Restructure solution layout by module
	
		
			
	
		
	
	
		
	
		
			Some checks failed
		
		
	
	
		
			
				
	
				Docs CI / lint-and-preview (push) Has been cancelled
				
			
		
		
	
	
				
					
				
			
		
			Some checks failed
		
		
	
	Docs CI / lint-and-preview (push) Has been cancelled
				
			This commit is contained in:
		
							
								
								
									
										409
									
								
								docs/implplan/EPIC_4.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										409
									
								
								docs/implplan/EPIC_4.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,409 @@ | ||||
| Here’s Epic 4 in the same paste‑into‑repo, implementation‑ready style as the prior epics. It’s exhaustive, formal, and slots directly into the existing AOC model, Policy Engine, and Console. | ||||
|  | ||||
| --- | ||||
|  | ||||
| # Epic 4: Policy Studio (author, version, simulate) | ||||
|  | ||||
| > Short name: **Policy Studio** | ||||
| > Services touched: **Policy Engine**, **Policy Registry** (new), **Web API Gateway**, **Authority** (authN/Z), **Scheduler/Workers**, **SBOM Service**, **Conseiller (Feedser)**, **Excitator (Vexer)**, **Telemetry** | ||||
| > Surfaces: **Console (Web UI)** feature module, **CLI**, **CI hooks** | ||||
| > Deliverables: Authoring workspace, policy versioning, static checks, simulation at scale, reviews/approvals, signing/publishing, promotion | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) What it is | ||||
|  | ||||
| **Policy Studio** is the end‑to‑end system for creating, evolving, and safely rolling out the rules that turn AOC facts (SBOM, advisories, VEX) into **effective findings**. It provides: | ||||
|  | ||||
| * A **workspace** where authors write policies in the DSL (Epic 2), with linting, autocompletion, snippets, and templates. | ||||
| * A **Policy Registry** that stores immutable versions, compiled artifacts, metadata, provenance, and signatures. | ||||
| * **Simulation** at two levels: quick local samples and large batch simulations across real SBOM inventories with full deltas. | ||||
| * A **review/approval** workflow with comments, diffs, required approvers, and promotion to environments (dev/test/prod). | ||||
| * **Publishing** semantics: signed, immutable versions bound to tenants; rollback and deprecation. | ||||
| * Tight integration with **Explain** traces so any change can show exactly which rules fired and why outcomes shifted. | ||||
|  | ||||
| The Studio respects **AOC enforcement**: policies never edit or merge facts. They only interpret facts and produce determinations consistent with precedence rules defined in the DSL. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Why | ||||
|  | ||||
| * Policy errors are expensive. Authors need safe sandboxes, deterministic builds, and evidence before rollout. | ||||
| * Auditors require immutability, provenance, and reproducibility from “source policy” to “effective finding.” | ||||
| * Teams want gradual rollout: simulate, canary, promote, observe, rollback. | ||||
| * Policy knowledge should be modular, reusable, and testable, not tribal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) How it should work (maximum detail) | ||||
|  | ||||
| ### 3.1 Domain model | ||||
|  | ||||
| * **PolicyPackage**: `{name, tenant, description, owners[], tags[], created_at}` | ||||
| * **PolicyVersion** (immutable): `{package, semver, source_sha, compiled_sha, status: draft|review|approved|published|deprecated|archived, created_by, created_at, signatures[], changelog, metadata{}}` | ||||
| * **Workspace**: mutable working area for authors; holds unversioned edits until compiled. | ||||
| * **CompilationArtifact**: `{policy_version, compiler_version, diagnostics[], rule_index[], symbol_table}` | ||||
| * **SimulationSpec**: `{policy_version|workspace, sbom_selector, time_window?, environment?, sample_size?, severity_floor?, includes{advisories?, vex?}}` | ||||
| * **SimulationRun**: `{run_id, spec, started_at, finished_at, result{counts_before, counts_after, top_deltas[], by_rule_hit[], sample_explains[]}}` | ||||
| * **Review**: `{policy_version, required_approvers[], votes[], comments[], files_changed[], diffs[]}` | ||||
| * **Promotion**: `{policy_version, environment: dev|test|prod, promoted_by, promoted_at, rollout_strategy: All|Percent|TenantSubset}` | ||||
| * **Attestation**: OIDC‑backed signature metadata binding `source_sha` and `compiled_sha` to an actor and time. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| ### 3.2 Authoring workflow | ||||
|  | ||||
| 1. **Create** a workspace from a template (e.g., “Default Risk Model,” “License Tilted,” “Cloud‑Native SBOM”). | ||||
| 2. **Edit** in the Studio: Monaco editor with DSL grammar, intelligent completion for predicates, policies, attributes. | ||||
| 3. **Lint & compile** locally: semantic checks, forbidden rules detection, policy size limits, constant‑folding. | ||||
| 4. **Unit tests**: run policy test cases on bundled fixtures and golden expectations. | ||||
| 5. **Quick simulate** on selected SBOMs (10–50 items) to preview counts, examples, and rule heatmap. | ||||
| 6. **Propose version**: bump semver, enter changelog; create a **PolicyVersion** in `review` with compiled artifacts. | ||||
| 7. **Review & approval**: side‑by‑side diff, comments, required approvers enforced by RBAC. | ||||
| 8. **Batch simulation**: run at scale across tenant inventory; produce deltas, sample explainer evidence. | ||||
| 9. **Publish**: sign and move to `published`; optional **Promotion** to target environment(s). | ||||
| 10. **Run** evaluation with the selected policy version; verify outcomes; optionally promote to default. | ||||
| 11. **Rollback**: select an older version; promotion updates references without mutating older versions. | ||||
|  | ||||
| ### 3.3 Editing experience (Console) | ||||
|  | ||||
| * **Three‑pane layout**: file tree, editor, diagnostics/simulation. | ||||
| * **Features**: autocomplete from symbol table, in‑editor docs on hover, go‑to definition, rule references, rename symbols across files, snippet library, policy templates. | ||||
| * **Validations**: | ||||
|  | ||||
|   * AOC guardrails: no edit/merge actions on source facts, only interpretation. | ||||
|   * Precedence correctness: if rules conflict, studio shows explicit order and effective winner. | ||||
|   * Severity floor and normalization mapping validated against registry configuration. | ||||
| * **Diagnostics panel**: errors, warnings, performance hints (e.g., “predicate X loads N advisories per component; consider indexing”). | ||||
| * **Rule heatmap**: during simulation, bar chart of rule firings and the objects they impact. | ||||
| * **Explain sampler**: click any delta bucket to open a sampled finding with full trace. | ||||
|  | ||||
| ### 3.4 Simulation | ||||
|  | ||||
| * **Quick Sim**: synchronous; runs in browser‑orchestrated job against API, constrained by `sample_size`. | ||||
| * **Batch Sim**: asynchronous run in workers: | ||||
|  | ||||
|   * Input selection: all SBOMs, labels, artifact regex, last N ingests, or a curated set. | ||||
|   * Outputs: counts by severity before/after, by status, top deltas by component and advisory, rule heatmap, top K affected artifacts. | ||||
|   * Evidence: NDJSON of sampled findings with traces; CSV summary; signed result manifest. | ||||
|   * Guardrails: cannot publish if batch sim drift > configurable threshold without an override justification. | ||||
|  | ||||
| ### 3.5 Versioning & promotion | ||||
|  | ||||
| * Semver enforced: `major` implies compatibility break (e.g., precedence changes), `minor` adds rules, `patch` fixes. | ||||
| * **Immutable**: after `published`, the version cannot change; deprecate instead. | ||||
| * **Environment bindings**: dev/test/prod mapping per tenant; default policy per environment. | ||||
| * **Canary**: promote to a subset of tenants or artifacts; the Runs page displays A/B comparisons. | ||||
|  | ||||
| ### 3.6 Review & approval | ||||
|  | ||||
| * Require N approvers by role; self‑approval optionally prohibited. | ||||
| * Line and file comments; overall decision with justification. | ||||
| * Review snapshot captures: diffs, diagnostics, simulation summary. | ||||
| * Webhooks to notify external systems of review events. | ||||
|  | ||||
| ### 3.7 RBAC (Authority) | ||||
|  | ||||
| Roles per tenant: | ||||
|  | ||||
| * **Policy Author**: create/edit workspace, quick sim, propose versions. | ||||
| * **Policy Reviewer**: comment, request changes, approve/reject. | ||||
| * **Policy Approver**: final approve, publish. | ||||
| * **Policy Operator**: promote, rollback, schedule runs. | ||||
| * **Read‑only Auditor**: view everything, download evidence. | ||||
|  | ||||
| All actions server‑checked; UI only hides affordances. | ||||
|  | ||||
| ### 3.8 CLI + CI integration | ||||
|  | ||||
| CLI verbs (examples): | ||||
|  | ||||
| ``` | ||||
| stella policy init --template default | ||||
| stella policy lint | ||||
| stella policy compile | ||||
| stella policy test --golden ./tests | ||||
| stella policy simulate --sboms label:prod --sample 1000 | ||||
| stella policy version bump --level minor --changelog "Normalize GHSA CVSS" | ||||
| stella policy submit --reviewers alice@example.com,bob@example.com | ||||
| stella policy approve --version 1.3.0 | ||||
| stella policy publish --version 1.3.0 --sign | ||||
| stella policy promote --version 1.3.0 --env test --percent 20 | ||||
| stella policy rollback --env prod --to 1.2.1 | ||||
| ``` | ||||
|  | ||||
| CI usage: | ||||
|  | ||||
| * Lint, compile, and run unit tests on PRs that modify `/policies/**`. | ||||
| * Optionally trigger **Batch Sim** against a staging inventory and post a Markdown report to the PR. | ||||
| * Block merge if diagnostics include errors or drift exceeds thresholds. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| ### 3.9 APIs (representative) | ||||
|  | ||||
| * `POST /policies/workspaces` create from template | ||||
| * `PUT /policies/workspaces/{id}/files` edit source files | ||||
| * `POST /policies/workspaces/{id}/compile` get diagnostics + compiled artifact | ||||
| * `POST /policies/workspaces/{id}/simulate` quick sim | ||||
| * `POST /policies/versions` create version from workspace with semver + changelog | ||||
| * `GET /policies/versions/{id}` fetch version + diagnostics + sim summary | ||||
| * `POST /policies/versions/{id}/reviews` open review | ||||
| * `POST /policies/versions/{id}/approve` record approval | ||||
| * `POST /policies/versions/{id}/publish` sign + publish | ||||
| * `POST /policies/versions/{id}/promote` bind to env/canary | ||||
| * `POST /policies/versions/{id}/simulate-batch` start batch sim (async) | ||||
| * `GET /policies/simulations/{run_id}` get sim results and artifacts | ||||
| * `GET /policies/registry` list packages/versions, status and bindings | ||||
|  | ||||
| All calls require tenant scoping and RBAC. | ||||
|  | ||||
| ### 3.10 Storage & data | ||||
|  | ||||
| * **Policy Registry DB** (MongoDB): packages, versions, workspaces, metadata. | ||||
| * **Object storage**: source bundles, compiled artifacts, simulation result bundles, evidence. | ||||
| * **Indexing**: compound indexes by `{tenant, package}`, `{tenant, status}`, `{tenant, environment}`. | ||||
| * **Retention**: configurable retention for workspaces and simulation artifacts; versions never deleted, only archived. | ||||
|  | ||||
| ### 3.11 Evidence & provenance | ||||
|  | ||||
| * Every published version has: | ||||
|  | ||||
|   * `source_sha` (content digest of the policy source bundle) | ||||
|   * `compiled_sha` (digest of compiled artifact) | ||||
|   * Attestation: signed envelope binding digests to an identity, time, and tenant. | ||||
|   * Links to the exact compiler version, inputs, and environment. | ||||
|  | ||||
| ### 3.12 Observability | ||||
|  | ||||
| * Metrics: compile time, diagnostics rate, simulation queue depth, delta magnitude distribution, approval latencies. | ||||
| * Logs: structured events for lifecycle transitions. | ||||
| * Traces: long simulations emit span per shard. | ||||
|  | ||||
| ### 3.13 Performance & scale | ||||
|  | ||||
| * Compilation should complete under 3 seconds for typical policies; warn at 10s. | ||||
| * Batch sim uses workers with partitioning by SBOM id; results reduced by the API. | ||||
| * Memory guardrails on rule execution; deny policies that exceed configured complexity limits. | ||||
|  | ||||
| ### 3.14 Security | ||||
|  | ||||
| * OIDC‑backed signing and attestation. | ||||
| * Policy sources are scanned on upload for secrets; blocked if found. | ||||
| * Strict CSP in Studio pages; tokens stored in memory, not localStorage. | ||||
| * Tenant isolation in buckets and DB collections. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Implementation plan | ||||
|  | ||||
| ### 4.1 Services | ||||
|  | ||||
| * **Policy Registry (new microservice)** | ||||
|  | ||||
|   * REST API and background workers for batch simulation orchestration. | ||||
|   * Stores workspaces, versions, metadata, bindings, reviews. | ||||
|   * Generates signed attestations at publish time. | ||||
|   * Coordinates with **Policy Engine** for compile/simulate invocations. | ||||
|  | ||||
| * **Policy Engine (existing)** | ||||
|  | ||||
|   * Expose compile and simulate endpoints with deterministic outputs. | ||||
|   * Provide rule coverage, symbol table, and explain traces for samples. | ||||
|  | ||||
| * **Web API Gateway** | ||||
|  | ||||
|   * Routes requests; injects tenant context; enforces RBAC. | ||||
|  | ||||
| ### 4.2 Console (Web UI) feature module | ||||
|  | ||||
| * `packages/features/policies` (shared with Epic 3): | ||||
|  | ||||
|   * **Studio** routes: `/policies/studio`, `/policies/:id/versions/:v/edit`, `/simulate`, `/review`. | ||||
|   * Monaco editor wrapper for DSL with hover docs, autocomplete. | ||||
|   * Diff viewer, diagnostics, heatmap, explain sampler, review UI. | ||||
|  | ||||
| ### 4.3 CLI | ||||
|  | ||||
| * New commands under `stella policy *`; typed client generated from OpenAPI. | ||||
| * Outputs machine‑readable JSON and pretty tables. | ||||
|  | ||||
| ### 4.4 Workers | ||||
|  | ||||
| * **Simulation workers**: pull shards of SBOMs, run policy, emit partials, reduce into result bundle. | ||||
| * **Notification worker**: sends webhooks on review, approval, publish, promote. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Documentation changes (create/update) | ||||
|  | ||||
| 1. **`/docs/policy/studio-overview.md`** | ||||
|  | ||||
|    * Concepts, roles, lifecycle, glossary. | ||||
| 2. **`/docs/policy/authoring.md`** | ||||
|  | ||||
|    * Workspace, templates, snippets, lint rules, best practices. | ||||
| 3. **`/docs/policy/versioning-and-publishing.md`** | ||||
|  | ||||
|    * Semver, immutability, deprecation, rollback, attestations. | ||||
| 4. **`/docs/policy/simulation.md`** | ||||
|  | ||||
|    * Quick vs batch sim, selection strategies, thresholds, evidence artifacts. | ||||
| 5. **`/docs/policy/review-and-approval.md`** | ||||
|  | ||||
|    * Required approvers, comments, webhooks, audit trail. | ||||
| 6. **`/docs/policy/promotion.md`** | ||||
|  | ||||
|    * Environments, canary, default policy binding, rollback. | ||||
| 7. **`/docs/policy/cli.md`** | ||||
|  | ||||
|    * Command reference with examples and JSON outputs. | ||||
| 8. **`/docs/policy/api.md`** | ||||
|  | ||||
|    * REST endpoints, request/response schemas, error codes. | ||||
| 9. **`/docs/security/policy-attestations.md`** | ||||
|  | ||||
|    * Signatures, digests, verifier steps. | ||||
| 10. **`/docs/architecture/policy-registry.md`** | ||||
|  | ||||
|     * Service design, schemas, queues, failure modes. | ||||
| 11. **`/docs/observability/policy-telemetry.md`** | ||||
|  | ||||
|     * Metrics, logs, tracing, dashboards. | ||||
| 12. **`/docs/runbooks/policy-incident.md`** | ||||
|  | ||||
|     * Rolling back a bad policy, freezing publishes, forensic steps. | ||||
| 13. **`/docs/examples/policy-templates.md`** | ||||
|  | ||||
|     * Ready‑made templates and snippet catalog. | ||||
| 14. **`/docs/aoc/aoc-guardrails.md`** | ||||
|  | ||||
|     * How Studio enforces AOC in authoring and review. | ||||
|  | ||||
| Each doc ends with a “Compliance checklist.” | ||||
| **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Tasks | ||||
|  | ||||
| ### 6.1 Backend: Policy Registry | ||||
|  | ||||
| * [ ] Define OpenAPI spec for Registry (workspaces, versions, reviews, sim). | ||||
| * [ ] Implement workspace storage and file CRUD. | ||||
| * [ ] Integrate with Policy Engine compile endpoint; return diagnostics, symbol table. | ||||
| * [ ] Implement quick simulation with request limits. | ||||
| * [ ] Implement batch simulation orchestration: enqueue shards, collect results, reduce deltas, store artifacts. | ||||
| * [ ] Implement review model: comments, required approvers, decisions. | ||||
| * [ ] Implement publish: sign, persist attestation, set status=published. | ||||
| * [ ] Implement promotion bindings per tenant/environment; canary subsets. | ||||
| * [ ] RBAC checks for all endpoints. | ||||
| * [ ] Unit/integration tests; load tests for batch sim. | ||||
|  | ||||
| ### 6.2 Policy Engine enhancements | ||||
|  | ||||
| * [ ] Return rule coverage and firing counts with compile/simulate. | ||||
| * [ ] Return symbol table and inline docs for editor autocomplete. | ||||
| * [ ] Expose deterministic Explain traces for sampled findings. | ||||
| * [ ] Enforce complexity/time limits and report breaches. | ||||
|  | ||||
| ### 6.3 Console (Web UI) | ||||
|  | ||||
| * [ ] Build Studio editor wrapper with Monaco + DSL language server hooks. | ||||
| * [ ] Implement file tree, snippets, templates, hotkeys, search/replace. | ||||
| * [ ] Diagnostics panel with jump‑to‑line, quick fixes. | ||||
| * [ ] Simulation panel: quick sim UI, charts, heatmap, sample explains. | ||||
| * [ ] Review UI: diff, comments, approvals, status badges. | ||||
| * [ ] Publish & Promote flows with confirmation and post‑actions. | ||||
| * [ ] Batch sim results pages with export buttons. | ||||
| * [ ] Accessibility audits and keyboard‑only authoring flow. | ||||
|  | ||||
| ### 6.4 CLI | ||||
|  | ||||
| * [ ] Implement commands listed in 3.8 with rich help and examples. | ||||
| * [ ] Add `--json` flag for machine consumption; emit stable schemas. | ||||
| * [ ] Exit codes aligned with CI usage (lint errors → non‑zero). | ||||
|  | ||||
| ### 6.5 CI/CD & Security | ||||
|  | ||||
| * [ ] Add CI job that runs `stella policy lint/compile/test` on PRs. | ||||
| * [ ] Optional job that triggers batch sim against staging inventory; post summary to PR. | ||||
| * [ ] Policy source secret scanning; block on findings. | ||||
| * [ ] Signing keys configuration; verify pipeline for attestation on publish. | ||||
|  | ||||
| ### 6.6 Docs | ||||
|  | ||||
| * [ ] Write all docs in section 5 with screenshots and CLI transcripts. | ||||
| * [ ] Add cookbook examples and templates in `/docs/examples/policy-templates.md`. | ||||
| * [ ] Wire contextual Help links from Studio to relevant docs. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Acceptance criteria | ||||
|  | ||||
| * Authors can create, edit, lint, compile policies with inline diagnostics and autocomplete. | ||||
| * Quick simulation produces counts, rule heatmap, and sample explains within UI. | ||||
| * Batch simulation scales across large SBOM sets, producing deltas and downloadable evidence. | ||||
| * Review requires configured approvers; comments and diffs are preserved. | ||||
| * Publish generates immutable, signed versions with attestations. | ||||
| * Promotion binds versions to environments and supports canary and rollback. | ||||
| * CLI supports full lifecycle and is usable in CI. | ||||
| * All actions are tenant‑scoped, RBAC‑enforced, and logged. | ||||
| * AOC guardrails prevent any mutation of raw facts. | ||||
| * Documentation shipped and linked contextually from the Studio. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Risks & mitigations | ||||
|  | ||||
| * **Policy complexity causes timeouts** → compile‑time complexity scoring, execution limits, early diagnostics. | ||||
| * **Simulation cost at scale** → sharding and streaming reducers; sampling; configurable quotas. | ||||
| * **RBAC misconfiguration** → server‑enforced checks, defense‑in‑depth tests, deny‑by‑default. | ||||
| * **Attestation key management** → OIDC‑backed signatures; auditable verifier tool; time‑boxed credentials. | ||||
| * **Editor usability** → language server with accurate completions; docs on hover; snippet library. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Test plan | ||||
|  | ||||
| * **Unit**: compiler adapters, registry models, reviewers workflow, CLI options. | ||||
| * **Integration**: compile→simulate→publish→promote on seeded data. | ||||
| * **E2E**: Playwright flows for author→review→batch sim→publish→promote→rollback. | ||||
| * **Performance**: load test batch simulation with 100k components spread across SBOMs. | ||||
| * **Security**: RBAC matrix tests; secret scanning; signing and verification. | ||||
| * **Determinism**: same inputs produce identical `compiled_sha` and simulation summaries. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Feature flags | ||||
|  | ||||
| * `policy.studio` (enables editor and quick sim) | ||||
| * `policy.batch-sim` | ||||
| * `policy.canary-promotion` | ||||
| * `policy.signature-required` (enforce signing on publish) | ||||
|  | ||||
| Flags documented in `/docs/observability/policy-telemetry.md`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Non‑goals (this epic) | ||||
|  | ||||
| * Building a general IDE for arbitrary languages; the editor is purpose‑built for the DSL. | ||||
| * Auto‑generated policies from AI without human approval. | ||||
| * Cross‑tenant policies; all policies are tenant‑scoped. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Philosophy | ||||
|  | ||||
| * **Safety first**: it’s cheaper to prevent a bad policy than to fix its fallout. | ||||
| * **Determinism**: same inputs, same outputs, verifiably. | ||||
| * **Immutability**: versions and evidence are forever; we deprecate, not mutate. | ||||
| * **Transparency**: every change is explainable with traces and proofs. | ||||
| * **Reusability**: templates, snippets, and tests turn policy from art into engineering. | ||||
|  | ||||
| > Final reminder: **Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.** | ||||
		Reference in New Issue
	
	Block a user