feat: Add new projects to solution and implement contract testing documentation
- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution. - Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done. - Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
This commit is contained in:
409
EPIC_4.md
Normal file
409
EPIC_4.md
Normal file
@@ -0,0 +1,409 @@
|
||||
Here’s Epic 4 in the same paste‑into‑repo, implementation‑ready style as the prior epics. It’s exhaustive, formal, and slots directly into the existing AOC model, Policy Engine, and Console.
|
||||
|
||||
---
|
||||
|
||||
# Epic 4: Policy Studio (author, version, simulate)
|
||||
|
||||
> Short name: **Policy Studio**
|
||||
> Services touched: **Policy Engine**, **Policy Registry** (new), **Web API Gateway**, **Authority** (authN/Z), **Scheduler/Workers**, **SBOM Service**, **Conseiller (Feedser)**, **Excitator (Vexer)**, **Telemetry**
|
||||
> Surfaces: **Console (Web UI)** feature module, **CLI**, **CI hooks**
|
||||
> Deliverables: Authoring workspace, policy versioning, static checks, simulation at scale, reviews/approvals, signing/publishing, promotion
|
||||
|
||||
---
|
||||
|
||||
## 1) What it is
|
||||
|
||||
**Policy Studio** is the end‑to‑end system for creating, evolving, and safely rolling out the rules that turn AOC facts (SBOM, advisories, VEX) into **effective findings**. It provides:
|
||||
|
||||
* A **workspace** where authors write policies in the DSL (Epic 2), with linting, autocompletion, snippets, and templates.
|
||||
* A **Policy Registry** that stores immutable versions, compiled artifacts, metadata, provenance, and signatures.
|
||||
* **Simulation** at two levels: quick local samples and large batch simulations across real SBOM inventories with full deltas.
|
||||
* A **review/approval** workflow with comments, diffs, required approvers, and promotion to environments (dev/test/prod).
|
||||
* **Publishing** semantics: signed, immutable versions bound to tenants; rollback and deprecation.
|
||||
* Tight integration with **Explain** traces so any change can show exactly which rules fired and why outcomes shifted.
|
||||
|
||||
The Studio respects **AOC enforcement**: policies never edit or merge facts. They only interpret facts and produce determinations consistent with precedence rules defined in the DSL.
|
||||
|
||||
---
|
||||
|
||||
## 2) Why
|
||||
|
||||
* Policy errors are expensive. Authors need safe sandboxes, deterministic builds, and evidence before rollout.
|
||||
* Auditors require immutability, provenance, and reproducibility from “source policy” to “effective finding.”
|
||||
* Teams want gradual rollout: simulate, canary, promote, observe, rollback.
|
||||
* Policy knowledge should be modular, reusable, and testable, not tribal.
|
||||
|
||||
---
|
||||
|
||||
## 3) How it should work (maximum detail)
|
||||
|
||||
### 3.1 Domain model
|
||||
|
||||
* **PolicyPackage**: `{name, tenant, description, owners[], tags[], created_at}`
|
||||
* **PolicyVersion** (immutable): `{package, semver, source_sha, compiled_sha, status: draft|review|approved|published|deprecated|archived, created_by, created_at, signatures[], changelog, metadata{}}`
|
||||
* **Workspace**: mutable working area for authors; holds unversioned edits until compiled.
|
||||
* **CompilationArtifact**: `{policy_version, compiler_version, diagnostics[], rule_index[], symbol_table}`
|
||||
* **SimulationSpec**: `{policy_version|workspace, sbom_selector, time_window?, environment?, sample_size?, severity_floor?, includes{advisories?, vex?}}`
|
||||
* **SimulationRun**: `{run_id, spec, started_at, finished_at, result{counts_before, counts_after, top_deltas[], by_rule_hit[], sample_explains[]}}`
|
||||
* **Review**: `{policy_version, required_approvers[], votes[], comments[], files_changed[], diffs[]}`
|
||||
* **Promotion**: `{policy_version, environment: dev|test|prod, promoted_by, promoted_at, rollout_strategy: All|Percent|TenantSubset}`
|
||||
* **Attestation**: OIDC‑backed signature metadata binding `source_sha` and `compiled_sha` to an actor and time.
|
||||
|
||||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
### 3.2 Authoring workflow
|
||||
|
||||
1. **Create** a workspace from a template (e.g., “Default Risk Model,” “License Tilted,” “Cloud‑Native SBOM”).
|
||||
2. **Edit** in the Studio: Monaco editor with DSL grammar, intelligent completion for predicates, policies, attributes.
|
||||
3. **Lint & compile** locally: semantic checks, forbidden rules detection, policy size limits, constant‑folding.
|
||||
4. **Unit tests**: run policy test cases on bundled fixtures and golden expectations.
|
||||
5. **Quick simulate** on selected SBOMs (10–50 items) to preview counts, examples, and rule heatmap.
|
||||
6. **Propose version**: bump semver, enter changelog; create a **PolicyVersion** in `review` with compiled artifacts.
|
||||
7. **Review & approval**: side‑by‑side diff, comments, required approvers enforced by RBAC.
|
||||
8. **Batch simulation**: run at scale across tenant inventory; produce deltas, sample explainer evidence.
|
||||
9. **Publish**: sign and move to `published`; optional **Promotion** to target environment(s).
|
||||
10. **Run** evaluation with the selected policy version; verify outcomes; optionally promote to default.
|
||||
11. **Rollback**: select an older version; promotion updates references without mutating older versions.
|
||||
|
||||
### 3.3 Editing experience (Console)
|
||||
|
||||
* **Three‑pane layout**: file tree, editor, diagnostics/simulation.
|
||||
* **Features**: autocomplete from symbol table, in‑editor docs on hover, go‑to definition, rule references, rename symbols across files, snippet library, policy templates.
|
||||
* **Validations**:
|
||||
|
||||
* AOC guardrails: no edit/merge actions on source facts, only interpretation.
|
||||
* Precedence correctness: if rules conflict, studio shows explicit order and effective winner.
|
||||
* Severity floor and normalization mapping validated against registry configuration.
|
||||
* **Diagnostics panel**: errors, warnings, performance hints (e.g., “predicate X loads N advisories per component; consider indexing”).
|
||||
* **Rule heatmap**: during simulation, bar chart of rule firings and the objects they impact.
|
||||
* **Explain sampler**: click any delta bucket to open a sampled finding with full trace.
|
||||
|
||||
### 3.4 Simulation
|
||||
|
||||
* **Quick Sim**: synchronous; runs in browser‑orchestrated job against API, constrained by `sample_size`.
|
||||
* **Batch Sim**: asynchronous run in workers:
|
||||
|
||||
* Input selection: all SBOMs, labels, artifact regex, last N ingests, or a curated set.
|
||||
* Outputs: counts by severity before/after, by status, top deltas by component and advisory, rule heatmap, top K affected artifacts.
|
||||
* Evidence: NDJSON of sampled findings with traces; CSV summary; signed result manifest.
|
||||
* Guardrails: cannot publish if batch sim drift > configurable threshold without an override justification.
|
||||
|
||||
### 3.5 Versioning & promotion
|
||||
|
||||
* Semver enforced: `major` implies compatibility break (e.g., precedence changes), `minor` adds rules, `patch` fixes.
|
||||
* **Immutable**: after `published`, the version cannot change; deprecate instead.
|
||||
* **Environment bindings**: dev/test/prod mapping per tenant; default policy per environment.
|
||||
* **Canary**: promote to a subset of tenants or artifacts; the Runs page displays A/B comparisons.
|
||||
|
||||
### 3.6 Review & approval
|
||||
|
||||
* Require N approvers by role; self‑approval optionally prohibited.
|
||||
* Line and file comments; overall decision with justification.
|
||||
* Review snapshot captures: diffs, diagnostics, simulation summary.
|
||||
* Webhooks to notify external systems of review events.
|
||||
|
||||
### 3.7 RBAC (Authority)
|
||||
|
||||
Roles per tenant:
|
||||
|
||||
* **Policy Author**: create/edit workspace, quick sim, propose versions.
|
||||
* **Policy Reviewer**: comment, request changes, approve/reject.
|
||||
* **Policy Approver**: final approve, publish.
|
||||
* **Policy Operator**: promote, rollback, schedule runs.
|
||||
* **Read‑only Auditor**: view everything, download evidence.
|
||||
|
||||
All actions server‑checked; UI only hides affordances.
|
||||
|
||||
### 3.8 CLI + CI integration
|
||||
|
||||
CLI verbs (examples):
|
||||
|
||||
```
|
||||
stella policy init --template default
|
||||
stella policy lint
|
||||
stella policy compile
|
||||
stella policy test --golden ./tests
|
||||
stella policy simulate --sboms label:prod --sample 1000
|
||||
stella policy version bump --level minor --changelog "Normalize GHSA CVSS"
|
||||
stella policy submit --reviewers alice@example.com,bob@example.com
|
||||
stella policy approve --version 1.3.0
|
||||
stella policy publish --version 1.3.0 --sign
|
||||
stella policy promote --version 1.3.0 --env test --percent 20
|
||||
stella policy rollback --env prod --to 1.2.1
|
||||
```
|
||||
|
||||
CI usage:
|
||||
|
||||
* Lint, compile, and run unit tests on PRs that modify `/policies/**`.
|
||||
* Optionally trigger **Batch Sim** against a staging inventory and post a Markdown report to the PR.
|
||||
* Block merge if diagnostics include errors or drift exceeds thresholds.
|
||||
|
||||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
### 3.9 APIs (representative)
|
||||
|
||||
* `POST /policies/workspaces` create from template
|
||||
* `PUT /policies/workspaces/{id}/files` edit source files
|
||||
* `POST /policies/workspaces/{id}/compile` get diagnostics + compiled artifact
|
||||
* `POST /policies/workspaces/{id}/simulate` quick sim
|
||||
* `POST /policies/versions` create version from workspace with semver + changelog
|
||||
* `GET /policies/versions/{id}` fetch version + diagnostics + sim summary
|
||||
* `POST /policies/versions/{id}/reviews` open review
|
||||
* `POST /policies/versions/{id}/approve` record approval
|
||||
* `POST /policies/versions/{id}/publish` sign + publish
|
||||
* `POST /policies/versions/{id}/promote` bind to env/canary
|
||||
* `POST /policies/versions/{id}/simulate-batch` start batch sim (async)
|
||||
* `GET /policies/simulations/{run_id}` get sim results and artifacts
|
||||
* `GET /policies/registry` list packages/versions, status and bindings
|
||||
|
||||
All calls require tenant scoping and RBAC.
|
||||
|
||||
### 3.10 Storage & data
|
||||
|
||||
* **Policy Registry DB** (MongoDB): packages, versions, workspaces, metadata.
|
||||
* **Object storage**: source bundles, compiled artifacts, simulation result bundles, evidence.
|
||||
* **Indexing**: compound indexes by `{tenant, package}`, `{tenant, status}`, `{tenant, environment}`.
|
||||
* **Retention**: configurable retention for workspaces and simulation artifacts; versions never deleted, only archived.
|
||||
|
||||
### 3.11 Evidence & provenance
|
||||
|
||||
* Every published version has:
|
||||
|
||||
* `source_sha` (content digest of the policy source bundle)
|
||||
* `compiled_sha` (digest of compiled artifact)
|
||||
* Attestation: signed envelope binding digests to an identity, time, and tenant.
|
||||
* Links to the exact compiler version, inputs, and environment.
|
||||
|
||||
### 3.12 Observability
|
||||
|
||||
* Metrics: compile time, diagnostics rate, simulation queue depth, delta magnitude distribution, approval latencies.
|
||||
* Logs: structured events for lifecycle transitions.
|
||||
* Traces: long simulations emit span per shard.
|
||||
|
||||
### 3.13 Performance & scale
|
||||
|
||||
* Compilation should complete under 3 seconds for typical policies; warn at 10s.
|
||||
* Batch sim uses workers with partitioning by SBOM id; results reduced by the API.
|
||||
* Memory guardrails on rule execution; deny policies that exceed configured complexity limits.
|
||||
|
||||
### 3.14 Security
|
||||
|
||||
* OIDC‑backed signing and attestation.
|
||||
* Policy sources are scanned on upload for secrets; blocked if found.
|
||||
* Strict CSP in Studio pages; tokens stored in memory, not localStorage.
|
||||
* Tenant isolation in buckets and DB collections.
|
||||
|
||||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
---
|
||||
|
||||
## 4) Implementation plan
|
||||
|
||||
### 4.1 Services
|
||||
|
||||
* **Policy Registry (new microservice)**
|
||||
|
||||
* REST API and background workers for batch simulation orchestration.
|
||||
* Stores workspaces, versions, metadata, bindings, reviews.
|
||||
* Generates signed attestations at publish time.
|
||||
* Coordinates with **Policy Engine** for compile/simulate invocations.
|
||||
|
||||
* **Policy Engine (existing)**
|
||||
|
||||
* Expose compile and simulate endpoints with deterministic outputs.
|
||||
* Provide rule coverage, symbol table, and explain traces for samples.
|
||||
|
||||
* **Web API Gateway**
|
||||
|
||||
* Routes requests; injects tenant context; enforces RBAC.
|
||||
|
||||
### 4.2 Console (Web UI) feature module
|
||||
|
||||
* `packages/features/policies` (shared with Epic 3):
|
||||
|
||||
* **Studio** routes: `/policies/studio`, `/policies/:id/versions/:v/edit`, `/simulate`, `/review`.
|
||||
* Monaco editor wrapper for DSL with hover docs, autocomplete.
|
||||
* Diff viewer, diagnostics, heatmap, explain sampler, review UI.
|
||||
|
||||
### 4.3 CLI
|
||||
|
||||
* New commands under `stella policy *`; typed client generated from OpenAPI.
|
||||
* Outputs machine‑readable JSON and pretty tables.
|
||||
|
||||
### 4.4 Workers
|
||||
|
||||
* **Simulation workers**: pull shards of SBOMs, run policy, emit partials, reduce into result bundle.
|
||||
* **Notification worker**: sends webhooks on review, approval, publish, promote.
|
||||
|
||||
---
|
||||
|
||||
## 5) Documentation changes (create/update)
|
||||
|
||||
1. **`/docs/policy/studio-overview.md`**
|
||||
|
||||
* Concepts, roles, lifecycle, glossary.
|
||||
2. **`/docs/policy/authoring.md`**
|
||||
|
||||
* Workspace, templates, snippets, lint rules, best practices.
|
||||
3. **`/docs/policy/versioning-and-publishing.md`**
|
||||
|
||||
* Semver, immutability, deprecation, rollback, attestations.
|
||||
4. **`/docs/policy/simulation.md`**
|
||||
|
||||
* Quick vs batch sim, selection strategies, thresholds, evidence artifacts.
|
||||
5. **`/docs/policy/review-and-approval.md`**
|
||||
|
||||
* Required approvers, comments, webhooks, audit trail.
|
||||
6. **`/docs/policy/promotion.md`**
|
||||
|
||||
* Environments, canary, default policy binding, rollback.
|
||||
7. **`/docs/policy/cli.md`**
|
||||
|
||||
* Command reference with examples and JSON outputs.
|
||||
8. **`/docs/policy/api.md`**
|
||||
|
||||
* REST endpoints, request/response schemas, error codes.
|
||||
9. **`/docs/security/policy-attestations.md`**
|
||||
|
||||
* Signatures, digests, verifier steps.
|
||||
10. **`/docs/architecture/policy-registry.md`**
|
||||
|
||||
* Service design, schemas, queues, failure modes.
|
||||
11. **`/docs/observability/policy-telemetry.md`**
|
||||
|
||||
* Metrics, logs, tracing, dashboards.
|
||||
12. **`/docs/runbooks/policy-incident.md`**
|
||||
|
||||
* Rolling back a bad policy, freezing publishes, forensic steps.
|
||||
13. **`/docs/examples/policy-templates.md`**
|
||||
|
||||
* Ready‑made templates and snippet catalog.
|
||||
14. **`/docs/aoc/aoc-guardrails.md`**
|
||||
|
||||
* How Studio enforces AOC in authoring and review.
|
||||
|
||||
Each doc ends with a “Compliance checklist.”
|
||||
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
---
|
||||
|
||||
## 6) Tasks
|
||||
|
||||
### 6.1 Backend: Policy Registry
|
||||
|
||||
* [ ] Define OpenAPI spec for Registry (workspaces, versions, reviews, sim).
|
||||
* [ ] Implement workspace storage and file CRUD.
|
||||
* [ ] Integrate with Policy Engine compile endpoint; return diagnostics, symbol table.
|
||||
* [ ] Implement quick simulation with request limits.
|
||||
* [ ] Implement batch simulation orchestration: enqueue shards, collect results, reduce deltas, store artifacts.
|
||||
* [ ] Implement review model: comments, required approvers, decisions.
|
||||
* [ ] Implement publish: sign, persist attestation, set status=published.
|
||||
* [ ] Implement promotion bindings per tenant/environment; canary subsets.
|
||||
* [ ] RBAC checks for all endpoints.
|
||||
* [ ] Unit/integration tests; load tests for batch sim.
|
||||
|
||||
### 6.2 Policy Engine enhancements
|
||||
|
||||
* [ ] Return rule coverage and firing counts with compile/simulate.
|
||||
* [ ] Return symbol table and inline docs for editor autocomplete.
|
||||
* [ ] Expose deterministic Explain traces for sampled findings.
|
||||
* [ ] Enforce complexity/time limits and report breaches.
|
||||
|
||||
### 6.3 Console (Web UI)
|
||||
|
||||
* [ ] Build Studio editor wrapper with Monaco + DSL language server hooks.
|
||||
* [ ] Implement file tree, snippets, templates, hotkeys, search/replace.
|
||||
* [ ] Diagnostics panel with jump‑to‑line, quick fixes.
|
||||
* [ ] Simulation panel: quick sim UI, charts, heatmap, sample explains.
|
||||
* [ ] Review UI: diff, comments, approvals, status badges.
|
||||
* [ ] Publish & Promote flows with confirmation and post‑actions.
|
||||
* [ ] Batch sim results pages with export buttons.
|
||||
* [ ] Accessibility audits and keyboard‑only authoring flow.
|
||||
|
||||
### 6.4 CLI
|
||||
|
||||
* [ ] Implement commands listed in 3.8 with rich help and examples.
|
||||
* [ ] Add `--json` flag for machine consumption; emit stable schemas.
|
||||
* [ ] Exit codes aligned with CI usage (lint errors → non‑zero).
|
||||
|
||||
### 6.5 CI/CD & Security
|
||||
|
||||
* [ ] Add CI job that runs `stella policy lint/compile/test` on PRs.
|
||||
* [ ] Optional job that triggers batch sim against staging inventory; post summary to PR.
|
||||
* [ ] Policy source secret scanning; block on findings.
|
||||
* [ ] Signing keys configuration; verify pipeline for attestation on publish.
|
||||
|
||||
### 6.6 Docs
|
||||
|
||||
* [ ] Write all docs in section 5 with screenshots and CLI transcripts.
|
||||
* [ ] Add cookbook examples and templates in `/docs/examples/policy-templates.md`.
|
||||
* [ ] Wire contextual Help links from Studio to relevant docs.
|
||||
|
||||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
---
|
||||
|
||||
## 7) Acceptance criteria
|
||||
|
||||
* Authors can create, edit, lint, compile policies with inline diagnostics and autocomplete.
|
||||
* Quick simulation produces counts, rule heatmap, and sample explains within UI.
|
||||
* Batch simulation scales across large SBOM sets, producing deltas and downloadable evidence.
|
||||
* Review requires configured approvers; comments and diffs are preserved.
|
||||
* Publish generates immutable, signed versions with attestations.
|
||||
* Promotion binds versions to environments and supports canary and rollback.
|
||||
* CLI supports full lifecycle and is usable in CI.
|
||||
* All actions are tenant‑scoped, RBAC‑enforced, and logged.
|
||||
* AOC guardrails prevent any mutation of raw facts.
|
||||
* Documentation shipped and linked contextually from the Studio.
|
||||
|
||||
---
|
||||
|
||||
## 8) Risks & mitigations
|
||||
|
||||
* **Policy complexity causes timeouts** → compile‑time complexity scoring, execution limits, early diagnostics.
|
||||
* **Simulation cost at scale** → sharding and streaming reducers; sampling; configurable quotas.
|
||||
* **RBAC misconfiguration** → server‑enforced checks, defense‑in‑depth tests, deny‑by‑default.
|
||||
* **Attestation key management** → OIDC‑backed signatures; auditable verifier tool; time‑boxed credentials.
|
||||
* **Editor usability** → language server with accurate completions; docs on hover; snippet library.
|
||||
|
||||
---
|
||||
|
||||
## 9) Test plan
|
||||
|
||||
* **Unit**: compiler adapters, registry models, reviewers workflow, CLI options.
|
||||
* **Integration**: compile→simulate→publish→promote on seeded data.
|
||||
* **E2E**: Playwright flows for author→review→batch sim→publish→promote→rollback.
|
||||
* **Performance**: load test batch simulation with 100k components spread across SBOMs.
|
||||
* **Security**: RBAC matrix tests; secret scanning; signing and verification.
|
||||
* **Determinism**: same inputs produce identical `compiled_sha` and simulation summaries.
|
||||
|
||||
---
|
||||
|
||||
## 10) Feature flags
|
||||
|
||||
* `policy.studio` (enables editor and quick sim)
|
||||
* `policy.batch-sim`
|
||||
* `policy.canary-promotion`
|
||||
* `policy.signature-required` (enforce signing on publish)
|
||||
|
||||
Flags documented in `/docs/observability/policy-telemetry.md`.
|
||||
|
||||
---
|
||||
|
||||
## 11) Non‑goals (this epic)
|
||||
|
||||
* Building a general IDE for arbitrary languages; the editor is purpose‑built for the DSL.
|
||||
* Auto‑generated policies from AI without human approval.
|
||||
* Cross‑tenant policies; all policies are tenant‑scoped.
|
||||
|
||||
---
|
||||
|
||||
## 12) Philosophy
|
||||
|
||||
* **Safety first**: it’s cheaper to prevent a bad policy than to fix its fallout.
|
||||
* **Determinism**: same inputs, same outputs, verifiably.
|
||||
* **Immutability**: versions and evidence are forever; we deprecate, not mutate.
|
||||
* **Transparency**: every change is explainable with traces and proofs.
|
||||
* **Reusability**: templates, snippets, and tests turn policy from art into engineering.
|
||||
|
||||
> Final reminder: **Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.**
|
||||
Reference in New Issue
Block a user