up

2025-11-26 20:23:28 +02:00
parent 4831c7fcb0
commit d63af51f84
139 changed files with 8010 additions and 2795 deletions
--- a/docs/policy/overview.md
+++ b/docs/policy/overview.md
@@ -1,173 +1,54 @@
-# Policy Engine Overview
-
-> **Goal:** Evaluate organisation policies deterministically against scanner SBOMs, Concelier advisories, and Excititor VEX evidence, then publish effective findings that downstream services can trust.
-
-This document introduces the v2 Policy Engine: how the service fits into Stella Ops, the artefacts it produces, the contracts it honours, and the guardrails that keep policy decisions reproducible across air-gapped and connected deployments.
-
---
-
-## 1 · Role in the Platform
-
- **Purpose:** Compose policy verdicts by reconciling SBOM inventory, advisory metadata, VEX statements, and organisation rules.
- **Form factor:** Dedicated `.NET 10` Minimal API host (`StellaOps.Policy.Engine`) plus worker orchestration. Policies are defined in `stella-dsl@1` packs compiled to an intermediate representation (IR) with a stable SHA-256 digest.
- **Tenancy:** All workloads run under Authority-enforced scopes (`policy:*`, `findings:read`, `effective:write`). Only the Policy Engine identity may materialise effective findings collections.
- **Consumption:** Findings ledger, Console, CLI, and Notify read the published `effective_finding_{policyId}` materialisations and policy run ledger (`policy_runs`).
- **Offline parity:** Bundled policies import/export alongside advisories and VEX. In sealed mode the engine degrades gracefully, annotating explanations whenever cached signals replace live lookups.
-
---
-
-## 2 · High-Level Architecture
-
-```mermaid
-flowchart LR
-    subgraph Inputs
-        A[Scanner SBOMs<br/>Inventory & Usage]
-        B[Concelier Advisories<br/>Canonical linksets]
-        C[Excititor VEX<br/>Consensus status]
-        D[Policy Packs<br/>stella-dsl@1]
-    end
-    subgraph PolicyEngine["StellaOps.Policy.Engine"]
-        P1[DSL Compiler<br/>IR + Digest]
-        P2[Joiners<br/>SBOM ↔ Advisory ↔ VEX]
-        P3[Deterministic Evaluator<br/>Rule hits + scoring]
-        P4[Materialisers<br/>effective findings]
-        P5[Run Orchestrator<br/>Full & incremental]
-    end
-    subgraph Outputs
-        O1[Effective Findings Collections]
-        O2[Explain Traces<br/>Rule hit lineage]
-        O3[Metrics & Traces<br/>policy_run_seconds,<br/>rules_fired_total]
-        O4[Simulation/Preview Feeds<br/>CLI & Studio]
-    end
-
-    A --> P2
-    B --> P2
-    C --> P2
-    D --> P1 --> P3
-    P2 --> P3 --> P4 --> O1
-    P3 --> O2
-    P5 --> P3
-    P3 --> O3
-    P3 --> O4
-```
-
---
-
-## 3 · Core Concepts
-
-| Concept | Description |
-|---------|-------------|
-| **Policy Pack** | Versioned bundle of DSL documents, metadata, and checksum manifest. Packs import/export via CLI and Offline Kit bundles. |
-| **Policy Digest** | SHA-256 of the canonical IR; used for caching, explain trace attribution, and audit proofs. |
-| **Effective Findings** | Append-only Mongo collections (`effective_finding_{policyId}`) storing the latest verdict per finding, plus history sidecars. |
-| **Policy Run** | Execution record persisted in `policy_runs` capturing inputs, run mode, timings, and determinism hash. |
-| **Explain Trace** | Structured tree showing rule matches, data provenance, and scoring components for UI/CLI explain features. |
-| **Simulation** | Dry-run evaluation that compares a candidate pack against the active pack and produces verdict diffs without persisting results. |
-| **Incident Mode** | Elevated sampling/trace capture toggled automatically when SLOs breach; emits events for Notifier and Timeline Indexer. |
-
---
-
-## 4 · Inputs & Pre-processing
-
-### 4.1 SBOM Inventory
-
- **Source:** Scanner.WebService publishes inventory/usage SBOMs plus BOM-Index (roaring bitmap) metadata.
- **Consumption:** Policy joiners use the index to expand candidate components quickly, keeping evaluation under the `< 5 s` warm path budget.
- **Schema:** CycloneDX Protobuf + JSON views; Policy Engine reads canonical projections via shared SBOM adapters.
-
-### 4.2 Advisory Corpus
-
- **Source:** Concelier exports canonical advisories with deterministic identifiers, linksets, and equivalence tables.
- **Contract:** Policy Engine only consumes raw `content.raw`, `identifiers`, and `linkset` fields per Aggregation-Only Contract (AOC); derived precedence remains a policy concern.
-
-### 4.3 VEX Evidence
-
- **Source:** Excititor consensus service resolves OpenVEX / CSAF statements, preserving conflicts.
- **Usage:** Policy rules can require specific VEX vendors or justification codes; evaluator records when cached evidence substitutes for live statements (sealed mode).
-
-### 4.4 Policy Packs
-
- Authored in Policy Studio or CLI, validated against the `stella-dsl@1` schema.
- Compiler performs canonicalisation (ordering, defaulting) before emitting IR and digest.
- Packs bundle scoring profiles, allowlist metadata, and optional reachability weighting tables.
-
---
-
-## 5 · Evaluation Flow
-
-1. **Run selection** – Orchestrator accepts `full`, `incremental`, or `simulate` jobs. Incremental runs listen to change streams from Concelier, Excititor, and SBOM imports to scope re-evaluation.
-2. **Input staging** – Candidates fetched in deterministic batches; identity graph from Concelier strengthens PURL lookups.
-3. **Rule execution** – Evaluator walks rules in lexical order (first-match wins). Actions available: `block`, `ignore`, `warn`, `defer`, `escalate`, `requireVex`, each supporting quieting semantics where permitted.
-4. **Scoring** – `PolicyScoringConfig` applies severity, trust, reachability weights plus penalties (`warnPenalty`, `ignorePenalty`, `quietPenalty`).
-5. **Verdict and explain** – Engine constructs `PolicyVerdict` records with inputs, quiet flags, unknown confidence bands, and provenance markers; explain trees capture rule lineage.
-6. **Materialisation** – Effective findings collections are upserted append-only, stamped with run identifier, policy digest, and tenant.
-7. **Publishing** – Completed run writes to `policy_runs`, emits metrics (`policy_run_seconds`, `rules_fired_total`, `vex_overrides_total`), and raises events for Console/Notify subscribers.
-
---
-
-## 6 · Run Modes
-
-| Mode | Trigger | Scope | Persistence | Typical Use |
-|------|---------|-------|-------------|-------------|
-| **Full** | Manual CLI (`stella policy run`), scheduled nightly, or emergency rebaseline | Entire tenant | Writes effective findings and run record | After policy publish or major advisory/VEX import |
-| **Incremental** | Change-stream queue driven by Concelier/Excititor/SBOM deltas | Only affected artefacts | Writes effective findings and run record | Continuous upkeep; ensures SLA ≤ 5 min from source change |
-| **Simulate** | CLI/Studio preview, CI pipelines | Candidate subset (diff against baseline) | No materialisation; produces explain & diff payloads | Policy authoring, CI regression suites |
-
-All modes are cancellation-aware and checkpoint progress for replay in case of deployment restarts.
-
---
-
-## 7 · Outputs & Integrations
-
- **APIs** – Minimal API exposes policy CRUD, run orchestration, explain fetches, and cursor-based listing of effective findings (see `/docs/api/policy.md` once published).
- **CLI** – `stella policy simulate/run/show` commands surface JSON verdicts, exit codes, and diff summaries suitable for CI gating.
- **Console / Policy Studio** – UI reads explain traces, policy metadata, approval workflow status, and simulation diffs to guide reviewers.
- **Findings Ledger** – Effective findings feed downstream export, Notify, and risk scoring jobs.
- **Air-gap bundles** – Offline Kit includes policy packs, scoring configs, and explain indexes; export commands generate DSSE-signed bundles for transfer.
-
---
-
-## 8 · Determinism & Guardrails
-
- **Deterministic inputs** – All joins rely on canonical linksets and equivalence tables; batches are sorted, and random/wall-clock APIs are blocked by static analysis plus runtime guards (`ERR_POL_004`).
- **Stable outputs** – Canonical JSON serializers sort keys; digests recorded in run metadata enable reproducible diffs across machines.
- **Idempotent writes** – Materialisers upsert using `{policyId, findingId, tenant}` keys and retain prior versions with append-only history.
- **Sandboxing** – Policy evaluation executes in-process with timeouts; restart-only plug-ins guarantee no runtime DLL injection.
- **Compliance proof** – Every run stores digest of inputs (policy, SBOM batch, advisory snapshot) so auditors can replay decisions offline.
-
---
-
-## 9 · Security, Tenancy & Offline Notes
-
- **Authority scopes:** Gateway enforces `policy:read`, `policy:write`, `policy:simulate`, `policy:runs`, `findings:read`, `effective:write`. Service identities must present DPoP-bound tokens.
- **Tenant isolation:** Collections partition by tenant identifier; cross-tenant queries require explicit admin scopes and return audit warnings.
- **Sealed mode:** In air-gapped deployments the engine surfaces `sealed=true` hints in explain traces, warning about cached EPSS/KEV data and suggesting bundle refreshes (see `docs/airgap/airgap-mode.md`).
- **Observability:** Structured logs carry correlation IDs matching orchestrator job IDs; metrics integrate with OpenTelemetry exporters; sampled rule-hit logs redact policy secrets.
- **Incident response:** Incident mode can be forced via API, boosting trace retention and notifying Notifier through `policy.incident.activated` events.
-
---
-
-## 10 · Working with Policy Packs
-
-1. **Author** in Policy Studio or edit DSL files locally. Validate with `stella policy lint`.
-2. **Simulate** against golden SBOM fixtures (`stella policy simulate --sbom fixtures/*.json`). Inspect explain traces for unexpected overrides.
-3. **Publish** via API or CLI; Authority enforces review/approval workflows (`draft → review → approve → rollout`).
-4. **Monitor** the subsequent incremental runs; if determinism diff fails in CI, roll back pack while investigating digests.
-5. **Bundle** packs for offline sites with `stella policy bundle export` and distribute via Offline Kit.
-
---
-
-## 11 · Compliance Checklist
-
- [ ] **Scopes enforced:** Confirm gateway policy requires `policy:*` and `effective:write` scopes for all mutating endpoints.
- [ ] **Determinism guard active:** Static analyzer blocks clock/RNG usage; CI determinism job diffing repeated runs passes.
- [ ] **Materialisation audit:** Effective findings collections use append-only writers and retain history per policy run.
- [ ] **Explain availability:** UI/CLI expose explain traces for every verdict; sealed-mode warnings display when cached evidence is used.
- [ ] **Offline parity:** Policy bundles (import/export) tested in sealed environment; air-gap degradations documented for operators.
- [ ] **Observability wired:** Metrics (`policy_run_seconds`, `rules_fired_total`, `vex_overrides_total`) and sampled rule hit logs emit to the shared telemetry pipeline with correlation IDs.
- [ ] **Documentation synced:** API (`/docs/api/policy.md`), DSL grammar (`/docs/policy/dsl.md`), lifecycle (`/docs/policy/lifecycle.md`), and run modes (`/docs/policy/runs.md`) cross-link back to this overview.
-
---
-
-*Last updated: 2025-10-26 (Sprint 20).*
-
+# Policy System Overview
+
+> **Imposed rule:** Policies that change reachability or trust weighting must enter shadow mode first and ship coverage fixtures; promotion is blocked until shadow + coverage gates pass (see `docs/policy/lifecycle.md`).
+
+This overview orients authors, reviewers, and operators to the Stella Policy system: the SPL language, lifecycle, evidence inputs, and how policies are enforced online and in air-gapped sites.
+
+## 1. What the Policy System Does
+- Combines SBOM facts, advisories (Concelier), VEX claims (Excititor), reachability signals (Graphs + runtime), trust/entropy signals, and operator metadata to produce deterministic findings.
+- Produces explainable outputs: every verdict carries rule, rationale (`because`), inputs, and evidence hashes.
+- Works online or offline: policies, inputs, and outputs are content-addressed and can be replayed with no network.
+
+## 2. Layers
+- **SPL (Stella Policy Language):** declarative rules (`stella-dsl@1`) with profiles, maps, and rule blocks; no loops or network calls.
+- **Compiler:** canonicalises SPL, emits IR + hash; used by CLI, Console, and CI. Canonical hashes feed attestation and replay.
+- **Engine:** evaluates IR against SBOM/VEX/reachability signals; outputs effective findings and explains every rule fire.
+- **Attestation:** optional DSSE over policy IR and approval metadata; Rekor mirror when online.
+- **Distribution:** policy packs are versioned, tenant-scoped, and promoted via Authority scopes; Offline Kit includes packs + attestations.
+
+## 3. Inputs & Signals
+- SBOM inventory/usage (Scanner), advisories (Concelier), VEX (Excititor), reachability graphs/runtime (Signals), trust/entropy/uncertainty scores, secret-leak findings, environment metadata, and tenant policy defaults.
+- Signals dictionary (normalised): `trust_score`, `reachability.state/score`, `entropy_penalty`, `uncertainty.level`, `runtime_hits`.
+- All inputs must be content-addressed; missing fields evaluate to `unknown`/null and must be handled explicitly.
+
+## 4. Lifecycle (summary)
+1. Draft in SPL with shadow mode on and coverage fixtures (`stella policy test`).
+2. Submit with lint/simulate + coverage artefacts attached.
+3. Review/approve with Authority scopes; determinism and shadow gates enforced in CI.
+4. Publish/attest (DSSE + optional Rekor); promote to environments; activate runs.
+5. Archive or roll back with audit trail preserved.
+
+## 5. Governance & Roles
+- Scopes: `policy:author`, `policy:review`, `policy:approve`, `policy:operate`, `policy:publish`, `policy:activate`, `policy:audit`.
+- Two-person rule recommended for publish/promote; enforced by Authority per tenant.
+- AOC: Aggregation-Only Contract applies to regulated tenants—UI/CLI must respect AOC flags on policies and evidence.
+
+## 6. Review Checklist (fast path)
+- Lint + simulate outputs attached and fresh (<24h).
+- Shadow mode enabled; coverage fixtures passing; twin-run determinism check green.
+- `because` present on every status/severity change; suppressions scoped.
+- Inputs handled explicitly when `unknown` (reachability/runtime missing).
+- Attestation metadata ready (reason, ticket, IR hash) if publish is requested.
+- AOC impact noted; air-gap replay steps documented if applicable.
+
+## 7. Air-gap / Offline Notes
+- Policy packs, attestations, and coverage fixtures ship in Offline Kits; no live feed calls allowed during evaluation.
+- CLI `stella policy simulate --sealed` enforces no-network; policy runs must use frozen SBOM/advisory/VEX bundles and reachability graphs.
+- Attestations and hashes recorded in Evidence Locker; Timeline events emitted on publish/activate.
+
+## 8. Key References
+- `docs/policy/dsl.md` (language)
+- `docs/policy/lifecycle.md` (process, gates)
+- `docs/policy/architecture.md` (engine internals)
+- `docs/modules/policy/implementation_plan.md`
+- `docs/policy/governance.md` (once published)