up

2025-11-26 20:23:28 +02:00
parent 4831c7fcb0
commit d63af51f84
139 changed files with 8010 additions and 2795 deletions
--- a/docs/policy/runtime.md
+++ b/docs/policy/runtime.md
@@ -0,0 +1,65 @@
+# Policy Runtime & Evaluation
+
+> **Imposed rule:** Runtime evaluations must use frozen inputs (SBOM, advisories, VEX, reachability, signals) and emit explain traces plus DSSE/attestation metadata; no live feed calls during evaluation.
+
+This document describes how SPL policies are compiled, cached, and executed, and how results are surfaced via APIs, CLI, UI, and observability.
+
+## 1. Components
+- **Compiler**: converts SPL (`stella-dsl@1`) into canonical IR JSON, hashes it, and validates lint/coverage. Produces IR cache used by Engine.
+- **Engine**: deterministic evaluator that consumes IR + inputs (SBOM, advisory, VEX, signals) and emits findings + explain traces.
+- **Caches**:
+  - IR cache keyed by `policyId`/`version`/IR hash.
+  - Input cursors (SBOM/advisory/VEX snapshots, reachability graphs) to guarantee replay.
+  - Explain trace cache for recently queried runs (TTL, tenant-scoped).
+- **Attestation**: optional DSSE over IR hash + approval metadata; Rekor mirror when online; stored alongside run outputs in Evidence Locker.
+
+## 2. Execution flow
+1. Resolve active policy version for tenant (or specified version for simulate).
+2. Load IR from cache; verify hash matches attested value if provided.
+3. Fetch frozen inputs via cursors: SBOM digest, advisory snapshot id, VEX set, reachability graph hash, signals bundle.
+4. Evaluate rules in priority order; record explain entries (rule, because, inputs, signals).
+5. Persist findings, explain traces, and run metadata (`runId`, `policyVersion`, hashes) to storage.
+6. Emit events: `policy.run.started`, `policy.run.completed`, `policy.run.failed`; optionally `policy.run.shadow` when settings.shadow=true.
+
+## 3. Caching & determinism
+- IR cache warmed at publish; invalidated on new policy version.
+- Input cursors are mandatory; if missing, run is blocked (returns `inputs_unfrozen`).
+- Explain trace storage keeps deterministic ordering; capped by tenant quotas.
+- Shadow mode runs record findings but mark `enforced=false`; promotion blocked until shadow+coverage gates pass.
+
+## 4. APIs & CLI
+- API: `POST /policies/{id}/simulate`, `POST /policies/{id}/run`, `GET /policy-runs/{runId}` (findings + explain), `GET /policies/{id}/versions/{v}` (IR, hash, attestation refs).
+- CLI: `stella policy simulate`, `stella policy run`, `stella policy explain <runId> --format json|table`, `stella policy export --run <runId> --offline`.
+- Headers: `X-Stella-Tenant`, `X-Stella-Shadow` (optional), `If-None-Match` for IR cache revalidation.
+
+## 5. Observability & SLOs
+- Metrics: `policy_runs_total{status}`, `policy_run_duration_seconds`, `policy_explain_cache_hits`, `policy_inputs_unfrozen_total`, `policy_shadow_runs_total`.
+- Logs include `policyId`, `version`, `runId`, `tenant`, `shadow`, `input_cursor` hashes.
+- Traces: span per run with events for rule evaluation batches; attributes include counts of rules fired and unknowns encountered.
+- SLOs (suggested):
+  - p95 policy run latency < 2s for simulate, < 10s for full run.
+  - Error budget: <0.5% failed runs per rolling 7d.
+  - Explain cache hit rate >80% for repeated queries.
+
+## 6. Failure modes & handling
+- **Inputs unfrozen**: return 409 with required cursors; emit `policy.inputs_unfrozen` event.
+- **Hash mismatch**: IR hash differs from attested; block run and emit `policy.ir_hash_mismatch` alert.
+- **Unknown signals**: if required signals missing, downgrade to `unknown` and optionally set `status=under_investigation`; flag in explain trace.
+- **Exceeded quotas**: explain storage or run count caps → 429 with `Retry-After`; run not executed.
+
+## 7. Offline / air-gap
+- All inputs fetched from Offline Kit bundles; no network during evaluate.
+- CLI `stella policy run --sealed --bundle <path>` loads IR, inputs, and signals from bundle; writes outputs + attestation-ready manifest.
+- Runs produce DSSE-ready payloads (`policy.run@1`) that can be signed later when connectivity is restored.
+
+## 8. Data model (high level)
+- `policy_runs`: `runId`, `policyId`, `version`, `tenant`, `shadow`, `input_cursors`, `ir_hash`, `attestation_ref`, `started_at`, `completed_at`, `status`, `stats` (rules fired, explains, unknowns), `storage_refs` (findings, explains).
+- `policy_findings`: flattened findings with references to explain entries.
+- `policy_explains`: rule-level explain traces with inputs, signals, because text.
+
+## 9. References
+- `docs/policy/dsl.md`
+- `docs/policy/lifecycle.md`
+- `docs/policy/architecture.md`
+- `docs/policy/overview.md`
+- `docs/reachability/DELIVERY_GUIDE.md`