Files
git.stella-ops.org/docs/policy/runtime.md
StellaOps Bot d63af51f84
Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
up
2025-11-26 20:23:28 +02:00

66 lines
4.4 KiB
Markdown

# Policy Runtime & Evaluation
> **Imposed rule:** Runtime evaluations must use frozen inputs (SBOM, advisories, VEX, reachability, signals) and emit explain traces plus DSSE/attestation metadata; no live feed calls during evaluation.
This document describes how SPL policies are compiled, cached, and executed, and how results are surfaced via APIs, CLI, UI, and observability.
## 1. Components
- **Compiler**: converts SPL (`stella-dsl@1`) into canonical IR JSON, hashes it, and validates lint/coverage. Produces IR cache used by Engine.
- **Engine**: deterministic evaluator that consumes IR + inputs (SBOM, advisory, VEX, signals) and emits findings + explain traces.
- **Caches**:
- IR cache keyed by `policyId`/`version`/IR hash.
- Input cursors (SBOM/advisory/VEX snapshots, reachability graphs) to guarantee replay.
- Explain trace cache for recently queried runs (TTL, tenant-scoped).
- **Attestation**: optional DSSE over IR hash + approval metadata; Rekor mirror when online; stored alongside run outputs in Evidence Locker.
## 2. Execution flow
1. Resolve active policy version for tenant (or specified version for simulate).
2. Load IR from cache; verify hash matches attested value if provided.
3. Fetch frozen inputs via cursors: SBOM digest, advisory snapshot id, VEX set, reachability graph hash, signals bundle.
4. Evaluate rules in priority order; record explain entries (rule, because, inputs, signals).
5. Persist findings, explain traces, and run metadata (`runId`, `policyVersion`, hashes) to storage.
6. Emit events: `policy.run.started`, `policy.run.completed`, `policy.run.failed`; optionally `policy.run.shadow` when settings.shadow=true.
## 3. Caching & determinism
- IR cache warmed at publish; invalidated on new policy version.
- Input cursors are mandatory; if missing, run is blocked (returns `inputs_unfrozen`).
- Explain trace storage keeps deterministic ordering; capped by tenant quotas.
- Shadow mode runs record findings but mark `enforced=false`; promotion blocked until shadow+coverage gates pass.
## 4. APIs & CLI
- API: `POST /policies/{id}/simulate`, `POST /policies/{id}/run`, `GET /policy-runs/{runId}` (findings + explain), `GET /policies/{id}/versions/{v}` (IR, hash, attestation refs).
- CLI: `stella policy simulate`, `stella policy run`, `stella policy explain <runId> --format json|table`, `stella policy export --run <runId> --offline`.
- Headers: `X-Stella-Tenant`, `X-Stella-Shadow` (optional), `If-None-Match` for IR cache revalidation.
## 5. Observability & SLOs
- Metrics: `policy_runs_total{status}`, `policy_run_duration_seconds`, `policy_explain_cache_hits`, `policy_inputs_unfrozen_total`, `policy_shadow_runs_total`.
- Logs include `policyId`, `version`, `runId`, `tenant`, `shadow`, `input_cursor` hashes.
- Traces: span per run with events for rule evaluation batches; attributes include counts of rules fired and unknowns encountered.
- SLOs (suggested):
- p95 policy run latency < 2s for simulate, < 10s for full run.
- Error budget: <0.5% failed runs per rolling 7d.
- Explain cache hit rate >80% for repeated queries.
## 6. Failure modes & handling
- **Inputs unfrozen**: return 409 with required cursors; emit `policy.inputs_unfrozen` event.
- **Hash mismatch**: IR hash differs from attested; block run and emit `policy.ir_hash_mismatch` alert.
- **Unknown signals**: if required signals missing, downgrade to `unknown` and optionally set `status=under_investigation`; flag in explain trace.
- **Exceeded quotas**: explain storage or run count caps → 429 with `Retry-After`; run not executed.
## 7. Offline / air-gap
- All inputs fetched from Offline Kit bundles; no network during evaluate.
- CLI `stella policy run --sealed --bundle <path>` loads IR, inputs, and signals from bundle; writes outputs + attestation-ready manifest.
- Runs produce DSSE-ready payloads (`policy.run@1`) that can be signed later when connectivity is restored.
## 8. Data model (high level)
- `policy_runs`: `runId`, `policyId`, `version`, `tenant`, `shadow`, `input_cursors`, `ir_hash`, `attestation_ref`, `started_at`, `completed_at`, `status`, `stats` (rules fired, explains, unknowns), `storage_refs` (findings, explains).
- `policy_findings`: flattened findings with references to explain entries.
- `policy_explains`: rule-level explain traces with inputs, signals, because text.
## 9. References
- `docs/policy/dsl.md`
- `docs/policy/lifecycle.md`
- `docs/policy/architecture.md`
- `docs/policy/overview.md`
- `docs/reachability/DELIVERY_GUIDE.md`