Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
4.4 KiB
4.4 KiB
Policy Runtime & Evaluation
Imposed rule: Runtime evaluations must use frozen inputs (SBOM, advisories, VEX, reachability, signals) and emit explain traces plus DSSE/attestation metadata; no live feed calls during evaluation.
This document describes how SPL policies are compiled, cached, and executed, and how results are surfaced via APIs, CLI, UI, and observability.
1. Components
- Compiler: converts SPL (
stella-dsl@1) into canonical IR JSON, hashes it, and validates lint/coverage. Produces IR cache used by Engine. - Engine: deterministic evaluator that consumes IR + inputs (SBOM, advisory, VEX, signals) and emits findings + explain traces.
- Caches:
- IR cache keyed by
policyId/version/IR hash. - Input cursors (SBOM/advisory/VEX snapshots, reachability graphs) to guarantee replay.
- Explain trace cache for recently queried runs (TTL, tenant-scoped).
- IR cache keyed by
- Attestation: optional DSSE over IR hash + approval metadata; Rekor mirror when online; stored alongside run outputs in Evidence Locker.
2. Execution flow
- Resolve active policy version for tenant (or specified version for simulate).
- Load IR from cache; verify hash matches attested value if provided.
- Fetch frozen inputs via cursors: SBOM digest, advisory snapshot id, VEX set, reachability graph hash, signals bundle.
- Evaluate rules in priority order; record explain entries (rule, because, inputs, signals).
- Persist findings, explain traces, and run metadata (
runId,policyVersion, hashes) to storage. - Emit events:
policy.run.started,policy.run.completed,policy.run.failed; optionallypolicy.run.shadowwhen settings.shadow=true.
3. Caching & determinism
- IR cache warmed at publish; invalidated on new policy version.
- Input cursors are mandatory; if missing, run is blocked (returns
inputs_unfrozen). - Explain trace storage keeps deterministic ordering; capped by tenant quotas.
- Shadow mode runs record findings but mark
enforced=false; promotion blocked until shadow+coverage gates pass.
4. APIs & CLI
- API:
POST /policies/{id}/simulate,POST /policies/{id}/run,GET /policy-runs/{runId}(findings + explain),GET /policies/{id}/versions/{v}(IR, hash, attestation refs). - CLI:
stella policy simulate,stella policy run,stella policy explain <runId> --format json|table,stella policy export --run <runId> --offline. - Headers:
X-Stella-Tenant,X-Stella-Shadow(optional),If-None-Matchfor IR cache revalidation.
5. Observability & SLOs
- Metrics:
policy_runs_total{status},policy_run_duration_seconds,policy_explain_cache_hits,policy_inputs_unfrozen_total,policy_shadow_runs_total. - Logs include
policyId,version,runId,tenant,shadow,input_cursorhashes. - Traces: span per run with events for rule evaluation batches; attributes include counts of rules fired and unknowns encountered.
- SLOs (suggested):
- p95 policy run latency < 2s for simulate, < 10s for full run.
- Error budget: <0.5% failed runs per rolling 7d.
- Explain cache hit rate >80% for repeated queries.
6. Failure modes & handling
- Inputs unfrozen: return 409 with required cursors; emit
policy.inputs_unfrozenevent. - Hash mismatch: IR hash differs from attested; block run and emit
policy.ir_hash_mismatchalert. - Unknown signals: if required signals missing, downgrade to
unknownand optionally setstatus=under_investigation; flag in explain trace. - Exceeded quotas: explain storage or run count caps → 429 with
Retry-After; run not executed.
7. Offline / air-gap
- All inputs fetched from Offline Kit bundles; no network during evaluate.
- CLI
stella policy run --sealed --bundle <path>loads IR, inputs, and signals from bundle; writes outputs + attestation-ready manifest. - Runs produce DSSE-ready payloads (
policy.run@1) that can be signed later when connectivity is restored.
8. Data model (high level)
policy_runs:runId,policyId,version,tenant,shadow,input_cursors,ir_hash,attestation_ref,started_at,completed_at,status,stats(rules fired, explains, unknowns),storage_refs(findings, explains).policy_findings: flattened findings with references to explain entries.policy_explains: rule-level explain traces with inputs, signals, because text.
9. References
docs/policy/dsl.mddocs/policy/lifecycle.mddocs/policy/architecture.mddocs/policy/overview.mddocs/reachability/DELIVERY_GUIDE.md