Files
git.stella-ops.org/docs/policy/runtime.md
StellaOps Bot d63af51f84
Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
up
2025-11-26 20:23:28 +02:00

4.4 KiB

Policy Runtime & Evaluation

Imposed rule: Runtime evaluations must use frozen inputs (SBOM, advisories, VEX, reachability, signals) and emit explain traces plus DSSE/attestation metadata; no live feed calls during evaluation.

This document describes how SPL policies are compiled, cached, and executed, and how results are surfaced via APIs, CLI, UI, and observability.

1. Components

  • Compiler: converts SPL (stella-dsl@1) into canonical IR JSON, hashes it, and validates lint/coverage. Produces IR cache used by Engine.
  • Engine: deterministic evaluator that consumes IR + inputs (SBOM, advisory, VEX, signals) and emits findings + explain traces.
  • Caches:
    • IR cache keyed by policyId/version/IR hash.
    • Input cursors (SBOM/advisory/VEX snapshots, reachability graphs) to guarantee replay.
    • Explain trace cache for recently queried runs (TTL, tenant-scoped).
  • Attestation: optional DSSE over IR hash + approval metadata; Rekor mirror when online; stored alongside run outputs in Evidence Locker.

2. Execution flow

  1. Resolve active policy version for tenant (or specified version for simulate).
  2. Load IR from cache; verify hash matches attested value if provided.
  3. Fetch frozen inputs via cursors: SBOM digest, advisory snapshot id, VEX set, reachability graph hash, signals bundle.
  4. Evaluate rules in priority order; record explain entries (rule, because, inputs, signals).
  5. Persist findings, explain traces, and run metadata (runId, policyVersion, hashes) to storage.
  6. Emit events: policy.run.started, policy.run.completed, policy.run.failed; optionally policy.run.shadow when settings.shadow=true.

3. Caching & determinism

  • IR cache warmed at publish; invalidated on new policy version.
  • Input cursors are mandatory; if missing, run is blocked (returns inputs_unfrozen).
  • Explain trace storage keeps deterministic ordering; capped by tenant quotas.
  • Shadow mode runs record findings but mark enforced=false; promotion blocked until shadow+coverage gates pass.

4. APIs & CLI

  • API: POST /policies/{id}/simulate, POST /policies/{id}/run, GET /policy-runs/{runId} (findings + explain), GET /policies/{id}/versions/{v} (IR, hash, attestation refs).
  • CLI: stella policy simulate, stella policy run, stella policy explain <runId> --format json|table, stella policy export --run <runId> --offline.
  • Headers: X-Stella-Tenant, X-Stella-Shadow (optional), If-None-Match for IR cache revalidation.

5. Observability & SLOs

  • Metrics: policy_runs_total{status}, policy_run_duration_seconds, policy_explain_cache_hits, policy_inputs_unfrozen_total, policy_shadow_runs_total.
  • Logs include policyId, version, runId, tenant, shadow, input_cursor hashes.
  • Traces: span per run with events for rule evaluation batches; attributes include counts of rules fired and unknowns encountered.
  • SLOs (suggested):
    • p95 policy run latency < 2s for simulate, < 10s for full run.
    • Error budget: <0.5% failed runs per rolling 7d.
    • Explain cache hit rate >80% for repeated queries.

6. Failure modes & handling

  • Inputs unfrozen: return 409 with required cursors; emit policy.inputs_unfrozen event.
  • Hash mismatch: IR hash differs from attested; block run and emit policy.ir_hash_mismatch alert.
  • Unknown signals: if required signals missing, downgrade to unknown and optionally set status=under_investigation; flag in explain trace.
  • Exceeded quotas: explain storage or run count caps → 429 with Retry-After; run not executed.

7. Offline / air-gap

  • All inputs fetched from Offline Kit bundles; no network during evaluate.
  • CLI stella policy run --sealed --bundle <path> loads IR, inputs, and signals from bundle; writes outputs + attestation-ready manifest.
  • Runs produce DSSE-ready payloads (policy.run@1) that can be signed later when connectivity is restored.

8. Data model (high level)

  • policy_runs: runId, policyId, version, tenant, shadow, input_cursors, ir_hash, attestation_ref, started_at, completed_at, status, stats (rules fired, explains, unknowns), storage_refs (findings, explains).
  • policy_findings: flattened findings with references to explain entries.
  • policy_explains: rule-level explain traces with inputs, signals, because text.

9. References

  • docs/policy/dsl.md
  • docs/policy/lifecycle.md
  • docs/policy/architecture.md
  • docs/policy/overview.md
  • docs/reachability/DELIVERY_GUIDE.md