11 KiB
11 KiB
Policy Engine Overview
Goal: Evaluate organisation policies deterministically against scanner SBOMs, Concelier advisories, and Excititor VEX evidence, then publish effective findings that downstream services can trust.
This document introduces the v2 Policy Engine: how the service fits into Stella Ops, the artefacts it produces, the contracts it honours, and the guardrails that keep policy decisions reproducible across air-gapped and connected deployments.
1 · Role in the Platform
- Purpose: Compose policy verdicts by reconciling SBOM inventory, advisory metadata, VEX statements, and organisation rules.
- Form factor: Dedicated
.NET 10Minimal API host (StellaOps.Policy.Engine) plus worker orchestration. Policies are defined instella-dsl@1packs compiled to an intermediate representation (IR) with a stable SHA-256 digest. - Tenancy: All workloads run under Authority-enforced scopes (
policy:*,findings:read,effective:write). Only the Policy Engine identity may materialise effective findings collections. - Consumption: Findings ledger, Console, CLI, and Notify read the published
effective_finding_{policyId}materialisations and policy run ledger (policy_runs). - Offline parity: Bundled policies import/export alongside advisories and VEX. In sealed mode the engine degrades gracefully, annotating explanations whenever cached signals replace live lookups.
2 · High-Level Architecture
flowchart LR
subgraph Inputs
A[Scanner SBOMs<br/>Inventory & Usage]
B[Concelier Advisories<br/>Canonical linksets]
C[Excititor VEX<br/>Consensus status]
D[Policy Packs<br/>stella-dsl@1]
end
subgraph PolicyEngine["StellaOps.Policy.Engine"]
P1[DSL Compiler<br/>IR + Digest]
P2[Joiners<br/>SBOM ↔ Advisory ↔ VEX]
P3[Deterministic Evaluator<br/>Rule hits + scoring]
P4[Materialisers<br/>effective findings]
P5[Run Orchestrator<br/>Full & incremental]
end
subgraph Outputs
O1[Effective Findings Collections]
O2[Explain Traces<br/>Rule hit lineage]
O3[Metrics & Traces<br/>policy_run_seconds,<br/>rules_fired_total]
O4[Simulation/Preview Feeds<br/>CLI & Studio]
end
A --> P2
B --> P2
C --> P2
D --> P1 --> P3
P2 --> P3 --> P4 --> O1
P3 --> O2
P5 --> P3
P3 --> O3
P3 --> O4
3 · Core Concepts
| Concept | Description |
|---|---|
| Policy Pack | Versioned bundle of DSL documents, metadata, and checksum manifest. Packs import/export via CLI and Offline Kit bundles. |
| Policy Digest | SHA-256 of the canonical IR; used for caching, explain trace attribution, and audit proofs. |
| Effective Findings | Append-only Mongo collections (effective_finding_{policyId}) storing the latest verdict per finding, plus history sidecars. |
| Policy Run | Execution record persisted in policy_runs capturing inputs, run mode, timings, and determinism hash. |
| Explain Trace | Structured tree showing rule matches, data provenance, and scoring components for UI/CLI explain features. |
| Simulation | Dry-run evaluation that compares a candidate pack against the active pack and produces verdict diffs without persisting results. |
| Incident Mode | Elevated sampling/trace capture toggled automatically when SLOs breach; emits events for Notifier and Timeline Indexer. |
4 · Inputs & Pre-processing
4.1 SBOM Inventory
- Source: Scanner.WebService publishes inventory/usage SBOMs plus BOM-Index (roaring bitmap) metadata.
- Consumption: Policy joiners use the index to expand candidate components quickly, keeping evaluation under the
< 5 swarm path budget. - Schema: CycloneDX Protobuf + JSON views; Policy Engine reads canonical projections via shared SBOM adapters.
4.2 Advisory Corpus
- Source: Concelier exports canonical advisories with deterministic identifiers, linksets, and equivalence tables.
- Contract: Policy Engine only consumes raw
content.raw,identifiers, andlinksetfields per Aggregation-Only Contract (AOC); derived precedence remains a policy concern.
4.3 VEX Evidence
- Source: Excititor consensus service resolves OpenVEX / CSAF statements, preserving conflicts.
- Usage: Policy rules can require specific VEX vendors or justification codes; evaluator records when cached evidence substitutes for live statements (sealed mode).
4.4 Policy Packs
- Authored in Policy Studio or CLI, validated against the
stella-dsl@1schema. - Compiler performs canonicalisation (ordering, defaulting) before emitting IR and digest.
- Packs bundle scoring profiles, allowlist metadata, and optional reachability weighting tables.
5 · Evaluation Flow
- Run selection – Orchestrator accepts
full,incremental, orsimulatejobs. Incremental runs listen to change streams from Concelier, Excititor, and SBOM imports to scope re-evaluation. - Input staging – Candidates fetched in deterministic batches; identity graph from Concelier strengthens PURL lookups.
- Rule execution – Evaluator walks rules in lexical order (first-match wins). Actions available:
block,ignore,warn,defer,escalate,requireVex, each supporting quieting semantics where permitted. - Scoring –
PolicyScoringConfigapplies severity, trust, reachability weights plus penalties (warnPenalty,ignorePenalty,quietPenalty). - Verdict and explain – Engine constructs
PolicyVerdictrecords with inputs, quiet flags, unknown confidence bands, and provenance markers; explain trees capture rule lineage. - Materialisation – Effective findings collections are upserted append-only, stamped with run identifier, policy digest, and tenant.
- Publishing – Completed run writes to
policy_runs, emits metrics (policy_run_seconds,rules_fired_total,vex_overrides_total), and raises events for Console/Notify subscribers.
6 · Run Modes
| Mode | Trigger | Scope | Persistence | Typical Use |
|---|---|---|---|---|
| Full | Manual CLI (stella policy run), scheduled nightly, or emergency rebaseline |
Entire tenant | Writes effective findings and run record | After policy publish or major advisory/VEX import |
| Incremental | Change-stream queue driven by Concelier/Excititor/SBOM deltas | Only affected artefacts | Writes effective findings and run record | Continuous upkeep; ensures SLA ≤ 5 min from source change |
| Simulate | CLI/Studio preview, CI pipelines | Candidate subset (diff against baseline) | No materialisation; produces explain & diff payloads | Policy authoring, CI regression suites |
All modes are cancellation-aware and checkpoint progress for replay in case of deployment restarts.
7 · Outputs & Integrations
- APIs – Minimal API exposes policy CRUD, run orchestration, explain fetches, and cursor-based listing of effective findings (see
/docs/api/policy.mdonce published). - CLI –
stella policy simulate/run/showcommands surface JSON verdicts, exit codes, and diff summaries suitable for CI gating. - Console / Policy Studio – UI reads explain traces, policy metadata, approval workflow status, and simulation diffs to guide reviewers.
- Findings Ledger – Effective findings feed downstream export, Notify, and risk scoring jobs.
- Air-gap bundles – Offline Kit includes policy packs, scoring configs, and explain indexes; export commands generate DSSE-signed bundles for transfer.
8 · Determinism & Guardrails
- Deterministic inputs – All joins rely on canonical linksets and equivalence tables; batches are sorted, and random/wall-clock APIs are blocked by static analysis plus runtime guards (
ERR_POL_004). - Stable outputs – Canonical JSON serializers sort keys; digests recorded in run metadata enable reproducible diffs across machines.
- Idempotent writes – Materialisers upsert using
{policyId, findingId, tenant}keys and retain prior versions with append-only history. - Sandboxing – Policy evaluation executes in-process with timeouts; restart-only plug-ins guarantee no runtime DLL injection.
- Compliance proof – Every run stores digest of inputs (policy, SBOM batch, advisory snapshot) so auditors can replay decisions offline.
9 · Security, Tenancy & Offline Notes
- Authority scopes: Gateway enforces
policy:read,policy:write,policy:simulate,policy:runs,findings:read,effective:write. Service identities must present DPoP-bound tokens. - Tenant isolation: Collections partition by tenant identifier; cross-tenant queries require explicit admin scopes and return audit warnings.
- Sealed mode: In air-gapped deployments the engine surfaces
sealed=truehints in explain traces, warning about cached EPSS/KEV data and suggesting bundle refreshes (seedocs/airgap/EPIC_16_AIRGAP_MODE.md§3.7). - Observability: Structured logs carry correlation IDs matching orchestrator job IDs; metrics integrate with OpenTelemetry exporters; sampled rule-hit logs redact policy secrets.
- Incident response: Incident mode can be forced via API, boosting trace retention and notifying Notifier through
policy.incident.activatedevents.
10 · Working with Policy Packs
- Author in Policy Studio or edit DSL files locally. Validate with
stella policy lint. - Simulate against golden SBOM fixtures (
stella policy simulate --sbom fixtures/*.json). Inspect explain traces for unexpected overrides. - Publish via API or CLI; Authority enforces review/approval workflows (
draft → review → approve → rollout). - Monitor the subsequent incremental runs; if determinism diff fails in CI, roll back pack while investigating digests.
- Bundle packs for offline sites with
stella policy bundle exportand distribute via Offline Kit.
11 · Compliance Checklist
- Scopes enforced: Confirm gateway policy requires
policy:*andeffective:writescopes for all mutating endpoints. - Determinism guard active: Static analyzer blocks clock/RNG usage; CI determinism job diffing repeated runs passes.
- Materialisation audit: Effective findings collections use append-only writers and retain history per policy run.
- Explain availability: UI/CLI expose explain traces for every verdict; sealed-mode warnings display when cached evidence is used.
- Offline parity: Policy bundles (import/export) tested in sealed environment; air-gap degradations documented for operators.
- Observability wired: Metrics (
policy_run_seconds,rules_fired_total,vex_overrides_total) and sampled rule hit logs emit to the shared telemetry pipeline with correlation IDs. - Documentation synced: API (
/docs/api/policy.md), DSL grammar (/docs/policy/dsl.md), lifecycle (/docs/policy/lifecycle.md), and run modes (/docs/policy/runs.md) cross-link back to this overview.
Last updated: 2025-10-26 (Sprint 20).