Files
git.stella-ops.org/docs/policy/overview.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

11 KiB
Raw Blame History

Policy Engine Overview

Goal: Evaluate organisation policies deterministically against scanner SBOMs, Concelier advisories, and Excititor VEX evidence, then publish effective findings that downstream services can trust.

This document introduces the v2 Policy Engine: how the service fits into StellaOps, the artefacts it produces, the contracts it honours, and the guardrails that keep policy decisions reproducible across air-gapped and connected deployments.


1·Role in the Platform

  • Purpose: Compose policy verdicts by reconciling SBOM inventory, advisory metadata, VEX statements, and organisation rules.
  • Form factor: Dedicated .NET 10 Minimal API host (StellaOps.Policy.Engine) plus worker orchestration. Policies are defined in stella-dsl@1 packs compiled to an intermediate representation (IR) with a stable SHA-256 digest.
  • Tenancy: All workloads run under Authority-enforced scopes (policy:*, findings:read, effective:write). Only the Policy Engine identity may materialise effective findings collections.
  • Consumption: Findings ledger, Console, CLI, and Notify read the published effective_finding_{policyId} materialisations and policy run ledger (policy_runs).
  • Offline parity: Bundled policies import/export alongside advisories and VEX. In sealed mode the engine degrades gracefully, annotating explanations whenever cached signals replace live lookups.

2·High-Level Architecture

flowchart LR
    subgraph Inputs
        A[Scanner SBOMs<br/>Inventory & Usage]
        B[Concelier Advisories<br/>Canonical linksets]
        C[Excititor VEX<br/>Consensus status]
        D[Policy Packs<br/>stella-dsl@1]
    end
    subgraph PolicyEngine["StellaOps.Policy.Engine"]
        P1[DSL Compiler<br/>IR + Digest]
        P2[Joiners<br/>SBOM ↔ Advisory ↔ VEX]
        P3[Deterministic Evaluator<br/>Rule hits + scoring]
        P4[Materialisers<br/>effective findings]
        P5[Run Orchestrator<br/>Full & incremental]
    end
    subgraph Outputs
        O1[Effective Findings Collections]
        O2[Explain Traces<br/>Rule hit lineage]
        O3[Metrics & Traces<br/>policy_run_seconds,<br/>rules_fired_total]
        O4[Simulation/Preview Feeds<br/>CLI & Studio]
    end

    A --> P2
    B --> P2
    C --> P2
    D --> P1 --> P3
    P2 --> P3 --> P4 --> O1
    P3 --> O2
    P5 --> P3
    P3 --> O3
    P3 --> O4

3·Core Concepts

Concept Description
Policy Pack Versioned bundle of DSL documents, metadata, and checksum manifest. Packs import/export via CLI and Offline Kit bundles.
Policy Digest SHA-256 of the canonical IR; used for caching, explain trace attribution, and audit proofs.
Effective Findings Append-only Mongo collections (effective_finding_{policyId}) storing the latest verdict per finding, plus history sidecars.
Policy Run Execution record persisted in policy_runs capturing inputs, run mode, timings, and determinism hash.
Explain Trace Structured tree showing rule matches, data provenance, and scoring components for UI/CLI explain features.
Simulation Dry-run evaluation that compares a candidate pack against the active pack and produces verdict diffs without persisting results.
Incident Mode Elevated sampling/trace capture toggled automatically when SLOs breach; emits events for Notifier and Timeline Indexer.

4·Inputs & Pre-processing

4.1 SBOM Inventory

  • Source: Scanner.WebService publishes inventory/usage SBOMs plus BOM-Index (roaring bitmap) metadata.
  • Consumption: Policy joiners use the index to expand candidate components quickly, keeping evaluation under the <5s warm path budget.
  • Schema: CycloneDX Protobuf + JSON views; Policy Engine reads canonical projections via shared SBOM adapters.

4.2 Advisory Corpus

  • Source: Concelier exports canonical advisories with deterministic identifiers, linksets, and equivalence tables.
  • Contract: Policy Engine only consumes raw content.raw, identifiers, and linkset fields per Aggregation-Only Contract (AOC); derived precedence remains a policy concern.

4.3 VEX Evidence

  • Source: Excititor consensus service resolves OpenVEX / CSAF statements, preserving conflicts.
  • Usage: Policy rules can require specific VEX vendors or justification codes; evaluator records when cached evidence substitutes for live statements (sealed mode).

4.4 Policy Packs

  • Authored in Policy Studio or CLI, validated against the stella-dsl@1 schema.
  • Compiler performs canonicalisation (ordering, defaulting) before emitting IR and digest.
  • Packs bundle scoring profiles, allowlist metadata, and optional reachability weighting tables.

5·Evaluation Flow

  1. Run selection Orchestrator accepts full, incremental, or simulate jobs. Incremental runs listen to change streams from Concelier, Excititor, and SBOM imports to scope re-evaluation.
  2. Input staging Candidates fetched in deterministic batches; identity graph from Concelier strengthens PURL lookups.
  3. Rule execution Evaluator walks rules in lexical order (first-match wins). Actions available: block, ignore, warn, defer, escalate, requireVex, each supporting quieting semantics where permitted.
  4. Scoring PolicyScoringConfig applies severity, trust, reachability weights plus penalties (warnPenalty, ignorePenalty, quietPenalty).
  5. Verdict and explain Engine constructs PolicyVerdict records with inputs, quiet flags, unknown confidence bands, and provenance markers; explain trees capture rule lineage.
  6. Materialisation Effective findings collections are upserted append-only, stamped with run identifier, policy digest, and tenant.
  7. Publishing Completed run writes to policy_runs, emits metrics (policy_run_seconds, rules_fired_total, vex_overrides_total), and raises events for Console/Notify subscribers.

6·Run Modes

Mode Trigger Scope Persistence Typical Use
Full Manual CLI (stella policy run), scheduled nightly, or emergency rebaseline Entire tenant Writes effective findings and run record After policy publish or major advisory/VEX import
Incremental Change-stream queue driven by Concelier/Excititor/SBOM deltas Only affected artefacts Writes effective findings and run record Continuous upkeep; ensures SLA ≤5min from source change
Simulate CLI/Studio preview, CI pipelines Candidate subset (diff against baseline) No materialisation; produces explain & diff payloads Policy authoring, CI regression suites

All modes are cancellation-aware and checkpoint progress for replay in case of deployment restarts.


7·Outputs & Integrations

  • APIs Minimal API exposes policy CRUD, run orchestration, explain fetches, and cursor-based listing of effective findings (see /docs/api/policy.md once published).
  • CLI stella policy simulate/run/show commands surface JSON verdicts, exit codes, and diff summaries suitable for CI gating.
  • Console / Policy Studio UI reads explain traces, policy metadata, approval workflow status, and simulation diffs to guide reviewers.
  • Findings Ledger Effective findings feed downstream export, Notify, and risk scoring jobs.
  • Air-gap bundles Offline Kit includes policy packs, scoring configs, and explain indexes; export commands generate DSSE-signed bundles for transfer.

8·Determinism & Guardrails

  • Deterministic inputs All joins rely on canonical linksets and equivalence tables; batches are sorted, and random/wall-clock APIs are blocked by static analysis plus runtime guards (ERR_POL_004).
  • Stable outputs Canonical JSON serializers sort keys; digests recorded in run metadata enable reproducible diffs across machines.
  • Idempotent writes Materialisers upsert using {policyId, findingId, tenant} keys and retain prior versions with append-only history.
  • Sandboxing Policy evaluation executes in-process with timeouts; restart-only plug-ins guarantee no runtime DLL injection.
  • Compliance proof Every run stores digest of inputs (policy, SBOM batch, advisory snapshot) so auditors can replay decisions offline.

9·Security, Tenancy & Offline Notes

  • Authority scopes: Gateway enforces policy:read, policy:write, policy:simulate, policy:runs, findings:read, effective:write. Service identities must present DPoP-bound tokens.
  • Tenant isolation: Collections partition by tenant identifier; cross-tenant queries require explicit admin scopes and return audit warnings.
  • Sealed mode: In air-gapped deployments the engine surfaces sealed=true hints in explain traces, warning about cached EPSS/KEV data and suggesting bundle refreshes (see docs/airgap/EPIC_16_AIRGAP_MODE.md §3.7).
  • Observability: Structured logs carry correlation IDs matching orchestrator job IDs; metrics integrate with OpenTelemetry exporters; sampled rule-hit logs redact policy secrets.
  • Incident response: Incident mode can be forced via API, boosting trace retention and notifying Notifier through policy.incident.activated events.

10·Working with Policy Packs

  1. Author in Policy Studio or edit DSL files locally. Validate with stella policy lint.
  2. Simulate against golden SBOM fixtures (stella policy simulate --sbom fixtures/*.json). Inspect explain traces for unexpected overrides.
  3. Publish via API or CLI; Authority enforces review/approval workflows (draft → review → approve → rollout).
  4. Monitor the subsequent incremental runs; if determinism diff fails in CI, roll back pack while investigating digests.
  5. Bundle packs for offline sites with stella policy bundle export and distribute via Offline Kit.

11·Compliance Checklist

  • Scopes enforced: Confirm gateway policy requires policy:* and effective:write scopes for all mutating endpoints.
  • Determinism guard active: Static analyzer blocks clock/RNG usage; CI determinism job diffing repeated runs passes.
  • Materialisation audit: Effective findings collections use append-only writers and retain history per policy run.
  • Explain availability: UI/CLI expose explain traces for every verdict; sealed-mode warnings display when cached evidence is used.
  • Offline parity: Policy bundles (import/export) tested in sealed environment; air-gap degradations documented for operators.
  • Observability wired: Metrics (policy_run_seconds, rules_fired_total, vex_overrides_total) and sampled rule hit logs emit to the shared telemetry pipeline with correlation IDs.
  • Documentation synced: API (/docs/api/policy.md), DSL grammar (/docs/policy/dsl.md), lifecycle (/docs/policy/lifecycle.md), and run modes (/docs/policy/runs.md) cross-link back to this overview.

Last updated: 2025-10-26 (Sprint 20).