Add Policy DSL Validator, Schema Exporter, and Simulation Smoke tools
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Implemented PolicyDslValidator with command-line options for strict mode and JSON output.
- Created PolicySchemaExporter to generate JSON schemas for policy-related models.
- Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes.
- Added project files and necessary dependencies for each tool.
- Ensured proper error handling and usage instructions across tools.
This commit is contained in:
2025-10-27 08:00:11 +02:00
parent 651b8e0fa3
commit 96d52884e8
712 changed files with 49449 additions and 6124 deletions

294
docs/policy/dsl.md Normal file
View File

@@ -0,0 +1,294 @@
# Stella Policy DSL (`stella-dsl@1`)
> **Audience:** Policy authors, reviewers, and tooling engineers building lint/compile flows for the Policy Engine v2 rollout (Sprint20).
This document specifies the `stella-dsl@1` grammar, semantics, and guardrails used by StellaOps to transform SBOM facts, Concelier advisories, and Excititor VEX statements into effective findings. Use it with the [Policy Engine Overview](overview.md) for architectural context and the upcoming lifecycle/run guides for operational workflows.
---
## 1·Design Goals
- **Deterministic:** Same policy + same inputs ⇒ identical findings on every machine.
- **Declarative:** No arbitrary loops, network calls, or clock access.
- **Explainable:** Every decision records the rule, inputs, and rationale in the explain trace.
- **Lean authoring:** Common precedence, severity, and suppression patterns are first-class.
- **Offline-friendly:** Grammar and built-ins avoid cloud dependencies, run the same in sealed deployments.
---
## 2·Document Structure
Policy packs ship one or more `.stella` files. Each file contains exactly one `policy` block:
```dsl
policy "Default Org Policy" syntax "stella-dsl@1" {
metadata {
description = "Baseline severity + VEX precedence"
tags = ["baseline","vex"]
}
profile severity {
map vendor_weight {
source "GHSA" => +0.5
source "OSV" => +0.0
source "VendorX" => -0.2
}
env exposure_adjustments {
if env.runtime == "serverless" then -0.5
if env.exposure == "internal-only" then -1.0
}
}
rule vex_precedence priority 10 {
when vex.any(status in ["not_affected","fixed"])
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
then status := vex.status
because "Strong vendor justification prevails";
}
}
```
High-level layout:
| Section | Purpose |
|---------|---------|
| `metadata` | Optional descriptive fields surfaced in Console/CLI. |
| `imports` | Reserved for future reuse (not yet implemented in `@1`). |
| `profile` blocks | Declarative scoring modifiers (`severity`, `trust`, `reachability`). |
| `rule` blocks | When/then logic applied to each `(component, advisory, vex[])` tuple. |
| `settings` | Optional evaluation toggles (sampling, default status overrides). |
---
## 3·Lexical Rules
- **Case sensitivity:** Keywords are lowercase; identifiers are case-sensitive.
- **Whitespace:** Space, tab, newline act as separators. Indentation is cosmetic.
- **Comments:** `// inline` and `/* block */` are ignored.
- **Literals:**
- Strings use double quotes (`"text"`); escape with `\"`, `\n`, `\t`.
- Numbers are decimal; suffix `%` allowed for percentage weights (`-2.5%` becomes `-0.025`).
- Booleans: `true`, `false`.
- Lists: `[1, 2, 3]`, `["a","b"]`.
- **Identifiers:** Start with letter or underscore, continue with letters, digits, `_`.
- **Operators:** `=`, `==`, `!=`, `<`, `<=`, `>`, `>=`, `in`, `not in`, `and`, `or`, `not`, `:=`.
---
## 4·Grammar (EBNF)
```ebnf
policy = "policy", string, "syntax", string, "{", policy-body, "}" ;
policy-body = { metadata | profile | settings | rule | helper } ;
metadata = "metadata", "{", { meta-entry }, "}" ;
meta-entry = identifier, "=", (string | list) ;
profile = "profile", identifier, "{", { profile-item }, "}" ;
profile-item= map | env-map | scalar ;
map = "map", identifier, "{", { "source", string, "=>", number, ";" }, "}" ;
env-map = "env", identifier, "{", { "if", expression, "then", number, ";" }, "}" ;
scalar = identifier, "=", (number | string | list), ";" ;
settings = "settings", "{", { setting-entry }, "}" ;
setting-entry = identifier, "=", (number | string | boolean), ";" ;
rule = "rule", identifier, [ "priority", integer ], "{",
"when", predicate,
{ "and", predicate },
"then", { action },
[ "else", { action } ],
[ "because", string ],
"}" ;
predicate = expression ;
expression = term, { ("and" | "or"), term } ;
term = ["not"], factor ;
factor = comparison | membership | function-call | literal | identifier | "(" expression ")" ;
comparison = value, comparator, value ;
membership = value, ("in" | "not in"), list ;
value = identifier | literal | function-call | field-access ;
field-access= identifier, { ".", identifier | "[" literal "]" } ;
function-call = identifier, "(", [ arg-list ], ")" ;
arg-list = expression, { ",", expression } ;
literal = string | number | boolean | list ;
action = assignment | ignore | escalate | require | warn | defer | annotate ;
assignment = target, ":=", expression, ";" ;
target = identifier, { ".", identifier } ;
ignore = "ignore", [ "until", expression ], [ "because", string ], ";" ;
escalate = "escalate", [ "to", expression ], [ "when", expression ], ";" ;
require = "requireVex", "{", require-fields, "}", ";" ;
warn = "warn", [ "message", string ], ";" ;
defer = "defer", [ "until", expression ], ";" ;
annotate = "annotate", identifier, ":=", expression, ";" ;
```
Notes:
- `helper` is reserved for shared calculcations (not yet implemented in `@1`).
- `else` branch executes only if `when` predicates evaluate truthy **and** no prior rule earlier in priority handled the tuple.
- Semicolons inside rule bodies are optional when each clause is on its own line; the compiler emits canonical semicolons in IR.
---
## 5·Evaluation Context
Within predicates and actions you may reference the following namespaces:
| Namespace | Fields | Description |
|-----------|--------|-------------|
| `sbom` | `purl`, `name`, `version`, `licenses`, `layerDigest`, `tags`, `usedByEntrypoint` | Component metadata from Scanner. |
| `advisory` | `id`, `source`, `aliases`, `severity`, `cvss`, `publishedAt`, `modifiedAt`, `content.raw` | Canonical Concelier advisory view. |
| `vex` | `status`, `justification`, `statementId`, `timestamp`, `scope` | Current VEX statement when iterating; aggregator helpers available. |
| `vex.any(...)`, `vex.all(...)`, `vex.count(...)` | Functions operating over all matching statements. |
| `run` | `policyId`, `policyVersion`, `tenant`, `timestamp` | Metadata for explain annotations. |
| `env` | Arbitrary key/value pairs injected per run (e.g., `environment`, `runtime`). |
| `telemetry` | Optional reachability signals; missing fields evaluate to `unknown`. |
| `profile.<name>` | Values computed inside profile blocks (maps, scalars). |
Missing fields evaluate to `null`, which is falsey in boolean context and propagates through comparisons unless explicitly checked.
---
## 6·Built-ins (v1)
| Function / Property | Signature | Description |
|---------------------|-----------|-------------|
| `normalize_cvss(advisory)` | `Advisory → SeverityScalar` | Parses `advisory.content.raw` for CVSS data; falls back to policy maps. |
| `cvss(score, vector)` | `double × string → SeverityScalar` | Constructs a severity object manually. |
| `severity_band(value)` | `string → SeverityBand` | Normalises strings like `"critical"`, `"medium"`. |
| `risk_score(base, modifiers...)` | Variadic | Multiplies numeric modifiers (severity × trust × reachability). |
| `vex.any(predicate)` | `(Statement → bool) → bool` | `true` if any statement satisfies predicate. |
| `vex.all(predicate)` | `(Statement → bool) → bool` | `true` if all statements satisfy predicate. |
| `vex.latest()` | `→ Statement` | Lexicographically newest statement. |
| `advisory.has_tag(tag)` | `string → bool` | Checks advisory metadata tags. |
| `advisory.matches(pattern)` | `string → bool` | Glob match against advisory identifiers. |
| `sbom.has_tag(tag)` | `string → bool` | Uses SBOM inventory tags (usage vs inventory). |
| `exists(expression)` | `→ bool` | `true` when value is non-null/empty. |
| `coalesce(a, b, ...)` | `→ value` | First non-null argument. |
| `days_between(dateA, dateB)` | `→ int` | Absolute day difference (UTC). |
| `percent_of(part, whole)` | `→ double` | Fractions for scoring adjustments. |
| `lowercase(text)` | `string → string` | Normalises casing deterministically (InvariantCulture). |
All built-ins are pure; if inputs are null the result is null unless otherwise noted.
---
## 7·Rule Semantics
1. **Ordering:** Rules execute in ascending `priority`. When priorities tie, lexical order defines precedence.
2. **Short-circuit:** Once a rule sets `status`, subsequent rules only execute if they use `combine`. Use this sparingly to avoid ambiguity.
3. **Actions:**
- `status := <string>` Allowed values: `affected`, `not_affected`, `fixed`, `suppressed`, `under_investigation`, `escalated`.
- `severity := <SeverityScalar>` Either from `normalize_cvss`, `cvss`, or numeric map; ensures `normalized` and `score`.
- `ignore until <ISO-8601>` Temporarily treats finding as suppressed until timestamp; recorded in explain trace.
- `warn message "<text>"` Adds warn verdict and deducts `warnPenalty`.
- `escalate to severity_band("critical") when condition` Forces verdict severity upward when condition true.
- `requireVex { vendors = ["VendorX"], justifications = ["component_not_present"] }` Fails evaluation if matching VEX evidence absent.
- `annotate reason := "text"` Adds free-form key/value pairs to explain payload.
4. **Because clause:** Mandatory for actions changing status or severity; captured verbatim in explain traces.
---
## 8·Scoping Helpers
- **Maps:** Use `profile severity { map vendor_weight { ... } }` to declare additive factors. Retrieve with `profile.severity.vendor_weight["GHSA"]`.
- **Environment overrides:** `env` profiles allow conditional adjustments based on runtime metadata.
- **Tenancy:** `run.tenant` ensures policies remain tenant-aware; avoid hardcoding single-tenant IDs.
- **Default values:** Use `settings { default_status = "affected"; }` to override built-in defaults.
---
## 9·Examples
### 9.1 Baseline Severity Normalisation
```dsl
rule advisory_normalization {
when advisory.source in ["GHSA","OSV"]
then severity := normalize_cvss(advisory)
because "Align vendor severity to CVSS baseline";
}
```
### 9.2 VEX Override with Quiet Mode
```dsl
rule vex_strong_claim priority 5 {
when vex.any(status == "not_affected")
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
then status := vex.status
annotate winning_statement := vex.latest().statementId
warn message "VEX override applied"
because "Strong VEX justification";
}
```
### 9.3 Environment-Specific Escalation
```dsl
rule internet_exposed_guard {
when env.exposure == "internet"
and severity.normalized >= "High"
then escalate to severity_band("Critical")
because "Internet-exposed assets require critical posture";
}
```
### 9.4 Anti-pattern (flagged by linter)
```dsl
rule catch_all {
when true
then status := "suppressed"
because "Suppress everything" // ❌ Fails lint: unbounded suppression
}
```
---
## 10·Validation & Tooling
- `stella policy lint` ensures:
- Grammar compliance and canonical formatting.
- Static determinism guard (no forbidden namespaces).
- Anti-pattern detection (e.g., unconditional suppression, missing `because`).
- `stella policy compile` emits IR (`.stella.ir.json`) and SHA-256 digest used in `policy_runs`.
- CI pipelines (see `DEVOPS-POLICY-20-001`) compile sample packs and fail on lint violations.
- Simulation harnesses (`stella policy simulate`) highlight provided/queried fields so policy authors affirm assumptions before promotion.
---
## 11·Anti-patterns & Mitigations
| Anti-pattern | Risk | Mitigation |
|--------------|------|------------|
| Catch-all suppress/ignore without scope | Masks all findings | Linter blocks rules with `when true` unless `priority` > 1000 and justification includes remediation plan. |
| Comparing strings with inconsistent casing | Missed matches | Wrap comparisons in `lowercase(value)` to align casing or normalise metadata during ingest. |
| Referencing `telemetry` without fallback | Null propagation | Wrap access in `exists(telemetry.reachability)`. |
| Hardcoding tenant IDs | Breaks multi-tenant | Prefer `env.tenantTag` or metadata-sourced predicates. |
| Duplicated rule names | Explain trace ambiguity | Compiler enforces unique `rule` identifiers within a policy. |
---
## 12·Versioning & Compatibility
- `syntax "stella-dsl@1"` is mandatory.
- Future revisions (`@2`, …) will be additive; existing packs continue to compile with their declared version.
- The compiler canonicalises documents (sorted keys, normalised whitespace) before hashing to ensure reproducibility.
---
## 13·Compliance Checklist
- [ ] **Grammar validated:** Policy compiles with `stella policy lint` and matches `syntax "stella-dsl@1"`.
- [ ] **Deterministic constructs only:** No use of forbidden namespaces (`DateTime.Now`, `Guid.NewGuid`, external services).
- [ ] **Rationales present:** Every status/severity change includes a `because` clause or `annotate` entry.
- [ ] **Scoped suppressions:** Rules that ignore/suppress findings reference explicit components, vendors, or VEX justifications.
- [ ] **Explain fields verified:** `annotate` keys align with Console/CLI expectations (documented in upcoming lifecycle guide).
- [ ] **Offline parity tested:** Policy pack simulated in sealed mode (`--sealed`) to confirm absence of network dependencies.
---
*Last updated: 2025-10-26 (Sprint 20).*

239
docs/policy/lifecycle.md Normal file
View File

@@ -0,0 +1,239 @@
# Policy Lifecycle & Approvals
> **Audience:** Policy authors, reviewers, security approvers, release engineers.
> **Scope:** End-to-end flow for `stella-dsl@1` policies from draft through archival, including CLI/Console touch-points, Authority scopes, audit artefacts, and offline considerations.
This guide explains how a policy progresses through StellaOps, which roles are involved, and the artefacts produced at every step. Pair it with the [Policy Engine Overview](overview.md), [DSL reference](dsl.md), and upcoming run documentation to ensure consistent authoring and rollout.
---
## 1·Protocol Summary
- Policies are **immutable versions** attached to a stable `policy_id`.
- Lifecycle states: `draft → submitted → approved → active → archived`.
- Every transition requires explicit Authority scopes and produces structured events + storage artefacts (`policies`, `policy_runs`, audit log collections).
- Simulation and CI gating happen **before** approvals can be granted.
- Activation triggers (runs, bundle exports, CLI `promote`) operate on the **latest approved** version per tenant.
```mermaid
stateDiagram-v2
[*] --> Draft
Draft --> Draft: edit/save (policy:write)
Draft --> Submitted: submit(reviewers) (policy:submit)
Submitted --> Draft: requestChanges (policy:write)
Submitted --> Approved: approve (policy:approve)
Approved --> Active: activate/run (policy:run)
Active --> Archived: archive (policy:archive)
Approved --> Archived: superseded/explicit archive
Archived --> [*]
```
---
## 2·Roles & Authority Scopes
| Role (suggested) | Required scopes | Responsibilities |
|------------------|-----------------|------------------|
| **Policy Author** | `policy:write`, `policy:submit`, `policy:simulate` | Draft DSL, run local/CI simulations, submit for review. |
| **Policy Reviewer** | `policy:review`, `policy:simulate`, `policy:runs` | Comment on submissions, demand additional simulations, request changes. |
| **Policy Approver** | `policy:approve`, `policy:runs`, `policy:audit` | Grant final approval, ensure sign-off evidence captured. |
| **Policy Operator** | `policy:run`, `policy:activate`, `findings:read` | Trigger full/incremental runs, monitor results, roll back to previous version. |
| **Policy Auditor** | `policy:audit`, `findings:read`, `policy:history` | Review past versions, verify attestations, respond to compliance requests. |
| **Policy Engine Service** | `effective:write`, `findings:read` | Materialise effective findings during runs; no approval capabilities. |
> Scopes are issued by Authority (`AUTH-POLICY-20-001`). Tenants may map organisational roles (e.g., `secops.approver`) to these scopes via issuer policy.
---
## 3·Lifecycle Stages in Detail
### 3.1 Draft
- **Who:** Authors (policy:write).
- **Tools:** Console editor, `stella policy edit`, policy DSL files.
- **Actions:**
- Author DSL leveraging [stella-dsl@1](dsl.md).
- Run `stella policy lint` and `stella policy simulate --sbom <fixtures>` locally.
- Attach rationale metadata (`metadata.description`, tags).
- **Artefacts:**
- `policies` document with `status=draft`, `version=n`, `provenance.created_by`.
- Local IR cache (`.stella.ir.json`) generated by CLI compile.
- **Guards:**
- Draft versions never run in production.
- CI must lint drafts before allowing submission PRs (see `DEVOPS-POLICY-20-001`).
### 3.2 Submission
- **Who:** Authors with `policy:submit`.
- **Tools:** Console “Submit for review” button, `stella policy submit <policyId> --reviewers ...`.
- **Actions:**
- Provide review notes and required simulations (CLI uploads attachments).
- Choose reviewer groups; Authority records them in submission metadata.
- **Artefacts:**
- Policy document transitions to `status=submitted`, capturing `submitted_by`, `submitted_at`, reviewer list, simulation digest references.
- Audit event `policy.submitted` (Authority timeline / Notifier integration).
- **Guards:**
- Submission blocked unless latest lint + compile succeed (<24h freshness).
- Must reference at least one simulation artefact (CLI enforces via `--attach`).
### 3.3 Review (Submitted)
- **Who:** Reviewers (`policy:review`), optionally authors responding.
- **Tools:** Console review pane (line comments, overall verdict), `stella policy review`.
- **Actions:**
- Inspect DSL diff vs previous approved version.
- Run additional `simulate` jobs (UI button or CLI).
- Request changes policy returns to `draft` with comment log.
- **Artefacts:**
- Comments stored in `policy_reviews` collection with timestamps, resolved flag.
- Additional simulation run records appended to submission metadata.
- **Guards:**
- Approval cannot proceed until all blocking comments resolved.
- Required reviewers (Authority rule) must vote before approver sees Approve button.
### 3.4 Approval
- **Who:** Approvers (`policy:approve`).
- **Tools:** Console Approve”, CLI `stella policy approve <id> --version n --note "rationale"`.
- **Actions:**
- Confirm compliance checks (see §6) all green.
- Provide approval note (mandatory string captured in audit trail).
- **Artefacts:**
- Policy `status=approved`, `approved_by`, `approved_at`, `approval_note`.
- Audit event `policy.approved` plus optional Notifier broadcast.
- Immutable approval record stored in `policy_history`.
- **Guards:**
- Approver cannot be same identity as author (enforced by Authority config).
- Approver must attest to successful simulation diff review (`--attach diff.json`).
### 3.5 Activation & Runs
- **Who:** Operators (`policy:run`, `policy:activate`).
- **Tools:** Console Promote to active”, CLI `stella policy activate <id> --version n`, `stella policy run`.
- **Actions:**
- Mark approved version as tenants active policy.
- Trigger full run or rely on orchestrator for incremental runs.
- Monitor results via Console dashboards or CLI run logs.
- **Artefacts:**
- `policy_runs` entries with `mode=full|incremental`, `policy_version=n`.
- Effective findings collections updated; explain traces stored.
- Activation event `policy.activated` with `runId`.
- **Guards:**
- Activation blocked if previous full run <24h old failed or is pending.
- Selection of SBOM/advisory snapshots uses consistent cursors recorded for reproducibility.
### 3.6 Archival / Rollback
- **Who:** Approvers or Operators with `policy:archive`.
- **Tools:** Console menu, CLI `stella policy archive <id> --version n --reason`.
- **Actions:**
- Retire policies superseded by newer versions or revert to older approved version (`stella policy activate <id> --version n-1`).
- Export archived version for audit bundles (Offline Kit integration).
- **Artefacts:**
- Policy `status=archived`, `archived_by`, `archived_at`, reason.
- Audit event `policy.archived`.
- Exported DSSE-signed policy pack stored if requested.
- **Guards:**
- Archival cannot proceed while runs using that version are in-flight.
- Rollback requires documented incident reference.
---
## 4·Tooling Touchpoints
| Stage | Console | CLI | API |
|-------|---------|-----|-----|
| Draft | Inline linting, simulation panel | `stella policy lint`, `edit`, `simulate` | `POST /policies`, `PUT /policies/{id}/versions/{v}` |
| Submit | Submit modal (attach simulations) | `stella policy submit` | `POST /policies/{id}/submit` |
| Review | Comment threads, diff viewer | `stella policy review --approve/--request-changes` | `POST /policies/{id}/reviews` |
| Approve | Approve dialog | `stella policy approve` | `POST /policies/{id}/approve` |
| Activate | Promote button, run scheduler | `stella policy activate`, `run`, `simulate` | `POST /policies/{id}/run`, `POST /policies/{id}/activate` |
| Archive | Archive / rollback menu | `stella policy archive` | `POST /policies/{id}/archive` |
All CLI commands emit structured JSON by default; use `--format table` for human review.
---
## 5·Audit & Observability
- **Storage:**
- `policies` retains all versions with provenance metadata.
- `policy_reviews` stores reviewer comments, timestamps, attachments.
- `policy_history` summarises transitions (state, actor, note, diff digest).
- `policy_runs` retains input cursors and determinism hash per run.
- **Events:**
- `policy.submitted`, `policy.review.requested`, `policy.approved`, `policy.activated`, `policy.archived`, `policy.rollback`.
- Routed to Notifier + Timeline Indexer; offline deployments log to local event store.
- **Logs & metrics:**
- Policy Engine logs include `policyId`, `policyVersion`, `runId`, `approvalNote`.
- Observability dashboards (see forthcoming `/docs/observability/policy.md`) highlight pending approvals, run SLA, VEX overrides.
- **Reproducibility:**
- Each state transition stores IR checksum and simulation diff digests, enabling offline audit replay.
---
## 6·Compliance Gates
| Gate | Stage | Enforced by | Requirement |
|------|-------|-------------|-------------|
| **DSL lint** | Draft Submit | CLI/CI | `stella policy lint` successful within 24h. |
| **Simulation evidence** | Submit | CLI/Console | Attach diff from `stella policy simulate` covering baseline SBOM set. |
| **Reviewer quorum** | Submit Approve | Authority | Minimum approver/reviewer count configurable per tenant. |
| **Determinism CI** | Approve | DevOps job | Twin run diff passes (`DEVOPS-POLICY-20-003`). |
| **Activation health** | Approve Activate | Policy Engine | Last run status succeeded; orchestrator queue healthy. |
| **Export validation** | Archive | Offline Kit | DSSE-signed policy pack generated for long-term retention. |
Failure of any gate emits a `policy.lifecycle.violation` event and blocks transition until resolved.
---
## 7·Offline / Air-Gap Considerations
- Offline Kit bundles include:
- Approved policy packs (`.policy.bundle` + DSSE signatures).
- Submission/approval audit logs.
- Simulation diff JSON for reproducibility.
- Air-gapped sites operate with the same lifecycle:
- Approvals happen locally; Authority runs in enclave.
- Rollout requires manual import of policy packs from connected environment via signed bundles.
- `stella policy simulate --sealed` ensures no outbound calls; required before approval in sealed mode.
---
## 8·Incident Response & Rollback
- Incident mode (triggered via `policy incident activate`) forces:
- Immediate incremental run to evaluate mitigation policies.
- Expanded trace retention for affected runs.
- Automatic snapshot of currently active policies for evidence locker.
- Rollback path:
1. `stella policy activate <id> --version <previous>` with incident note.
2. Orchestrator schedules full run to ensure findings align.
3. Archive problematic version with reason referencing incident ticket.
- Post-incident review must confirm new version passes gates before re-activation.
---
## 9·CI/CD Integration (Reference)
- **Pre-merge:** run lint + simulation jobs against golden SBOM fixtures.
- **Post-merge (main):** compile, compute IR checksum, stage for Offline Kit.
- **Nightly:** determinism replay, `policy simulate` diff drift alerts, backlog of pending approvals.
- **Notifications:** Slack/Email via Notifier when submissions await review > SLA or approvals succeed.
---
## 10·Compliance Checklist
- [ ] **Role mapping validated:** Authority issuer config maps organisational roles to required `policy:*` scopes (per tenant).
- [ ] **Submission evidence attached:** Latest simulation diff and lint artefacts linked to submission.
- [ ] **Reviewer quorum met:** All required reviewers approved or acknowledged; no unresolved blocking comments.
- [ ] **Approval note logged:** Approver justification recorded in audit trail alongside IR checksum.
- [ ] **Activation guard passed:** Latest run status success, orchestrator queue healthy, determinism job green.
- [ ] **Archive bundles produced:** When archiving, DSSE-signed policy pack exported and stored for offline retention.
- [ ] **Offline parity proven:** For sealed deployments, `--sealed` simulations executed and logged before approval.
---
*Last updated: 2025-10-26 (Sprint 20).*

173
docs/policy/overview.md Normal file
View File

@@ -0,0 +1,173 @@
# Policy Engine Overview
> **Goal:** Evaluate organisation policies deterministically against scanner SBOMs, Concelier advisories, and Excititor VEX evidence, then publish effective findings that downstream services can trust.
This document introduces the v2 Policy Engine: how the service fits into StellaOps, the artefacts it produces, the contracts it honours, and the guardrails that keep policy decisions reproducible across air-gapped and connected deployments.
---
## 1·Role in the Platform
- **Purpose:** Compose policy verdicts by reconciling SBOM inventory, advisory metadata, VEX statements, and organisation rules.
- **Form factor:** Dedicated `.NET 10` Minimal API host (`StellaOps.Policy.Engine`) plus worker orchestration. Policies are defined in `stella-dsl@1` packs compiled to an intermediate representation (IR) with a stable SHA-256 digest.
- **Tenancy:** All workloads run under Authority-enforced scopes (`policy:*`, `findings:read`, `effective:write`). Only the Policy Engine identity may materialise effective findings collections.
- **Consumption:** Findings ledger, Console, CLI, and Notify read the published `effective_finding_{policyId}` materialisations and policy run ledger (`policy_runs`).
- **Offline parity:** Bundled policies import/export alongside advisories and VEX. In sealed mode the engine degrades gracefully, annotating explanations whenever cached signals replace live lookups.
---
## 2·High-Level Architecture
```mermaid
flowchart LR
subgraph Inputs
A[Scanner SBOMs<br/>Inventory & Usage]
B[Concelier Advisories<br/>Canonical linksets]
C[Excititor VEX<br/>Consensus status]
D[Policy Packs<br/>stella-dsl@1]
end
subgraph PolicyEngine["StellaOps.Policy.Engine"]
P1[DSL Compiler<br/>IR + Digest]
P2[Joiners<br/>SBOM ↔ Advisory ↔ VEX]
P3[Deterministic Evaluator<br/>Rule hits + scoring]
P4[Materialisers<br/>effective findings]
P5[Run Orchestrator<br/>Full & incremental]
end
subgraph Outputs
O1[Effective Findings Collections]
O2[Explain Traces<br/>Rule hit lineage]
O3[Metrics & Traces<br/>policy_run_seconds,<br/>rules_fired_total]
O4[Simulation/Preview Feeds<br/>CLI & Studio]
end
A --> P2
B --> P2
C --> P2
D --> P1 --> P3
P2 --> P3 --> P4 --> O1
P3 --> O2
P5 --> P3
P3 --> O3
P3 --> O4
```
---
## 3·Core Concepts
| Concept | Description |
|---------|-------------|
| **Policy Pack** | Versioned bundle of DSL documents, metadata, and checksum manifest. Packs import/export via CLI and Offline Kit bundles. |
| **Policy Digest** | SHA-256 of the canonical IR; used for caching, explain trace attribution, and audit proofs. |
| **Effective Findings** | Append-only Mongo collections (`effective_finding_{policyId}`) storing the latest verdict per finding, plus history sidecars. |
| **Policy Run** | Execution record persisted in `policy_runs` capturing inputs, run mode, timings, and determinism hash. |
| **Explain Trace** | Structured tree showing rule matches, data provenance, and scoring components for UI/CLI explain features. |
| **Simulation** | Dry-run evaluation that compares a candidate pack against the active pack and produces verdict diffs without persisting results. |
| **Incident Mode** | Elevated sampling/trace capture toggled automatically when SLOs breach; emits events for Notifier and Timeline Indexer. |
---
## 4·Inputs & Pre-processing
### 4.1 SBOM Inventory
- **Source:** Scanner.WebService publishes inventory/usage SBOMs plus BOM-Index (roaring bitmap) metadata.
- **Consumption:** Policy joiners use the index to expand candidate components quickly, keeping evaluation under the `<5s` warm path budget.
- **Schema:** CycloneDX Protobuf + JSON views; Policy Engine reads canonical projections via shared SBOM adapters.
### 4.2 Advisory Corpus
- **Source:** Concelier exports canonical advisories with deterministic identifiers, linksets, and equivalence tables.
- **Contract:** Policy Engine only consumes raw `content.raw`, `identifiers`, and `linkset` fields per Aggregation-Only Contract (AOC); derived precedence remains a policy concern.
### 4.3 VEX Evidence
- **Source:** Excititor consensus service resolves OpenVEX / CSAF statements, preserving conflicts.
- **Usage:** Policy rules can require specific VEX vendors or justification codes; evaluator records when cached evidence substitutes for live statements (sealed mode).
### 4.4 Policy Packs
- Authored in Policy Studio or CLI, validated against the `stella-dsl@1` schema.
- Compiler performs canonicalisation (ordering, defaulting) before emitting IR and digest.
- Packs bundle scoring profiles, allowlist metadata, and optional reachability weighting tables.
---
## 5·Evaluation Flow
1. **Run selection** Orchestrator accepts `full`, `incremental`, or `simulate` jobs. Incremental runs listen to change streams from Concelier, Excititor, and SBOM imports to scope re-evaluation.
2. **Input staging** Candidates fetched in deterministic batches; identity graph from Concelier strengthens PURL lookups.
3. **Rule execution** Evaluator walks rules in lexical order (first-match wins). Actions available: `block`, `ignore`, `warn`, `defer`, `escalate`, `requireVex`, each supporting quieting semantics where permitted.
4. **Scoring** `PolicyScoringConfig` applies severity, trust, reachability weights plus penalties (`warnPenalty`, `ignorePenalty`, `quietPenalty`).
5. **Verdict and explain** Engine constructs `PolicyVerdict` records with inputs, quiet flags, unknown confidence bands, and provenance markers; explain trees capture rule lineage.
6. **Materialisation** Effective findings collections are upserted append-only, stamped with run identifier, policy digest, and tenant.
7. **Publishing** Completed run writes to `policy_runs`, emits metrics (`policy_run_seconds`, `rules_fired_total`, `vex_overrides_total`), and raises events for Console/Notify subscribers.
---
## 6·Run Modes
| Mode | Trigger | Scope | Persistence | Typical Use |
|------|---------|-------|-------------|-------------|
| **Full** | Manual CLI (`stella policy run`), scheduled nightly, or emergency rebaseline | Entire tenant | Writes effective findings and run record | After policy publish or major advisory/VEX import |
| **Incremental** | Change-stream queue driven by Concelier/Excititor/SBOM deltas | Only affected artefacts | Writes effective findings and run record | Continuous upkeep; ensures SLA ≤5min from source change |
| **Simulate** | CLI/Studio preview, CI pipelines | Candidate subset (diff against baseline) | No materialisation; produces explain & diff payloads | Policy authoring, CI regression suites |
All modes are cancellation-aware and checkpoint progress for replay in case of deployment restarts.
---
## 7·Outputs & Integrations
- **APIs** Minimal API exposes policy CRUD, run orchestration, explain fetches, and cursor-based listing of effective findings (see `/docs/api/policy.md` once published).
- **CLI** `stella policy simulate/run/show` commands surface JSON verdicts, exit codes, and diff summaries suitable for CI gating.
- **Console / Policy Studio** UI reads explain traces, policy metadata, approval workflow status, and simulation diffs to guide reviewers.
- **Findings Ledger** Effective findings feed downstream export, Notify, and risk scoring jobs.
- **Air-gap bundles** Offline Kit includes policy packs, scoring configs, and explain indexes; export commands generate DSSE-signed bundles for transfer.
---
## 8·Determinism & Guardrails
- **Deterministic inputs** All joins rely on canonical linksets and equivalence tables; batches are sorted, and random/wall-clock APIs are blocked by static analysis plus runtime guards (`ERR_POL_004`).
- **Stable outputs** Canonical JSON serializers sort keys; digests recorded in run metadata enable reproducible diffs across machines.
- **Idempotent writes** Materialisers upsert using `{policyId, findingId, tenant}` keys and retain prior versions with append-only history.
- **Sandboxing** Policy evaluation executes in-process with timeouts; restart-only plug-ins guarantee no runtime DLL injection.
- **Compliance proof** Every run stores digest of inputs (policy, SBOM batch, advisory snapshot) so auditors can replay decisions offline.
---
## 9·Security, Tenancy & Offline Notes
- **Authority scopes:** Gateway enforces `policy:read`, `policy:write`, `policy:simulate`, `policy:runs`, `findings:read`, `effective:write`. Service identities must present DPoP-bound tokens.
- **Tenant isolation:** Collections partition by tenant identifier; cross-tenant queries require explicit admin scopes and return audit warnings.
- **Sealed mode:** In air-gapped deployments the engine surfaces `sealed=true` hints in explain traces, warning about cached EPSS/KEV data and suggesting bundle refreshes (see `docs/airgap/EPIC_16_AIRGAP_MODE.md` §3.7).
- **Observability:** Structured logs carry correlation IDs matching orchestrator job IDs; metrics integrate with OpenTelemetry exporters; sampled rule-hit logs redact policy secrets.
- **Incident response:** Incident mode can be forced via API, boosting trace retention and notifying Notifier through `policy.incident.activated` events.
---
## 10·Working with Policy Packs
1. **Author** in Policy Studio or edit DSL files locally. Validate with `stella policy lint`.
2. **Simulate** against golden SBOM fixtures (`stella policy simulate --sbom fixtures/*.json`). Inspect explain traces for unexpected overrides.
3. **Publish** via API or CLI; Authority enforces review/approval workflows (`draft → review → approve → rollout`).
4. **Monitor** the subsequent incremental runs; if determinism diff fails in CI, roll back pack while investigating digests.
5. **Bundle** packs for offline sites with `stella policy bundle export` and distribute via Offline Kit.
---
## 11·Compliance Checklist
- [ ] **Scopes enforced:** Confirm gateway policy requires `policy:*` and `effective:write` scopes for all mutating endpoints.
- [ ] **Determinism guard active:** Static analyzer blocks clock/RNG usage; CI determinism job diffing repeated runs passes.
- [ ] **Materialisation audit:** Effective findings collections use append-only writers and retain history per policy run.
- [ ] **Explain availability:** UI/CLI expose explain traces for every verdict; sealed-mode warnings display when cached evidence is used.
- [ ] **Offline parity:** Policy bundles (import/export) tested in sealed environment; air-gap degradations documented for operators.
- [ ] **Observability wired:** Metrics (`policy_run_seconds`, `rules_fired_total`, `vex_overrides_total`) and sampled rule hit logs emit to the shared telemetry pipeline with correlation IDs.
- [ ] **Documentation synced:** API (`/docs/api/policy.md`), DSL grammar (`/docs/policy/dsl.md`), lifecycle (`/docs/policy/lifecycle.md`), and run modes (`/docs/policy/runs.md`) cross-link back to this overview.
---
*Last updated: 2025-10-26 (Sprint 20).*

187
docs/policy/runs.md Normal file
View File

@@ -0,0 +1,187 @@
# Policy Runs & Orchestration
> **Audience:** Policy Engine operators, Scheduler team, DevOps, and tooling engineers planning CI integrations.
> **Scope:** Run modes (`full`, `incremental`, `simulate`), orchestration pipeline, cursor management, replay/determinism guarantees, monitoring, and recovery procedures.
Policies only generate value when they execute deterministically against current SBOM, advisory, and VEX inputs. This guide explains how runs are triggered, how the orchestrator scopes work, and what artefacts you should expect at each stage.
---
## 1·Run Modes at a Glance
| Mode | Trigger sources | Scope | Persistence | Primary use |
|------|-----------------|-------|-------------|-------------|
| **Full** | Manual CLI (`stella policy run`), Console “Run now”, scheduled nightly job | Entire tenant (all registered SBOMs) | Writes `effective_finding_{policyId}` and `policy_runs` record | Baseline after policy approval, quarterly attestation, post-incident rechecks |
| **Incremental** | Change streams (Concelier advisories, Excititor VEX, SBOM imports), orchestrator cron | Only affected `(sbom, advisory)` tuples | Writes diffs to effective findings and run record | Continuous upkeep meeting ≤5min SLA from input change |
| **Simulate** | Console review workspace, CLI (`stella policy simulate`), CI pipeline | Selected SBOM sample set (provided or golden set) | No materialisation; captures diff summary + explain traces | Authoring validation, regression safeguards, sealed-mode rehearsals |
All modes record their status in `policy_runs` with deterministic metadata:
```json
{
"_id": "run:P-7:2025-10-26T14:05:11Z:3f9a",
"policy_id": "P-7",
"policy_version": 4,
"mode": "incremental",
"status": "succeeded", // queued | running | succeeded | failed | canceled | replay_pending
"inputs": {
"sbom_set": ["sbom:S-42","sbom:S-318"],
"advisory_cursor": "2025-10-26T13:59:00Z",
"vex_cursor": "2025-10-26T13:58:30Z",
"env": {"exposure":"internet"}
},
"stats": {
"components": 1742,
"rules_fired": 68023,
"findings_written": 4321,
"vex_overrides": 210
},
"determinism_hash": "sha256:…",
"started_at": "2025-10-26T14:05:11Z",
"finished_at": "2025-10-26T14:06:01Z",
"tenant": "default"
}
```
> **Schemas & samples:** see `src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-20-001-POLICY-RUNS.md` and the fixtures in `samples/api/scheduler/policy-*.json` for canonical payloads consumed by CLI/UI/worker integrations.
---
## 2·Pipeline Overview
```mermaid
sequenceDiagram
autonumber
participant Trigger as Trigger (CLI / Console / Change Stream)
participant Orchestrator as Policy Orchestrator
participant Queue as Scheduler Queue (Mongo/NATS)
participant Engine as Policy Engine Workers
participant Concelier as Concelier Service
participant Excititor as Excititor Service
participant SBOM as SBOM Service
participant Store as Mongo (policy_runs & effective_finding_*)
participant Observability as Metrics/Events
Trigger->>Orchestrator: Run request (mode, scope, env)
Orchestrator->>Queue: Enqueue PolicyRunRequest (idempotent key)
Queue->>Engine: Lease job (fairness window)
Engine->>Concelier: Fetch advisories + linksets (cursor-aware)
Engine->>Excititor: Fetch VEX statements (cursor-aware)
Engine->>SBOM: Fetch SBOM segments / BOM-Index
Engine->>Engine: Evaluate policy (deterministic batches)
Engine->>Store: Upsert effective findings + append history
Engine->>Store: Persist policy_runs record + determinism hash
Engine->>Observability: Emit metrics, traces, rule-hit logs
Engine->>Orchestrator: Ack completion / failure
Orchestrator->>Trigger: Notify (webhook, CLI, Console update)
```
- **Trigger** CLI, Console, or automated change stream publishes a `PolicyRunRequest`.
- **Orchestrator** Runs inside `StellaOps.Policy.Engine` worker host; applies fairness (tenant + policy quotas) and idempotency using run keys.
- **Queue** Backed by Mongo + optional NATS for fan-out; supports leases and replay on crash.
- **Engine** Stateless worker executing the deterministic evaluator.
- **Store** Mongo collections: `policy_runs`, `effective_finding_{policyId}`, `policy_run_events` (append-only history), optional object storage for explain traces.
- **Observability** Prometheus metrics (`policy_run_seconds`), OTLP traces, structured logs.
---
## 3·Input Scoping & Cursors
### 3.1 Advisory & VEX Cursors
- Each run records the latest Concelier change stream timestamp (`advisory_cursor`) and Excititor timestamp (`vex_cursor`).
- Incremental runs receive change batches `(feedId, lastOffset)`; orchestrator deduplicates using `change_digest`.
- Full runs set cursors to “current read time”, effectively resetting incremental baseline.
### 3.2 SBOM Selection
- Full runs enumerate all SBOM records declared active for the tenant.
- Incremental runs derive SBOM set by intersecting advisory/VEX changes with BOM-Index lookups (component → SBOM mapping).
- Simulations accept explicit SBOM list; if omitted, CLI uses `etc/policy/golden-sboms.json`.
### 3.3 Environment Metadata
- `env` block (free-form key/values) allows scenario-specific evaluation (e.g., `env.exposure=internet`).
- Stored verbatim in `policy_runs.inputs.env` for replay; orchestrator hashes environment data to avoid cache collisions.
---
## 4·Execution Semantics
1. **Preparation:** Worker loads compiled IR for target policy version (cached by digest).
2. **Batching:** Candidate tuples are grouped by SBOM, then by advisory to maintain deterministic order; page size defaults to 1024 tuples.
3. **Evaluation:** Rules execute with first-match semantics; results captured as `PolicyVerdict`.
4. **Materialisation:**
- Upserts into `effective_finding_{policyId}` using `{policyId, sbomId, findingKey}`.
- Previous versions stored in `effective_finding_{policyId}_history`.
5. **Explain storage:** Full explain trees stored in blob store when `captureExplain=true`; incremental runs keep sampled traces (configurable).
6. **Completion:** Worker writes final status, stats, determinism hash (combination of policy digest + ordered input digests), and emits `policy.run.completed` event.
---
## 5·Retry, Replay & Determinism
- **Retries:** Failures (network, validation) mark run `status=failed` and enqueue retry with exponential backoff capped at 3 attempts. Manual re-run via CLI resets counters.
- **Replay:**
- Use `policy_runs` record to assemble input snapshot (policy version, cursors, env).
- Fetch associated SBOM/advisory/VEX data via `stella policy replay --run <id>` which rehydrates data into a sealed bundle.
- Determinism hash mismatches between replay and recorded run indicate drift; CI job `DEVOPS-POLICY-20-003` compares successive runs to guard this.
- **Cancellation:** Manual `stella policy run cancel <runId>` or orchestrator TTL triggers `status=canceled`; partial changes roll back via history append (no destructive delete).
---
## 6·Trigger Sources & Scheduling
| Source | Description | SLAs |
|--------|-------------|------|
| **Nightly full run** | Default schedule per tenant; ensures baseline alignment. | Finish before 07:00 UTC |
| **Change stream** | Concelier (`advisory_raw`), Excititor (`vex_raw`), SBOM imports emit `policy.trigger.delta` events. | Start within 60s; complete within 5min |
| **Manual CLI/Console** | Operators run ad-hoc evaluations. | No SLA; warns if warm path > target |
| **CI** | `stella policy simulate` runs in pipelines referencing golden SBOMs. | Must complete under 10min to avoid pipeline timeout |
The orchestrator enforces max concurrency per tenant (`maxActiveRuns`), queue depth alarms, and fairness (round-robin per policy).
---
## 7·Monitoring & Alerts
- **Metrics:** `policy_run_seconds`, `policy_run_queue_depth`, `policy_run_failures_total`, `policy_run_incremental_backlog`, `policy_rules_fired_total`.
- **Dashboards:** Highlight pending approvals, incremental backlog age, top failing policies, VEX override ratios (tie-in with `/docs/observability/policy.md` once published).
- **Alerts:**
- Incremental backlog > 3 cycles.
- Determinism hash mismatch.
- Failure rate > 5% over rolling hour.
- Run duration > SLA (full > 30min, incremental > 5min).
---
## 8·Failure Handling & Rollback
- **Soft failures:** Worker retries; after final failure, orchestrator emits `policy.run.failed` with diagnostics and recommended actions (e.g., missing SBOM segment).
- **Hard failures:** Schema mismatch, determinism guard violation (`ERR_POL_004`) blocks further runs until resolved.
- **Rollback:** Operators can activate previous policy version (see [Lifecycle guide](lifecycle.md)) and schedule full run to restore prior state.
---
## 9·Offline / Sealed Mode
- Change streams originate from offline bundle imports; orchestrator processes delta manifests.
- Runs execute with `sealed=true`, blocking any external lookups; `policy_runs.inputs.env.sealed` set for auditing.
- Explain traces annotate cached data usage to prompt bundle refresh.
- Offline Kit exports include latest `policy_runs` snapshot and determinism hashes for evidence lockers.
---
## 10·Compliance Checklist
- [ ] **Run schemas validated:** `PolicyRunRequest` / `PolicyRunStatus` DTOs from Scheduler Models (`SCHED-MODELS-20-001`) serialise deterministically; schema samples up to date.
- [ ] **Cursor integrity:** Incremental runs persist advisory & VEX cursors; replay verifies identical input digests.
- [ ] **Queue fairness configured:** Tenant-level concurrency limits and lease timeouts applied; no starvation of lower-volume policies.
- [ ] **Determinism guard active:** CI replay job (`DEVOPS-POLICY-20-003`) green; determinism hash recorded on each run.
- [ ] **Observability wired:** Metrics exported, alerts configured, and run events flowing to Notifier/Timeline.
- [ ] **Offline tested:** `stella policy run --sealed` executed in air-gapped environment; explain traces flag cached evidence usage.
- [ ] **Recovery plan rehearsed:** Failure and rollback drill documented; incident checklist aligned with Lifecycle guide.
---
*Last updated: 2025-10-26 (Sprint 20).*