22 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Fine. Here’s the next epic, written so you can paste it straight into the repo without having to babysit me. Same structure as before, maximum detail, zero hand‑waving.
Epic 2: Policy Engine & Policy Editor (VEX + Advisory Application Rules)
Short name: Policy Engine v2 Services touched: Policy Engine, Web API, Console (Policy Editor), CLI, Conseiller, Excitator, SBOM Service, Authority, Workers/Scheduler Data stores: MongoDB (policies, runs, effective findings), optional Redis/NATS for jobs
1) What it is
This epic delivers the organization‑specific decision layer for Stella. Ingestion is now AOC‑compliant (Epic 1). That means advisories and VEX arrive as immutable raw facts. This epic builds the place where those facts become effective findings under policies you control.
Core deliverables:
- 
Policy Engine: deterministic evaluator that applies rule sets to inputs: - Inputs: advisory_raw,vex_raw, SBOMs, optional telemetry hooks (reachability stubs), org metadata.
- Outputs: effective_finding_{policyId}materializations, with full explanation traces.
 
- Inputs: 
- 
Policy Editor (Console + CLI): versioned policy authoring, simulation, review/approval workflow, and change diffs. 
- 
Rules DSL v1: safe, declarative language for VEX application, advisory normalization, and risk scoring. No arbitrary code execution, no network calls. 
- 
Run Orchestrator: incremental re‑evaluation when new raw facts or SBOM changes arrive; efficient partial updates. 
The philosophy is boring on purpose: policy is a pure function of inputs. Same inputs and same policy yield the same outputs, every time, on every machine. If you want drama, watch reality TV, not your risk pipeline.
2) Why
- Vendors disagree, contexts differ, and your tolerance for risk is not universal.
- VEX means nothing until you decide how to apply it to your assets.
- Auditors care about the “why.” You’ll need consistent, replayable answers, with traces.
- Security teams need simulation before rollouts, and diffs after.
3) How it should work (deep details)
3.1 Data model
3.1.1 Policy documents (Mongo: policies)
{
  "_id": "policy:P-7:v3",
  "policy_id": "P-7",
  "version": 3,
  "name": "Default Org Policy",
  "status": "approved",        // draft | submitted | approved | archived
  "owned_by": "team:sec-plat",
  "valid_from": "2025-01-15T00:00:00Z",
  "valid_to": null,
  "dsl": {
    "syntax": "stella-dsl@1",
    "source": "rule-set text or compiled IR ref"
  },
  "metadata": {
    "description": "Baseline scoring + VEX precedence",
    "tags": ["baseline","vex","cvss"]
  },
  "provenance": {
    "created_by": "user:ali",
    "created_at": "2025-01-15T08:00:00Z",
    "submitted_by": "user:kay",
    "approved_by": "user:root",
    "approval_at": "2025-01-16T10:00:00Z",
    "checksum": "sha256:..."
  },
  "tenant": "default"
}
Constraints:
- status=approvedis required to run in production.
- Version increments are append‑only. Old versions remain runnable for replay.
3.1.2 Policy runs (Mongo: policy_runs)
{
  "_id": "run:P-7:2025-02-20T12:34:56Z:abcd",
  "policy_id": "P-7",
  "policy_version": 3,
  "inputs": {
    "sbom_set": ["sbom:S-42"],
    "advisory_cursor": "2025-02-20T00:00:00Z",
    "vex_cursor": "2025-02-20T00:00:00Z"
  },
  "mode": "incremental",   // full | incremental | simulate
  "stats": {
    "components": 1742,
    "advisories_considered": 9210,
    "vex_considered": 1187,
    "rules_fired": 68023,
    "findings_out": 4321
  },
  "trace": {
    "location": "blob://traces/run-.../index.json",
    "sampling": "smart-10pct"
  },
  "status": "succeeded",   // queued | running | failed | succeeded | canceled
  "started_at": "2025-02-20T12:34:56Z",
  "finished_at": "2025-02-20T12:35:41Z",
  "tenant": "default"
}
3.1.3 Effective findings (Mongo: effective_finding_P-7)
{
  "_id": "P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337",
  "policy_id": "P-7",
  "policy_version": 3,
  "sbom_id": "S-42",
  "component_purl": "pkg:npm/lodash@4.17.21",
  "advisory_ids": ["CVE-2021-23337", "GHSA-..."],
  "status": "affected",     // affected | not_affected | fixed | under_investigation | suppressed
  "severity": {
    "normalized": "High",
    "score": 7.5,
    "vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N",
    "rationale": "cvss_base(OSV) + vendor_weighting + env_modifiers"
  },
  "rationale": [
    {"rule":"vex.precedence","detail":"VendorX not_affected justified=component_not_present wins"},
    {"rule":"advisory.cvss.normalization","detail":"mapped GHSA severity to CVSS 3.1 = 7.5"}
  ],
  "references": {
    "advisory_raw_ids": ["advisory_raw:osv:GHSA-...:v3"],
    "vex_raw_ids": ["vex_raw:VendorX:doc-123:v4"]
  },
  "run_id": "run:P-7:2025-02-20T12:34:56Z:abcd",
  "tenant": "default"
}
Write protection: only the Policy Engine service identity may write any effective_finding_* collection.
3.2 Rules DSL v1 (stella‑dsl@1)
Design goals
- Declarative, composable, deterministic.
- No loops, no network IO, no non‑deterministic time.
- Policy authors see readable text; the engine compiles to a safe IR.
Concepts
- WHEN condition matches a tuple (sbom_component, advisory, optional vex_statements)
- THEN actions set status, computeseverity, attachrationale, orsuppresswith reason.
- Profiles for severity and scoring; Maps for vendor weighting; Guards for VEX justification.
Mini‑grammar (subset)
policy "Default Org Policy" syntax "stella-dsl@1" {
  profile severity {
    map vendor_weight {
      source "GHSA" => +0.5
      source "OSV"  => +0.0
      source "VendorX" => -0.2
    }
    env base_cvss {
      if env.runtime == "serverless" then -0.5
      if env.exposure == "internal-only" then -1.0
    }
  }
  rule vex_precedence {
    when vex.any(status in ["not_affected","fixed"])
      and vex.justification in ["component_not_present","vulnerable_code_not_present"]
    then status := vex.status
         because "VEX strong justification prevails";
  }
  rule advisory_to_cvss {
    when advisory.source in ["GHSA","OSV"]
    then severity := normalize_cvss(advisory)
         because "Map vendor severity or CVSS vector";
  }
  rule reachability_soft_suppress {
    when severity.normalized <= "Medium"
      and telemetry.reachability == "none"
    then status := "suppressed"
         because "not reachable and low severity";
  }
}
Built‑ins (non‑exhaustive)
- normalize_cvss(advisory)maps GHSA/OSV/CSAF severity fields to CVSS v3.1 numbers when possible; otherwise vendor‑to‑numeric mapping table in policy.
- vex.any(...)tests across matching VEX statements for the same- (component, advisory).
- telemetry.*is an optional input namespace reserved for future reachability data; if absent, expressions evaluate to- unknown(no effect).
Determinism
- Rules are evaluated in stable order: explicit priorityattribute or lexical order.
- First‑match semantics for conflicting status unless combineis used.
- Severity computations are pure; numeric maps are part of policy document.
3.3 Evaluation model
- 
Selection - For each SBOM component PURL, find candidate advisories from advisory_rawvia linkset PURLs or identifiers.
- For each pair (component, advisory), load all matching VEX facts fromvex_raw.
 
- For each SBOM component PURL, find candidate advisories from 
- 
Context assembly - 
Build an evaluation context from: - sbom_component: PURL, licenses, relationships.
- advisory: source, identifiers, references, embedded vendor severity (kept in- content.raw).
- vex: list of statements with status and justification.
- env: org‑specific env vars configured per policy run (e.g., exposure).
- Optional telemetryif available.
 
 
- 
- 
Rule execution - Compile DSL to IR once per policy version; cache.
- Execute rules per tuple; record which rules fired and the order.
- If no rule sets status, default is affected.
- If no rule sets severity, default severity uses normalize_cvss(advisory)with vendor defaults.
 
- 
Materialization - Write to effective_finding_{policyId}withrationalechain and references to raw docs.
- Emit per‑tuple trace events; sample and store full traces per run.
 
- Write to 
- 
Incremental updates - A watch job observes new advisory_rawandvex_rawinserts and SBOM deltas.
- The orchestrator computes the affected tuples and re‑evaluates only those.
 
- A watch job observes new 
- 
Replay - Any policy_runis fully reproducible by(policy_id, version, input set, cursors).
 
- Any 
3.4 VEX application semantics
- Precedence: a not_affectedwith strong justification (component_not_present,vulnerable_code_not_present,fix_not_required) wins unless another rule explicitly overrides by environment context.
- Scoping: VEX statements often specify product/component scope. Matching uses PURL equivalence and version ranges extracted during ingestion linkset generation.
- Conflicts: If multiple VEX statements conflict, the default is most‑specific scope wins (component > product > vendor), then newest document_version. Policies can override with explicit rules.
- Explainability: Every VEX‑driven decision records which statement IDs were considered and which one won.
3.5 Advisory normalization rules
- Vendor severity mapping: Map GHSA levels or CSAF product‑tree severities to CVSS‑like numeric bands via policy maps.
- CVSS vector use: If a valid vector exists in content.raw, parse and compute; apply policy modifiers fromprofile severity.
- Temporal/environment modifiers: Optional reductions for network exposure, isolation, or compensating controls, all encoded in policy.
3.6 Performance and scale
- Partition evaluation by SBOM ID and hash ranges of PURLs.
- Pre‑index advisory_raw.linkset.purlsandvex_raw.linkset.purls(already in Epic 1).
- Use streaming iterators; avoid loading entire SBOM or advisory sets into memory.
- Materialize only changed findings (diff‑aware writes).
- Target: 100k components, 1M advisories considered, 5 minutes incremental SLA on commodity hardware.
3.7 Error codes
| Code | Meaning | HTTP | 
|---|---|---|
| ERR_POL_001 | Policy syntax error | 400 | 
| ERR_POL_002 | Policy not approved for run | 403 | 
| ERR_POL_003 | Missing inputs (SBOM/advisory/vex fetch failed) | 424 | 
| ERR_POL_004 | Determinism guard triggered (non‑pure function usage) | 500 | 
| ERR_POL_005 | Write denied to effective findings (caller invalid) | 403 | 
| ERR_POL_006 | Run canceled or timed out | 408 | 
3.8 Observability
- 
Metrics: - policy_compile_seconds,- policy_run_seconds{mode=...},- rules_fired_total,- findings_written_total,- vex_overrides_total,- simulate_diff_total{delta=up|down|unchanged}.
 
- 
Tracing: - Spans: policy.compile,policy.select,policy.eval,policy.materialize.
 
- Spans: 
- 
Logs: - Include policy_id,version,run_id,sbom_id,component_purl,advisory_id,vex_count,rule_hits.
 
- Include 
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
3.9 Security and tenancy
- Only users with policy:writecan create/modify policies.
- policy:approveis a separate privileged role.
- Only Policy Engine service identity has effective:write.
- Tenancy is explicit on all documents and queries.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
4) API surface
4.1 Policy CRUD and lifecycle
- POST /policiescreate draft
- GET /policies?status=...list
- GET /policies/{policyId}/versions/{v}fetch
- POST /policies/{policyId}/submitmove draft to submitted
- POST /policies/{policyId}/approveapprove version
- POST /policies/{policyId}/archivearchive version
4.2 Compilation and validation
- 
POST /policies/{policyId}/versions/{v}/compile- Returns IR checksum, syntax diagnostics, rule stats.
 
4.3 Runs
- POST /policies/{policyId}/runsbody:- {mode, sbom_set, advisory_cursor?, vex_cursor?, env?}
- GET /policies/{policyId}/runs/{runId}status + stats
- POST /policies/{policyId}/simulatereturns diff vs current approved version on a sample SBOM set.
4.4 Findings and explanations
- GET /findings/{policyId}?sbom_id=S-42&status=affected&severity=High+Critical
- GET /findings/{policyId}/{findingId}/explainreturns ordered rule hits and linked raw IDs.
All endpoints require tenant scoping and appropriate policy:* or findings:* roles.
5) Console (Policy Editor) and CLI behavior
Console
- Monaco‑style editor with DSL syntax highlighting, lint, quick docs.
- Side‑by‑side Simulation panel: show count of affected findings before/after.
- Approval workflow: submit, review comments, approve with rationale.
- Diffs: show rule‑wise changes and estimated impact.
- Read‑only run viewer: heatmap of rules fired, top suppressions, VEX wins.
CLI
- stella policy new --name "Default Org Policy"
- stella policy edit P-7opens local editor ->- submit
- stella policy approve P-7 --version 3
- stella policy simulate P-7 --sbom S-42 --env exposure=internal-only
- stella findings ls --policy P-7 --sbom S-42 --status affected
Exit codes map to ERR_POL_*.
6) Implementation tasks
6.1 Policy Engine service
- Implement DSL parser and IR compiler (stella-dsl@1).
- Build evaluator with stable ordering and first‑match semantics.
- Implement selection joiners for SBOM↔advisory↔vex using linksets.
- Materialization writer with upsert‑only semantics to effective_finding_{policyId}.
- Determinism guard (ban wall‑clock, network, and RNG during eval).
- Incremental orchestrator listening to advisory/vex/SBOM change streams.
- Trace emitter with rule‑hit sampling.
- Unit tests, property tests, golden fixtures; perf tests to target SLA.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.2 Web API
- Policy CRUD, compile, run, simulate, findings, explain endpoints.
- Pagination, filters, and tenant enforcement on all list endpoints.
- Error mapping to ERR_POL_*.
- Rate limits on simulate endpoints.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.3 Console (Policy Editor)
- Editor with DSL syntax highlighting and inline diagnostics.
- Simulation UI with pre/post counts and top deltas.
- Approval workflow UI with audit trail.
- Run viewer dashboards (rule heatmap, VEX wins, suppressions).
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.4 CLI
- New commands: policy new|edit|submit|approve|simulate,findings ls|get.
- Json/YAML output formats for CI consumption.
- Non‑zero exits on syntax errors or simulation failures; map to ERR_POL_*.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.5 Conseiller & Excitator integration
- Provide search endpoints optimized for policy selection (batch by PURLs and IDs).
- Harden linkset extraction to maximize join recall.
- Add cursors for incremental selection windows per run.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.6 SBOM Service
- Ensure fast PURL index and component metadata projection for policy queries.
- Provide relationship graph API for future transitive logic.
- Emit change events on SBOM updates.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.7 Authority
- Define scopes: policy:write,policy:approve,policy:run,findings:read,effective:write.
- Issue service identity for Policy Engine with effective:writeonly.
- Enforce tenant claims at gateway.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
6.8 CI/CD
- Lint policy DSL in PRs; block invalid syntax.
- Run simulateagainst golden SBOMs to detect explosive deltas.
- Determinism CI: two runs with identical seeds produce identical outputs.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
7) Documentation changes (create/update these files)
- 
/docs/policy/overview.md- What the Policy Engine is, high‑level concepts, inputs, outputs, determinism.
 
- 
/docs/policy/dsl.md- Full grammar, built‑ins, examples, best practices, anti‑patterns.
 
- 
/docs/policy/lifecycle.md- Draft → submitted → approved → archived, roles, and audit trail.
 
- 
/docs/policy/runs.md- Run modes, incremental mechanics, cursors, replay.
 
- 
/docs/api/policy.md- Endpoints, request/response schemas, error codes.
 
- 
/docs/cli/policy.md- Command usage, examples, exit codes, JSON output contracts.
 
- 
/docs/ui/policy-editor.md- Screens, workflows, simulation, diffs, approvals.
 
- 
/docs/architecture/policy-engine.md- Detailed sequence diagrams, selection/join strategy, materialization schema.
 
- 
/docs/observability/policy.md- Metrics, tracing, logs, sample dashboards.
 
- 
/docs/security/policy-governance.md- Scopes, approvals, tenancy, least privilege.
 
- 
/docs/examples/policies/- baseline.pol,- serverless.pol,- internal-only.pol, each with commentary.
 
- 
/docs/faq/policy-faq.md- Common pitfalls, VEX conflict handling, determinism gotchas.
 
Each file includes a Compliance checklist for authors and reviewers.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
8) Acceptance criteria
- Policies are versioned, approvable, and compilable; invalid DSL blocks merges.
- Engine produces deterministic outputs with full rationale chains.
- VEX precedence rules work per spec and are overridable by policy.
- Simulation yields accurate pre/post deltas and diffs.
- Only Policy Engine can write to effective_finding_*.
- Incremental runs pick up new advisories/VEX/SBOM changes without full re‑runs.
- Console and CLI cover authoring, simulation, approval, and retrieval.
- Observability dashboards show rule hits, VEX wins, and run timings.
9) Risks and mitigations
- 
Policy sprawl: too many similar policies. - Mitigation: templates, policy inheritance in v1.1, tagging, ownership metadata.
 
- 
Non‑determinism creep: someone sneaks wall‑clock or network into evaluation. - Mitigation: determinism guard, static analyzer, and CI replay check.
 
- 
Join miss‑rate: weak linksets cause under‑matching. - Mitigation: linkset strengthening in ingestion, PURL equivalence tables, monitoring for “zero‑hit” rates.
 
- 
Approval bottlenecks: blocked rollouts. - Mitigation: RBAC with delegated approvers and time‑boxed SLAs.
 
10) Test plan
- Unit: parser, compiler, evaluator; conflict resolution; precedence.
- Property: random policies over synthetic inputs; ensure no panics and stable outputs.
- Golden: fixed SBOM + curated advisories/VEX → expected findings; compare every run.
- Performance: large SBOMs with heavy rule sets; assert run times and memory ceilings.
- Integration: end‑to‑end simulate → approve → run → diff; verify write protections.
- Chaos: inject malformed VEX, missing advisories; ensure graceful degradation and clear errors.
11) Developer checklists
Definition of Ready
- Policy grammar finalized; examples prepared.
- Linkset join queries benchmarked.
- Owner and approvers assigned.
Definition of Done
- All APIs live with RBAC.
- CLI and Console features shipped.
- Determinism and golden tests green.
- Observability dashboards deployed.
- Docs in section 7 merged.
- Two real org policies migrated and in production.
12) Glossary
- Policy: versioned rule set controlling status and severity.
- DSL: domain‑specific language used to express rules.
- Run: a single evaluation execution with defined inputs and outputs.
- Simulation: a run that doesn’t write findings; returns diffs.
- Materialization: persisted effective findings for fast queries.
- Determinism: same inputs + same policy = same outputs. Always.
Final imposed reminder
Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.