Fine. Here’s the next epic, written so you can paste it straight into the repo without having to babysit me. Same structure as before, maximum detail, zero hand‑waving.

---

# Epic 2: Policy Engine & Policy Editor (VEX + Advisory Application Rules)

> Short name: **Policy Engine v2**
> Services touched: **Policy Engine, Web API, Console (Policy Editor), CLI, Conseiller, Excitator, SBOM Service, Authority, Workers/Scheduler**
> Data stores: **MongoDB (policies, runs, effective findings), optional Redis/NATS for jobs**

---

## 1) What it is

This epic delivers the **organization‑specific decision layer** for Stella. Ingestion is now AOC‑compliant (Epic 1). That means advisories and VEX arrive as immutable raw facts. This epic builds the place where those facts become **effective findings** under policies you control.

Core deliverables:

* **Policy Engine**: deterministic evaluator that applies rule sets to inputs:

  * Inputs: `advisory_raw`, `vex_raw`, SBOMs, optional telemetry hooks (reachability stubs), org metadata.
  * Outputs: `effective_finding_{policyId}` materializations, with full explanation traces.
* **Policy Editor (Console + CLI)**: versioned policy authoring, simulation, review/approval workflow, and change diffs.
* **Rules DSL v1**: safe, declarative language for VEX application, advisory normalization, and risk scoring. No arbitrary code execution, no network calls.
* **Run Orchestrator**: incremental re‑evaluation when new raw facts or SBOM changes arrive; efficient partial updates.

The philosophy is boring on purpose: policy is a **pure function of inputs**. Same inputs and same policy yield the same outputs, every time, on every machine. If you want drama, watch reality TV, not your risk pipeline.

---

## 2) Why

* Vendors disagree, contexts differ, and your tolerance for risk is not universal.
* VEX means nothing until you decide **how** to apply it to **your** assets.
* Auditors care about the “why.” You’ll need consistent, replayable answers, with traces.
* Security teams need **simulation** before rollouts, and **diffs** after.

---

## 3) How it should work (deep details)

### 3.1 Data model

#### 3.1.1 Policy documents (Mongo: `policies`)

```json
{
  "_id": "policy:P-7:v3",
  "policy_id": "P-7",
  "version": 3,
  "name": "Default Org Policy",
  "status": "approved",        // draft | submitted | approved | archived
  "owned_by": "team:sec-plat",
  "valid_from": "2025-01-15T00:00:00Z",
  "valid_to": null,
  "dsl": {
    "syntax": "stella-dsl@1",
    "source": "rule-set text or compiled IR ref"
  },
  "metadata": {
    "description": "Baseline scoring + VEX precedence",
    "tags": ["baseline","vex","cvss"]
  },
  "provenance": {
    "created_by": "user:ali",
    "created_at": "2025-01-15T08:00:00Z",
    "submitted_by": "user:kay",
    "approved_by": "user:root",
    "approval_at": "2025-01-16T10:00:00Z",
    "checksum": "sha256:..."
  },
  "tenant": "default"
}
```

Constraints:

* `status=approved` is required to run in production.
* Version increments are append‑only. Old versions remain runnable for replay.

#### 3.1.2 Policy runs (Mongo: `policy_runs`)

```json
{
  "_id": "run:P-7:2025-02-20T12:34:56Z:abcd",
  "policy_id": "P-7",
  "policy_version": 3,
  "inputs": {
    "sbom_set": ["sbom:S-42"],
    "advisory_cursor": "2025-02-20T00:00:00Z",
    "vex_cursor": "2025-02-20T00:00:00Z"
  },
  "mode": "incremental",   // full | incremental | simulate
  "stats": {
    "components": 1742,
    "advisories_considered": 9210,
    "vex_considered": 1187,
    "rules_fired": 68023,
    "findings_out": 4321
  },
  "trace": {
    "location": "blob://traces/run-.../index.json",
    "sampling": "smart-10pct"
  },
  "status": "succeeded",   // queued | running | failed | succeeded | canceled
  "started_at": "2025-02-20T12:34:56Z",
  "finished_at": "2025-02-20T12:35:41Z",
  "tenant": "default"
}
```

#### 3.1.3 Effective findings (Mongo: `effective_finding_P-7`)

```json
{
  "_id": "P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337",
  "policy_id": "P-7",
  "policy_version": 3,
  "sbom_id": "S-42",
  "component_purl": "pkg:npm/lodash@4.17.21",
  "advisory_ids": ["CVE-2021-23337", "GHSA-..."],
  "status": "affected",     // affected | not_affected | fixed | under_investigation | suppressed
  "severity": {
    "normalized": "High",
    "score": 7.5,
    "vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N",
    "rationale": "cvss_base(OSV) + vendor_weighting + env_modifiers"
  },
  "rationale": [
    {"rule":"vex.precedence","detail":"VendorX not_affected justified=component_not_present wins"},
    {"rule":"advisory.cvss.normalization","detail":"mapped GHSA severity to CVSS 3.1 = 7.5"}
  ],
  "references": {
    "advisory_raw_ids": ["advisory_raw:osv:GHSA-...:v3"],
    "vex_raw_ids": ["vex_raw:VendorX:doc-123:v4"]
  },
  "run_id": "run:P-7:2025-02-20T12:34:56Z:abcd",
  "tenant": "default"
}
```

Write protection: only the **Policy Engine** service identity may write any `effective_finding_*` collection.

---

### 3.2 Rules DSL v1 (stella‑dsl@1)

**Design goals**

* Declarative, composable, deterministic.
* No loops, no network IO, no non‑deterministic time.
* Policy authors see readable text; the engine compiles to a safe IR.

**Concepts**

* **WHEN** condition matches a tuple `(sbom_component, advisory, optional vex_statements)`
* **THEN** actions set `status`, compute `severity`, attach `rationale`, or `suppress` with reason.
* **Profiles** for severity and scoring; **Maps** for vendor weighting; **Guards** for VEX justification.

**Mini‑grammar (subset)**

```
policy "Default Org Policy" syntax "stella-dsl@1" {

  profile severity {
    map vendor_weight {
      source "GHSA" => +0.5
      source "OSV"  => +0.0
      source "VendorX" => -0.2
    }
    env base_cvss {
      if env.runtime == "serverless" then -0.5
      if env.exposure == "internal-only" then -1.0
    }
  }

  rule vex_precedence {
    when vex.any(status in ["not_affected","fixed"])
      and vex.justification in ["component_not_present","vulnerable_code_not_present"]
    then status := vex.status
         because "VEX strong justification prevails";
  }

  rule advisory_to_cvss {
    when advisory.source in ["GHSA","OSV"]
    then severity := normalize_cvss(advisory)
         because "Map vendor severity or CVSS vector";
  }

  rule reachability_soft_suppress {
    when severity.normalized <= "Medium"
      and telemetry.reachability == "none"
    then status := "suppressed"
         because "not reachable and low severity";
  }
}
```

**Built‑ins** (non‑exhaustive)

* `normalize_cvss(advisory)` maps GHSA/OSV/CSAF severity fields to CVSS v3.1 numbers when possible; otherwise vendor‑to‑numeric mapping table in policy.
* `vex.any(...)` tests across matching VEX statements for the same `(component, advisory)`.
* `telemetry.*` is an optional input namespace reserved for future reachability data; if absent, expressions evaluate to `unknown` (no effect).

**Determinism**

* Rules are evaluated in **stable order**: explicit `priority` attribute or lexical order.
* **First‑match** semantics for conflicting status unless `combine` is used.
* Severity computations are pure; numeric maps are part of policy document.

---

### 3.3 Evaluation model

1. **Selection**

   * For each SBOM component PURL, find candidate advisories from `advisory_raw` via linkset PURLs or identifiers.
   * For each pair `(component, advisory)`, load all matching VEX facts from `vex_raw`.

2. **Context assembly**

   * Build an evaluation context from:

     * `sbom_component`: PURL, licenses, relationships.
     * `advisory`: source, identifiers, references, embedded vendor severity (kept in `content.raw`).
     * `vex`: list of statements with status and justification.
     * `env`: org‑specific env vars configured per policy run (e.g., exposure).
     * Optional `telemetry` if available.

3. **Rule execution**

   * Compile DSL to IR once per policy version; cache.
   * Execute rules per tuple; record which rules fired and the order.
   * If no rule sets status, default is `affected`.
   * If no rule sets severity, default severity uses `normalize_cvss(advisory)` with vendor defaults.

4. **Materialization**

   * Write to `effective_finding_{policyId}` with `rationale` chain and references to raw docs.
   * Emit per‑tuple trace events; sample and store full traces per run.

5. **Incremental updates**

   * A watch job observes new `advisory_raw` and `vex_raw` inserts and SBOM deltas.
   * The orchestrator computes the affected tuples and re‑evaluates only those.

6. **Replay**

   * Any `policy_run` is fully reproducible by `(policy_id, version, input set, cursors)`.

---

### 3.4 VEX application semantics

* **Precedence**: a `not_affected` with strong justification (`component_not_present`, `vulnerable_code_not_present`, `fix_not_required`) wins unless another rule explicitly overrides by environment context.
* **Scoping**: VEX statements often specify product/component scope. Matching uses PURL equivalence and version ranges extracted during ingestion linkset generation.
* **Conflicts**: If multiple VEX statements conflict, the default is **most‑specific scope wins** (component > product > vendor), then newest `document_version`. Policies can override with explicit rules.
* **Explainability**: Every VEX‑driven decision records which statement IDs were considered and which one won.

---

### 3.5 Advisory normalization rules

* **Vendor severity mapping**: Map GHSA levels or CSAF product‑tree severities to CVSS‑like numeric bands via policy maps.
* **CVSS vector use**: If a valid vector exists in `content.raw`, parse and compute; apply policy modifiers from `profile severity`.
* **Temporal/environment modifiers**: Optional reductions for network exposure, isolation, or compensating controls, all encoded in policy.

---

### 3.6 Performance and scale

* Partition evaluation by SBOM ID and hash ranges of PURLs.
* Pre‑index `advisory_raw.linkset.purls` and `vex_raw.linkset.purls` (already in Epic 1).
* Use streaming iterators; avoid loading entire SBOM or advisory sets into memory.
* Materialize only changed findings (diff‑aware writes).
* Target: 100k components, 1M advisories considered, 5 minutes incremental SLA on commodity hardware.

---

### 3.7 Error codes

| Code          | Meaning                                               | HTTP |
| ------------- | ----------------------------------------------------- | ---- |
| `ERR_POL_001` | Policy syntax error                                   | 400  |
| `ERR_POL_002` | Policy not approved for run                           | 403  |
| `ERR_POL_003` | Missing inputs (SBOM/advisory/vex fetch failed)       | 424  |
| `ERR_POL_004` | Determinism guard triggered (non‑pure function usage) | 500  |
| `ERR_POL_005` | Write denied to effective findings (caller invalid)   | 403  |
| `ERR_POL_006` | Run canceled or timed out                             | 408  |

---

### 3.8 Observability

* Metrics:

  * `policy_compile_seconds`, `policy_run_seconds{mode=...}`, `rules_fired_total`, `findings_written_total`, `vex_overrides_total`, `simulate_diff_total{delta=up|down|unchanged}`.
* Tracing:

  * Spans: `policy.compile`, `policy.select`, `policy.eval`, `policy.materialize`.
* Logs:

  * Include `policy_id`, `version`, `run_id`, `sbom_id`, `component_purl`, `advisory_id`, `vex_count`, `rule_hits`.

> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

---

### 3.9 Security and tenancy

* Only users with `policy:write` can create/modify policies.
* `policy:approve` is a separate privileged role.
* Only Policy Engine service identity has `effective:write`.
* Tenancy is explicit on all documents and queries.

> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

---

## 4) API surface

### 4.1 Policy CRUD and lifecycle

* `POST /policies` create draft
* `GET /policies?status=...` list
* `GET /policies/{policyId}/versions/{v}` fetch
* `POST /policies/{policyId}/submit` move draft to submitted
* `POST /policies/{policyId}/approve` approve version
* `POST /policies/{policyId}/archive` archive version

### 4.2 Compilation and validation

* `POST /policies/{policyId}/versions/{v}/compile`

  * Returns IR checksum, syntax diagnostics, rule stats.

### 4.3 Runs

* `POST /policies/{policyId}/runs` body: `{mode, sbom_set, advisory_cursor?, vex_cursor?, env?}`
* `GET /policies/{policyId}/runs/{runId}` status + stats
* `POST /policies/{policyId}/simulate` returns **diff** vs current approved version on a sample SBOM set.

### 4.4 Findings and explanations

* `GET /findings/{policyId}?sbom_id=S-42&status=affected&severity=High+Critical`
* `GET /findings/{policyId}/{findingId}/explain` returns ordered rule hits and linked raw IDs.

All endpoints require tenant scoping and appropriate `policy:*` or `findings:*` roles.

---

## 5) Console (Policy Editor) and CLI behavior

**Console**

* Monaco‑style editor with DSL syntax highlighting, lint, quick docs.
* Side‑by‑side **Simulation** panel: show count of affected findings before/after.
* Approval workflow: submit, review comments, approve with rationale.
* Diffs: show rule‑wise changes and estimated impact.
* Read‑only run viewer: heatmap of rules fired, top suppressions, VEX wins.

**CLI**

* `stella policy new --name "Default Org Policy"`
* `stella policy edit P-7` opens local editor -> `submit`
* `stella policy approve P-7 --version 3`
* `stella policy simulate P-7 --sbom S-42 --env exposure=internal-only`
* `stella findings ls --policy P-7 --sbom S-42 --status affected`

Exit codes map to `ERR_POL_*`.

---

## 6) Implementation tasks

### 6.1 Policy Engine service

* [ ] Implement DSL parser and IR compiler (`stella-dsl@1`).
* [ ] Build evaluator with stable ordering and first‑match semantics.
* [ ] Implement selection joiners for SBOM↔advisory↔vex using linksets.
* [ ] Materialization writer with upsert‑only semantics to `effective_finding_{policyId}`.
* [ ] Determinism guard (ban wall‑clock, network, and RNG during eval).
* [ ] Incremental orchestrator listening to advisory/vex/SBOM change streams.
* [ ] Trace emitter with rule‑hit sampling.
* [ ] Unit tests, property tests, golden fixtures; perf tests to target SLA.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.2 Web API

* [ ] Policy CRUD, compile, run, simulate, findings, explain endpoints.
* [ ] Pagination, filters, and tenant enforcement on all list endpoints.
* [ ] Error mapping to `ERR_POL_*`.
* [ ] Rate limits on simulate endpoints.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.3 Console (Policy Editor)

* [ ] Editor with DSL syntax highlighting and inline diagnostics.
* [ ] Simulation UI with pre/post counts and top deltas.
* [ ] Approval workflow UI with audit trail.
* [ ] Run viewer dashboards (rule heatmap, VEX wins, suppressions).

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.4 CLI

* [ ] New commands: `policy new|edit|submit|approve|simulate`, `findings ls|get`.
* [ ] Json/YAML output formats for CI consumption.
* [ ] Non‑zero exits on syntax errors or simulation failures; map to `ERR_POL_*`.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.5 Conseiller & Excitator integration

* [ ] Provide search endpoints optimized for policy selection (batch by PURLs and IDs).
* [ ] Harden linkset extraction to maximize join recall.
* [ ] Add cursors for incremental selection windows per run.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.6 SBOM Service

* [ ] Ensure fast PURL index and component metadata projection for policy queries.
* [ ] Provide relationship graph API for future transitive logic.
* [ ] Emit change events on SBOM updates.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.7 Authority

* [ ] Define scopes: `policy:write`, `policy:approve`, `policy:run`, `findings:read`, `effective:write`.
* [ ] Issue service identity for Policy Engine with `effective:write` only.
* [ ] Enforce tenant claims at gateway.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

### 6.8 CI/CD

* [ ] Lint policy DSL in PRs; block invalid syntax.
* [ ] Run `simulate` against golden SBOMs to detect explosive deltas.
* [ ] Determinism CI: two runs with identical seeds produce identical outputs.

**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

---

## 7) Documentation changes (create/update these files)

1. **`/docs/policy/overview.md`**

   * What the Policy Engine is, high‑level concepts, inputs, outputs, determinism.
2. **`/docs/policy/dsl.md`**

   * Full grammar, built‑ins, examples, best practices, anti‑patterns.
3. **`/docs/policy/lifecycle.md`**

   * Draft → submitted → approved → archived, roles, and audit trail.
4. **`/docs/policy/runs.md`**

   * Run modes, incremental mechanics, cursors, replay.
5. **`/docs/api/policy.md`**

   * Endpoints, request/response schemas, error codes.
6. **`/docs/cli/policy.md`**

   * Command usage, examples, exit codes, JSON output contracts.
7. **`/docs/ui/policy-editor.md`**

   * Screens, workflows, simulation, diffs, approvals.
8. **`/docs/architecture/policy-engine.md`**

   * Detailed sequence diagrams, selection/join strategy, materialization schema.
9. **`/docs/observability/policy.md`**

   * Metrics, tracing, logs, sample dashboards.
10. **`/docs/security/policy-governance.md`**

    * Scopes, approvals, tenancy, least privilege.
11. **`/docs/examples/policies/`**

    * `baseline.pol`, `serverless.pol`, `internal-only.pol`, each with commentary.
12. **`/docs/faq/policy-faq.md`**

    * Common pitfalls, VEX conflict handling, determinism gotchas.

Each file includes a **Compliance checklist** for authors and reviewers.

> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

---

## 8) Acceptance criteria

* Policies are versioned, approvable, and compilable; invalid DSL blocks merges.
* Engine produces deterministic outputs with full rationale chains.
* VEX precedence rules work per spec and are overridable by policy.
* Simulation yields accurate pre/post deltas and diffs.
* Only Policy Engine can write to `effective_finding_*`.
* Incremental runs pick up new advisories/VEX/SBOM changes without full re‑runs.
* Console and CLI cover authoring, simulation, approval, and retrieval.
* Observability dashboards show rule hits, VEX wins, and run timings.

---

## 9) Risks and mitigations

* **Policy sprawl**: too many similar policies.

  * Mitigation: templates, policy inheritance in v1.1, tagging, ownership metadata.
* **Non‑determinism creep**: someone sneaks wall‑clock or network into evaluation.

  * Mitigation: determinism guard, static analyzer, and CI replay check.
* **Join miss‑rate**: weak linksets cause under‑matching.

  * Mitigation: linkset strengthening in ingestion, PURL equivalence tables, monitoring for “zero‑hit” rates.
* **Approval bottlenecks**: blocked rollouts.

  * Mitigation: RBAC with delegated approvers and time‑boxed SLAs.

---

## 10) Test plan

* **Unit**: parser, compiler, evaluator; conflict resolution; precedence.
* **Property**: random policies over synthetic inputs; ensure no panics and stable outputs.
* **Golden**: fixed SBOM + curated advisories/VEX → expected findings; compare every run.
* **Performance**: large SBOMs with heavy rule sets; assert run times and memory ceilings.
* **Integration**: end‑to‑end simulate → approve → run → diff; verify write protections.
* **Chaos**: inject malformed VEX, missing advisories; ensure graceful degradation and clear errors.

---

## 11) Developer checklists

**Definition of Ready**

* Policy grammar finalized; examples prepared.
* Linkset join queries benchmarked.
* Owner and approvers assigned.

**Definition of Done**

* All APIs live with RBAC.
* CLI and Console features shipped.
* Determinism and golden tests green.
* Observability dashboards deployed.
* Docs in section 7 merged.
* Two real org policies migrated and in production.

---

## 12) Glossary

* **Policy**: versioned rule set controlling status and severity.
* **DSL**: domain‑specific language used to express rules.
* **Run**: a single evaluation execution with defined inputs and outputs.
* **Simulation**: a run that doesn’t write findings; returns diffs.
* **Materialization**: persisted effective findings for fast queries.
* **Determinism**: same inputs + same policy = same outputs. Always.

---

### Final imposed reminder

**Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.**