stella-ops.org/git.stella-ops.org

Fork 0

Files

root 68da90a11a

Docs CI / lint-and-preview (push) Has been cancelled

Details

Restructure solution layout by module

2025-10-28 15:10:40 +02:00

22 KiB

Raw Blame History

Fine. Here’s the next epic, written so you can paste it straight into the repo without having to babysit me. Same structure as before, maximum detail, zero hand‑waving.

Epic 2: Policy Engine & Policy Editor (VEX + Advisory Application Rules)

Short name: Policy Engine v2 Services touched: Policy Engine, Web API, Console (Policy Editor), CLI, Conseiller, Excitator, SBOM Service, Authority, Workers/Scheduler Data stores: MongoDB (policies, runs, effective findings), optional Redis/NATS for jobs

1) What it is

This epic delivers the organization‑specific decision layer for Stella. Ingestion is now AOC‑compliant (Epic 1). That means advisories and VEX arrive as immutable raw facts. This epic builds the place where those facts become effective findings under policies you control.

Core deliverables:

Policy Engine: deterministic evaluator that applies rule sets to inputs:
- Inputs: advisory_raw, vex_raw, SBOMs, optional telemetry hooks (reachability stubs), org metadata.
- Outputs: effective_finding_{policyId} materializations, with full explanation traces.
Policy Editor (Console + CLI): versioned policy authoring, simulation, review/approval workflow, and change diffs.
Rules DSL v1: safe, declarative language for VEX application, advisory normalization, and risk scoring. No arbitrary code execution, no network calls.
Run Orchestrator: incremental re‑evaluation when new raw facts or SBOM changes arrive; efficient partial updates.

The philosophy is boring on purpose: policy is a pure function of inputs. Same inputs and same policy yield the same outputs, every time, on every machine. If you want drama, watch reality TV, not your risk pipeline.

2) Why

Vendors disagree, contexts differ, and your tolerance for risk is not universal.
VEX means nothing until you decide how to apply it to your assets.
Auditors care about the “why.” You’ll need consistent, replayable answers, with traces.
Security teams need simulation before rollouts, and diffs after.

3) How it should work (deep details)

3.1 Data model

3.1.1 Policy documents (Mongo: `policies`)

{
  "_id": "policy:P-7:v3",
  "policy_id": "P-7",
  "version": 3,
  "name": "Default Org Policy",
  "status": "approved",        // draft | submitted | approved | archived
  "owned_by": "team:sec-plat",
  "valid_from": "2025-01-15T00:00:00Z",
  "valid_to": null,
  "dsl": {
    "syntax": "stella-dsl@1",
    "source": "rule-set text or compiled IR ref"
  },
  "metadata": {
    "description": "Baseline scoring + VEX precedence",
    "tags": ["baseline","vex","cvss"]
  },
  "provenance": {
    "created_by": "user:ali",
    "created_at": "2025-01-15T08:00:00Z",
    "submitted_by": "user:kay",
    "approved_by": "user:root",
    "approval_at": "2025-01-16T10:00:00Z",
    "checksum": "sha256:..."
  },
  "tenant": "default"
}

Constraints:

status=approved is required to run in production.
Version increments are append‑only. Old versions remain runnable for replay.

3.1.2 Policy runs (Mongo: `policy_runs`)

{
  "_id": "run:P-7:2025-02-20T12:34:56Z:abcd",
  "policy_id": "P-7",
  "policy_version": 3,
  "inputs": {
    "sbom_set": ["sbom:S-42"],
    "advisory_cursor": "2025-02-20T00:00:00Z",
    "vex_cursor": "2025-02-20T00:00:00Z"
  },
  "mode": "incremental",   // full | incremental | simulate
  "stats": {
    "components": 1742,
    "advisories_considered": 9210,
    "vex_considered": 1187,
    "rules_fired": 68023,
    "findings_out": 4321
  },
  "trace": {
    "location": "blob://traces/run-.../index.json",
    "sampling": "smart-10pct"
  },
  "status": "succeeded",   // queued | running | failed | succeeded | canceled
  "started_at": "2025-02-20T12:34:56Z",
  "finished_at": "2025-02-20T12:35:41Z",
  "tenant": "default"
}

3.1.3 Effective findings (Mongo: `effective_finding_P-7`)

{
  "_id": "P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337",
  "policy_id": "P-7",
  "policy_version": 3,
  "sbom_id": "S-42",
  "component_purl": "pkg:npm/lodash@4.17.21",
  "advisory_ids": ["CVE-2021-23337", "GHSA-..."],
  "status": "affected",     // affected | not_affected | fixed | under_investigation | suppressed
  "severity": {
    "normalized": "High",
    "score": 7.5,
    "vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N",
    "rationale": "cvss_base(OSV) + vendor_weighting + env_modifiers"
  },
  "rationale": [
    {"rule":"vex.precedence","detail":"VendorX not_affected justified=component_not_present wins"},
    {"rule":"advisory.cvss.normalization","detail":"mapped GHSA severity to CVSS 3.1 = 7.5"}
  ],
  "references": {
    "advisory_raw_ids": ["advisory_raw:osv:GHSA-...:v3"],
    "vex_raw_ids": ["vex_raw:VendorX:doc-123:v4"]
  },
  "run_id": "run:P-7:2025-02-20T12:34:56Z:abcd",
  "tenant": "default"
}

Write protection: only the Policy Engine service identity may write any effective_finding_* collection.

3.2 Rules DSL v1 (stella‑dsl@1)

Design goals

Declarative, composable, deterministic.
No loops, no network IO, no non‑deterministic time.
Policy authors see readable text; the engine compiles to a safe IR.

Concepts

WHEN condition matches a tuple (sbom_component, advisory, optional vex_statements)
THEN actions set status, compute severity, attach rationale, or suppress with reason.
Profiles for severity and scoring; Maps for vendor weighting; Guards for VEX justification.

Mini‑grammar (subset)

policy "Default Org Policy" syntax "stella-dsl@1" {

  profile severity {
    map vendor_weight {
      source "GHSA" => +0.5
      source "OSV"  => +0.0
      source "VendorX" => -0.2
    }
    env base_cvss {
      if env.runtime == "serverless" then -0.5
      if env.exposure == "internal-only" then -1.0
    }
  }

  rule vex_precedence {
    when vex.any(status in ["not_affected","fixed"])
      and vex.justification in ["component_not_present","vulnerable_code_not_present"]
    then status := vex.status
         because "VEX strong justification prevails";
  }

  rule advisory_to_cvss {
    when advisory.source in ["GHSA","OSV"]
    then severity := normalize_cvss(advisory)
         because "Map vendor severity or CVSS vector";
  }

  rule reachability_soft_suppress {
    when severity.normalized <= "Medium"
      and telemetry.reachability == "none"
    then status := "suppressed"
         because "not reachable and low severity";
  }
}

Built‑ins (non‑exhaustive)

normalize_cvss(advisory) maps GHSA/OSV/CSAF severity fields to CVSS v3.1 numbers when possible; otherwise vendor‑to‑numeric mapping table in policy.
vex.any(...) tests across matching VEX statements for the same (component, advisory).
telemetry.* is an optional input namespace reserved for future reachability data; if absent, expressions evaluate to unknown (no effect).

Determinism

Rules are evaluated in stable order: explicit priority attribute or lexical order.
First‑match semantics for conflicting status unless combine is used.
Severity computations are pure; numeric maps are part of policy document.

3.3 Evaluation model

Selection
- For each SBOM component PURL, find candidate advisories from advisory_raw via linkset PURLs or identifiers.
- For each pair (component, advisory), load all matching VEX facts from vex_raw.
Context assembly
- Build an evaluation context from:
  - sbom_component: PURL, licenses, relationships.
  - advisory: source, identifiers, references, embedded vendor severity (kept in content.raw).
  - vex: list of statements with status and justification.
  - env: org‑specific env vars configured per policy run (e.g., exposure).
  - Optional telemetry if available.
Rule execution
- Compile DSL to IR once per policy version; cache.
- Execute rules per tuple; record which rules fired and the order.
- If no rule sets status, default is affected.
- If no rule sets severity, default severity uses normalize_cvss(advisory) with vendor defaults.
Materialization
- Write to effective_finding_{policyId} with rationale chain and references to raw docs.
- Emit per‑tuple trace events; sample and store full traces per run.
Incremental updates
- A watch job observes new advisory_raw and vex_raw inserts and SBOM deltas.
- The orchestrator computes the affected tuples and re‑evaluates only those.
Replay
- Any policy_run is fully reproducible by (policy_id, version, input set, cursors).

3.4 VEX application semantics

Precedence: a not_affected with strong justification (component_not_present, vulnerable_code_not_present, fix_not_required) wins unless another rule explicitly overrides by environment context.
Scoping: VEX statements often specify product/component scope. Matching uses PURL equivalence and version ranges extracted during ingestion linkset generation.
Conflicts: If multiple VEX statements conflict, the default is most‑specific scope wins (component > product > vendor), then newest document_version. Policies can override with explicit rules.
Explainability: Every VEX‑driven decision records which statement IDs were considered and which one won.

3.5 Advisory normalization rules

Vendor severity mapping: Map GHSA levels or CSAF product‑tree severities to CVSS‑like numeric bands via policy maps.
CVSS vector use: If a valid vector exists in content.raw, parse and compute; apply policy modifiers from profile severity.
Temporal/environment modifiers: Optional reductions for network exposure, isolation, or compensating controls, all encoded in policy.

3.6 Performance and scale

Partition evaluation by SBOM ID and hash ranges of PURLs.
Pre‑index advisory_raw.linkset.purls and vex_raw.linkset.purls (already in Epic 1).
Use streaming iterators; avoid loading entire SBOM or advisory sets into memory.
Materialize only changed findings (diff‑aware writes).
Target: 100k components, 1M advisories considered, 5 minutes incremental SLA on commodity hardware.

3.7 Error codes

Code	Meaning	HTTP
`ERR_POL_001`	Policy syntax error	400
`ERR_POL_002`	Policy not approved for run	403
`ERR_POL_003`	Missing inputs (SBOM/advisory/vex fetch failed)	424
`ERR_POL_004`	Determinism guard triggered (non‑pure function usage)	500
`ERR_POL_005`	Write denied to effective findings (caller invalid)	403
`ERR_POL_006`	Run canceled or timed out	408

3.8 Observability

Metrics:
- policy_compile_seconds, policy_run_seconds{mode=...}, rules_fired_total, findings_written_total, vex_overrides_total, simulate_diff_total{delta=up|down|unchanged}.
Tracing:
- Spans: policy.compile, policy.select, policy.eval, policy.materialize.
Logs:
- Include policy_id, version, run_id, sbom_id, component_purl, advisory_id, vex_count, rule_hits.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.9 Security and tenancy

Only users with policy:write can create/modify policies.
policy:approve is a separate privileged role.
Only Policy Engine service identity has effective:write.
Tenancy is explicit on all documents and queries.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

4) API surface

4.1 Policy CRUD and lifecycle

POST /policies create draft
GET /policies?status=... list
GET /policies/{policyId}/versions/{v} fetch
POST /policies/{policyId}/submit move draft to submitted
POST /policies/{policyId}/approve approve version
POST /policies/{policyId}/archive archive version

4.2 Compilation and validation

POST /policies/{policyId}/versions/{v}/compile
- Returns IR checksum, syntax diagnostics, rule stats.

4.3 Runs

POST /policies/{policyId}/runs body: {mode, sbom_set, advisory_cursor?, vex_cursor?, env?}
GET /policies/{policyId}/runs/{runId} status + stats
POST /policies/{policyId}/simulate returns diff vs current approved version on a sample SBOM set.

4.4 Findings and explanations

GET /findings/{policyId}?sbom_id=S-42&status=affected&severity=High+Critical
GET /findings/{policyId}/{findingId}/explain returns ordered rule hits and linked raw IDs.

All endpoints require tenant scoping and appropriate policy:* or findings:* roles.

5) Console (Policy Editor) and CLI behavior

Console

Monaco‑style editor with DSL syntax highlighting, lint, quick docs.
Side‑by‑side Simulation panel: show count of affected findings before/after.
Approval workflow: submit, review comments, approve with rationale.
Diffs: show rule‑wise changes and estimated impact.
Read‑only run viewer: heatmap of rules fired, top suppressions, VEX wins.

CLI

stella policy new --name "Default Org Policy"
stella policy edit P-7 opens local editor -> submit
stella policy approve P-7 --version 3
stella policy simulate P-7 --sbom S-42 --env exposure=internal-only
stella findings ls --policy P-7 --sbom S-42 --status affected

Exit codes map to ERR_POL_*.