feat: Add new provenance and crypto registry documentation

- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
2025-11-18 23:47:13 +02:00
parent d3ecd7f8e6
commit e91da22836
44 changed files with 6793 additions and 99 deletions
--- a/docs/product-advisories/18-Nov-2026
+++ b/docs/product-advisories/18-Nov-2026
@@ -0,0 +1,719 @@
+
+Here’s a crisp idea you can drop straight into Stella Ops: treat “unknowns” as first‑class data, not noise.
+
+---
+
+# Unknowns Registry — turning uncertainty into signals
+
+**Why:** Scanners and VEX feeds miss things (ambiguous package IDs, unverifiable hashes, orphaned layers, missing SBOM edges, runtime-only artifacts). Today these get logged and forgotten. If we **structure** them, downstream agents can reason about risk and shrink blast radius proactively.
+
+**What it is:** A small service + schema that records every uncertainty with enough context for later inference.
+
+## Core model (v0)
+
+```json
+{
+  "unknown_id": "unk:sha256:…",
+  "observed_at": "2025-11-18T12:00:00Z",
+  "provenance": {
+    "source": "Scanner.Analyzer.DotNet|Sbomer|Signals|Vexer",
+    "host": "runner-42",
+    "scan_id": "scan:…"
+  },
+  "scope": {
+    "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
+    "subpath": "/app/bin/Contoso.dll",
+    "phase": "build|scan|runtime"
+  },
+  "unknown_type": "identity_gap|version_conflict|hash_mismatch|missing_edge|runtime_shadow|policy_undecidable",
+  "evidence": {
+    "raw": "nuget id 'Serilog' but assembly name 'Serilog.Core'",
+    "signals": ["sym:Serilog.Core.Logger", "procopen:/app/agent"]
+  },
+  "transitive": {
+    "depth": 2,
+    "parents": ["pkg:nuget/Serilog@?"],
+    "children": []
+  },
+  "confidence": { "p": 0.42, "method": "bayes-merge|rule" },
+  "exposure_hints": {
+    "surface": ["logging pipeline", "startup path"],
+    "runtime_hits": 3
+  },
+  "status": "open|triaged|suppressed|resolved",
+  "labels": ["reachability:possible", "sbom:incomplete"]
+}
+```
+
+## Categorize by three axes
+
+* **Provenance** (where it came from): Scanner vs Sbomer vs Vexer vs Signals.
+* **Scope** (what it touches): image/layer/file/symbol/runtime‑proc/policy.
+* **Transitive depth** (how far from an entry point): 0 = direct, 1..N via deps.
+
+## How agents use it
+
+* **Cartographer**: includes unknown edges in the graph with special weight; lets Policy/Lattice down‑rank vulnerable nodes near high‑impact unknowns.
+* **Remedy Assistant (Zastava)**: proposes micro‑probes (“add EventPipe/JFR tap for X symbol”) or build‑time assertions (“pin Serilog>=3.1, regenerate SBOM”).
+* **Scheduler**: prioritizes scans where unknown density × asset criticality is highest.
+
+## Minimal API (idempotent, additive)
+
+* `POST /unknowns/ingest` — upsert by `unknown_id` (hash of type+scope+evidence).
+* `GET  /unknowns?artifact=…&status=open` — list for a target.
+* `POST /unknowns/:id/triage` — set status/labels, attach rationale.
+* `GET  /metrics` — density by artifact/namespace/unknown_type.
+
+*All additive; no versioning required. Repeat calls with the same payload are no‑ops.*
+
+## Scoring hook (into your lattice)
+
+* Add a **“Unknowns Pressure”** term:
+  `risk = base ⊕ (α * density_depth≤1) ⊕ (β * runtime_shadow) ⊕ (γ * policy_undecidable)`
+* Gate “green” only if `density_depth≤1 == 0` **or** compensating controls active.
+
+## Storage & plumbing
+
+* **Store:** append‑only KV (Badger/Rocks) + Graph overlay (SQLite/Neo4j—your call).
+* **Emit:** DSSE‑signed “Unknowns Attestation” per scan for replayable audits.
+* **UI:** heatmap per artifact (unknowns by type × depth), drill‑down to evidence.
+
+## First 2‑day slice
+
+1. Define `unknown_type` enum + hashable `unknown_id`.
+2. Wire Scanner/Sbomer/Vexer to emit unknowns (start with: identity_gap, missing_edge).
+3. Persist + expose `/metrics` (density, by depth and type).
+4. In Policy Studio, add the Unknowns Pressure term with default α/β/γ.
+
+If you want, I’ll draft the exact protobuf/JSON schema and drop a .NET 10 record types + EF model, plus a tiny CLI to query and a Grafana panel JSON.
+I will treat “it” as the whole vision behind **Pushing Binary Reachability Toward True Determinism** inside Stella Ops: function-/symbol-level reachability for binaries and higher-level languages, wired into Scanner, Cartographer, Signals, and VEX.
+
+Below is an implementation-oriented architecture plan you can hand directly to agents.
+
+---
+
+## 1. Scope, goals, and non-negotiable invariants
+
+### 1.1. Scope
+
+Deliver a deterministic reachability pipeline for containers that:
+
+1. Builds **call graphs** and **symbol usage maps** for:
+
+   * Native binaries (ELF, PE, Mach-O) — primary for this branch.
+   * Scripted/VM languages later: JS, Python, PHP (as part of the same architecture).
+2. Maps symbols and functions to:
+
+   * Packages (purls).
+   * Vulnerabilities (CVE → symbol/function list via Concelier/VEX data).
+3. Computes **deterministic reachability states** for each `(vulnerability, artifact)` pair.
+4. Emits:
+
+   * Machine-readable JSON (with `purl`s).
+   * Graph overlays for Cartographer.
+   * Inputs for the lattice/trust engine and VEXer/Excitor.
+
+### 1.2. Invariants
+
+* **Deterministic replay**: Given the same:
+
+  * Image digest(s),
+  * Analyzer versions,
+  * Config + policy,
+  * Runtime trace inputs (if any),
+    the same reachability outputs must be produced, bit-for-bit.
+* **Idempotent, additive APIs**:
+
+  * No versioning of endpoints, only additive/optional fields.
+  * Same request = same response, no side effects besides storing/caching.
+* **Lattice logic runs in `Scanner.WebService`**:
+
+  * All “reachable/unreachable/unknown” and confidence merging lives in Scanner, not Concelier/Excitors.
+* **Preserve prune source**:
+
+  * Concelier and Excitors preserve provenance and do not “massage” reachability; they only consume it.
+* **Offline, air-gap friendly**:
+
+  * No mandatory external calls; dependency on local analyzers and local advisory/VEX cache.
+
+---
+
+## 2. High-level pipeline
+
+From container image to reachability output:
+
+1. **Image enumeration**
+   `Scanner.WebService` receives an image ref or tarball and spawns an analysis run.
+2. **Binary discovery & classification**
+   Binary analyzers detect ELF/PE/Mach-O + main interpreters (python, node, php) and scripts.
+3. **Symbolization & call graph building**
+
+   * For each binary/module, we produce:
+
+     * Symbol table (exported + imported).
+     * Call graph edges (function-level where possible).
+   * For dynamic languages, we later plug in appropriate analyzers.
+4. **Symbol→package mapping**
+
+   * Match symbols to packages and `purl`s using:
+
+     * Known vendor symbol maps (from Concelier / Feedser).
+     * Heuristics, path patterns, build IDs.
+5. **Vulnerability→symbol mapping**
+
+   * From Concelier/VEX/CSAF: map each CVE to the set of symbols/functions it affects.
+6. **Reachability solving**
+
+   * For each `(CVE, artifact)`:
+
+     * Determine presence and reachability of affected symbols from known entrypoints.
+     * Merge static call graph and runtime signals (if available) via deterministic lattice.
+7. **Output & storage**
+
+   * Reachability JSON with purls and confidence.
+   * Graph overlay into Cartographer.
+   * Signals/events for downstream scoring.
+   * DSSE-signed reachability attestation for replay/audit.
+
+---
+
+## 3. Component architecture
+
+### 3.1. New and extended services
+
+1. **`StellaOps.Scanner.WebService` (extended)**
+
+   * Orchestration of reachability analyses.
+   * Lattice/merging engine.
+   * Idempotent reachability APIs.
+
+2. **`StellaOps.Scanner.Analyzers.Binary.*` (new)**
+
+   * `…Binary.Discovery`: file type detection, ELF/PE/Mach-O parsing.
+   * `…Binary.Symbolizer`: resolves symbols, imports/exports, relocations.
+   * `…Binary.CallGraph.Native`: builds call graphs where possible (via disassembly/CFG).
+   * `…Binary.CallGraph.DynamicStubs`: heuristics for indirect calls, PLT/GOT, vtables.
+
+3. **`StellaOps.Scanner.Analyzers.Script.*` (future extension)**
+
+   * `…Lang.JavaScript.CallGraph`
+   * `…Lang.Python.CallGraph`
+   * `…Lang.Php.CallGraph`
+   * These emit the same generic call-graph IR.
+
+4. **`StellaOps.Reachability.Engine` (within Scanner.WebService)**
+
+   * Normalizes all call graphs into a common IR.
+   * Merges static and dynamic evidence.
+   * Computes reachability states and scores.
+
+5. **`StellaOps.Cartographer.ReachabilityOverlay` (new overlay module)**
+
+   * Stores per-artifact call graphs and reachability tags.
+   * Provides graph queries for UI and policy tools.
+
+6. **`StellaOps.Signals` (extended)**
+
+   * Ingests runtime call traces (e.g., from EventPipe/JFR/ebpf in other branches).
+   * Feeds function-hit events into the Reachability Engine.
+
+7. **Unknowns Registry integration (optional but recommended)**
+
+   * Stores unresolved symbol/package mappings and incomplete edges as `unknowns`.
+   * Used to adjust risk scores (“Unknowns Pressure”) when binary analysis is incomplete.
+
+---
+
+## 4. Detailed design by layer
+
+### 4.1. Static analysis layer (binaries)
+
+#### 4.1.1. Binary discovery
+
+Module: `StellaOps.Scanner.Analyzers.Binary.Discovery`
+
+* Inputs:
+
+  * Per-image file list (from existing Scanner).
+  * Byte slices of candidate binaries.
+* Logic:
+
+  * Detect ELF/PE/Mach-O via magic bytes, not extensions.
+  * Classify as:
+
+    * Main executable
+    * Shared library
+    * Plugin/module
+* Output:
+
+  * `binary_manifest.json` per image:
+
+    ```json
+    {
+      "image_ref": "registry/app@sha256:…",
+      "binaries": [
+        {
+          "id": "bin:elf:/usr/local/bin/app",
+          "path": "/usr/local/bin/app",
+          "format": "elf",
+          "arch": "x86_64",
+          "role": "executable"
+        }
+      ]
+    }
+    ```
+
+#### 4.1.2. Symbolization
+
+Module: `StellaOps.Scanner.Analyzers.Binary.Symbolizer`
+
+* Uses:
+
+  * ELF/PE/Mach-O parsers (internal or third-party), no external calls.
+* Output per binary:
+
+  ```json
+  {
+    "binary_id": "bin:elf:/usr/local/bin/app",
+    "build_id": "buildid:abcd…",
+    "exports": ["pkg1::ClassA::method1", "..."],
+    "imports": ["openssl::EVP_EncryptInit_ex", "..."],
+    "sections": { "text": { "va": "0x...", "size": 12345 } }
+  }
+  ```
+* Writes unresolved symbol sets to Unknowns Registry when:
+
+  * Imports cannot be tied to known packages or symbols.
+
+#### 4.1.3. Call graph construction
+
+Module: `StellaOps.Scanner.Analyzers.Binary.CallGraph.Native`
+
+* Core tasks:
+
+  * Build control-flow graphs (CFG) for each function via:
+
+    * Disassembly.
+    * Basic block detection.
+  * Identify direct calls (`call func`) and indirect calls (function pointers, vtables).
+* IR model:
+
+  ```json
+  {
+    "binary_id": "bin:elf:/usr/local/bin/app",
+    "functions": [
+      { "fid": "func:app::main", "va": "0x401000", "size": 128 },
+      { "fid": "func:libssl::EVP_EncryptInit_ex", "external": true }
+    ],
+    "edges": [
+      { "caller": "func:app::main", "callee": "func:app::init_config", "type": "direct" },
+      { "caller": "func:app::main", "callee": "func:libssl::EVP_EncryptInit_ex", "type": "import" }
+    ]
+  }
+  ```
+* Edge confidence:
+
+  * `type: direct|import|indirect|heuristic`
+  * Used later by the lattice.
+
+#### 4.1.4. Entry point inference
+
+* Sources:
+
+  * ELF `PT_INTERP`, PE `AddressOfEntryPoint`.
+  * Application-level hints (known frameworks, service main methods).
+  * Container metadata (CMD, ENTRYPOINT).
+* Output:
+
+  ```json
+  {
+    "binary_id": "bin:elf:/usr/local/bin/app",
+    "entrypoints": ["func:app::main"]
+  }
+  ```
+
+> Note: For JS/Python/PHP, equivalent analyzers will later define module entrypoints (`index.js`, `wsgi_app`, `public/index.php`).
+
+---
+
+### 4.2. Symbol-to-package and CVE-to-symbol mapping
+
+#### 4.2.1. Symbol→package mapping
+
+Module: `StellaOps.Reachability.Mapping.SymbolToPurl`
+
+* Inputs:
+
+  * Binary symbolization outputs.
+  * Local mapping DB in Concelier (vendor symbol maps, debug info, name patterns).
+  * File path + container context (`/usr/lib/...`, `/site-packages/...`).
+* Output:
+
+  ```json
+  {
+    "symbol": "libssl::EVP_EncryptInit_ex",
+    "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
+    "confidence": 0.93,
+    "method": "vendor_map+path_heuristic"
+  }
+  ```
+* Unresolved / ambiguous symbols:
+
+  * Stored as `unknowns` of type `identity_gap`.
+
+#### 4.2.2. CVE→symbol mapping
+
+Responsibility: Concelier + its advisory ingestion.
+
+* For each vulnerability:
+
+  ```json
+  {
+    "cve_id": "CVE-2025-12345",
+    "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
+    "affected_symbols": [
+      "libssl::EVP_EncryptInit_ex",
+      "libssl::EVP_EncryptUpdate"
+    ],
+    "source": "vendor_vex",
+    "confidence": 1.0
+  }
+  ```
+* Reachability Engine consumes this mapping read-only.
+
+---
+
+### 4.3. Reachability Engine
+
+Module: `StellaOps.Reachability.Engine` (in Scanner.WebService)
+
+#### 4.3.1. Core data model
+
+Per `(artifact, cve, purl)`:
+
+```json
+{
+  "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
+  "cve_id": "CVE-2025-12345",
+  "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
+  "symbols": [
+    {
+      "symbol": "libssl::EVP_EncryptInit_ex",
+      "static_presence": "present|absent|unknown",
+      "static_reachability": "reachable|unreachable|unknown",
+      "runtime_hits": 3,
+      "runtime_reachability": "observed|not_observed|unknown"
+    }
+  ],
+  "reachability_state": "confirmed_reachable|statically_reachable|present_not_reachable|not_present|unknown",
+  "confidence": {
+    "p": 0.87,
+    "evidence": ["static_callgraph", "runtime_trace", "symbol_map"],
+    "unknowns_pressure": 0.12
+  }
+}
+```
+
+#### 4.3.2. Lattice / state machine
+
+Define a deterministic lattice over states:
+
+* `NOT_PRESENT`
+* `PRESENT_NOT_REACHABLE`
+* `STATICALLY_REACHABLE`
+* `RUNTIME_OBSERVED`
+
+And “unknown” flags overlayed when evidence is missing.
+
+Merging rules (simplified):
+
+* If `NOT_PRESENT` and no conflicting evidence → `NOT_PRESENT`.
+* If at least one affected symbol is on a static path from any entrypoint → `STATICALLY_REACHABLE`.
+* If symbol observed at runtime → `RUNTIME_OBSERVED` (top state).
+* If symbol present in binary but not on any static path → `PRESENT_NOT_REACHABLE`, unless unknown edges exist near it (then downgrade with lower confidence).
+* Unknowns Registry entries near affected symbols increase `unknowns_pressure` and may push from `NOT_PRESENT` to `UNKNOWN`.
+
+Implementation: pure functional merge functions inside Scanner.WebService:
+
+```csharp
+ReachabilityState Merge(ReachabilityState a, ReachabilityState b);
+ReachabilityState FromEvidence(StaticEvidence s, RuntimeEvidence r, UnknownsPressure u);
+```
+
+#### 4.3.3. Deterministic inputs
+
+To guarantee replay:
+
+* Build **Reachability Plan Manifest** per run:
+
+  ```json
+  {
+    "plan_id": "reach:sha256:…",
+    "scanner_version": "1.4.0",
+    "analyzers": {
+      "binary_discovery": "1.0.0",
+      "binary_symbolizer": "1.1.0",
+      "binary_callgraph": "1.2.0"
+    },
+    "inputs": {
+      "image_digest": "sha256:…",
+      "runtime_trace_files": ["signals:run:2025-11-18T12:00:00Z"],
+      "config": {
+        "assume_indirect_calls": "conservative",
+        "max_call_depth": 10
+      }
+    }
+  }
+  ```
+* DSSE-sign the plan + result.
+
+---
+
+### 4.4. Storage and graph overlay
+
+#### 4.4.1. Reachability store
+
+Backend: re-use existing Scanner/Cartographer storage stack (e.g., Postgres or SQLite + blob store).
+
+Tables/collections:
+
+* `reachability_runs`
+
+  * `plan_id`, `image_ref`, `created_at`, `scanner_version`.
+
+* `reachability_results`
+
+  * `plan_id`, `cve_id`, `purl`, `state`, `confidence_p`, `unknowns_pressure`, `payload_json`.
+
+* Indexes on `(image_ref, cve_id)`, `(image_ref, purl)`.
+
+#### 4.4.2. Cartographer overlay
+
+Edges:
+
+* `IMAGE` → `BINARY` → `FUNCTION` → `PACKAGE` → `CVE`
+* Extra property on `IMAGE -[AFFECTED_BY]-> CVE`:
+
+  * `reachability_state`
+  * `reachability_plan_id`
+
+Enables queries:
+
+* “Show me all CVEs with `STATICALLY_REACHABLE` in this namespace.”
+* “Show me binaries with high density of reachable crypto CVEs.”
+
+---
+
+### 4.5. APIs (idempotent, additive)
+
+#### 4.5.1. Trigger reachability
+
+`POST /reachability/runs`
+
+Request:
+
+```json
+{
+  "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
+  "config": {
+    "include_languages": ["binary"],
+    "max_call_depth": 10,
+    "assume_indirect_calls": "conservative"
+  }
+}
+```
+
+Response:
+
+```json
+{ "plan_id": "reach:sha256:…" }
+```
+
+* Idempotent key: `(image_ref, config_hash)`. Subsequent calls return same `plan_id`.
+
+#### 4.5.2. Fetch results
+
+`GET /reachability/runs/:plan_id`
+
+```json
+{
+  "plan": { /* reachability plan manifest */ },
+  "results": [
+    {
+      "cve_id": "CVE-2025-12345",
+      "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
+      "reachability_state": "static_reachable",
+      "confidence": { "p": 0.84, "unknowns_pressure": 0.1 }
+    }
+  ]
+}
+```
+
+#### 4.5.3. Per-CVE view for VEXer/Excitor
+
+`GET /reachability/by-cve?artifact=…&cve_id=…`
+
+* Returns filtered result for downstream VEX creation.
+
+All APIs are **read-only** except for the side effect of storing/caching runs.
+
+---
+
+## 5. Interaction with other Stella Ops modules
+
+### 5.1. Concelier
+
+* Provides:
+
+  * CVE→purl→symbol mapping.
+  * Vendor VEX statements indicating affected functions.
+* Consumes:
+
+  * Nothing from reachability directly; Scanner/WebService passes reachability summary to VEXer/Excitor which merges with vendor statements.
+
+### 5.2. VEXer / Excitor
+
+* Input:
+
+  * For each `(artifact, cve)`:
+
+    * Reachability state.
+    * Confidence.
+* Logic:
+
+  * Translate states to VEX statements:
+
+    * `NOT_PRESENT` → `not_affected`
+    * `PRESENT_NOT_REACHABLE` → `not_affected` (with justification “code not reachable according to analysis”)
+    * `STATICALLY_REACHABLE` → `affected`
+    * `RUNTIME_OBSERVED` → `affected` (higher severity)
+  * Attach determinism proof:
+
+    * Plan ID + DSSE of reachability run.
+
+### 5.3. Signals
+
+* Provides:
+
+  * Function hit events: `(binary_id, function_id, timestamp)` aggregated per image.
+* Reachability Engine:
+
+  * Marks `runtime_hits` and state `RUNTIME_OBSERVED` for symbols with hits.
+* Unknowns:
+
+  * If runtime sees hits in functions with no static edges to entrypoints (or unmapped symbols), these produce Unknowns and increase `unknowns_pressure`.
+
+### 5.4. Unknowns Registry
+
+* From reachability pipeline, create Unknowns when:
+
+  * Symbol→package mapping is ambiguous.
+  * CVE→symbol mapping exists, but symbol cannot be found in binaries.
+  * Call graph has indirect calls that cannot be resolved.
+* The “Unknowns Pressure” term is fed into:
+
+  * Reachability confidence.
+  * Global risk scoring (Trust Algebra Studio).
+
+---
+
+## 6. Implementation phases and engineering plan
+
+### Phase 0 – Scaffolding & manifests (1 sprint)
+
+* Create:
+
+  * `StellaOps.Reachability.Engine` skeleton.
+  * Reachability Plan Manifest schema.
+  * Reachability Run + Result persistence.
+* Add `/reachability/runs` and `/reachability/runs/:plan_id` endpoints, returning mock data.
+* Wire DSSE attestation generation for reachability results (even if payload is empty).
+
+### Phase 1 – Binary discovery + symbolization (1–2 sprints)
+
+* Implement `Binary.Discovery` and `Binary.Symbolizer`.
+* Feed symbol tables into Reachability Engine as “presence-only evidence”:
+
+  * States: `NOT_PRESENT` vs `PRESENT_NOT_REACHABLE` vs `UNKNOWN`.
+* Integrate with Concelier’s CVE→purl mapping (no symbol-level yet):
+
+  * For CVEs affecting a package present in the image, mark as `PRESENT_NOT_REACHABLE`.
+* Emit Unknowns for unresolved binary roles and ambiguous package mapping.
+
+Deliverable: package-level reachability with deterministic manifests.
+
+### Phase 2 – Binary call graphs & entrypoints (2–3 sprints)
+
+* Implement `Binary.CallGraph.Native`:
+
+  * CFG + direct call edges.
+* Implement entrypoint inference from binary + container ENTRYPOINT/CMD.
+* Add static reachability algorithm:
+
+  * DFS/BFS from entrypoints through call graph.
+  * Mark affected symbols as reachable if found on paths.
+* Extend Concelier to ingest symbol-aware vulnerability metadata (for pilots; can be partial).
+
+Deliverable: function-level static reachability for native binaries where symbol maps exist.
+
+### Phase 3 – Runtime integration (2 sprints, may be in parallel workstream)
+
+* Integrate Signals runtime evidence:
+
+  * Define schema for function hit events.
+  * Add ingestion path into Reachability Engine.
+* Update lattice:
+
+  * Promote symbols to `RUNTIME_OBSERVED` when hits exist.
+* Extend DSSE attestation to reference runtime evidence URIs (hashes of trace inputs).
+
+Deliverable: static + runtime-confirmed reachability.
+
+### Phase 4 – Unknowns & pressure (1 sprint)
+
+* Wire Unknowns Registry:
+
+  * Emit unknowns from Symbolizer and CallGraph (identity gaps, missing edges).
+  * Compute `unknowns_pressure` per `(artifact, cve)` as density of unknowns near affected symbols.
+* Adjust confidence calculation in Reachability Engine.
+* Expose unknowns metrics in API and Cartographer.
+
+Deliverable: explicit modelling of uncertainty, feeding into trust/lattice.
+
+### Phase 5 – Language extensions (JS/Python/PHP) (ongoing)
+
+* Implement per-language call-graph analyzers creating the same IR as binary.
+* Extend symbol→purl mapping for these ecosystems (npm, PyPI, Packagist).
+* Update reachability solver to include multi-language edges (e.g., Python calling into native modules).
+
+---
+
+## 7. Minimal contracts for agents
+
+To hand off to agents, you can codify:
+
+1. **IR schemas**
+
+   * Call graph IR.
+   * Reachability Result JSON.
+   * Reachability Plan Manifest.
+2. **API contracts**
+
+   * `POST /reachability/runs`
+   * `GET /reachability/runs/:plan_id`
+   * `GET /reachability/by-cve`
+3. **Module boundaries**
+
+   * `Scanner.Analyzers.Binary.*` produce IR only; NO network calls.
+   * `Reachability.Engine` is the only place where lattice logic lives.
+   * `Concelier` is read-only for reachability; no custom logic there.
+4. **Determinism practices**
+
+   * All algorithmic randomness is banned; where unavoidable, seed with values derived from plan_id.
+   * All external inputs must be listed in the Plan Manifest.
+
+If you like, next step I can draft:
+
+* Concrete C# record types for the IRs.
+* A small pseudo-code implementation of the lattice functions and static reachability DFS.
+* A proposed directory layout under `src/StellaOps.Scanner` and `src/StellaOps.Cartographer`.