# Function-Level Evidence Readiness (Nov 2025 Advisory) _Last updated: 2025-11-12. Owner: Business Analysis Guild._ This memo captures the outstanding work required to make Stella Ops scanners emit stable, function-level evidence that matches the November 2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land. --- ## 1. Goal & Scope **Goal.** Anchor every vulnerability finding to an immutable `{artifact_digest, code_id}` tuple plus optional symbol hints so replayers can prove reachability against stripped binaries. **Scope.** Scanner analyzers, runtime ingestion, Signals scoring, Replay manifests, Policy/VEX emission, CLI/UI explainers, and documentation/runbooks needed to operationalise the advisory. Out of scope: implementing disassemblers or symbol servers; those will be handled inside the module-specific backlog tasks referenced below. --- ## 2. Advisory Requirements vs. System Gaps | Requirement | Current gap | Task references | Notes | |-------------|-------------|-----------------|-------| | Immutable code identity (`code_id` = `{format, build_id, start, length}` + optional `code_block_hash`) | Callgraph nodes are opaque strings with no address metadata. | Sprint 401 `GRAPH-CAS-401-001`, `GAP-SCAN-001`, `GAP-SYM-007` | `code_id` should live alongside existing `SymbolID` helpers so analyzers can emit it without duplicating logic. | | Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | `GAP-SYM-007` | Require deterministic casing + `symbol.source ∈ {DWARF,PDB,SYM,none}`. | | Runtime facts mapped to code anchors | `/signals/runtime-facts` now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. | Sprint 400 `ZASTAVA-REACH-201-001`, Sprint 401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. | | Replay/DSSE coverage | Replay manifests don’t enforce hash/CAS registration for graphs/traces. | Sprint 400 `REPLAY-REACH-201-005`, Sprint 401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. | | Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint 401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. | | Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint 401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. | | Build-id propagation | Build-id not consistently captured or threaded into `SymbolID`/`code_id`; SBOM/runtime joins are brittle. | Sprint 401 `SCANNER-BUILDID-401-035` | Capture `.note.gnu.build-id`, include in code identity, expose in SBOM exports and runtime events. | | Load-time constructors as roots | Graph roots omit `.preinit_array`/`.init_array`/`_init`, missing load-time edges. | Sprint 401 `SCANNER-INITROOT-401-036` | Add synthetic roots with `phase=load`; include `DT_NEEDED` deps’ constructors. | | PURL-resolved edges | Call edges do not carry `purl` or `symbol_digest`, slowing SBOM joins. | Sprint 401 `GRAPH-PURL-401-034` | Annotate edges per `docs/reachability/purl-resolved-edges.md`; keep deterministic graph hash. | | Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint 0400 `SIGNALS-UNKNOWN-201-008` | Emit Unknowns records (see `docs/signals/unknowns-registry.md`) and feed `unknowns_pressure` into scoring. | | Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint 401 `QA-PORACLE-401-037` | Add paired vuln/fixed fixtures and expectations; wire to CI using `docs/reachability/patch-oracles.md`. | --- ## 3. Workstreams & Expectations ### 3.1 Scanner Symbolization (GAP-SCAN-001 / GAP-SYM-007) * Define `SymbolID` helpers that glue together `{artifact_digest, file`, optional `section`, `addr`, `length`, `code_block_hash`}. * Update analyzer contracts so every analyzer returns both `symbol_id` and `code_id`, with demangled names stored under the new `symbol` block. * Persist the data into `richgraph-v1` payloads and attach CAS URIs via `StellaOps.Scanner.Reachability`. * Deliver fixtures in `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` that prove determinism (same hash when analyzer flags reorder). * **Helper status (2025-12-02):** `SymbolId.ForBinaryAddressed` + `CodeId.ForBinarySegment` now encode `{file_hash, section, addr, name, linkage, length, code_block_hash}` with normalized hex addresses. Analyzers should start emitting these tuples instead of ad-hoc hashes. * **Binary lifter (2025-12-03):** `BinaryReachabilityLifter` emits richgraph nodes for ELF/PE/Mach-O using file SHA-256 + section/address tuples, attaches `code_id` anchors, and turns imports/load commands into `import` edges. * **Schema wiring (2025-12-12):** `reachability-union` + `richgraph-v1` serializers now emit `symbol {mangled,demangled,source,confidence}` and optional `code_block_hash` for stripped blocks; confidence is clamped to `[0,1]` and `source` normalized to uppercase (`DWARF|PDB|SYM|NONE`). ### 3.2 Runtime + Signals (GAP-ZAS-002 / GAP-SIG-003) * Extend Zastava Observer NDJSON schema to emit: `symbol_id`, `code_id`, `hit_count`, `observed_at`, `loader_base`, `process.buildId`. * Implement `/signals/runtime-facts` ingestion (gzip + NDJSON) with CAS-backed storage under `cas://reachability/runtime/{sha256}`. * Update `ReachabilityScoringService` to lattice states and include runtime evidence references plus CAS URIs in `ReachabilityFactDocument.Metadata`. ### 3.3 Replay & Evidence (GAP-REP-004) * Enforce CAS registration + BLAKE3 hashing before manifest writes (graphs and traces). * Teach `ReachabilityReplayWriter` to require analyzer name/version, graph kind, `code_id` coverage summary. * Update `docs/replay/DETERMINISTIC_REPLAY.md` once schema v2 is finalized. ### 3.4 Policy, VEX, CLI/UI (GAP-POL-005 / GAP-VEX-006) * Policy Engine: ingest new reachability facts, expose `reachability.state`, `max_path_conf`, and `evidence.graph_hash` via SPL + API. * CLI/UI: add `stella graph explain` and explain drawer showing call path (`SymbolID` list), code anchors, runtime hits, DSSE references. * Notify templates: include short evidence summary (first hop + truncated `code_id`). ### 3.5 Documentation & Samples (GAP-DOC-008) * Publish schema diffs in `docs/data/evidence-schema.md` (new file) covering SBOM evidence nodes, runtime NDJSON, and API responses. * Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains. * Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays. ### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016) * Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that: * parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving; * emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and * attach analyzer + toolchain identifiers consumed by `richgraph-v1`. * Introduce `Reachability.Store` collections in Mongo: * `func_nodes` – keyed by `func:::` with `{binDigest,name,addr,size,lang,confidence,sym}`. * `call_edges` – `{from,to,kind,confidence,evidence[]}` linking internal/external nodes. * `cve_func_hits` – `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment. * Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries. --- ## 4. Schema & API Touchpoints Authoritative field list lives in `docs/reachability/evidence-schema.md`; use it for DTOs and CAS writers. The next implementation pass must cover the following documents/files (create them if missing): 1. `docs/data/evidence-schema.md` – authoritative schema for `{code_id, symbol, tool}` blocks. 2. `docs/runbooks/reachability-runtime.md` – operator steps for staging runtime ingestion bundles, retention, and troubleshooting. 3. `docs/runbooks/replay_ops.md` – add section detailing replay verification using the new graph/runtime CAS entries. API contracts to amend: - `POST /signals/callgraphs` response should include `graphHash` (BLAKE3) once `GRAPH-CAS-401-001` lands. - `POST /signals/runtime-facts` request body schema (NDJSON) with `symbol_id`, `code_id`, `hit_count`, `loader_base`. - `GET /policy/findings` payload must surface `reachability.evidence[]` objects. ### 4.1 Signals runtime ingestion snapshot (Nov 2025) - `/signals/runtime-facts` (JSON) and `/signals/runtime-facts/ndjson` (streaming, optional gzip) accept the following event fields: - `symbolId` (required), `codeId`, `loaderBase`, `hitCount`, `processId`, `processName`, `socketAddress`, `containerId`, `evidenceUri`, `metadata`. - Subject context (`scanId` / `imageDigest` / `component` / `version`) plus `callgraphId` is supplied either in the JSON body or as query params for the NDJSON endpoint. - Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice. - Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers. ### 4.2 Reachability store layout (SIG-STORE-401-016) All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options): ```json // func_nodes { "_id": "func:ELF:sha256:4012a0", "binDigest": "sha256:deadbeef...", "name": "ssl3_read_bytes", "addr": "0x4012a0", "size": 312, "lang": "c", "confidence": 0.92, "symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" }, "sym": "present" } // call_edges { "from": "func:ELF:sha256:4012a0", "to": "func:ELF:sha256:40f0ff", "kind": "static", "confidence": 0.88, "evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"] } // cve_func_hits { "cve": "CVE-2023-XXXX", "purl": "pkg:generic/openssl@1.1.1u", "func_id": "func:ELF:sha256:4012a0", "match": "name+version", "confidence": 0.77, "source": "concelier:openssl-advisory" } ``` Writers **must**: 1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable. 2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing. 3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance. --- ## 5. Test & Fixture Expectations - **Reachbench fixtures**: update golden cases with `code_id` + `symbol` metadata. Ensure both reachable/unreachable variants still pass once graphs contain the richer IDs. - **Signals unit tests**: add deterministic tests for lattice scoring + runtime evidence linking (`tests/reachability/StellaOps.Signals.Reachability.Tests`). - **Replay tests**: extend `tests/reachability/StellaOps.Replay.Core.Tests` to assert manifest v2 serialization and hash enforcement. All fixtures must remain deterministic: sort nodes/edges, normalise casing, and freeze timestamps in test data. --- ## 6. Handoff Checklist for the Next Agent 1. Confirm sprint entries (`SPRINT_400` and `SPRINT_401`) remain in sync when moving `GAP-*` tasks to DOING/DONE. 2. Start with `GAP-SYM-007` (schema/helper implementation) because downstream work depends on the new `code_id` payload shape. 3. Once schema PR merges, coordinate with Signals + Policy guilds to align on CAS naming and DSSE predicates before wiring APIs. 4. Update the docs listed in §4 as each component lands; keep this file current with statuses and links to PRs/ADRs. 5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs. Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.