12 KiB
Function-Level Evidence Readiness (Nov 2025 Advisory)
Last updated: 2025-11-12. Owner: Business Analysis Guild.
This memo captures the outstanding work required to make Stella Ops scanners emit stable, function-level evidence that matches the November 2025 advisory. It does not implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.
1. Goal & Scope
Goal. Anchor every vulnerability finding to an immutable {artifact_digest, code_id} tuple plus optional symbol hints so replayers can prove reachability against stripped binaries.
Scope. Scanner analyzers, runtime ingestion, Signals scoring, Replay manifests, Policy/VEX emission, CLI/UI explainers, and documentation/runbooks needed to operationalise the advisory.
Out of scope: implementing disassemblers or symbol servers; those will be handled inside the module-specific backlog tasks referenced below.
2. Advisory Requirements vs. System Gaps
| Requirement | Current gap | Task references | Notes |
|---|---|---|---|
Immutable code identity (code_id = {format, build_id, start, length} + optional code_block_hash) |
Callgraph nodes are opaque strings with no address metadata. | Sprint 401 GRAPH-CAS-401-001, GAP-SCAN-001, GAP-SYM-007 |
code_id should live alongside existing SymbolID helpers so analyzers can emit it without duplicating logic. |
| Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | GAP-SYM-007 |
Require deterministic casing + symbol.source ∈ {DWARF,PDB,SYM,none}. |
| Runtime facts mapped to code anchors | /signals/runtime-facts now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. |
Sprint 400 ZASTAVA-REACH-201-001, Sprint 401 SIGNALS-RUNTIME-401-002, GAP-ZAS-002, GAP-SIG-003 |
Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. |
| Replay/DSSE coverage | Replay manifests don’t enforce hash/CAS registration for graphs/traces. | Sprint 400 REPLAY-REACH-201-005, Sprint 401 REPLAY-401-004, GAP-REP-004 |
Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
| Policy/VEX/UI explainability | Policy uses coarse reachability:* tags; UI/CLI cannot show call paths or evidence hashes. |
Sprint 401 POLICY-VEX-401-006, UI-CLI-401-007, GAP-POL-005, GAP-VEX-006, EXPERIENCE-GAP-401-012 |
Evidence blocks must cite code_id, graph hash, runtime CAS URI, analyzer version. |
| Operator documentation & samples | No guide shows how to replay {build_id,start,len} across CLI/API. |
Sprint 401 QA-DOCS-401-008, GAP-DOC-008 |
Produce samples under samples/reachability/** plus CLI walkthroughs. |
| Build-id propagation | Build-id not consistently captured or threaded into SymbolID/code_id; SBOM/runtime joins are brittle. |
Sprint 401 SCANNER-BUILDID-401-035 |
Capture .note.gnu.build-id, include in code identity, expose in SBOM exports and runtime events. |
| Load-time constructors as roots | Graph roots omit .preinit_array/.init_array/_init, missing load-time edges. |
Sprint 401 SCANNER-INITROOT-401-036 |
Add synthetic roots with phase=load; include DT_NEEDED deps’ constructors. |
| PURL-resolved edges | Call edges do not carry purl or symbol_digest, slowing SBOM joins. |
Sprint 401 GRAPH-PURL-401-034 |
Annotate edges per docs/reachability/purl-resolved-edges.md; keep deterministic graph hash. |
| Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint 0400 SIGNALS-UNKNOWN-201-008 |
Emit Unknowns records (see docs/signals/unknowns-registry.md) and feed unknowns_pressure into scoring. |
| Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint 401 QA-PORACLE-401-037 |
Add paired vuln/fixed fixtures and expectations; wire to CI using docs/reachability/patch-oracles.md. |
3. Workstreams & Expectations
3.1 Scanner Symbolization (GAP-SCAN-001 / GAP-SYM-007)
- Define
SymbolIDhelpers that glue together{artifact_digest, file, optionalsection,addr,length,code_block_hash}. - Update analyzer contracts so every analyzer returns both
symbol_idandcode_id, with demangled names stored under the newsymbolblock. - Persist the data into
richgraph-v1payloads and attach CAS URIs viaStellaOps.Scanner.Reachability. - Deliver fixtures in
tests/reachability/StellaOps.ScannerSignals.IntegrationTeststhat prove determinism (same hash when analyzer flags reorder). - Helper status (2025-12-02):
SymbolId.ForBinaryAddressed+CodeId.ForBinarySegmentnow encode{file_hash, section, addr, name, linkage, length, code_block_hash}with normalized hex addresses. Analyzers should start emitting these tuples instead of ad-hoc hashes. - Binary lifter (2025-12-03):
BinaryReachabilityLifteremits richgraph nodes for ELF/PE/Mach-O using file SHA-256 + section/address tuples, attachescode_idanchors, and turns imports/load commands intoimportedges. - Schema wiring (2025-12-12):
reachability-union+richgraph-v1serializers now emitsymbol {mangled,demangled,source,confidence}and optionalcode_block_hashfor stripped blocks; confidence is clamped to[0,1]andsourcenormalized to uppercase (DWARF|PDB|SYM|NONE).
3.2 Runtime + Signals (GAP-ZAS-002 / GAP-SIG-003)
- Extend Zastava Observer NDJSON schema to emit:
symbol_id,code_id,hit_count,observed_at,loader_base,process.buildId. - Implement
/signals/runtime-factsingestion (gzip + NDJSON) with CAS-backed storage undercas://reachability/runtime/{sha256}. - Update
ReachabilityScoringServiceto lattice states and include runtime evidence references plus CAS URIs inReachabilityFactDocument.Metadata.
3.3 Replay & Evidence (GAP-REP-004)
- Enforce CAS registration + BLAKE3 hashing before manifest writes (graphs and traces).
- Teach
ReachabilityReplayWriterto require analyzer name/version, graph kind,code_idcoverage summary. - Update
docs/replay/DETERMINISTIC_REPLAY.mdonce schema v2 is finalized.
3.4 Policy, VEX, CLI/UI (GAP-POL-005 / GAP-VEX-006)
- Policy Engine: ingest new reachability facts, expose
reachability.state,max_path_conf, andevidence.graph_hashvia SPL + API. - CLI/UI: add
stella graph explainand explain drawer showing call path (SymbolIDlist), code anchors, runtime hits, DSSE references. - Notify templates: include short evidence summary (first hop + truncated
code_id).
3.5 Documentation & Samples (GAP-DOC-008)
- Publish schema diffs in
docs/data/evidence-schema.md(new file) covering SBOM evidence nodes, runtime NDJSON, and API responses. - Write CLI/API walkthroughs in
docs/09_API_CLI_REFERENCE.mdanddocs/api/policy.mdshowing how to request reachability evidence and verify DSSE chains. - Produce OpenVEX + replay samples under
samples/reachability/showingfacts.type = "stella.reachability"withgraph_hashandcode_idarrays.
3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
- Stand up
Scanner.Symbols.Native+Scanner.CallGraph.Nativelibraries that:- parse ELF (DWARF +
.symtab/.dynsym), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving; - emit deterministic
FuncNode+CallEdgerecords with demangled names, language hints, and{confidence,evidence}arrays; and - attach analyzer + toolchain identifiers consumed by
richgraph-v1.
- parse ELF (DWARF +
- Introduce
Reachability.Storecollections in Mongo:func_nodes– keyed byfunc:<format>:<sha256>:<va>with{binDigest,name,addr,size,lang,confidence,sym}.call_edges–{from,to,kind,confidence,evidence[]}linking internal/external nodes.cve_func_hits–{cve,purl,func_id,match_kind,confidence,source}for advisory alignment.
- Build indexes (
binDigest+name,from→to,cve+func_id) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
4. Schema & API Touchpoints
Authoritative field list lives in docs/reachability/evidence-schema.md; use it for DTOs and CAS writers.
The next implementation pass must cover the following documents/files (create them if missing):
docs/data/evidence-schema.md– authoritative schema for{code_id, symbol, tool}blocks.docs/runbooks/reachability-runtime.md– operator steps for staging runtime ingestion bundles, retention, and troubleshooting.docs/runbooks/replay_ops.md– add section detailing replay verification using the new graph/runtime CAS entries.
API contracts to amend:
POST /signals/callgraphsresponse includesgraphHash(sha256) for the normalized callgraph; richgraph-v1 uses BLAKE3 for graph CAS hashes.POST /signals/runtime-factsrequest body schema (NDJSON) withsymbol_id,code_id,hit_count,loader_base.GET /policy/findingspayload must surfacereachability.evidence[]objects.
4.1 Signals runtime ingestion snapshot (Nov 2025)
/signals/runtime-facts(JSON) and/signals/runtime-facts/ndjson(streaming, optional gzip) accept the following event fields:symbolId(required),codeId,loaderBase,hitCount,processId,processName,socketAddress,containerId,evidenceUri,metadata.- Subject context (
scanId/imageDigest/component/version) pluscallgraphIdis supplied either in the JSON body or as query params for the NDJSON endpoint.
- Signals dedupes events, merges metadata, and persists the aggregated
RuntimeFactsontoReachabilityFactDocument. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice. - Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
4.2 Reachability store layout (SIG-STORE-401-016)
All producers must persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
// func_nodes
{
"_id": "func:ELF:sha256:4012a0",
"binDigest": "sha256:deadbeef...",
"name": "ssl3_read_bytes",
"addr": "0x4012a0",
"size": 312,
"lang": "c",
"confidence": 0.92,
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
"sym": "present"
}
// call_edges
{
"from": "func:ELF:sha256:4012a0",
"to": "func:ELF:sha256:40f0ff",
"kind": "static",
"confidence": 0.88,
"evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
}
// cve_func_hits
{
"cve": "CVE-2023-XXXX",
"purl": "pkg:generic/openssl@1.1.1u",
"func_id": "func:ELF:sha256:4012a0",
"match": "name+version",
"confidence": 0.77,
"source": "concelier:openssl-advisory"
}
Writers must:
- Upsert
func_nodesbefore emitting edges/hits to ensure_idlookups remain stable. - Serialize evidence arrays in deterministic order (
reloc,bb-target,import, …) and normalise hex casing. - Attach analyzer fingerprints (
scanner.native@sha256:...) so Replay/Policy can enforce provenance.
5. Test & Fixture Expectations
- Reachbench fixtures: update golden cases with
code_id+symbolmetadata. Ensure both reachable/unreachable variants still pass once graphs contain the richer IDs. - Signals unit tests: add deterministic tests for lattice scoring + runtime evidence linking (
tests/reachability/StellaOps.Signals.Reachability.Tests). - Replay tests: extend
tests/reachability/StellaOps.Replay.Core.Teststo assert manifest v2 serialization and hash enforcement.
All fixtures must remain deterministic: sort nodes/edges, normalise casing, and freeze timestamps in test data.
6. Handoff Checklist for the Next Agent
- Confirm sprint entries (
SPRINT_400andSPRINT_401) remain in sync when movingGAP-*tasks to DOING/DONE. - Start with
GAP-SYM-007(schema/helper implementation) because downstream work depends on the newcode_idpayload shape. - Once schema PR merges, coordinate with Signals + Policy guilds to align on CAS naming and DSSE predicates before wiring APIs.
- Update the docs listed in §4 as each component lands; keep this file current with statuses and links to PRs/ADRs.
- Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.
Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.