Files
git.stella-ops.org/docs/reachability/function-level-evidence.md
master d519782a8f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
prep docs and service updates
2025-11-21 06:56:36 +00:00

12 KiB
Raw Blame History

Function-Level Evidence Readiness (Nov 2025 Advisory)

Last updated: 2025-11-12. Owner: Business Analysis Guild.

This memo captures the outstanding work required to make StellaOps scanners emit stable, function-level evidence that matches the November2025 advisory. It does not implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.


1. Goal & Scope

Goal. Anchor every vulnerability finding to an immutable {artifact_digest, code_id} tuple plus optional symbol hints so replayers can prove reachability against stripped binaries.

Scope. Scanner analyzers, runtime ingestion, Signals scoring, Replay manifests, Policy/VEX emission, CLI/UI explainers, and documentation/runbooks needed to operationalise the advisory.

Out of scope: implementing disassemblers or symbol servers; those will be handled inside the module-specific backlog tasks referenced below.


2. Advisory Requirements vs. System Gaps

Requirement Current gap Task references Notes
Immutable code identity (code_id = {format, build_id, start, length} + optional code_block_hash) Callgraph nodes are opaque strings with no address metadata. Sprint401 GRAPH-CAS-401-001, GAP-SCAN-001, GAP-SYM-007 code_id should live alongside existing SymbolID helpers so analyzers can emit it without duplicating logic.
Symbol hints (demangled name, source, confidence) No schema fields for symbol metadata; demangling is ad-hoc per analyzer. GAP-SYM-007 Require deterministic casing + symbol.source ∈ {DWARF,PDB,SYM,none}.
Runtime facts mapped to code anchors /signals/runtime-facts now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. Sprint400 ZASTAVA-REACH-201-001, Sprint401 SIGNALS-RUNTIME-401-002, GAP-ZAS-002, GAP-SIG-003 Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay.
Replay/DSSE coverage Replay manifests dont enforce hash/CAS registration for graphs/traces. Sprint400 REPLAY-REACH-201-005, Sprint401 REPLAY-401-004, GAP-REP-004 Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types.
Policy/VEX/UI explainability Policy uses coarse reachability:* tags; UI/CLI cannot show call paths or evidence hashes. Sprint401 POLICY-VEX-401-006, UI-CLI-401-007, GAP-POL-005, GAP-VEX-006, EXPERIENCE-GAP-401-012 Evidence blocks must cite code_id, graph hash, runtime CAS URI, analyzer version.
Operator documentation & samples No guide shows how to replay {build_id,start,len} across CLI/API. Sprint401 QA-DOCS-401-008, GAP-DOC-008 Produce samples under samples/reachability/** plus CLI walkthroughs.
Build-id propagation Build-id not consistently captured or threaded into SymbolID/code_id; SBOM/runtime joins are brittle. Sprint401 SCANNER-BUILDID-401-035 Capture .note.gnu.build-id, include in code identity, expose in SBOM exports and runtime events.
Load-time constructors as roots Graph roots omit .preinit_array/.init_array/_init, missing load-time edges. Sprint401 SCANNER-INITROOT-401-036 Add synthetic roots with phase=load; include DT_NEEDED deps constructors.
PURL-resolved edges Call edges do not carry purl or symbol_digest, slowing SBOM joins. Sprint401 GRAPH-PURL-401-034 Annotate edges per docs/reachability/purl-resolved-edges.md; keep deterministic graph hash.
Unknowns handling Unresolved symbols/edges disappear silently. Sprint0400 SIGNALS-UNKNOWN-201-008 Emit Unknowns records (see docs/signals/unknowns-registry.md) and feed unknowns_pressure into scoring.
Patch-oracle QA No guard-rail tests proving binary analyzers see real patch deltas. Sprint401 QA-PORACLE-401-037 Add paired vuln/fixed fixtures and expectations; wire to CI using docs/reachability/patch-oracles.md.

3. Workstreams & Expectations

3.1 Scanner Symbolization (GAP-SCAN-001 / GAP-SYM-007)

  • Define SymbolID helpers that glue together {artifact_digest, file, optional section, addr, length, code_block_hash}.
  • Update analyzer contracts so every analyzer returns both symbol_id and code_id, with demangled names stored under the new symbol block.
  • Persist the data into richgraph-v1 payloads and attach CAS URIs via StellaOps.Scanner.Reachability.
  • Deliver fixtures in tests/reachability/StellaOps.ScannerSignals.IntegrationTests that prove determinism (same hash when analyzer flags reorder).

3.2 Runtime + Signals (GAP-ZAS-002 / GAP-SIG-003)

  • Extend Zastava Observer NDJSON schema to emit: symbol_id, code_id, hit_count, observed_at, loader_base, process.buildId.
  • Implement /signals/runtime-facts ingestion (gzip + NDJSON) with CAS-backed storage under cas://reachability/runtime/{sha256}.
  • Update ReachabilityScoringService to lattice states and include runtime evidence references plus CAS URIs in ReachabilityFactDocument.Metadata.

3.3 Replay & Evidence (GAP-REP-004)

  • Enforce CAS registration + BLAKE3 hashing before manifest writes (graphs and traces).
  • Teach ReachabilityReplayWriter to require analyzer name/version, graph kind, code_id coverage summary.
  • Update docs/replay/DETERMINISTIC_REPLAY.md once schema v2 is finalized.

3.4 Policy, VEX, CLI/UI (GAP-POL-005 / GAP-VEX-006)

  • Policy Engine: ingest new reachability facts, expose reachability.state, max_path_conf, and evidence.graph_hash via SPL + API.
  • CLI/UI: add stella graph explain and explain drawer showing call path (SymbolID list), code anchors, runtime hits, DSSE references.
  • Notify templates: include short evidence summary (first hop + truncated code_id).

3.5 Documentation & Samples (GAP-DOC-008)

  • Publish schema diffs in docs/data/evidence-schema.md (new file) covering SBOM evidence nodes, runtime NDJSON, and API responses.
  • Write CLI/API walkthroughs in docs/09_API_CLI_REFERENCE.md and docs/api/policy.md showing how to request reachability evidence and verify DSSE chains.
  • Produce OpenVEX + replay samples under samples/reachability/ showing facts.type = "stella.reachability" with graph_hash and code_id arrays.

3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)

  • Stand up Scanner.Symbols.Native + Scanner.CallGraph.Native libraries that:
    • parse ELF (DWARF + .symtab/.dynsym), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
    • emit deterministic FuncNode + CallEdge records with demangled names, language hints, and {confidence,evidence} arrays; and
    • attach analyzer + toolchain identifiers consumed by richgraph-v1.
  • Introduce Reachability.Store collections in Mongo:
    • func_nodes keyed by func:<format>:<sha256>:<va> with {binDigest,name,addr,size,lang,confidence,sym}.
    • call_edges {from,to,kind,confidence,evidence[]} linking internal/external nodes.
    • cve_func_hits {cve,purl,func_id,match_kind,confidence,source} for advisory alignment.
  • Build indexes (binDigest+name, from→to, cve+func_id) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.

4. Schema & API Touchpoints

Authoritative field list lives in docs/reachability/evidence-schema.md; use it for DTOs and CAS writers.

The next implementation pass must cover the following documents/files (create them if missing):

  1. docs/data/evidence-schema.md authoritative schema for {code_id, symbol, tool} blocks.
  2. docs/runbooks/reachability-runtime.md operator steps for staging runtime ingestion bundles, retention, and troubleshooting.
  3. docs/runbooks/replay_ops.md add section detailing replay verification using the new graph/runtime CAS entries.

API contracts to amend:

  • POST /signals/callgraphs response should include graphHash (BLAKE3) once GRAPH-CAS-401-001 lands.
  • POST /signals/runtime-facts request body schema (NDJSON) with symbol_id, code_id, hit_count, loader_base.
  • GET /policy/findings payload must surface reachability.evidence[] objects.

4.1 Signals runtime ingestion snapshot (Nov 2025)

  • /signals/runtime-facts (JSON) and /signals/runtime-facts/ndjson (streaming, optional gzip) accept the following event fields:
    • symbolId (required), codeId, loaderBase, hitCount, processId, processName, socketAddress, containerId, evidenceUri, metadata.
    • Subject context (scanId / imageDigest / component / version) plus callgraphId is supplied either in the JSON body or as query params for the NDJSON endpoint.
  • Signals dedupes events, merges metadata, and persists the aggregated RuntimeFacts onto ReachabilityFactDocument. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
  • Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.

4.2 Reachability store layout (SIG-STORE-401-016)

All producers must persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):

// func_nodes
{
  "_id": "func:ELF:sha256:4012a0",
  "binDigest": "sha256:deadbeef...",
  "name": "ssl3_read_bytes",
  "addr": "0x4012a0",
  "size": 312,
  "lang": "c",
  "confidence": 0.92,
  "symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
  "sym": "present"
}

// call_edges
{
  "from": "func:ELF:sha256:4012a0",
  "to": "func:ELF:sha256:40f0ff",
  "kind": "static",
  "confidence": 0.88,
  "evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
}

// cve_func_hits
{
  "cve": "CVE-2023-XXXX",
  "purl": "pkg:generic/openssl@1.1.1u",
  "func_id": "func:ELF:sha256:4012a0",
  "match": "name+version",
  "confidence": 0.77,
  "source": "concelier:openssl-advisory"
}

Writers must:

  1. Upsert func_nodes before emitting edges/hits to ensure _id lookups remain stable.
  2. Serialize evidence arrays in deterministic order (reloc, bb-target, import, …) and normalise hex casing.
  3. Attach analyzer fingerprints (scanner.native@sha256:...) so Replay/Policy can enforce provenance.

5. Test & Fixture Expectations

  • Reachbench fixtures: update golden cases with code_id + symbol metadata. Ensure both reachable/unreachable variants still pass once graphs contain the richer IDs.
  • Signals unit tests: add deterministic tests for lattice scoring + runtime evidence linking (tests/reachability/StellaOps.Signals.Reachability.Tests).
  • Replay tests: extend tests/reachability/StellaOps.Replay.Core.Tests to assert manifest v2 serialization and hash enforcement.

All fixtures must remain deterministic: sort nodes/edges, normalise casing, and freeze timestamps in test data.


6. Handoff Checklist for the Next Agent

  1. Confirm sprint entries (SPRINT_400 and SPRINT_401) remain in sync when moving GAP-* tasks to DOING/DONE.
  2. Start with GAP-SYM-007 (schema/helper implementation) because downstream work depends on the new code_id payload shape.
  3. Once schema PR merges, coordinate with Signals + Policy guilds to align on CAS naming and DSSE predicates before wiring APIs.
  4. Update the docs listed in §4 as each component lands; keep this file current with statuses and links to PRs/ADRs.
  5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.

Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.