feat(scanner): Implement Deno analyzer and associated tests

- Added Deno analyzer with comprehensive metadata and evidence structure. - Created a detailed implementation plan for Sprint 130 focusing on Deno analyzer. - Introduced AdvisoryAiGuardrailOptions for managing guardrail configurations. - Developed GuardrailPhraseLoader for loading blocked phrases from JSON files. - Implemented tests for AdvisoryGuardrailOptions binding and phrase loading. - Enhanced telemetry for Advisory AI with metrics tracking. - Added VexObservationProjectionService for querying VEX observations. - Created extensive tests for VexObservationProjectionService functionality. - Introduced Ruby language analyzer with tests for simple and complex workspaces. - Added Ruby application fixtures for testing purposes.
2025-11-12 10:01:54 +02:00
parent 0e8655cbb1
commit babb81af52
75 changed files with 3346 additions and 187 deletions
--- a/docs/reachability/function-level-evidence.md
+++ b/docs/reachability/function-level-evidence.md
@@ -1,6 +1,6 @@
 # Function-Level Evidence Readiness (Nov 2025 Advisory)

-_Last updated: 2025-11-09. Owner: Business Analysis Guild._
+_Last updated: 2025-11-12. Owner: Business Analysis Guild._

 This memo captures the outstanding work required to make Stella Ops scanners emit stable, function-level evidence that matches the November 2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.

@@ -62,6 +62,18 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
 * Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains.
 * Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays.

+### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
+
+* Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that:
+  * parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
+  * emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and
+  * attach analyzer + toolchain identifiers consumed by `richgraph-v1`.
+* Introduce `Reachability.Store` collections in Mongo:
+  * `func_nodes` – keyed by `func:<format>:<sha256>:<va>` with `{binDigest,name,addr,size,lang,confidence,sym}`.
+  * `call_edges` – `{from,to,kind,confidence,evidence[]}` linking internal/external nodes.
+  * `cve_func_hits` – `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment.
+* Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
+
 ---

 ## 4. Schema & API Touchpoints
@@ -86,6 +98,50 @@ API contracts to amend:
 - Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
 - Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.

+### 4.2 Reachability store layout (SIG-STORE-401-016)
+
+All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
+
+```json
+// func_nodes
+{
+  "_id": "func:ELF:sha256:4012a0",
+  "binDigest": "sha256:deadbeef...",
+  "name": "ssl3_read_bytes",
+  "addr": "0x4012a0",
+  "size": 312,
+  "lang": "c",
+  "confidence": 0.92,
+  "symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
+  "sym": "present"
+}
+
+// call_edges
+{
+  "from": "func:ELF:sha256:4012a0",
+  "to": "func:ELF:sha256:40f0ff",
+  "kind": "static",
+  "confidence": 0.88,
+  "evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
+}
+
+// cve_func_hits
+{
+  "cve": "CVE-2023-XXXX",
+  "purl": "pkg:generic/openssl@1.1.1u",
+  "func_id": "func:ELF:sha256:4012a0",
+  "match": "name+version",
+  "confidence": 0.77,
+  "source": "concelier:openssl-advisory"
+}
+```
+
+Writers **must**:
+
+1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable.
+2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing.
+3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance.
+
 ---

 ## 5. Test & Fixture Expectations