feat(scanner): Implement Deno analyzer and associated tests

- Added Deno analyzer with comprehensive metadata and evidence structure. - Created a detailed implementation plan for Sprint 130 focusing on Deno analyzer. - Introduced AdvisoryAiGuardrailOptions for managing guardrail configurations. - Developed GuardrailPhraseLoader for loading blocked phrases from JSON files. - Implemented tests for AdvisoryGuardrailOptions binding and phrase loading. - Enhanced telemetry for Advisory AI with metrics tracking. - Added VexObservationProjectionService for querying VEX observations. - Created extensive tests for VexObservationProjectionService functionality. - Introduced Ruby language analyzer with tests for simple and complex workspaces. - Added Ruby application fixtures for testing purposes.
2025-11-12 10:01:54 +02:00
parent 0e8655cbb1
commit babb81af52
75 changed files with 3346 additions and 187 deletions
--- a/docs/reachability/DELIVERY_GUIDE.md
+++ b/docs/reachability/DELIVERY_GUIDE.md
@@ -36,6 +36,8 @@ This guide translates the deterministic reachability blueprint into concrete wor

 | Stream | Owner Guild(s) | Key deliverables |
 |--------|----------------|------------------|
+| **Native symbols & callgraphs** | Scanner Worker · Symbols Guild | Ship `Scanner.Symbols.Native` + `Scanner.CallGraph.Native`, integrate Symbol Manifest v1, demangle Itanium/MSVC names, emit `FuncNode`/`CallEdge` CAS bundles (task `SCANNER-NATIVE-401-015`). |
+| **Reachability store** | Signals · BE-Base Platform | Provision shared Mongo collections (`func_nodes`, `call_edges`, `cve_func_hits`), indexes, and repositories plus REST hooks for reuse (task `SIG-STORE-401-016`). |
 | **Language lifters** | Scanner Worker | CLI/hosted lifters for DotNet, Go, Node/Deno, JVM, Rust, Swift, Binary, Shell with CAS uploads and richgraph output |
 | **Signals ingestion & scoring** | Signals | `/callgraphs`, `/runtime-facts` (JSON + NDJSON/gzip), `/graphs/{id}`, `/reachability/recompute` GA; CAS-backed storage, runtime dedupe, BFS+predicates scoring |
 | **Runtime capture** | Zastava + Runtime Guild | EntryTrace/eBPF samplers, NDJSON batches (symbol IDs + timestamps + counts) |
@@ -104,7 +106,8 @@ Each sprint is two weeks; refer to `docs/implplan/SPRINT_401_reachability_eviden

 - Place developer-facing updates here (`docs/reachability`).
 - [Function-level evidence guide](function-level-evidence.md) captures the Nov 2025 advisory scope, task references, and schema expectations; keep it in lockstep with sprint status.
- Operator runbooks (`docs/runbooks/reachability-runtime.md`) – TODO reference to be added when runtime pipeline lands.
+- [Reachability runtime runbook](../runbooks/reachability-runtime.md) now documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
+- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
 - Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.

 ---
--- a/docs/reachability/function-level-evidence.md
+++ b/docs/reachability/function-level-evidence.md
@@ -1,6 +1,6 @@
 # Function-Level Evidence Readiness (Nov 2025 Advisory)

-_Last updated: 2025-11-09. Owner: Business Analysis Guild._
+_Last updated: 2025-11-12. Owner: Business Analysis Guild._

 This memo captures the outstanding work required to make Stella Ops scanners emit stable, function-level evidence that matches the November 2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.

@@ -62,6 +62,18 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
 * Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains.
 * Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays.

+### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
+
+* Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that:
+  * parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
+  * emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and
+  * attach analyzer + toolchain identifiers consumed by `richgraph-v1`.
+* Introduce `Reachability.Store` collections in Mongo:
+  * `func_nodes` – keyed by `func:<format>:<sha256>:<va>` with `{binDigest,name,addr,size,lang,confidence,sym}`.
+  * `call_edges` – `{from,to,kind,confidence,evidence[]}` linking internal/external nodes.
+  * `cve_func_hits` – `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment.
+* Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
+
 ---

 ## 4. Schema & API Touchpoints
@@ -86,6 +98,50 @@ API contracts to amend:
 - Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
 - Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.

+### 4.2 Reachability store layout (SIG-STORE-401-016)
+
+All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
+
+```json
+// func_nodes
+{
+  "_id": "func:ELF:sha256:4012a0",
+  "binDigest": "sha256:deadbeef...",
+  "name": "ssl3_read_bytes",
+  "addr": "0x4012a0",
+  "size": 312,
+  "lang": "c",
+  "confidence": 0.92,
+  "symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
+  "sym": "present"
+}
+
+// call_edges
+{
+  "from": "func:ELF:sha256:4012a0",
+  "to": "func:ELF:sha256:40f0ff",
+  "kind": "static",
+  "confidence": 0.88,
+  "evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
+}
+
+// cve_func_hits
+{
+  "cve": "CVE-2023-XXXX",
+  "purl": "pkg:generic/openssl@1.1.1u",
+  "func_id": "func:ELF:sha256:4012a0",
+  "match": "name+version",
+  "confidence": 0.77,
+  "source": "concelier:openssl-advisory"
+}
+```
+
+Writers **must**:
+
+1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable.
+2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing.
+3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance.
+
 ---

 ## 5. Test & Fixture Expectations