feat(ruby): Implement RubyManifestParser for parsing gem groups and dependencies

feat(ruby): Add RubyVendorArtifactCollector to collect vendor artifacts test(deno): Add golden tests for Deno analyzer with various fixtures test(deno): Create Deno module and package files for testing test(deno): Implement Deno lock and import map for dependency management test(deno): Add FFI and worker scripts for Deno testing feat(ruby): Set up Ruby workspace with Gemfile and dependencies feat(ruby): Add expected output for Ruby workspace tests feat(signals): Introduce CallgraphManifest model for signal processing
2025-11-10 09:27:03 +02:00
parent 69c59defdc
commit 56c687253f
87 changed files with 2462 additions and 542 deletions
--- a/docs/reachability/function-level-evidence.md
+++ b/docs/reachability/function-level-evidence.md
@@ -22,7 +22,7 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
 |-------------|-------------|-----------------|-------|
 | Immutable code identity (`code_id` = `{format, build_id, start, length}` + optional `code_block_hash`) | Callgraph nodes are opaque strings with no address metadata. | Sprint 401 `GRAPH-CAS-401-001`, `GAP-SCAN-001`, `GAP-SYM-007` | `code_id` should live alongside existing `SymbolID` helpers so analyzers can emit it without duplicating logic. |
 | Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | `GAP-SYM-007` | Require deterministic casing + `symbol.source ∈ {DWARF,PDB,SYM,none}`. |
-| Runtime facts mapped to code anchors | `/signals/runtime-facts` is a stub; Zastava streams only Build-IDs. | Sprint 400 `ZASTAVA-REACH-201-001`, Sprint 401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Need NDJSON schema documenting `code_id`, `symbol.sid`, `hit_count`, `loader_base`. |
+| Runtime facts mapped to code anchors | `/signals/runtime-facts` now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. | Sprint 400 `ZASTAVA-REACH-201-001`, Sprint 401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. |
 | Replay/DSSE coverage | Replay manifests don’t enforce hash/CAS registration for graphs/traces. | Sprint 400 `REPLAY-REACH-201-005`, Sprint 401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
 | Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint 401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
 | Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint 401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
@@ -78,6 +78,14 @@ API contracts to amend:
 - `POST /signals/runtime-facts` request body schema (NDJSON) with `symbol_id`, `code_id`, `hit_count`, `loader_base`.
 - `GET /policy/findings` payload must surface `reachability.evidence[]` objects.

+### 4.1 Signals runtime ingestion snapshot (Nov 2025)
+
+- `/signals/runtime-facts` (JSON) and `/signals/runtime-facts/ndjson` (streaming, optional gzip) accept the following event fields:
+  - `symbolId` (required), `codeId`, `loaderBase`, `hitCount`, `processId`, `processName`, `socketAddress`, `containerId`, `evidenceUri`, `metadata`.
+  - Subject context (`scanId` / `imageDigest` / `component` / `version`) plus `callgraphId` is supplied either in the JSON body or as query params for the NDJSON endpoint.
+- Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
+- Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
+
 ---

 ## 5. Test & Fixture Expectations
@@ -99,4 +107,3 @@ All fixtures must remain deterministic: sort nodes/edges, normalise casing, and
 5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.

 Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.
-