feat(ruby): Implement RubyManifestParser for parsing gem groups and dependencies
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

feat(ruby): Add RubyVendorArtifactCollector to collect vendor artifacts

test(deno): Add golden tests for Deno analyzer with various fixtures

test(deno): Create Deno module and package files for testing

test(deno): Implement Deno lock and import map for dependency management

test(deno): Add FFI and worker scripts for Deno testing

feat(ruby): Set up Ruby workspace with Gemfile and dependencies

feat(ruby): Add expected output for Ruby workspace tests

feat(signals): Introduce CallgraphManifest model for signal processing
This commit is contained in:
master
2025-11-10 09:27:03 +02:00
parent 69c59defdc
commit 56c687253f
87 changed files with 2462 additions and 542 deletions

View File

@@ -22,7 +22,7 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
|-------------|-------------|-----------------|-------|
| Immutable code identity (`code_id` = `{format, build_id, start, length}` + optional `code_block_hash`) | Callgraph nodes are opaque strings with no address metadata. | Sprint401 `GRAPH-CAS-401-001`, `GAP-SCAN-001`, `GAP-SYM-007` | `code_id` should live alongside existing `SymbolID` helpers so analyzers can emit it without duplicating logic. |
| Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | `GAP-SYM-007` | Require deterministic casing + `symbol.source ∈ {DWARF,PDB,SYM,none}`. |
| Runtime facts mapped to code anchors | `/signals/runtime-facts` is a stub; Zastava streams only Build-IDs. | Sprint400 `ZASTAVA-REACH-201-001`, Sprint401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Need NDJSON schema documenting `code_id`, `symbol.sid`, `hit_count`, `loader_base`. |
| Runtime facts mapped to code anchors | `/signals/runtime-facts` now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. | Sprint400 `ZASTAVA-REACH-201-001`, Sprint401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. |
| Replay/DSSE coverage | Replay manifests dont enforce hash/CAS registration for graphs/traces. | Sprint400 `REPLAY-REACH-201-005`, Sprint401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
| Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
| Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
@@ -78,6 +78,14 @@ API contracts to amend:
- `POST /signals/runtime-facts` request body schema (NDJSON) with `symbol_id`, `code_id`, `hit_count`, `loader_base`.
- `GET /policy/findings` payload must surface `reachability.evidence[]` objects.
### 4.1 Signals runtime ingestion snapshot (Nov 2025)
- `/signals/runtime-facts` (JSON) and `/signals/runtime-facts/ndjson` (streaming, optional gzip) accept the following event fields:
- `symbolId` (required), `codeId`, `loaderBase`, `hitCount`, `processId`, `processName`, `socketAddress`, `containerId`, `evidenceUri`, `metadata`.
- Subject context (`scanId` / `imageDigest` / `component` / `version`) plus `callgraphId` is supplied either in the JSON body or as query params for the NDJSON endpoint.
- Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
- Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
---
## 5. Test & Fixture Expectations
@@ -99,4 +107,3 @@ All fixtures must remain deterministic: sort nodes/edges, normalise casing, and
5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.
Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.