# Reachability · Runtime + Static Union (v0.1) ## What this covers - End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles. - Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts. - How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes. ## Pipeline (at a glance) 1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/.tar.zst` with manifest `meta.json`. 2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`. 3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`. 4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events. 5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`. ## Storage & CAS namespaces - Static graphs: `cas://reachability_graphs//.tar.zst` (meta.json + graph files). - Runtime traces: `cas://runtime_traces//.tar.zst` (NDJSON or zipped stream). - Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay. ## Signals API quick reference - `POST /signals/runtime-facts` — structured request body; recomputes reachability. - `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params. - `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`. - `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json. - `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files. - `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets). ## Scoring and unknowns - Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0. - Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.05–0.99). - Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure. - Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets. ## Replay contract changes (v0.1 add-ons) - `reachability.analysisId` (string, optional) — ties to Signals union ingest. - Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri. - Runtime trace refs include `namespace`, recordedAt, sha256, casUri. ## Operator checklist - Use deterministic CAS paths; never embed absolute file paths. - When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup. - Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked. - Keep feeds frozen for reproducibility; avoid external downloads in union preparation. --- ## Node Hash Joins and Runtime Evidence Linkage Sprint: SPRINT_20260112_008_DOCS_path_witness_contracts (PW-DOC-002) ### Overview Node hashes provide a canonical way to join static reachability analysis with runtime observations. Each node in a callgraph can be identified by a stable hash computed from its PURL and symbol information, enabling: 1. **Static-to-runtime correlation**: Match runtime stack traces to static callgraph nodes 2. **Cross-scan consistency**: Compare reachability across different analysis runs 3. **Evidence linking**: Associate attestations with specific code paths ### Node Hash Recipe A node hash is computed as: ``` nodeHash = SHA256(normalize(purl) + ":" + normalize(symbol)) ``` Where: - `normalize(purl)` lowercases the PURL and sorts qualifiers alphabetically - `normalize(symbol)` removes whitespace and normalizes platform-specific decorations Example: ```json { "purl": "pkg:npm/express@4.18.2", "symbol": "Router.handle", "nodeHash": "sha256:a1b2c3d4..." } ``` ### Path Hash and Top-K Selection A path hash identifies a specific call path from entrypoint to sink: ``` pathHash = SHA256(entryNodeHash + ":" + joinedIntermediateHashes + ":" + sinkNodeHash) ``` For long paths, only the **top-K** most significant nodes are included (default K=10): - Entry node (always included) - Sink node (always included) - Intermediate nodes ranked by call frequency or security relevance ### Runtime Evidence Linkage Runtime observations from Zastava can be linked to static analysis using node hashes: | Field | Description | |-------|-------------| | `observedNodeHashes` | Node hashes seen at runtime | | `observedPathHashes` | Path hashes confirmed by runtime traces | | `runtimeEvidenceAt` | Timestamp of runtime observation (RFC3339) | | `callstackHash` | Hash of the observed call stack | ### Join Example To correlate static reachability with runtime evidence: ```sql -- Find statically-reachable vulnerabilities confirmed at runtime SELECT s.vulnerability_id, s.path_hash, r.observed_at FROM static_reachability s JOIN runtime_observations r ON s.sink_node_hash = ANY(r.observed_node_hashes) WHERE s.reachable = true AND r.observed_at > NOW() - INTERVAL '7 days'; ``` ### SARIF Integration Node hashes are exposed in SARIF outputs via `stellaops/*` property keys: ```json { "results": [{ "ruleId": "CVE-2024-1234", "properties": { "stellaops/nodeHash": "sha256:abc123...", "stellaops/pathHash": "sha256:def456...", "stellaops/topKNodeHashes": ["sha256:...", "sha256:..."], "stellaops/evidenceUri": "cas://evidence/...", "stellaops/observedAtRuntime": true } }] } ``` ### Policy Gate Usage Policy rules can reference node and path hashes for fine-grained control: ```yaml rules: - name: block-confirmed-critical-path match: severity: CRITICAL reachability: pathHash: exists: true observedAtRuntime: true action: block ``` See `policies/path-gates-advanced.yaml` for comprehensive examples. --- ## References - Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md` - Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md` - Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.