# Reachability · Runtime + Static Union (v0.1) ## What this covers - End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles. - Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts. - How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes. ## Pipeline (at a glance) 1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/.tar.zst` with manifest `meta.json`. 2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`. 3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`. 4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events. 5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`. ## Storage & CAS namespaces - Static graphs: `cas://reachability_graphs//.tar.zst` (meta.json + graph files). - Runtime traces: `cas://runtime_traces//.tar.zst` (NDJSON or zipped stream). - Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay. ## Signals API quick reference - `POST /signals/runtime-facts` — structured request body; recomputes reachability. - `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params. - `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`. - `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json. - `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files. - `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets). ## Scoring and unknowns - Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0. - Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.05–0.99). - Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure. - Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets. ## Replay contract changes (v0.1 add-ons) - `reachability.analysisId` (string, optional) — ties to Signals union ingest. - Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri. - Runtime trace refs include `namespace`, recordedAt, sha256, casUri. ## Operator checklist - Use deterministic CAS paths; never embed absolute file paths. - When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup. - Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked. - Keep feeds frozen for reproducibility; avoid external downloads in union preparation. ## References - Schema: `docs/reachability/runtime-static-union-schema.md` - Delivery guide: `docs/reachability/DELIVERY_GUIDE.md` - Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/signals/events-24-005.md`.