git.stella-ops.org/docs/modules/reach-graph/guides/reachability.md

# Reachability · Runtime + Static Union (v0.1)

## What this covers
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.

## Pipeline (at a glance)
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.

## Storage & CAS namespaces
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.

## Signals API quick reference
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).

## Scoring and unknowns
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.05–0.99).
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.

## Replay contract changes (v0.1 add-ons)
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.

## Operator checklist
- Use deterministic CAS paths; never embed absolute file paths.
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.

---

## Node Hash Joins and Runtime Evidence Linkage

Sprint: SPRINT_20260112_008_DOCS_path_witness_contracts (PW-DOC-002)

### Overview

Node hashes provide a canonical way to join static reachability analysis with runtime observations. Each node in a callgraph can be identified by a stable hash computed from its PURL and symbol information, enabling:

1. **Static-to-runtime correlation**: Match runtime stack traces to static callgraph nodes
2. **Cross-scan consistency**: Compare reachability across different analysis runs
3. **Evidence linking**: Associate attestations with specific code paths

### Node Hash Recipe

A node hash is computed as:

```
nodeHash = SHA256(normalize(purl) + ":" + normalize(symbol))
```

Where:
- `normalize(purl)` lowercases the PURL and sorts qualifiers alphabetically
- `normalize(symbol)` removes whitespace and normalizes platform-specific decorations

Example:
```json
{
  "purl": "pkg:npm/express@4.18.2",
  "symbol": "Router.handle",
  "nodeHash": "sha256:a1b2c3d4..."
}
```

### Path Hash and Top-K Selection

A path hash identifies a specific call path from entrypoint to sink:

```
pathHash = SHA256(entryNodeHash + ":" + joinedIntermediateHashes + ":" + sinkNodeHash)
```

For long paths, only the **top-K** most significant nodes are included (default K=10):
- Entry node (always included)
- Sink node (always included)
- Intermediate nodes ranked by call frequency or security relevance

### Runtime Evidence Linkage

Runtime observations from Zastava can be linked to static analysis using node hashes:

| Field | Description |
|-------|-------------|
| `observedNodeHashes` | Node hashes seen at runtime |
| `observedPathHashes` | Path hashes confirmed by runtime traces |
| `runtimeEvidenceAt` | Timestamp of runtime observation (RFC3339) |
| `callstackHash` | Hash of the observed call stack |

### Join Example

To correlate static reachability with runtime evidence:

```sql
-- Find statically-reachable vulnerabilities confirmed at runtime
SELECT
  s.vulnerability_id,
  s.path_hash,
  r.observed_at
FROM static_reachability s
JOIN runtime_observations r
  ON s.sink_node_hash = ANY(r.observed_node_hashes)
WHERE s.reachable = true
  AND r.observed_at > NOW() - INTERVAL '7 days';
```

### SARIF Integration

Node hashes are exposed in SARIF outputs via `stellaops/*` property keys:

```json
{
  "results": [{
    "ruleId": "CVE-2024-1234",
    "properties": {
      "stellaops/nodeHash": "sha256:abc123...",
      "stellaops/pathHash": "sha256:def456...",
      "stellaops/topKNodeHashes": ["sha256:...", "sha256:..."],
      "stellaops/evidenceUri": "cas://evidence/...",
      "stellaops/observedAtRuntime": true
    }
  }]
}
```

### Policy Gate Usage

Policy rules can reference node and path hashes for fine-grained control:

```yaml
rules:
  - name: block-confirmed-critical-path
    match:
      severity: CRITICAL
      reachability:
        pathHash:
          exists: true
        observedAtRuntime: true
    action: block
```

See `policies/path-gates-advanced.yaml` for comprehensive examples.

---

## References
- Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.