Files
git.stella-ops.org/docs/modules/reach-graph/guides/reachability.md
2026-01-14 18:39:19 +02:00

164 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reachability · Runtime + Static Union (v0.1)
## What this covers
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.
## Pipeline (at a glance)
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.
## Storage & CAS namespaces
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.
## Signals API quick reference
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).
## Scoring and unknowns
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.050.99).
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.
## Replay contract changes (v0.1 add-ons)
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.
## Operator checklist
- Use deterministic CAS paths; never embed absolute file paths.
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.
---
## Node Hash Joins and Runtime Evidence Linkage
Sprint: SPRINT_20260112_008_DOCS_path_witness_contracts (PW-DOC-002)
### Overview
Node hashes provide a canonical way to join static reachability analysis with runtime observations. Each node in a callgraph can be identified by a stable hash computed from its PURL and symbol information, enabling:
1. **Static-to-runtime correlation**: Match runtime stack traces to static callgraph nodes
2. **Cross-scan consistency**: Compare reachability across different analysis runs
3. **Evidence linking**: Associate attestations with specific code paths
### Node Hash Recipe
A node hash is computed as:
```
nodeHash = SHA256(normalize(purl) + ":" + normalize(symbol))
```
Where:
- `normalize(purl)` lowercases the PURL and sorts qualifiers alphabetically
- `normalize(symbol)` removes whitespace and normalizes platform-specific decorations
Example:
```json
{
"purl": "pkg:npm/express@4.18.2",
"symbol": "Router.handle",
"nodeHash": "sha256:a1b2c3d4..."
}
```
### Path Hash and Top-K Selection
A path hash identifies a specific call path from entrypoint to sink:
```
pathHash = SHA256(entryNodeHash + ":" + joinedIntermediateHashes + ":" + sinkNodeHash)
```
For long paths, only the **top-K** most significant nodes are included (default K=10):
- Entry node (always included)
- Sink node (always included)
- Intermediate nodes ranked by call frequency or security relevance
### Runtime Evidence Linkage
Runtime observations from Zastava can be linked to static analysis using node hashes:
| Field | Description |
|-------|-------------|
| `observedNodeHashes` | Node hashes seen at runtime |
| `observedPathHashes` | Path hashes confirmed by runtime traces |
| `runtimeEvidenceAt` | Timestamp of runtime observation (RFC3339) |
| `callstackHash` | Hash of the observed call stack |
### Join Example
To correlate static reachability with runtime evidence:
```sql
-- Find statically-reachable vulnerabilities confirmed at runtime
SELECT
s.vulnerability_id,
s.path_hash,
r.observed_at
FROM static_reachability s
JOIN runtime_observations r
ON s.sink_node_hash = ANY(r.observed_node_hashes)
WHERE s.reachable = true
AND r.observed_at > NOW() - INTERVAL '7 days';
```
### SARIF Integration
Node hashes are exposed in SARIF outputs via `stellaops/*` property keys:
```json
{
"results": [{
"ruleId": "CVE-2024-1234",
"properties": {
"stellaops/nodeHash": "sha256:abc123...",
"stellaops/pathHash": "sha256:def456...",
"stellaops/topKNodeHashes": ["sha256:...", "sha256:..."],
"stellaops/evidenceUri": "cas://evidence/...",
"stellaops/observedAtRuntime": true
}
}]
}
```
### Policy Gate Usage
Policy rules can reference node and path hashes for fine-grained control:
```yaml
rules:
- name: block-confirmed-critical-path
match:
severity: CRITICAL
reachability:
pathHash:
exists: true
observedAtRuntime: true
action: block
```
See `policies/path-gates-advanced.yaml` for comprehensive examples.
---
## References
- Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.