164 lines
6.8 KiB
Markdown
164 lines
6.8 KiB
Markdown
# Reachability · Runtime + Static Union (v0.1)
|
||
|
||
## What this covers
|
||
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
|
||
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
|
||
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.
|
||
|
||
## Pipeline (at a glance)
|
||
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
|
||
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
|
||
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
|
||
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
|
||
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.
|
||
|
||
## Storage & CAS namespaces
|
||
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
|
||
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
|
||
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.
|
||
|
||
## Signals API quick reference
|
||
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
|
||
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
|
||
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
|
||
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
|
||
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
|
||
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).
|
||
|
||
## Scoring and unknowns
|
||
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
|
||
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.05–0.99).
|
||
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
|
||
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.
|
||
|
||
## Replay contract changes (v0.1 add-ons)
|
||
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
|
||
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
|
||
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.
|
||
|
||
## Operator checklist
|
||
- Use deterministic CAS paths; never embed absolute file paths.
|
||
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
|
||
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
|
||
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.
|
||
|
||
---
|
||
|
||
## Node Hash Joins and Runtime Evidence Linkage
|
||
|
||
Sprint: SPRINT_20260112_008_DOCS_path_witness_contracts (PW-DOC-002)
|
||
|
||
### Overview
|
||
|
||
Node hashes provide a canonical way to join static reachability analysis with runtime observations. Each node in a callgraph can be identified by a stable hash computed from its PURL and symbol information, enabling:
|
||
|
||
1. **Static-to-runtime correlation**: Match runtime stack traces to static callgraph nodes
|
||
2. **Cross-scan consistency**: Compare reachability across different analysis runs
|
||
3. **Evidence linking**: Associate attestations with specific code paths
|
||
|
||
### Node Hash Recipe
|
||
|
||
A node hash is computed as:
|
||
|
||
```
|
||
nodeHash = SHA256(normalize(purl) + ":" + normalize(symbol))
|
||
```
|
||
|
||
Where:
|
||
- `normalize(purl)` lowercases the PURL and sorts qualifiers alphabetically
|
||
- `normalize(symbol)` removes whitespace and normalizes platform-specific decorations
|
||
|
||
Example:
|
||
```json
|
||
{
|
||
"purl": "pkg:npm/express@4.18.2",
|
||
"symbol": "Router.handle",
|
||
"nodeHash": "sha256:a1b2c3d4..."
|
||
}
|
||
```
|
||
|
||
### Path Hash and Top-K Selection
|
||
|
||
A path hash identifies a specific call path from entrypoint to sink:
|
||
|
||
```
|
||
pathHash = SHA256(entryNodeHash + ":" + joinedIntermediateHashes + ":" + sinkNodeHash)
|
||
```
|
||
|
||
For long paths, only the **top-K** most significant nodes are included (default K=10):
|
||
- Entry node (always included)
|
||
- Sink node (always included)
|
||
- Intermediate nodes ranked by call frequency or security relevance
|
||
|
||
### Runtime Evidence Linkage
|
||
|
||
Runtime observations from Zastava can be linked to static analysis using node hashes:
|
||
|
||
| Field | Description |
|
||
|-------|-------------|
|
||
| `observedNodeHashes` | Node hashes seen at runtime |
|
||
| `observedPathHashes` | Path hashes confirmed by runtime traces |
|
||
| `runtimeEvidenceAt` | Timestamp of runtime observation (RFC3339) |
|
||
| `callstackHash` | Hash of the observed call stack |
|
||
|
||
### Join Example
|
||
|
||
To correlate static reachability with runtime evidence:
|
||
|
||
```sql
|
||
-- Find statically-reachable vulnerabilities confirmed at runtime
|
||
SELECT
|
||
s.vulnerability_id,
|
||
s.path_hash,
|
||
r.observed_at
|
||
FROM static_reachability s
|
||
JOIN runtime_observations r
|
||
ON s.sink_node_hash = ANY(r.observed_node_hashes)
|
||
WHERE s.reachable = true
|
||
AND r.observed_at > NOW() - INTERVAL '7 days';
|
||
```
|
||
|
||
### SARIF Integration
|
||
|
||
Node hashes are exposed in SARIF outputs via `stellaops/*` property keys:
|
||
|
||
```json
|
||
{
|
||
"results": [{
|
||
"ruleId": "CVE-2024-1234",
|
||
"properties": {
|
||
"stellaops/nodeHash": "sha256:abc123...",
|
||
"stellaops/pathHash": "sha256:def456...",
|
||
"stellaops/topKNodeHashes": ["sha256:...", "sha256:..."],
|
||
"stellaops/evidenceUri": "cas://evidence/...",
|
||
"stellaops/observedAtRuntime": true
|
||
}
|
||
}]
|
||
}
|
||
```
|
||
|
||
### Policy Gate Usage
|
||
|
||
Policy rules can reference node and path hashes for fine-grained control:
|
||
|
||
```yaml
|
||
rules:
|
||
- name: block-confirmed-critical-path
|
||
match:
|
||
severity: CRITICAL
|
||
reachability:
|
||
pathHash:
|
||
exists: true
|
||
observedAtRuntime: true
|
||
action: block
|
||
```
|
||
|
||
See `policies/path-gates-advanced.yaml` for comprehensive examples.
|
||
|
||
---
|
||
|
||
## References
|
||
- Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
|
||
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
|
||
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.
|