Files
2026-01-14 18:39:19 +02:00

6.8 KiB
Raw Permalink Blame History

Reachability · Runtime + Static Union (v0.1)

What this covers

  • End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
  • Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
  • How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.

Pipeline (at a glance)

  1. Scanner emits language-specific callgraphs as richgraph-v1 and packs them into CAS under reachability_graphs/<digest>.tar.zst with manifest meta.json.
  2. Zastava Observer streams NDJSON runtime facts (symbol_id, code_id, hit_count, loader_base, cas_uri) to Signals POST /signals/runtime-facts or /runtime-facts/ndjson.
  3. Union bundles (runtime + static) are uploaded as ZIP to POST /signals/reachability/union with optional X-Analysis-Id; Signals stores under reachability_graphs/{analysisId}/.
  4. Signals scoring consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes signals.fact.updated@v1 events.
  5. Replay records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared analysisId.

Storage & CAS namespaces

  • Static graphs: cas://reachability_graphs/<hh>/<sha>.tar.zst (meta.json + graph files).
  • Runtime traces: cas://runtime_traces/<hh>/<sha>.tar.zst (NDJSON or zipped stream).
  • Replay manifest now includes analysisId to correlate graphs/traces; each reference also carries namespace and callgraphId (static) for unambiguous replay.

Signals API quick reference

  • POST /signals/runtime-facts — structured request body; recomputes reachability.
  • POST /signals/runtime-facts/ndjson — streaming NDJSON/gzip; requires callgraphId header params.
  • POST /signals/reachability/union — upload ZIP bundle; optional X-Analysis-Id.
  • GET /signals/reachability/union/{analysisId}/meta — returns meta.json.
  • GET /signals/reachability/union/{analysisId}/files/{fileName} — download bundled graph/trace files.
  • GET /signals/facts/{subjectKey} — fetch latest reachability fact (includes unknowns counters and targets).

Scoring and unknowns

  • Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
  • Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.050.99).
  • Unknowns: Signals counts unresolved symbols/edges per subject; UnknownsPressure = unknowns / (states + unknowns) (capped). Fact score is reduced by UnknownsPenaltyCeiling (default 0.35) × pressure.
  • Events: signals.fact.updated@v1 now emits unknownsCount and unknownsPressure plus bucket/weight/stateCount/targets.

Replay contract changes (v0.1 add-ons)

  • reachability.analysisId (string, optional) — ties to Signals union ingest.
  • Graph refs include namespace, callgraphId, analyzer, version, sha256, casUri.
  • Runtime trace refs include namespace, recordedAt, sha256, casUri.

Operator checklist

  • Use deterministic CAS paths; never embed absolute file paths.
  • When emitting runtime NDJSON, include loader_base and code_id when available for de-dup.
  • Ensure analysisId is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
  • Keep feeds frozen for reproducibility; avoid external downloads in union preparation.

Node Hash Joins and Runtime Evidence Linkage

Sprint: SPRINT_20260112_008_DOCS_path_witness_contracts (PW-DOC-002)

Overview

Node hashes provide a canonical way to join static reachability analysis with runtime observations. Each node in a callgraph can be identified by a stable hash computed from its PURL and symbol information, enabling:

  1. Static-to-runtime correlation: Match runtime stack traces to static callgraph nodes
  2. Cross-scan consistency: Compare reachability across different analysis runs
  3. Evidence linking: Associate attestations with specific code paths

Node Hash Recipe

A node hash is computed as:

nodeHash = SHA256(normalize(purl) + ":" + normalize(symbol))

Where:

  • normalize(purl) lowercases the PURL and sorts qualifiers alphabetically
  • normalize(symbol) removes whitespace and normalizes platform-specific decorations

Example:

{
  "purl": "pkg:npm/express@4.18.2",
  "symbol": "Router.handle",
  "nodeHash": "sha256:a1b2c3d4..."
}

Path Hash and Top-K Selection

A path hash identifies a specific call path from entrypoint to sink:

pathHash = SHA256(entryNodeHash + ":" + joinedIntermediateHashes + ":" + sinkNodeHash)

For long paths, only the top-K most significant nodes are included (default K=10):

  • Entry node (always included)
  • Sink node (always included)
  • Intermediate nodes ranked by call frequency or security relevance

Runtime Evidence Linkage

Runtime observations from Zastava can be linked to static analysis using node hashes:

Field Description
observedNodeHashes Node hashes seen at runtime
observedPathHashes Path hashes confirmed by runtime traces
runtimeEvidenceAt Timestamp of runtime observation (RFC3339)
callstackHash Hash of the observed call stack

Join Example

To correlate static reachability with runtime evidence:

-- Find statically-reachable vulnerabilities confirmed at runtime
SELECT 
  s.vulnerability_id,
  s.path_hash,
  r.observed_at
FROM static_reachability s
JOIN runtime_observations r 
  ON s.sink_node_hash = ANY(r.observed_node_hashes)
WHERE s.reachable = true
  AND r.observed_at > NOW() - INTERVAL '7 days';

SARIF Integration

Node hashes are exposed in SARIF outputs via stellaops/* property keys:

{
  "results": [{
    "ruleId": "CVE-2024-1234",
    "properties": {
      "stellaops/nodeHash": "sha256:abc123...",
      "stellaops/pathHash": "sha256:def456...",
      "stellaops/topKNodeHashes": ["sha256:...", "sha256:..."],
      "stellaops/evidenceUri": "cas://evidence/...",
      "stellaops/observedAtRuntime": true
    }
  }]
}

Policy Gate Usage

Policy rules can reference node and path hashes for fine-grained control:

rules:
  - name: block-confirmed-critical-path
    match:
      severity: CRITICAL
      reachability:
        pathHash:
          exists: true
        observedAtRuntime: true
    action: block

See policies/path-gates-advanced.yaml for comprehensive examples.


References

  • Schema: docs/modules/reach-graph/schemas/runtime-static-union-schema.md
  • Delivery guide: docs/modules/reach-graph/guides/DELIVERY_GUIDE.md
  • Unknowns registry & scoring: Signals code (ReachabilityScoringService, UnknownsIngestionService) and events doc docs/modules/signals/guides/events-24-005.md.