up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-26 07:47:08 +02:00
parent 56e2f64d07
commit 1c782897f7
184 changed files with 8991 additions and 649 deletions

View File

@@ -0,0 +1,34 @@
# Reachability Callgraph Formats (richgraph-v1)
## Purpose
Normalize static callgraphs across languages so Signals can merge them with runtime traces and replay bundles deterministically.
## Core fields (per node/edge)
- `nodes[].id` — canonical SymbolID (language-specific, stable, lowercase where applicable).
- `nodes[].kind` — e.g., method/function/class/file.
- `edges[].sourceId` / `edges[].targetId` — SymbolIDs; edge types include `call`, `import`, `inherit`, `reference`.
- `artifact` — CAS paths for source graph files; include `sha256`, `uri`, optional `generator` (analyzer name/version).
## Language-specific notes
- **JVM**: use JVM internal names; include signature for overloads.
- **.NET/Roslyn**: fully-qualified method token; include assembly and module for cross-assembly edges.
- **Go SSA**: package path + function; include receiver for methods.
- **Node/Deno TS**: module path + exported symbol; ES module graph only.
- **Rust MIR**: crate::module::symbol; monomorphized forms allowed if stable.
- **Swift SIL**: mangled name; demangled kept in metadata only.
- **Shell/binaries**: when present, use ELF/PE symbol+offset; mark `kind=binary`.
## CAS layout
- Store graph bundles under `reachability_graphs/<hh>/<sha>.tar.zst`.
- Bundle SHOULD contain `meta.json` with analyzer, version, language, component, and entry points (array).
- File order inside tar must be lexicographic to keep hashes stable.
## Validation rules
- No duplicate node IDs; edges must reference existing nodes.
- Entry points list must be present (even if empty) for Signals recompute.
- Graph SHA256 must match tar content; Signals rejects mismatched SHA.
- Only ASCII; UTF-8 paths are allowed but must be normalized (NFC).
## References
- Union schema: `docs/reachability/runtime-static-union-schema.md`
- Delivery guide: `docs/reachability/DELIVERY_GUIDE.md`

View File

@@ -0,0 +1,48 @@
# Reachability · Runtime + Static Union (v0.1)
## What this covers
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.
## Pipeline (at a glance)
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.
## Storage & CAS namespaces
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.
## Signals API quick reference
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).
## Scoring and unknowns
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.050.99).
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.
## Replay contract changes (v0.1 add-ons)
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.
## Operator checklist
- Use deterministic CAS paths; never embed absolute file paths.
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.
## References
- Schema: `docs/reachability/runtime-static-union-schema.md`
- Delivery guide: `docs/reachability/DELIVERY_GUIDE.md`
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/signals/events-24-005.md`.

View File

@@ -0,0 +1,38 @@
# Runtime Facts (Signals/Zastava) v0.1
## Payload shapes
- **Structured** (`POST /signals/runtime-facts`):
- `subject` (imageDigest | scanId | component+version)
- `callgraphId` (required)
- `events[]`: `{ symbolId, codeId?, purl?, buildId?, loaderBase?, processId?, processName?, socketAddress?, containerId?, evidenceUri?, hitCount, observedAt?, metadata{} }`
- **Streaming NDJSON** (`POST /signals/runtime-facts/ndjson`): one JSON object per line with the same fields; supports `Content-Encoding: gzip`; callgraphId provided via query/header metadata.
## Provenance/metadata
- Signals stamps:
- `provenance.source` (defaults to `runtime` unless provided in metadata)
- `provenance.ingestedAt` (ISO-8601 UTC)
- `provenance.callgraphId`
- Runtime hits are aggregated per `symbolId` (summing hitCount) before persisting and feeding scoring.
## Validation
- `symbolId` required; events list must not be empty.
- `callgraphId` required and must resolve to a stored callgraph/union bundle.
- Subject must yield a non-empty `subjectKey`.
- Empty runtime stream is rejected.
## Storage and cache
- Stored alongside reachability facts in Mongo collection `reachability_facts`.
- Runtime hits cached in Redis via `reachability_cache:*` entries; invalidated on ingest.
## Interaction with scoring
- Ingest triggers recompute: runtime hits added to prior facts hits, targets set to symbols observed, entryPoints taken from callgraph.
- Reachability states include runtime evidence on the path; bucket/weight may be `runtime` when hits are present.
- Unknowns registry stays separate; unknowns count still factors into fact score via pressure penalty.
## Replay alignment
- Runtime traces packaged under CAS namespace `runtime_traces`; referenced in replay manifest with `namespace` and `analysisId` to link to static graphs.
## Determinism rules
- Keep NDJSON ordering stable when generating bundles.
- Use UTC timestamps; avoid environment-dependent metadata values.
- No external network lookups during ingest.