# Runbook — Reachability Runtime Ingestion > **Audience:** Signals Guild · Zastava Guild · Scanner Guild · Ops Guild > **Prereqs:** `docs/reachability/DELIVERY_GUIDE.md`, `docs/reachability/function-level-evidence.md`, `docs/modules/platform/architecture-overview.md` §5 This runbook documents how to stage, ingest, and troubleshoot runtime evidence (`/signals/runtime-facts`) so function-level reachability data remains provable across online and air-gapped environments. --- ## 1 · Runtime capture pipeline 1. **Zastava Observer / runtime probes** - Emit NDJSON lines with `symbolId`, `codeId`, `loaderBase`, `hitCount`, `process{Id,Name}`, `socketAddress`, `containerId`, optional `evidenceUri`, and `metadata` map. - Compress large batches with gzip (`.ndjson.gz`), max 10 MiB per chunk, monotonic timestamps. - Attach subject context via HTTP query (`scanId`, `imageDigest`, `component`, `version`) when using the streaming endpoint. 2. **CAS staging (optional but recommended)** - Upload raw batches to `cas://reachability/runtime/` before ingestion. - Store CAS URIs alongside probe metadata so Signals can echo them in `ReachabilityFactDocument.Metadata`. 3. **Signals ingestion** - POST `/signals/runtime-facts` (JSON) for one-off uploads or stream NDJSON to `/signals/runtime-facts/ndjson` (set `Content-Encoding: gzip` when applicable). - Signals validates schema, dedupes events by `(symbolId, codeId, loaderBase)`, and updates `runtimeFacts` with cumulative `hitCount`. 4. **Reachability scoring** - `ReachabilityScoringService` recomputes lattice states (`Unknown → Observed`), persists references to runtime CAS artifacts, and emits `signals.fact.updated` once `GAP-SIG-003` lands. --- ## 2 · Operator checklist | Step | Action | Owner | Notes | |------|--------|-------|-------| | 1 | Verify probe health (`zastava observer status`) and confirm NDJSON batches include `symbolId` + `codeId`. | Runtime Guild | Reject batches missing `symbolId`; restart probe with debug logging. | | 2 | Stage batches in CAS (`stella cas put reachability/runtime ...`) and record the returned URI. | Ops Guild | Required for replay-grade evidence. | | 3 | Call `/signals/runtime-facts/ndjson` with `tenant` and `callgraphId` headers, streaming the gzip payload. | Signals Guild | Use service identity with `signals.runtime:write`. | | 4 | Monitor ingestion metrics: `signals_runtime_events_total`, `signals_runtime_ingest_failures_total`. | Observability | Alert if failures exceed 1% over 5 min. | | 5 | Trigger recompute (`POST /signals/reachability/recompute`) when new runtime batches arrive for an active scan. | Signals Guild | Provide `callgraphId` + subject tuple. | | 6 | Validate Policy/UI surfaces by requesting `/policy/findings?includeReachability=true` and checking `reachability.evidence`. | Policy + UI Guilds | Ensure evidence references the CAS URIs from Step 2. | --- ## 3 · Air-gapped workflow 1. Export runtime NDJSON batches via Offline Kit: `offline/reachability/runtime//.ndjson.gz` + manifest. 2. On the secure network, load CAS entries locally (`stella cas load ...`) and invoke `stella signals runtime-facts ingest --from offline/...`. 3. Re-run `stella replay manifest.json --section reachability` to ensure manifests cite the imported runtime digests. 4. Sync ingestion receipts (`signals-runtime-ingest.log`) back to the air-gapped environment for audit. --- ## 4 · Troubleshooting | Symptom | Cause | Resolution | |---------|-------|------------| | `422 Unprocessable Entity: missing symbolId` | Probe emitted incomplete JSON. | Restart probe with `--include-symbols`, confirm symbol server availability, regenerate batch. | | `403 Forbidden: sealed-mode evidence invalid` | Signals sealed-mode verifier rejected payload (likely missing CAS proof). | Upload batch to CAS first, include `X-Reachability-Cas-Uri` header, or disable sealed-mode in non-prod. | | Runtime facts missing from Policy/UI | Recompute not triggered or `callgraphId` mismatch. | List facts via `/signals/reachability/facts?subject=...`, confirm `callgraphId`, then POST recompute. | | CAS hash mismatch during replay | Batch mutated post-ingestion. | Re-stage from original gzip, invalidate old CAS entry, rerun ingestion to regenerate manifest references. | --- ## 5 · Retention & observability - Default retention: 30 days hot in Signals Mongo, 180 days in CAS (match replay policy). Configure via `signals.runtimeFacts.retentionDays`. - Metrics to alert on: - `signals_runtime_ingest_latency_seconds` (P95 < 2 s). - `signals_runtime_cas_miss_total` (should be 0 once CAS is mandatory). - Logs/traces: - Category `Reachability.Runtime` records ingestion batches and CAS URIs. - Trace attributes: `callgraphId`, `subjectKey`, `casUri`, `eventCount`. --- ## 6 · References - `docs/reachability/DELIVERY_GUIDE.md` - `docs/reachability/function-level-evidence.md` - `docs/replay/DETERMINISTIC_REPLAY.md` - `docs/modules/platform/architecture-overview.md` §5 (Replay CAS) - `docs/runbooks/replay_ops.md` Update this runbook whenever endpoints, retention knobs, or CAS layouts change.