Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added Deno analyzer with comprehensive metadata and evidence structure. - Created a detailed implementation plan for Sprint 130 focusing on Deno analyzer. - Introduced AdvisoryAiGuardrailOptions for managing guardrail configurations. - Developed GuardrailPhraseLoader for loading blocked phrases from JSON files. - Implemented tests for AdvisoryGuardrailOptions binding and phrase loading. - Enhanced telemetry for Advisory AI with metrics tracking. - Added VexObservationProjectionService for querying VEX observations. - Created extensive tests for VexObservationProjectionService functionality. - Introduced Ruby language analyzer with tests for simple and complex workspaces. - Added Ruby application fixtures for testing purposes.
5.1 KiB
5.1 KiB
Runbook — Reachability Runtime Ingestion
Audience: Signals Guild · Zastava Guild · Scanner Guild · Ops Guild
Prereqs:docs/reachability/DELIVERY_GUIDE.md,docs/reachability/function-level-evidence.md,docs/modules/platform/architecture-overview.md§5
This runbook documents how to stage, ingest, and troubleshoot runtime evidence (/signals/runtime-facts) so function-level reachability data remains provable across online and air-gapped environments.
1 · Runtime capture pipeline
- Zastava Observer / runtime probes
- Emit NDJSON lines with
symbolId,codeId,loaderBase,hitCount,process{Id,Name},socketAddress,containerId, optionalevidenceUri, andmetadatamap. - Compress large batches with gzip (
.ndjson.gz), max 10 MiB per chunk, monotonic timestamps. - Attach subject context via HTTP query (
scanId,imageDigest,component,version) when using the streaming endpoint.
- Emit NDJSON lines with
- CAS staging (optional but recommended)
- Upload raw batches to
cas://reachability/runtime/<sha256>before ingestion. - Store CAS URIs alongside probe metadata so Signals can echo them in
ReachabilityFactDocument.Metadata.
- Upload raw batches to
- Signals ingestion
- POST
/signals/runtime-facts(JSON) for one-off uploads or stream NDJSON to/signals/runtime-facts/ndjson(setContent-Encoding: gzipwhen applicable). - Signals validates schema, dedupes events by
(symbolId, codeId, loaderBase), and updatesruntimeFactswith cumulativehitCount.
- POST
- Reachability scoring
ReachabilityScoringServicerecomputes lattice states (Unknown → Observed), persists references to runtime CAS artifacts, and emitssignals.fact.updatedonceGAP-SIG-003lands.
2 · Operator checklist
| Step | Action | Owner | Notes |
|---|---|---|---|
| 1 | Verify probe health (zastava observer status) and confirm NDJSON batches include symbolId + codeId. |
Runtime Guild | Reject batches missing symbolId; restart probe with debug logging. |
| 2 | Stage batches in CAS (stella cas put reachability/runtime ...) and record the returned URI. |
Ops Guild | Required for replay-grade evidence. |
| 3 | Call /signals/runtime-facts/ndjson with tenant and callgraphId headers, streaming the gzip payload. |
Signals Guild | Use service identity with signals.runtime:write. |
| 4 | Monitor ingestion metrics: signals_runtime_events_total, signals_runtime_ingest_failures_total. |
Observability | Alert if failures exceed 1% over 5 min. |
| 5 | Trigger recompute (POST /signals/reachability/recompute) when new runtime batches arrive for an active scan. |
Signals Guild | Provide callgraphId + subject tuple. |
| 6 | Validate Policy/UI surfaces by requesting /policy/findings?includeReachability=true and checking reachability.evidence. |
Policy + UI Guilds | Ensure evidence references the CAS URIs from Step 2. |
3 · Air-gapped workflow
- Export runtime NDJSON batches via Offline Kit:
offline/reachability/runtime/<scan-id>/<timestamp>.ndjson.gz+ manifest. - On the secure network, load CAS entries locally (
stella cas load ...) and invokestella signals runtime-facts ingest --from offline/.... - Re-run
stella replay manifest.json --section reachabilityto ensure manifests cite the imported runtime digests. - Sync ingestion receipts (
signals-runtime-ingest.log) back to the air-gapped environment for audit.
4 · Troubleshooting
| Symptom | Cause | Resolution |
|---|---|---|
422 Unprocessable Entity: missing symbolId |
Probe emitted incomplete JSON. | Restart probe with --include-symbols, confirm symbol server availability, regenerate batch. |
403 Forbidden: sealed-mode evidence invalid |
Signals sealed-mode verifier rejected payload (likely missing CAS proof). | Upload batch to CAS first, include X-Reachability-Cas-Uri header, or disable sealed-mode in non-prod. |
| Runtime facts missing from Policy/UI | Recompute not triggered or callgraphId mismatch. |
List facts via /signals/reachability/facts?subject=..., confirm callgraphId, then POST recompute. |
| CAS hash mismatch during replay | Batch mutated post-ingestion. | Re-stage from original gzip, invalidate old CAS entry, rerun ingestion to regenerate manifest references. |
5 · Retention & observability
- Default retention: 30 days hot in Signals Mongo, 180 days in CAS (match replay policy). Configure via
signals.runtimeFacts.retentionDays. - Metrics to alert on:
signals_runtime_ingest_latency_seconds(P95 < 2 s).signals_runtime_cas_miss_total(should be 0 once CAS is mandatory).
- Logs/traces:
- Category
Reachability.Runtimerecords ingestion batches and CAS URIs. - Trace attributes:
callgraphId,subjectKey,casUri,eventCount.
- Category
6 · References
docs/reachability/DELIVERY_GUIDE.mddocs/reachability/function-level-evidence.mddocs/replay/DETERMINISTIC_REPLAY.mddocs/modules/platform/architecture-overview.md§5 (Replay CAS)docs/runbooks/replay_ops.md
Update this runbook whenever endpoints, retention knobs, or CAS layouts change.