up
Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
This commit is contained in:
@@ -1,95 +1,63 @@
|
||||
# Runbook: Runtime Reachability Facts (Zastava → Signals)
|
||||
# Reachability Runtime Ingestion Runbook
|
||||
|
||||
## Goal
|
||||
Stream runtime symbol evidence from Zastava Observer to Signals in NDJSON batches that align with the runtime/static union schema, stay deterministic, and are replayable.
|
||||
> **Imposed rule:** Runtime traces must never bypass CAS/DSSE verification; ingest only CAS-addressed NDJSON with hashes logged to Timeline and Evidence Locker.
|
||||
|
||||
## Endpoints
|
||||
- Signals structured ingest: `POST /signals/runtime-facts`
|
||||
- Signals NDJSON ingest: `POST /signals/runtime-facts/ndjson`
|
||||
- Headers: `Content-Encoding: gzip` (optional), `Content-Type: application/x-ndjson`
|
||||
- Query/header metadata: `callgraphId` (required), `scanId|imageDigest|component+version`, optional `source`
|
||||
This runbook guides operators through ingesting runtime reachability evidence (EntryTrace, probes, Signals ingestion) and wiring it into the reachability evidence chain.
|
||||
|
||||
## NDJSON event shape (one per line)
|
||||
```json
|
||||
{
|
||||
"symbolId": "pkg:python/django.views:View.as_view",
|
||||
"codeId": "buildid-abc123",
|
||||
"purl": "pkg:pypi/django@4.2.7",
|
||||
"loaderBase": "0x7f23c01000",
|
||||
"processId": 214,
|
||||
"processName": "uwsgi",
|
||||
"containerId": "c123",
|
||||
"socketAddress": "10.0.0.5:8443",
|
||||
"hitCount": 3,
|
||||
"observedAt": "2025-11-26T12:00:00Z",
|
||||
"metadata": { "pid": "214" }
|
||||
}
|
||||
```
|
||||
## 1. Prerequisites
|
||||
- Services: `Signals` API, `Zastava Observer` (or other probes), `Evidence Locker`, optional `Attestor` for DSSE.
|
||||
- Reachability schema: `docs/reachability/function-level-evidence.md`, `docs/reachability/evidence-schema.md`.
|
||||
- CAS: configured bucket/path for `cas://reachability/runtime/*` and `.../graphs/*`.
|
||||
- Time sync: AirGap Time anchor if sealed; otherwise NTP with drift <200ms.
|
||||
|
||||
Required: `symbolId`, `hitCount`; `callgraphId` is provided via query/header metadata. Optional fields shown for correlation.
|
||||
## 2. Ingestion workflow (online)
|
||||
1) **Capture traces** from Observer/probes → NDJSON (`runtime-trace.ndjson.gz`) with `symbol_id`, `purl`, `timestamp`, `pid`, `container`, `count`.
|
||||
2) **Stage to CAS**: upload file, record `sha256`, store at `cas://reachability/runtime/<sha256>`.
|
||||
3) **Optionally sign**: wrap CAS digest in DSSE (`stella attest runtime --bundle runtime.dsse.json`).
|
||||
4) **Ingest** via Signals API:
|
||||
```sh
|
||||
curl -H "X-Stella-Tenant: acme" \
|
||||
-H "Content-Type: application/x-ndjson" \
|
||||
--data-binary @runtime-trace.ndjson.gz \
|
||||
"https://signals.example/api/v1/runtime-facts?graph_hash=<graph>"
|
||||
```
|
||||
Headers returned: `Content-SHA256`, `X-Graph-Hash`, `X-Ingest-Id`.
|
||||
5) **Emit timeline**: ensure Timeline event `reach.runtime.ingested` with CAS digest and ingest id.
|
||||
6) **Verify**: run `stella graph verify --runtime runtime-trace.ndjson.gz --graph <graph_hash>` to confirm edges mapped.
|
||||
|
||||
## Batch rules
|
||||
- NDJSON MUST NOT be empty; empty streams are rejected.
|
||||
- Compress with gzip when large; maintain stable line ordering.
|
||||
- Use UTC timestamps (ISO-8601 `observedAt`).
|
||||
- Avoid PII; redact process/user info before send.
|
||||
## 3. Ingestion workflow (air-gap)
|
||||
1) Receive runtime bundle containing `runtime-trace.ndjson.gz`, `manifest.json` (hashes), optional DSSE.
|
||||
2) Validate hashes against manifest; if present, verify DSSE bundle.
|
||||
3) Import into CAS path `cas://reachability/runtime/<sha256>` using offline loader.
|
||||
4) Run Signals offline ingest tool:
|
||||
```sh
|
||||
signals-offline ingest-runtime \
|
||||
--tenant acme \
|
||||
--graph-hash <graph_hash> \
|
||||
--runtime runtime-trace.ndjson.gz \
|
||||
--manifest manifest.json
|
||||
```
|
||||
5) Export ingest receipt and add to Evidence Locker; update Timeline when reconnected.
|
||||
|
||||
## CAS alignment
|
||||
- When runtime trace bundles are produced, store under `cas://runtime_traces/<hh>/<sha>.tar.zst` and include `meta.json` with analysisId.
|
||||
- Pass the same `analysisId` in `X-Analysis-Id` (if present) when uploading union bundles so replay manifests can link graphs+traces.
|
||||
## 4. Checks & alerts
|
||||
- **Drift**: block ingest if time anchor age > configured budget; surface `staleness_seconds`.
|
||||
- **Hash mismatch**: fail ingest; write `runtime.ingest.failed` event with reason.
|
||||
- **Orphan traces**: if no matching `graph_hash`, queue for retry and alert `reachability.orphan_traces` counter.
|
||||
|
||||
## Errors & remediation
|
||||
- `400 callgraphId is required` → set `callgraphId` header/query.
|
||||
- `400 runtime fact stream was empty` → ensure NDJSON has events.
|
||||
- `400 Subject must include scanId/imageDigest/component+version` → populate subject metadata.
|
||||
## 5. Troubleshooting
|
||||
- **400 Bad Request**: validate NDJSON schema; run `scripts/reachability/validate_runtime_trace.py`.
|
||||
- **Hash mismatch**: recompute `sha256sum runtime-trace.ndjson.gz`; compare to manifest.
|
||||
- **Missing symbols**: ensure symbol manifest ingested (see `docs/specs/symbols/SYMBOL_MANIFEST_v1.md`); rerun `stella graph verify`.
|
||||
- **High drift**: refresh time anchor (AirGap Time service) or resync NTP; retry ingest.
|
||||
|
||||
## Determinism checklist
|
||||
- Stable ordering of NDJSON lines.
|
||||
- No host-dependent paths; only IDs/digests.
|
||||
- Fixed gzip level if used (suggest 6) to aid reproducibility.
|
||||
## 6. Artefact checklist
|
||||
- `runtime-trace.ndjson.gz` (or `.json`), `sha256` recorded.
|
||||
- Optional `runtime.dsse.json` DSSE bundle.
|
||||
- Ingest receipt (ingest id, graph hash, CAS digest, tenant).
|
||||
- Timeline event `reach.runtime.ingested` and Evidence Locker record (bundle + receipt).
|
||||
|
||||
## Zastava Observer setup (runtime sampler)
|
||||
- **Sampling mode:** deterministic EntryTrace sampler; default 1:1 (no drop) for pilot. Enable rate/CPU guard: `Sampler:MaxEventsPerSecond` (default 500), `Sampler:MaxCpuPercent` (default 35). When rates are exceeded, emit `sampler.dropped` counters with drop reason `rate_limit`/`cpu_guard`.
|
||||
- **Symbol capture:** enable build-id collection (`SymbolCapture:CollectBuildIds=true`) and loader base addresses (`SymbolCapture:EmitLoaderBase=true`) to match static graphs.
|
||||
- **Batching:** buffer up to 1,000 events or 2s, whichever comes first (`Ingest:BatchSize`, `Ingest:FlushIntervalMs`). Batches are sorted by `observedAt` before send to keep deterministic order.
|
||||
- **Transport:** NDJSON POST to Signals `/signals/runtime-facts/ndjson` with headers `X-Callgraph-Id`, optional `X-Analysis-Id`. Set `Content-Encoding: gzip` when batches exceed 64 KiB.
|
||||
- **CAS traces (optional):** if EntryTrace raw traces are persisted, package as `cas://runtime_traces/<hh>/<sha>.tar.zst` with `meta.json` containing `analysisId`, `nodeCount`, `edgeCount`, `traceVersion`. Include the CAS URI in `metadata.casUri` on each NDJSON event.
|
||||
- **Security/offline:** disable egress by default; allowlist only the Signals host. TLS must be enabled; supply client certs per platform runbook if required. No PID/user names are emitted—only digests/IDs.
|
||||
|
||||
### Example appsettings (Observer)
|
||||
```json
|
||||
{
|
||||
"Sampler": {
|
||||
"MaxEventsPerSecond": 500,
|
||||
"MaxCpuPercent": 35
|
||||
},
|
||||
"SymbolCapture": {
|
||||
"CollectBuildIds": true,
|
||||
"EmitLoaderBase": true
|
||||
},
|
||||
"Ingest": {
|
||||
"BatchSize": 1000,
|
||||
"FlushIntervalMs": 2000,
|
||||
"Endpoint": "https://signals.local/signals/runtime-facts/ndjson",
|
||||
"Headers": {
|
||||
"X-Callgraph-Id": "cg-123"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Operational steps
|
||||
1) Enable EntryTrace sampler in Zastava Observer with the config above; verify `sampler.dropped` stays at 0 during pilot.
|
||||
2) Run a 5-minute capture and send NDJSON to a staging Signals instance using the smoke test; confirm 202 and CAS pointers recorded.
|
||||
3) Correlate runtime facts to static graphs by callgraphId in Signals; ensure counts match sampler totals.
|
||||
4) Promote config to prod/offline bundle; freeze config hashes for replay.
|
||||
|
||||
## Smoke test
|
||||
```bash
|
||||
cat events.ndjson | gzip -c | \
|
||||
curl -X POST "https://signals.local/signals/runtime-facts/ndjson?callgraphId=cg-123&component=web&version=1.0.0" \
|
||||
-H "Content-Type: application/x-ndjson" \
|
||||
-H "Content-Encoding: gzip" \
|
||||
--data-binary @-
|
||||
```
|
||||
Expect 202 Accepted with SubjectKey in response; Signals will recompute reachability and emit `signals.fact.updated@v1`.
|
||||
## 7. References
|
||||
- `docs/reachability/DELIVERY_GUIDE.md`
|
||||
- `docs/reachability/function-level-evidence.md`
|
||||
- `docs/reachability/evidence-schema.md`
|
||||
- `docs/specs/symbols/SYMBOL_MANIFEST_v1.md`
|
||||
|
||||
Reference in New Issue
Block a user