docs consolidation and others

This commit is contained in:
master
2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions

View File

@@ -0,0 +1,42 @@
# Assistant Ops Runbook (DOCS-AIAI-31-009)
_Updated: 2025-11-24 · Owners: DevOps Guild · Advisory AI Guild · Sprint 0111_
This runbook covers day-2 operations for Advisory AI (web + worker) with emphasis on cache priming, guardrail verification, and outage handling in offline/air-gapped installs.
## 1) Warmup & cache priming
- Ensure Offline Kit fixtures are staged:
- CLI guardrail bundles: `out/console/guardrails/cli-vuln-29-001/`, `out/console/guardrails/cli-vex-30-001/`.
- SBOM context fixtures: copy into `data/advisory-ai/fixtures/sbom/` and record hashes in `SHA256SUMS`.
- Profiles/prompts manifests: ensure `profiles.catalog.json` and `prompts.manifest` hashes match `AdvisoryAI:Provenance` settings.
- Start services and prime caches using cache-only calls:
- `stella advise run summary --advisory-key <id> --timeout 0 --json` (should return cached/empty context, exit 0).
- `stella advise run remediation --advisory-key <id> --artifact-id <id> --timeout 0 --json` (verifies SBOM clamps without executing inference).
## 2) Guardrail & provenance verification
- Run guardrail self-test: `dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj --filter Guardrail` (offline-safe).
- Validate DSSE bundles:
- `slsa-verifier verify-attestation --bundle offline-kit/advisory-ai/provenance/prompts.manifest.dsse --source prompts.manifest`
- `slsa-verifier verify-attestation --bundle offline-kit/advisory-ai/provenance/policy-bundle.intoto.jsonl --digest <policy-digest>`
- Confirm `AdvisoryAI:Guardrails:BlockedPhrases` file matches the hash captured during pack build; diff against `prompts.manifest`.
## 3) Scaling & queue health
- Defaults: queue capacity 1024, dequeue wait 1s (see `docs/modules/policy/guides/assistant-parameters.md`). For bursty tenants, scale workers horizontally before increasing queue size to preserve determinism.
- Metrics to watch: `advisory_ai_queue_depth`, `advisory_ai_latency_seconds`, `advisory_ai_guardrail_blocks_total`.
- If queue depth > 75% for 5 minutes, add one worker pod or increase `Queue:Capacity` by 25% (record change in ops log).
## 4) Outage handling
- **SBOM service down**: switch to `NullSbomContextClient` by unsetting `ADVISORYAI__SBOM__BASEADDRESS`; Advisory AI returns deterministic responses with `sbomSummary` counts at 0.
- **Policy Engine unavailable**: pin last-known `policyVersion`; set `AdvisoryAI:Guardrails:RequireCitations=true` to avoid drift; raise `advisory.remediation.policyHold` in responses.
- **Remote profile disabled**: keep `profile=cloud-openai` blocked; return `advisory.inference.remoteDisabled` with exit code 12 in CLI (see `docs/modules/advisory-ai/guides/cli.md`).
## 5) Air-gap / offline posture
- All external calls are disabled by default. To re-enable remote inference, set `ADVISORYAI__INFERENCE__MODE=Remote` and provide an allowlisted `Remote.BaseAddress`; record the consent in Authority and in the ops log.
- Mirror the guardrail artefact folders and `hashes.sha256` into the Offline Kit; re-run the guardrail self-test after mirroring.
## 6) Checklist before declaring healthy
- [ ] Guardrail self-test suite green.
- [ ] Cache-only CLI probes return 0 with correct `context.planCacheKey`.
- [ ] DSSE verifications logged for prompts, profiles, policy bundle.
- [ ] Metrics scrape shows queue depth < 75% and latency within SLO.
- [ ] Ops log updated with any config overrides (queue size, clamps, remote inference toggles).

View File

@@ -0,0 +1,44 @@
# Concelier Air-Gap Bundle Deploy Runbook (CONCELIER-AIRGAP-56-003)
Status: draft · 2025-11-24
Scope: deploy sealed-mode Concelier evidence bundles using deterministic NDJSON + manifest/entry-trace outputs.
## Inputs
- Bundle: `concelier-airgap.ndjson`
- Manifest: `bundle.manifest.json`
- Entry trace: `bundle.entry-trace.json`
- Hashes: SHA256 recorded in manifest and entry-trace; verify before import.
## Preconditions
- Concelier WebService running with `concelier:features:airgap` enabled.
- No external egress; only local file system allowed for bundle path.
- PostgreSQL indexes applied (`advisory_observations`, `advisory_linksets` tables).
## Steps
1) Transfer bundle directory to offline controller host.
2) Verify hashes:
```bash
sha256sum concelier-airgap.ndjson | diff - <(jq -r .bundleSha256 bundle.manifest.json)
jq -r '.[].sha256' bundle.entry-trace.json | nl | sed 's/\t/:/' > entry.hashes
paste -d' ' <(cut -d: -f1 entry.hashes) <(cut -d: -f2 entry.hashes)
```
3) Import:
```bash
curl -sSf -X POST \
-H 'Content-Type: application/x-ndjson' \
--data-binary @concelier-airgap.ndjson \
http://localhost:5000/internal/airgap/import
```
4) Validate import:
```bash
curl -sSf http://localhost:5000/internal/airgap/status | jq
```
5) Record evidence:
- Store manifest + entry-trace alongside TRX/logs in `artifacts/airgap/<date>/`.
## Determinism notes
- NDJSON ordering is lexicographic; do not re-sort downstream.
- Entry-trace hashes must match post-transfer; any mismatch aborts import.
## Rollback
- Delete imported batch by `bundleId` from `advisory_observations` and `advisory_linksets` (requires DBA approval); rerun import after fixing hash.

View File

@@ -0,0 +1,17 @@
# Incident Mode Runbook (outline)
- Activation, escalation, retention, verification checklist TBD from Ops Guild.
## Pending Inputs
- See sprint SPRINT_0309_0001_0009_docs_tasks_md_ix action tracker; inputs due 2025-12-09..12 from owning guilds.
## Determinism Checklist
- [ ] Hash any inbound assets/payloads; place sums alongside artifacts (e.g., SHA256SUMS in this folder).
- [ ] Keep examples offline-friendly and deterministic (fixed seeds, pinned versions, stable ordering).
- [ ] Note source/approver for any provided captures or schemas.
## Sections to fill (once inputs arrive)
- Activation criteria and toggle steps.
- Escalation paths and roles.
- Retention/cleanup impacts.
- Verification checklist and imposed-rule banner text.

View File

@@ -0,0 +1,50 @@
# Policy Publish / Incident Runbook (draft)
Status: DRAFT — pending policy-registry overlay and production digests. Use for dev/mock exercises until policy release artefacts land.
## Scope
- Policy Registry publish/promote workflows (canary → full rollout).
- Emergency freeze for publish endpoints.
- Evidence capture for audits and postmortems.
## Pre-flight checks (dev vs. prod)
1) Validate manifests
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-stable.yaml --downloads deploy/downloads/manifest.json`
- Confirm `.gitea/workflows/release-manifest-verify.yml` is green for the target manifest change.
2) Render deployment plan (no apply yet)
- Helm: `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-orchestrator.yaml > /tmp/policy-plan.yaml`
- Compose (dev): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/policy-compose.yaml`
3) Backups
- Run `deploy/compose/scripts/backup.sh` before production rollout; archive PostgreSQL/Redis/ObjectStore snapshots to the regulated vault.
## Canary publish → promote
1) Prepare override (temporary)
- Create `deploy/helm/stellaops/values-policy-canary.yaml` with a single replica, reduced worker counts, and an isolated ingress path for policy publish.
- Keep `mock.enabled=false`; only use real digests when available.
2) Dry-run render
- `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --debug --validate > /tmp/policy-canary.yaml`
3) Apply canary
- `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --atomic --timeout 10m`
- Monitor: `kubectl logs deployment/policy-registry -n stellaops --tail=200 -f` and readiness probes; rollback on errors.
4) Promote
- Remove the canary override from the release branch; rerender with `values-prod.yaml` only and redeploy.
- Update the release manifest with final policy digests and rerun `release-manifest-verify`.
## Emergency freeze
- Hard stop publishes while keeping read access
- `kubectl scale deployment/policy-registry -n stellaops --replicas=0`
- Alternatively, apply a NetworkPolicy that blocks ingress to the publish endpoint while leaving status/read paths open.
- Manifest gate
- Remove policy entries from the target `deploy/releases/*.yaml` and rerun `.gitea/workflows/release-manifest-verify.yml` so pipelines fail closed until the issue is cleared.
## Evidence capture
- Release artefacts: copy the exact release manifest, `/tmp/policy-canary.yaml`, and `/tmp/policy-compose.yaml` used for rollout.
- Runtime state: `kubectl get deploy,po,svc -n stellaops -l app=policy-registry -o yaml > /tmp/policy-live.yaml`.
- Logs: `kubectl logs deployment/policy-registry -n stellaops --since=1h > /tmp/policy-logs.txt`.
- Package as `tar -czf policy-incident-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/policy-*.yaml /tmp/policy-*.txt` and store in the audit bucket.
## Open items (blockers)
- Replace mock digests with production pins in `deploy/releases/*` once provided.
- Update the canary override file with the real policy-registry chart values (service/env schema pending from DEPLOY-POLICY-27-001).
- Add Grafana/Prometheus dashboard references once policy metrics are exposed.

View File

@@ -0,0 +1,63 @@
# Reachability Runtime Ingestion Runbook
> **Imposed rule:** Runtime traces must never bypass CAS/DSSE verification; ingest only CAS-addressed NDJSON with hashes logged to Timeline and Evidence Locker.
This runbook guides operators through ingesting runtime reachability evidence (EntryTrace, probes, Signals ingestion) and wiring it into the reachability evidence chain.
## 1. Prerequisites
- Services: `Signals` API, `Zastava Observer` (or other probes), `Evidence Locker`, optional `Attestor` for DSSE.
- Reachability schema: `docs/modules/reach-graph/guides/function-level-evidence.md`, `docs/modules/reach-graph/guides/evidence-schema.md`.
- CAS: configured bucket/path for `cas://reachability/runtime/*` and `.../graphs/*`.
- Time sync: AirGap Time anchor if sealed; otherwise NTP with drift <200ms.
## 2. Ingestion workflow (online)
1) **Capture traces** from Observer/probes NDJSON (`runtime-trace.ndjson.gz`) with `symbol_id`, `purl`, `timestamp`, `pid`, `container`, `count`.
2) **Stage to CAS**: upload file, record `sha256`, store at `cas://reachability/runtime/<sha256>`.
3) **Optionally sign**: wrap CAS digest in DSSE (`stella attest runtime --bundle runtime.dsse.json`).
4) **Ingest** via Signals API:
```sh
curl -H "X-Stella-Tenant: acme" \
-H "Content-Type: application/x-ndjson" \
--data-binary @runtime-trace.ndjson.gz \
"https://signals.example/api/v1/runtime-facts?graph_hash=<graph>"
```
Headers returned: `Content-SHA256`, `X-Graph-Hash`, `X-Ingest-Id`.
5) **Emit timeline**: ensure Timeline event `reach.runtime.ingested` with CAS digest and ingest id.
6) **Verify**: run `stella graph verify --runtime runtime-trace.ndjson.gz --graph <graph_hash>` to confirm edges mapped.
## 3. Ingestion workflow (air-gap)
1) Receive runtime bundle containing `runtime-trace.ndjson.gz`, `manifest.json` (hashes), optional DSSE.
2) Validate hashes against manifest; if present, verify DSSE bundle.
3) Import into CAS path `cas://reachability/runtime/<sha256>` using offline loader.
4) Run Signals offline ingest tool:
```sh
signals-offline ingest-runtime \
--tenant acme \
--graph-hash <graph_hash> \
--runtime runtime-trace.ndjson.gz \
--manifest manifest.json
```
5) Export ingest receipt and add to Evidence Locker; update Timeline when reconnected.
## 4. Checks & alerts
- **Drift**: block ingest if time anchor age > configured budget; surface `staleness_seconds`.
- **Hash mismatch**: fail ingest; write `runtime.ingest.failed` event with reason.
- **Orphan traces**: if no matching `graph_hash`, queue for retry and alert `reachability.orphan_traces` counter.
## 5. Troubleshooting
- **400 Bad Request**: validate NDJSON schema; run `scripts/reachability/validate_runtime_trace.py`.
- **Hash mismatch**: recompute `sha256sum runtime-trace.ndjson.gz`; compare to manifest.
- **Missing symbols**: ensure symbol manifest ingested (see `docs/specs/symbols/SYMBOL_MANIFEST_v1.md`); rerun `stella graph verify`.
- **High drift**: refresh time anchor (AirGap Time service) or resync NTP; retry ingest.
## 6. Artefact checklist
- `runtime-trace.ndjson.gz` (or `.json`), `sha256` recorded.
- Optional `runtime.dsse.json` DSSE bundle.
- Ingest receipt (ingest id, graph hash, CAS digest, tenant).
- Timeline event `reach.runtime.ingested` and Evidence Locker record (bundle + receipt).
## 7. References
- `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
- `docs/modules/reach-graph/guides/function-level-evidence.md`
- `docs/modules/reach-graph/guides/evidence-schema.md`
- `docs/specs/symbols/SYMBOL_MANIFEST_v1.md`

View File

@@ -0,0 +1,96 @@
# Runbook - Replay Operations
> **Audience:** Ops Guild / Evidence Locker Guild / Scanner Guild / Authority/Signer / Attestor
> **Prereqs:** `docs/modules/replay/guides/DETERMINISTIC_REPLAY.md`, `docs/modules/replay/guides/DEVS_GUIDE_REPLAY.md`, `docs/modules/replay/guides/TEST_STRATEGY.md`, `docs/modules/platform/architecture-overview.md`
This runbook governs day-to-day replay operations, retention, and incident handling across online and air-gapped environments. Keep it in sync with the tasks in `docs/implplan/SPRINT_0187_0001_0001_evidence_locker_cli_integration.md`.
---
## 1 Terminology
- **Replay Manifest** - `manifest.json` describing scan inputs, outputs, signatures.
- **Input Bundle** - `inputbundle.tar.zst` containing feeds, policies, tools, env.
- **Output Bundle** - `outputbundle.tar.zst` with SBOM, findings, VEX, logs.
- **DSSE Envelope** - Signed metadata produced by Authority/Signer.
- **RootPack** - Trusted key bundle used to validate DSSE signatures offline.
---
## 2 Normal operations
1. **Ingestion**
- Scanner WebService writes manifest metadata to `replay_runs`.
- Bundles uploaded to CAS (`cas://replay/...`) and mirrored into Evidence Locker (`evidence.replay_bundles`).
- Authority triggers DSSE signing; Attestor optionally anchors to Rekor.
2. **Verification**
- Nightly job runs `stella verify` on the latest N replay manifests per tenant.
- Metrics `replay_verify_total{result}`, `replay_bundle_size_bytes` recorded in Telemetry Stack (see `docs/modules/telemetry/architecture.md`).
- Failures alert `#ops-replay` via PagerDuty with runbook link.
3. **Retention**
- Hot CAS retention: 180 days (configurable per tenant). Cron job `replay-retention` prunes expired digests and writes audit entries.
- Cold storage (Evidence Locker): 2 years; legal holds extend via `/evidence/holds`. Ensure holds recorded in `timeline.events` with type `replay.hold.created`.
- Retention declaration: validate against `docs/schemas/replay-retention.schema.json` (frozen 2025-12-10). Include `retention_policy_id`, `tenant_id`, `bundle_type`, `retention_days`, `legal_hold`, `purge_after`, `checksum`, `created_at`. Audit checksum via DSSE envelope when persisting.
4. **Access control**
- Only service identities with `replay:read` scope may fetch bundles. CLI requires device or client credential flow with DPoP.
---
## 3 Incident response (Replay Integrity)
| Step | Action | Owner | Notes |
|------|--------|-------|-------|
| 1 | Page Ops via `replay_verify_total{result="failed"}` alert | Observability | Include scan id, tenant, failure codes |
| 2 | Lock affected bundles (`POST /evidence/holds`) | Evidence Locker | Reference incident ticket |
| 3 | Re-run `stella verify` with `--explain` to gather diffs | Scanner Guild | Attach diff JSON to incident |
| 4 | Check Rekor inclusion proofs (`stella verify --ledger`) | Attestor | Flag if ledger mismatch or stale |
| 5 | If tool hash drift -> coordinate Signer for rotation | Authority/Signer | Rotate DSSE profile, update RootPack |
| 6 | Update incident timeline (`docs/operations/runbooks/replay_ops.md` -> Incident Log) | Ops Guild | Record timestamps and decisions |
| 7 | Close hold once resolved, publish postmortem | Ops + Docs | Postmortem must reference replay spec sections |
---
## 4 Air-gapped workflow
1. Receive Offline Kit bundle containing:
- `offline/replay/<scan-id>/manifest.json`
- Bundles + DSSE signatures
- RootPack snapshot
2. Run `stella replay manifest.json --strict --offline` using local CLI.
3. Load feed/policy snapshots from kit; never hit external networks.
4. Store verification logs under `ops/offline/replay/<scan-id>/`.
5. Sync results back to Evidence Locker once connectivity restored.
---
## 5 Maintenance checklist
- [ ] RootPack rotated quarterly; CLI/Evidence Locker updated with new fingerprints.
- [ ] CAS retention job executed successfully in the past 24 hours.
- [ ] Replay verification metrics present in dashboards (x64 + arm64 lanes).
- [ ] Runbook incident log updated (see section 6) for the last drill.
- [ ] Offline kit instructions verified against current CLI version.
---
## 6 Incident log
| Date (UTC) | Incident ID | Tenant | Summary | Follow-up |
|------------|-------------|--------|---------|-----------|
| _TBD_ | | | | |
---
## 7 References
- `docs/modules/replay/guides/DETERMINISTIC_REPLAY.md`
- `docs/modules/replay/guides/DEVS_GUIDE_REPLAY.md`
- `docs/modules/replay/guides/TEST_STRATEGY.md`
- `docs/modules/platform/architecture-overview.md` section 5
- `docs/modules/evidence-locker/architecture.md`
- `docs/modules/telemetry/architecture.md`
- `docs/implplan/SPRINT_0187_0001_0001_evidence_locker_cli_integration.md`
---
*Created: 2025-11-03 - Update alongside replay task status changes.*

View File

@@ -0,0 +1,35 @@
# VEX Ops Runbook (dev-mock ready)
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production rollouts wait on policy/VEX final digests.
## Pre-flight (dev vs. prod)
1) Release manifest guard
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once VEX digests land.
2) Render plan
- Helm (mock overlay): `helm template vex-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vex-mock.yaml`
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/vex-compose.yaml`
3) Backups (when touching prod data) — not required for mock, but in prod take PostgreSQL snapshots for issuer-directory and VEX state before rollout.
## Deploy (mock path)
- Helm dry-run already covers structural checks. To apply in a dev cluster: `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
- Observe VEX Lens pod logs: `kubectl logs deploy/vex-lens -n stellaops --tail=200 -f`.
- Issuer Directory seed: ensure `issuer-directory-config` ConfigMap includes `csaf-publishers.json`; mock overlay already mounts default seed.
## Rollback
- Helm: `helm rollback stellaops 1` (choose previous revision). Mock overlay uses `stellaops.dev/mock: "true"` annotations; safe to tear down after tests.
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
## Troubleshooting
- Recompute storms: throttle via `VEX_LENS__MAX_PARALLELISM` env (set in values once schema lands); for now scale deployment down to 1 replica to reduce concurrency.
- Mapping failures: capture request/response with `kubectl logs ... --since=10m`; rerun after clearing queue.
- Signature errors: confirm Authority token audience/issuer; mock overlay uses the same auth settings as dev compose.
## Evidence capture
- Save `/tmp/vex-mock.yaml` and `/tmp/vex-compose.yaml` with the manifest used.
- `kubectl get deploy/pod,svc -n stellaops -l app=vex-lens -o yaml > /tmp/vex-live.yaml`.
- Tarball: `tar -czf vex-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/vex-*`.
## Open TODOs
- Replace mock digests with production pins and add env/schema knobs for VEX Lens once published.
- Add Grafana panels for recompute throughput and mapping failure rate after metrics are exposed.

View File

@@ -0,0 +1,40 @@
# Vuln / Findings Ops Runbook (dev-mock ready)
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production steps need final digests and schema from DEPLOY-VULN-29-001.
## Scope
- Findings Ledger + projector + Vuln Explorer API deployment/rollback, plus common incident drills (lag, storms, export failures).
## Pre-flight (dev vs. prod)
1) Release manifest guard
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once ledger/api digests land.
2) Render plan
- Helm (mock overlay): `helm template vuln-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vuln-mock.yaml`
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml config > /tmp/vuln-compose.yaml`
3) Backups (prod only)
- PostgreSQL dump for Findings Ledger DB; copy object-store buckets tied to projector anchors.
## Deploy (mock path)
- Helm apply (dev): `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
- Compose: quickstart already starts ledger + vuln API with mock pins; validate health at `https://localhost:8443/swagger` (dev certs).
## Incident drills
- Projector lag: scale projector worker up (`kubectl scale deploy/findings-ledger -n stellaops --replicas=2`) then back down; monitor queue length (metric hook pending).
- Resolver storms: temporarily set `ASPNETCORE_THREADPOOL_MINTHREADS` higher or scale API horizontally; in compose, use `docker compose restart vuln-explorer-api` after bumping `VULNEXPLORER__MAX_CONCURRENCY` env once schema lands.
- Export failures: re-run export job after verifying hashes in `deploy/releases/*`; mock path skips signing but still exercises checksum validation via `ops/devops/release/check_release_manifest.py`.
## Rollback
- Helm: `helm rollback stellaops 1` to previous revision.
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
## Evidence capture
- Keep `/tmp/vuln-mock.yaml`, `/tmp/vuln-compose.yaml`, and the release manifest used.
- `kubectl logs deployment/findings-ledger -n stellaops --since=30m > /tmp/ledger-logs.txt`
- DB snapshot checksums if taken; bundle into `vuln-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz`.
## Open TODOs
- Replace mock digests with production pins; add concrete env knobs for projector and API when schemas publish.
- Hook Prometheus counters for projector lag and resolver storm dashboards once metrics are exported.
_Last updated: 2025-12-06 (UTC)_