ops: add mock-ready VEX/Vuln runbooks

This commit is contained in:
StellaOps Bot
2025-12-07 00:09:24 +00:00
parent e0f6efecce
commit 8a72779c16
5 changed files with 80 additions and 33 deletions

View File

@@ -1,15 +1,35 @@
# VEX Ops Runbook — Draft Skeleton (2025-12-05 UTC)
# VEX Ops Runbook (dev-mock ready)
Status: draft placeholder. Inputs pending: DevOps rollout plan for signatures/ops.
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production rollouts wait on policy/VEX final digests.
## Recompute Storms
- Steps to mitigate; throttling knobs (to fill).
## Pre-flight (dev vs. prod)
1) Release manifest guard
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once VEX digests land.
2) Render plan
- Helm (mock overlay): `helm template vex-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vex-mock.yaml`
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/vex-compose.yaml`
3) Backups (when touching prod data) — not required for mock, but in prod take Mongo snapshots for issuer-directory and VEX state before rollout.
## Mapping Failures
- Triage steps; retry/backfill guidance.
## Deploy (mock path)
- Helm dry-run already covers structural checks. To apply in a dev cluster: `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
- Observe VEX Lens pod logs: `kubectl logs deploy/vex-lens -n stellaops --tail=200 -f`.
- Issuer Directory seed: ensure `issuer-directory-config` ConfigMap includes `csaf-publishers.json`; mock overlay already mounts default seed.
## Signature Errors
- Diagnosis workflow; key rotation checks.
## Rollback
- Helm: `helm rollback stellaops 1` (choose previous revision). Mock overlay uses `stellaops.dev/mock: "true"` annotations; safe to tear down after tests.
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
## Troubleshooting
- Recompute storms: throttle via `VEX_LENS__MAX_PARALLELISM` env (set in values once schema lands); for now scale deployment down to 1 replica to reduce concurrency.
- Mapping failures: capture request/response with `kubectl logs ... --since=10m`; rerun after clearing queue.
- Signature errors: confirm Authority token audience/issuer; mock overlay uses the same auth settings as dev compose.
## Evidence capture
- Save `/tmp/vex-mock.yaml` and `/tmp/vex-compose.yaml` with the manifest used.
- `kubectl get deploy/pod,svc -n stellaops -l app=vex-lens -o yaml > /tmp/vex-live.yaml`.
- Tarball: `tar -czf vex-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/vex-*`.
## Open TODOs
- Add concrete commands and dashboards once rollout plan is delivered.
- Replace mock digests with production pins and add env/schema knobs for VEX Lens once published.
- Add Grafana panels for recompute throughput and mapping failure rate after metrics are exposed.

View File

@@ -1,22 +1,40 @@
# Vuln Ops Runbook (Md.XI draft)
# Vuln / Findings Ops Runbook (dev-mock ready)
> Status: DRAFT — pending policy overlay outputs and Ops scenarios. Keep TODO.
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production steps need final digests and schema from DEPLOY-VULN-29-001.
## Scope
- Operational responses: projector lag, resolver storms, export failures, policy activation steps.
- Findings Ledger + projector + Vuln Explorer API deployment/rollback, plus common incident drills (lag, storms, export failures).
## Dependencies
- Policy overlay outputs; GRAP0101 identifiers; export bundle spec.
## Pre-flight (dev vs. prod)
1) Release manifest guard
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once ledger/api digests land.
2) Render plan
- Helm (mock overlay): `helm template vuln-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vuln-mock.yaml`
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml config > /tmp/vuln-compose.yaml`
3) Backups (prod only)
- Postgres dump for Findings Ledger DB; Mongo dump if projector uses Mongo cache; copy object-store buckets tied to projector anchors.
## Outline
- Projector lag: detection, remediation, replay steps.
- Resolver storms: rate limits, backpressure, queue drains.
- Export failures: bundle retry, manifest verification, hash checks.
- Policy activation: rollout checklist and rollback.
## Deploy (mock path)
- Helm apply (dev): `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
- Compose: quickstart already starts ledger + vuln API with mock pins; validate health at `https://localhost:8443/swagger` (dev certs).
### Hash Capture Checklist (when scenarios scripted)
- `assets/vuln-explorer/runbook-projector-lag.md`
- `assets/vuln-explorer/runbook-resolver-storm.json`
- `assets/vuln-explorer/runbook-export-failure.json`
- `assets/vuln-explorer/runbook-policy-activation.md`
_Last updated: 2025-12-05 (UTC)_
## Incident drills
- Projector lag: scale projector worker up (`kubectl scale deploy/findings-ledger -n stellaops --replicas=2`) then back down; monitor queue length (metric hook pending).
- Resolver storms: temporarily set `ASPNETCORE_THREADPOOL_MINTHREADS` higher or scale API horizontally; in compose, use `docker compose restart vuln-explorer-api` after bumping `VULNEXPLORER__MAX_CONCURRENCY` env once schema lands.
- Export failures: re-run export job after verifying hashes in `deploy/releases/*`; mock path skips signing but still exercises checksum validation via `ops/devops/release/check_release_manifest.py`.
## Rollback
- Helm: `helm rollback stellaops 1` to previous revision.
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
## Evidence capture
- Keep `/tmp/vuln-mock.yaml`, `/tmp/vuln-compose.yaml`, and the release manifest used.
- `kubectl logs deployment/findings-ledger -n stellaops --since=30m > /tmp/ledger-logs.txt`
- DB snapshot checksums if taken; bundle into `vuln-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz`.
## Open TODOs
- Replace mock digests with production pins; add concrete env knobs for projector and API when schemas publish.
- Hook Prometheus counters for projector lag and resolver storm dashboards once metrics are exported.
_Last updated: 2025-12-06 (UTC)_