ops: add mock-ready VEX/Vuln runbooks
This commit is contained in:
@@ -1,15 +1,35 @@
|
||||
# VEX Ops Runbook — Draft Skeleton (2025-12-05 UTC)
|
||||
# VEX Ops Runbook (dev-mock ready)
|
||||
|
||||
Status: draft placeholder. Inputs pending: DevOps rollout plan for signatures/ops.
|
||||
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production rollouts wait on policy/VEX final digests.
|
||||
|
||||
## Recompute Storms
|
||||
- Steps to mitigate; throttling knobs (to fill).
|
||||
## Pre-flight (dev vs. prod)
|
||||
1) Release manifest guard
|
||||
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
|
||||
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once VEX digests land.
|
||||
2) Render plan
|
||||
- Helm (mock overlay): `helm template vex-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vex-mock.yaml`
|
||||
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/vex-compose.yaml`
|
||||
3) Backups (when touching prod data) — not required for mock, but in prod take Mongo snapshots for issuer-directory and VEX state before rollout.
|
||||
|
||||
## Mapping Failures
|
||||
- Triage steps; retry/backfill guidance.
|
||||
## Deploy (mock path)
|
||||
- Helm dry-run already covers structural checks. To apply in a dev cluster: `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
|
||||
- Observe VEX Lens pod logs: `kubectl logs deploy/vex-lens -n stellaops --tail=200 -f`.
|
||||
- Issuer Directory seed: ensure `issuer-directory-config` ConfigMap includes `csaf-publishers.json`; mock overlay already mounts default seed.
|
||||
|
||||
## Signature Errors
|
||||
- Diagnosis workflow; key rotation checks.
|
||||
## Rollback
|
||||
- Helm: `helm rollback stellaops 1` (choose previous revision). Mock overlay uses `stellaops.dev/mock: "true"` annotations; safe to tear down after tests.
|
||||
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
|
||||
|
||||
## Troubleshooting
|
||||
- Recompute storms: throttle via `VEX_LENS__MAX_PARALLELISM` env (set in values once schema lands); for now scale deployment down to 1 replica to reduce concurrency.
|
||||
- Mapping failures: capture request/response with `kubectl logs ... --since=10m`; rerun after clearing queue.
|
||||
- Signature errors: confirm Authority token audience/issuer; mock overlay uses the same auth settings as dev compose.
|
||||
|
||||
## Evidence capture
|
||||
- Save `/tmp/vex-mock.yaml` and `/tmp/vex-compose.yaml` with the manifest used.
|
||||
- `kubectl get deploy/pod,svc -n stellaops -l app=vex-lens -o yaml > /tmp/vex-live.yaml`.
|
||||
- Tarball: `tar -czf vex-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/vex-*`.
|
||||
|
||||
## Open TODOs
|
||||
- Add concrete commands and dashboards once rollout plan is delivered.
|
||||
- Replace mock digests with production pins and add env/schema knobs for VEX Lens once published.
|
||||
- Add Grafana panels for recompute throughput and mapping failure rate after metrics are exposed.
|
||||
|
||||
@@ -1,22 +1,40 @@
|
||||
# Vuln Ops Runbook (Md.XI draft)
|
||||
# Vuln / Findings Ops Runbook (dev-mock ready)
|
||||
|
||||
> Status: DRAFT — pending policy overlay outputs and Ops scenarios. Keep TODO.
|
||||
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production steps need final digests and schema from DEPLOY-VULN-29-001.
|
||||
|
||||
## Scope
|
||||
- Operational responses: projector lag, resolver storms, export failures, policy activation steps.
|
||||
- Findings Ledger + projector + Vuln Explorer API deployment/rollback, plus common incident drills (lag, storms, export failures).
|
||||
|
||||
## Dependencies
|
||||
- Policy overlay outputs; GRAP0101 identifiers; export bundle spec.
|
||||
## Pre-flight (dev vs. prod)
|
||||
1) Release manifest guard
|
||||
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
|
||||
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once ledger/api digests land.
|
||||
2) Render plan
|
||||
- Helm (mock overlay): `helm template vuln-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vuln-mock.yaml`
|
||||
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml config > /tmp/vuln-compose.yaml`
|
||||
3) Backups (prod only)
|
||||
- Postgres dump for Findings Ledger DB; Mongo dump if projector uses Mongo cache; copy object-store buckets tied to projector anchors.
|
||||
|
||||
## Outline
|
||||
- Projector lag: detection, remediation, replay steps.
|
||||
- Resolver storms: rate limits, backpressure, queue drains.
|
||||
- Export failures: bundle retry, manifest verification, hash checks.
|
||||
- Policy activation: rollout checklist and rollback.
|
||||
## Deploy (mock path)
|
||||
- Helm apply (dev): `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
|
||||
- Compose: quickstart already starts ledger + vuln API with mock pins; validate health at `https://localhost:8443/swagger` (dev certs).
|
||||
|
||||
### Hash Capture Checklist (when scenarios scripted)
|
||||
- `assets/vuln-explorer/runbook-projector-lag.md`
|
||||
- `assets/vuln-explorer/runbook-resolver-storm.json`
|
||||
- `assets/vuln-explorer/runbook-export-failure.json`
|
||||
- `assets/vuln-explorer/runbook-policy-activation.md`
|
||||
_Last updated: 2025-12-05 (UTC)_
|
||||
## Incident drills
|
||||
- Projector lag: scale projector worker up (`kubectl scale deploy/findings-ledger -n stellaops --replicas=2`) then back down; monitor queue length (metric hook pending).
|
||||
- Resolver storms: temporarily set `ASPNETCORE_THREADPOOL_MINTHREADS` higher or scale API horizontally; in compose, use `docker compose restart vuln-explorer-api` after bumping `VULNEXPLORER__MAX_CONCURRENCY` env once schema lands.
|
||||
- Export failures: re-run export job after verifying hashes in `deploy/releases/*`; mock path skips signing but still exercises checksum validation via `ops/devops/release/check_release_manifest.py`.
|
||||
|
||||
## Rollback
|
||||
- Helm: `helm rollback stellaops 1` to previous revision.
|
||||
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
|
||||
|
||||
## Evidence capture
|
||||
- Keep `/tmp/vuln-mock.yaml`, `/tmp/vuln-compose.yaml`, and the release manifest used.
|
||||
- `kubectl logs deployment/findings-ledger -n stellaops --since=30m > /tmp/ledger-logs.txt`
|
||||
- DB snapshot checksums if taken; bundle into `vuln-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz`.
|
||||
|
||||
## Open TODOs
|
||||
- Replace mock digests with production pins; add concrete env knobs for projector and API when schemas publish.
|
||||
- Hook Prometheus counters for projector lag and resolver storm dashboards once metrics are exported.
|
||||
|
||||
_Last updated: 2025-12-06 (UTC)_
|
||||
|
||||
Reference in New Issue
Block a user