Files
git.stella-ops.org/docs/runbooks/vex-ops.md
2025-12-07 00:09:24 +00:00

36 lines
2.6 KiB
Markdown

# VEX Ops Runbook (dev-mock ready)
Status: DRAFT (2025-12-06 UTC). Safe for dev/mock exercises; production rollouts wait on policy/VEX final digests.
## Pre-flight (dev vs. prod)
1) Release manifest guard
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: rerun against `deploy/releases/2025.09-stable.yaml` once VEX digests land.
2) Render plan
- Helm (mock overlay): `helm template vex-mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --debug --validate > /tmp/vex-mock.yaml`
- Compose (dev with overlay): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/vex-compose.yaml`
3) Backups (when touching prod data) — not required for mock, but in prod take Mongo snapshots for issuer-directory and VEX state before rollout.
## Deploy (mock path)
- Helm dry-run already covers structural checks. To apply in a dev cluster: `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml --atomic --timeout 10m`.
- Observe VEX Lens pod logs: `kubectl logs deploy/vex-lens -n stellaops --tail=200 -f`.
- Issuer Directory seed: ensure `issuer-directory-config` ConfigMap includes `csaf-publishers.json`; mock overlay already mounts default seed.
## Rollback
- Helm: `helm rollback stellaops 1` (choose previous revision). Mock overlay uses `stellaops.dev/mock: "true"` annotations; safe to tear down after tests.
- Compose: `docker compose --env-file env/dev.env.example -f docker-compose.dev.yaml -f docker-compose.mock.yaml down`.
## Troubleshooting
- Recompute storms: throttle via `VEX_LENS__MAX_PARALLELISM` env (set in values once schema lands); for now scale deployment down to 1 replica to reduce concurrency.
- Mapping failures: capture request/response with `kubectl logs ... --since=10m`; rerun after clearing queue.
- Signature errors: confirm Authority token audience/issuer; mock overlay uses the same auth settings as dev compose.
## Evidence capture
- Save `/tmp/vex-mock.yaml` and `/tmp/vex-compose.yaml` with the manifest used.
- `kubectl get deploy/pod,svc -n stellaops -l app=vex-lens -o yaml > /tmp/vex-live.yaml`.
- Tarball: `tar -czf vex-evidence-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/vex-*`.
## Open TODOs
- Replace mock digests with production pins and add env/schema knobs for VEX Lens once published.
- Add Grafana panels for recompute throughput and mapping failure rate after metrics are exposed.