ops: add policy incident runbook draft
This commit is contained in:
@@ -20,7 +20,7 @@
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | DEPLOY-POLICY-27-002 | TODO | Depends on DEPLOY-POLICY-27-001 | Deployment Guild, Policy Guild | Document rollout/rollback playbooks for policy publish/promote (canary, emergency freeze, evidence retrieval) under `docs/runbooks/policy-incident.md` |
|
||||
| 1 | DEPLOY-POLICY-27-002 | DOING (draft 2025-12-06) | Pending policy overlay/digests from DEPLOY-POLICY-27-001; draft runbook at `docs/runbooks/policy-incident.md` | Deployment Guild, Policy Guild | Document rollout/rollback playbooks for policy publish/promote (canary, emergency freeze, evidence retrieval) under `docs/runbooks/policy-incident.md` |
|
||||
| 2 | DEPLOY-VEX-30-001 | DOING (dev-mock digests 2025-12-06) | Mock digests published in `deploy/releases/2025.09-mock-dev.yaml`; production still awaits real artefacts | Deployment Guild, VEX Lens Guild | Provide Helm/Compose overlays, scaling defaults, offline kit instructions for VEX Lens service |
|
||||
| 3 | DEPLOY-VEX-30-002 | DOING (dev-mock digests 2025-12-06) | Depends on DEPLOY-VEX-30-001 | Deployment Guild, Issuer Directory Guild | Package Issuer Directory deployment manifests, backups, security hardening guidance |
|
||||
| 4 | DEPLOY-VULN-29-001 | DOING (dev-mock digests 2025-12-06) | Mock digests available in `deploy/releases/2025.09-mock-dev.yaml`; production pins pending | Deployment Guild, Findings Ledger Guild | Helm/Compose overlays for Findings Ledger + projector incl. DB migrations, Merkle anchor jobs, scaling guidance |
|
||||
@@ -33,6 +33,7 @@
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-06 | Drafted policy incident runbook (`docs/runbooks/policy-incident.md`); set DEPLOY-POLICY-27-002 to DOING pending policy overlay/digests. | Deployment Guild |
|
||||
| 2025-12-06 | Header normalised to standard template; no content/status changes. | Project Mgmt |
|
||||
| 2025-12-06 | Seeded mock dev release manifest (`deploy/releases/2025.09-mock-dev.yaml`) covering VEX Lens and Findings/Vuln stacks; tasks moved to DOING (dev-mock) for development packaging. Production release still awaits real digests. | Deployment Guild |
|
||||
| 2025-12-06 | Added mock downloads manifest at `deploy/downloads/manifest.json` to unblock dev/test; production still requires signed console artefacts. | Deployment Guild |
|
||||
@@ -53,6 +54,7 @@
|
||||
- Risk: Offline kit instructions must avoid external image pulls; ensure pinned digests and air-gap copy steps.
|
||||
- VEX Lens and Findings/Vuln overlays blocked: release digests absent from `deploy/releases/2025.09-stable.yaml`; cannot pin images or publish offline bundles until artefacts land.
|
||||
- Console downloads manifest blocked: console images/bundles not published, so `deploy/downloads/manifest.json` cannot be signed/updated.
|
||||
- Policy incident runbook is draft-only until DEPLOY-POLICY-27-001 delivers policy overlay schema and production digests.
|
||||
|
||||
## Next Checkpoints
|
||||
| Date (UTC) | Session / Owner | Target outcome | Fallback / Escalation |
|
||||
|
||||
@@ -539,7 +539,7 @@
|
||||
| DEPLOY-PACKS-42-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Packs Registry Guild | ops/deployment | Provide deployment manifests for packs-registry and task-runner services, including Helm/Compose overlays, scaling defaults, and secret templates. | Wait for pack registry schema | AGDP0101 |
|
||||
| DEPLOY-PACKS-43-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Task Runner Guild | ops/deployment | Ship remote Task Runner worker profiles, object storage bootstrap, approval workflow integration, and Offline Kit packaging instructions. Dependencies: DEPLOY-PACKS-42-001. | Needs #7 artifacts | AGDP0101 |
|
||||
| DEPLOY-POLICY-27-001 | DOING (dev-mock 2025-12-06) | 2025-12-05 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Policy Registry Guild | ops/deployment | Produce Helm/Compose overlays for Policy Registry + simulation workers (migrations, buckets, signing keys, tenancy defaults). | WEPO0101 | DVPL0105 |
|
||||
| DEPLOY-POLICY-27-002 | TODO | | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Document rollout/rollback playbooks for policy publish/promote (canary strategy, emergency freeze, evidence retrieval). | DEPLOY-POLICY-27-001 | DVPL0105 |
|
||||
| DEPLOY-POLICY-27-002 | DOING (draft 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Drafted `docs/runbooks/policy-incident.md` (publish/promote, freeze, evidence); awaiting policy overlay schema/digests from DEPLOY-POLICY-27-001. | DEPLOY-POLICY-27-001 | DVPL0105 |
|
||||
| DEPLOY-VEX-30-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + VEX Lens Guild | ops/deployment | Provide Helm/Compose overlays, scaling defaults, and offline kit instructions for VEX Lens service. | Wait for CCWO0101 schema | DVPL0101 |
|
||||
| DEPLOY-VEX-30-002 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild | ops/deployment | Package Issuer Directory deployment manifests, backups, and security hardening guidance. Dependencies: DEPLOY-VEX-30-001. | Depends on #5 | DVPL0101 |
|
||||
| DEPLOY-VULN-29-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + Vuln Guild | ops/deployment | Produce Helm/Compose overlays for Findings Ledger + projector, including DB migrations, Merkle anchor jobs, and scaling guidance. | Needs CCWO0101 | DVPL0101 |
|
||||
@@ -2753,7 +2753,7 @@
|
||||
| DEPLOY-PACKS-42-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Packs Registry Guild | ops/deployment | Provide deployment manifests for packs-registry and task-runner services, including Helm/Compose overlays, scaling defaults, and secret templates. | Wait for pack registry schema | AGDP0101 |
|
||||
| DEPLOY-PACKS-43-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Task Runner Guild | ops/deployment | Ship remote Task Runner worker profiles, object storage bootstrap, approval workflow integration, and Offline Kit packaging instructions. Dependencies: DEPLOY-PACKS-42-001. | Needs #7 artifacts | AGDP0101 |
|
||||
| DEPLOY-POLICY-27-001 | DOING (dev-mock 2025-12-06) | 2025-12-05 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Policy Registry Guild | ops/deployment | Produce Helm/Compose overlays for Policy Registry + simulation workers, including Mongo migrations, object storage buckets, signing key secrets, and tenancy defaults. | Needs registry schema + secrets | AGDP0101 |
|
||||
| DEPLOY-POLICY-27-002 | TODO | | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Document rollout/rollback playbooks for policy publish/promote (canary strategy, emergency freeze toggle, evidence retrieval) under `/docs/runbooks/policy-incident.md`. Dependencies: DEPLOY-POLICY-27-001. | Depends on 27-001 | AGDP0101 |
|
||||
| DEPLOY-POLICY-27-002 | DOING (draft 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Drafted `docs/runbooks/policy-incident.md` (publish/promote, freeze, evidence); finalize once DEPLOY-POLICY-27-001 ships schema/digests. | Depends on 27-001 | AGDP0101 |
|
||||
| DEPLOY-VEX-30-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + VEX Lens Guild | ops/deployment | Provide Helm/Compose overlays, scaling defaults, and offline kit instructions for VEX Lens service. | Wait for CCWO0101 schema | DVPL0101 |
|
||||
| DEPLOY-VEX-30-002 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild | ops/deployment | Package Issuer Directory deployment manifests, backups, and security hardening guidance. Dependencies: DEPLOY-VEX-30-001. | Depends on #5 | DVPL0101 |
|
||||
| DEPLOY-VULN-29-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + Vuln Guild | ops/deployment | Produce Helm/Compose overlays for Findings Ledger + projector, including DB migrations, Merkle anchor jobs, and scaling guidance. | Needs CCWO0101 | DVPL0101 |
|
||||
|
||||
50
docs/runbooks/policy-incident.md
Normal file
50
docs/runbooks/policy-incident.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Policy Publish / Incident Runbook (draft)
|
||||
|
||||
Status: DRAFT — pending policy-registry overlay and production digests. Use for dev/mock exercises until policy release artefacts land.
|
||||
|
||||
## Scope
|
||||
- Policy Registry publish/promote workflows (canary → full rollout).
|
||||
- Emergency freeze for publish endpoints.
|
||||
- Evidence capture for audits and postmortems.
|
||||
|
||||
## Pre-flight checks (dev vs. prod)
|
||||
1) Validate manifests
|
||||
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
|
||||
- Prod: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-stable.yaml --downloads deploy/downloads/manifest.json`
|
||||
- Confirm `.gitea/workflows/release-manifest-verify.yml` is green for the target manifest change.
|
||||
2) Render deployment plan (no apply yet)
|
||||
- Helm: `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-orchestrator.yaml > /tmp/policy-plan.yaml`
|
||||
- Compose (dev): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/policy-compose.yaml`
|
||||
3) Backups
|
||||
- Run `deploy/compose/scripts/backup.sh` before production rollout; archive Mongo/Redis/ObjectStore snapshots to the regulated vault.
|
||||
|
||||
## Canary publish → promote
|
||||
1) Prepare override (temporary)
|
||||
- Create `deploy/helm/stellaops/values-policy-canary.yaml` with a single replica, reduced worker counts, and an isolated ingress path for policy publish.
|
||||
- Keep `mock.enabled=false`; only use real digests when available.
|
||||
2) Dry-run render
|
||||
- `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --debug --validate > /tmp/policy-canary.yaml`
|
||||
3) Apply canary
|
||||
- `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --atomic --timeout 10m`
|
||||
- Monitor: `kubectl logs deployment/policy-registry -n stellaops --tail=200 -f` and readiness probes; rollback on errors.
|
||||
4) Promote
|
||||
- Remove the canary override from the release branch; rerender with `values-prod.yaml` only and redeploy.
|
||||
- Update the release manifest with final policy digests and rerun `release-manifest-verify`.
|
||||
|
||||
## Emergency freeze
|
||||
- Hard stop publishes while keeping read access
|
||||
- `kubectl scale deployment/policy-registry -n stellaops --replicas=0`
|
||||
- Alternatively, apply a NetworkPolicy that blocks ingress to the publish endpoint while leaving status/read paths open.
|
||||
- Manifest gate
|
||||
- Remove policy entries from the target `deploy/releases/*.yaml` and rerun `.gitea/workflows/release-manifest-verify.yml` so pipelines fail closed until the issue is cleared.
|
||||
|
||||
## Evidence capture
|
||||
- Release artefacts: copy the exact release manifest, `/tmp/policy-canary.yaml`, and `/tmp/policy-compose.yaml` used for rollout.
|
||||
- Runtime state: `kubectl get deploy,po,svc -n stellaops -l app=policy-registry -o yaml > /tmp/policy-live.yaml`.
|
||||
- Logs: `kubectl logs deployment/policy-registry -n stellaops --since=1h > /tmp/policy-logs.txt`.
|
||||
- Package as `tar -czf policy-incident-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/policy-*.yaml /tmp/policy-*.txt` and store in the audit bucket.
|
||||
|
||||
## Open items (blockers)
|
||||
- Replace mock digests with production pins in `deploy/releases/*` once provided.
|
||||
- Update the canary override file with the real policy-registry chart values (service/env schema pending from DEPLOY-POLICY-27-001).
|
||||
- Add Grafana/Prometheus dashboard references once policy metrics are exposed.
|
||||
Reference in New Issue
Block a user