ops: add policy incident runbook draft

This commit is contained in:
StellaOps Bot
2025-12-06 23:30:12 +00:00
parent 69651212ec
commit 98934170ca
3 changed files with 55 additions and 3 deletions

View File

@@ -20,7 +20,7 @@
## Delivery Tracker ## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition | | # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
| 1 | DEPLOY-POLICY-27-002 | TODO | Depends on DEPLOY-POLICY-27-001 | Deployment Guild, Policy Guild | Document rollout/rollback playbooks for policy publish/promote (canary, emergency freeze, evidence retrieval) under `docs/runbooks/policy-incident.md` | | 1 | DEPLOY-POLICY-27-002 | DOING (draft 2025-12-06) | Pending policy overlay/digests from DEPLOY-POLICY-27-001; draft runbook at `docs/runbooks/policy-incident.md` | Deployment Guild, Policy Guild | Document rollout/rollback playbooks for policy publish/promote (canary, emergency freeze, evidence retrieval) under `docs/runbooks/policy-incident.md` |
| 2 | DEPLOY-VEX-30-001 | DOING (dev-mock digests 2025-12-06) | Mock digests published in `deploy/releases/2025.09-mock-dev.yaml`; production still awaits real artefacts | Deployment Guild, VEX Lens Guild | Provide Helm/Compose overlays, scaling defaults, offline kit instructions for VEX Lens service | | 2 | DEPLOY-VEX-30-001 | DOING (dev-mock digests 2025-12-06) | Mock digests published in `deploy/releases/2025.09-mock-dev.yaml`; production still awaits real artefacts | Deployment Guild, VEX Lens Guild | Provide Helm/Compose overlays, scaling defaults, offline kit instructions for VEX Lens service |
| 3 | DEPLOY-VEX-30-002 | DOING (dev-mock digests 2025-12-06) | Depends on DEPLOY-VEX-30-001 | Deployment Guild, Issuer Directory Guild | Package Issuer Directory deployment manifests, backups, security hardening guidance | | 3 | DEPLOY-VEX-30-002 | DOING (dev-mock digests 2025-12-06) | Depends on DEPLOY-VEX-30-001 | Deployment Guild, Issuer Directory Guild | Package Issuer Directory deployment manifests, backups, security hardening guidance |
| 4 | DEPLOY-VULN-29-001 | DOING (dev-mock digests 2025-12-06) | Mock digests available in `deploy/releases/2025.09-mock-dev.yaml`; production pins pending | Deployment Guild, Findings Ledger Guild | Helm/Compose overlays for Findings Ledger + projector incl. DB migrations, Merkle anchor jobs, scaling guidance | | 4 | DEPLOY-VULN-29-001 | DOING (dev-mock digests 2025-12-06) | Mock digests available in `deploy/releases/2025.09-mock-dev.yaml`; production pins pending | Deployment Guild, Findings Ledger Guild | Helm/Compose overlays for Findings Ledger + projector incl. DB migrations, Merkle anchor jobs, scaling guidance |
@@ -33,6 +33,7 @@
## Execution Log ## Execution Log
| Date (UTC) | Update | Owner | | Date (UTC) | Update | Owner |
| --- | --- | --- | | --- | --- | --- |
| 2025-12-06 | Drafted policy incident runbook (`docs/runbooks/policy-incident.md`); set DEPLOY-POLICY-27-002 to DOING pending policy overlay/digests. | Deployment Guild |
| 2025-12-06 | Header normalised to standard template; no content/status changes. | Project Mgmt | | 2025-12-06 | Header normalised to standard template; no content/status changes. | Project Mgmt |
| 2025-12-06 | Seeded mock dev release manifest (`deploy/releases/2025.09-mock-dev.yaml`) covering VEX Lens and Findings/Vuln stacks; tasks moved to DOING (dev-mock) for development packaging. Production release still awaits real digests. | Deployment Guild | | 2025-12-06 | Seeded mock dev release manifest (`deploy/releases/2025.09-mock-dev.yaml`) covering VEX Lens and Findings/Vuln stacks; tasks moved to DOING (dev-mock) for development packaging. Production release still awaits real digests. | Deployment Guild |
| 2025-12-06 | Added mock downloads manifest at `deploy/downloads/manifest.json` to unblock dev/test; production still requires signed console artefacts. | Deployment Guild | | 2025-12-06 | Added mock downloads manifest at `deploy/downloads/manifest.json` to unblock dev/test; production still requires signed console artefacts. | Deployment Guild |
@@ -53,6 +54,7 @@
- Risk: Offline kit instructions must avoid external image pulls; ensure pinned digests and air-gap copy steps. - Risk: Offline kit instructions must avoid external image pulls; ensure pinned digests and air-gap copy steps.
- VEX Lens and Findings/Vuln overlays blocked: release digests absent from `deploy/releases/2025.09-stable.yaml`; cannot pin images or publish offline bundles until artefacts land. - VEX Lens and Findings/Vuln overlays blocked: release digests absent from `deploy/releases/2025.09-stable.yaml`; cannot pin images or publish offline bundles until artefacts land.
- Console downloads manifest blocked: console images/bundles not published, so `deploy/downloads/manifest.json` cannot be signed/updated. - Console downloads manifest blocked: console images/bundles not published, so `deploy/downloads/manifest.json` cannot be signed/updated.
- Policy incident runbook is draft-only until DEPLOY-POLICY-27-001 delivers policy overlay schema and production digests.
## Next Checkpoints ## Next Checkpoints
| Date (UTC) | Session / Owner | Target outcome | Fallback / Escalation | | Date (UTC) | Session / Owner | Target outcome | Fallback / Escalation |

View File

@@ -539,7 +539,7 @@
| DEPLOY-PACKS-42-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Packs Registry Guild | ops/deployment | Provide deployment manifests for packs-registry and task-runner services, including Helm/Compose overlays, scaling defaults, and secret templates. | Wait for pack registry schema | AGDP0101 | | DEPLOY-PACKS-42-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Packs Registry Guild | ops/deployment | Provide deployment manifests for packs-registry and task-runner services, including Helm/Compose overlays, scaling defaults, and secret templates. | Wait for pack registry schema | AGDP0101 |
| DEPLOY-PACKS-43-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Task Runner Guild | ops/deployment | Ship remote Task Runner worker profiles, object storage bootstrap, approval workflow integration, and Offline Kit packaging instructions. Dependencies: DEPLOY-PACKS-42-001. | Needs #7 artifacts | AGDP0101 | | DEPLOY-PACKS-43-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Task Runner Guild | ops/deployment | Ship remote Task Runner worker profiles, object storage bootstrap, approval workflow integration, and Offline Kit packaging instructions. Dependencies: DEPLOY-PACKS-42-001. | Needs #7 artifacts | AGDP0101 |
| DEPLOY-POLICY-27-001 | DOING (dev-mock 2025-12-06) | 2025-12-05 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Policy Registry Guild | ops/deployment | Produce Helm/Compose overlays for Policy Registry + simulation workers (migrations, buckets, signing keys, tenancy defaults). | WEPO0101 | DVPL0105 | | DEPLOY-POLICY-27-001 | DOING (dev-mock 2025-12-06) | 2025-12-05 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Policy Registry Guild | ops/deployment | Produce Helm/Compose overlays for Policy Registry + simulation workers (migrations, buckets, signing keys, tenancy defaults). | WEPO0101 | DVPL0105 |
| DEPLOY-POLICY-27-002 | TODO | | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Document rollout/rollback playbooks for policy publish/promote (canary strategy, emergency freeze, evidence retrieval). | DEPLOY-POLICY-27-001 | DVPL0105 | | DEPLOY-POLICY-27-002 | DOING (draft 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Drafted `docs/runbooks/policy-incident.md` (publish/promote, freeze, evidence); awaiting policy overlay schema/digests from DEPLOY-POLICY-27-001. | DEPLOY-POLICY-27-001 | DVPL0105 |
| DEPLOY-VEX-30-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + VEX Lens Guild | ops/deployment | Provide Helm/Compose overlays, scaling defaults, and offline kit instructions for VEX Lens service. | Wait for CCWO0101 schema | DVPL0101 | | DEPLOY-VEX-30-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + VEX Lens Guild | ops/deployment | Provide Helm/Compose overlays, scaling defaults, and offline kit instructions for VEX Lens service. | Wait for CCWO0101 schema | DVPL0101 |
| DEPLOY-VEX-30-002 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild | ops/deployment | Package Issuer Directory deployment manifests, backups, and security hardening guidance. Dependencies: DEPLOY-VEX-30-001. | Depends on #5 | DVPL0101 | | DEPLOY-VEX-30-002 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild | ops/deployment | Package Issuer Directory deployment manifests, backups, and security hardening guidance. Dependencies: DEPLOY-VEX-30-001. | Depends on #5 | DVPL0101 |
| DEPLOY-VULN-29-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + Vuln Guild | ops/deployment | Produce Helm/Compose overlays for Findings Ledger + projector, including DB migrations, Merkle anchor jobs, and scaling guidance. | Needs CCWO0101 | DVPL0101 | | DEPLOY-VULN-29-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + Vuln Guild | ops/deployment | Produce Helm/Compose overlays for Findings Ledger + projector, including DB migrations, Merkle anchor jobs, and scaling guidance. | Needs CCWO0101 | DVPL0101 |
@@ -2753,7 +2753,7 @@
| DEPLOY-PACKS-42-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Packs Registry Guild | ops/deployment | Provide deployment manifests for packs-registry and task-runner services, including Helm/Compose overlays, scaling defaults, and secret templates. | Wait for pack registry schema | AGDP0101 | | DEPLOY-PACKS-42-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Packs Registry Guild | ops/deployment | Provide deployment manifests for packs-registry and task-runner services, including Helm/Compose overlays, scaling defaults, and secret templates. | Wait for pack registry schema | AGDP0101 |
| DEPLOY-PACKS-43-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Task Runner Guild | ops/deployment | Ship remote Task Runner worker profiles, object storage bootstrap, approval workflow integration, and Offline Kit packaging instructions. Dependencies: DEPLOY-PACKS-42-001. | Needs #7 artifacts | AGDP0101 | | DEPLOY-PACKS-43-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Task Runner Guild | ops/deployment | Ship remote Task Runner worker profiles, object storage bootstrap, approval workflow integration, and Offline Kit packaging instructions. Dependencies: DEPLOY-PACKS-42-001. | Needs #7 artifacts | AGDP0101 |
| DEPLOY-POLICY-27-001 | DOING (dev-mock 2025-12-06) | 2025-12-05 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Policy Registry Guild | ops/deployment | Produce Helm/Compose overlays for Policy Registry + simulation workers, including Mongo migrations, object storage buckets, signing key secrets, and tenancy defaults. | Needs registry schema + secrets | AGDP0101 | | DEPLOY-POLICY-27-001 | DOING (dev-mock 2025-12-06) | 2025-12-05 | SPRINT_0501_0001_0001_ops_deployment_i | Deployment Guild · Policy Registry Guild | ops/deployment | Produce Helm/Compose overlays for Policy Registry + simulation workers, including Mongo migrations, object storage buckets, signing key secrets, and tenancy defaults. | Needs registry schema + secrets | AGDP0101 |
| DEPLOY-POLICY-27-002 | TODO | | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Document rollout/rollback playbooks for policy publish/promote (canary strategy, emergency freeze toggle, evidence retrieval) under `/docs/runbooks/policy-incident.md`. Dependencies: DEPLOY-POLICY-27-001. | Depends on 27-001 | AGDP0101 | | DEPLOY-POLICY-27-002 | DOING (draft 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild · Policy Guild | ops/deployment | Drafted `docs/runbooks/policy-incident.md` (publish/promote, freeze, evidence); finalize once DEPLOY-POLICY-27-001 ships schema/digests. | Depends on 27-001 | AGDP0101 |
| DEPLOY-VEX-30-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + VEX Lens Guild | ops/deployment | Provide Helm/Compose overlays, scaling defaults, and offline kit instructions for VEX Lens service. | Wait for CCWO0101 schema | DVPL0101 | | DEPLOY-VEX-30-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + VEX Lens Guild | ops/deployment | Provide Helm/Compose overlays, scaling defaults, and offline kit instructions for VEX Lens service. | Wait for CCWO0101 schema | DVPL0101 |
| DEPLOY-VEX-30-002 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild | ops/deployment | Package Issuer Directory deployment manifests, backups, and security hardening guidance. Dependencies: DEPLOY-VEX-30-001. | Depends on #5 | DVPL0101 | | DEPLOY-VEX-30-002 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment Guild | ops/deployment | Package Issuer Directory deployment manifests, backups, and security hardening guidance. Dependencies: DEPLOY-VEX-30-001. | Depends on #5 | DVPL0101 |
| DEPLOY-VULN-29-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + Vuln Guild | ops/deployment | Produce Helm/Compose overlays for Findings Ledger + projector, including DB migrations, Merkle anchor jobs, and scaling guidance. | Needs CCWO0101 | DVPL0101 | | DEPLOY-VULN-29-001 | DOING (dev-mock 2025-12-06) | 2025-12-06 | SPRINT_0502_0001_0001_ops_deployment_ii | Deployment + Vuln Guild | ops/deployment | Produce Helm/Compose overlays for Findings Ledger + projector, including DB migrations, Merkle anchor jobs, and scaling guidance. | Needs CCWO0101 | DVPL0101 |

View File

@@ -0,0 +1,50 @@
# Policy Publish / Incident Runbook (draft)
Status: DRAFT — pending policy-registry overlay and production digests. Use for dev/mock exercises until policy release artefacts land.
## Scope
- Policy Registry publish/promote workflows (canary → full rollout).
- Emergency freeze for publish endpoints.
- Evidence capture for audits and postmortems.
## Pre-flight checks (dev vs. prod)
1) Validate manifests
- Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json`
- Prod: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-stable.yaml --downloads deploy/downloads/manifest.json`
- Confirm `.gitea/workflows/release-manifest-verify.yml` is green for the target manifest change.
2) Render deployment plan (no apply yet)
- Helm: `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-orchestrator.yaml > /tmp/policy-plan.yaml`
- Compose (dev): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/policy-compose.yaml`
3) Backups
- Run `deploy/compose/scripts/backup.sh` before production rollout; archive Mongo/Redis/ObjectStore snapshots to the regulated vault.
## Canary publish → promote
1) Prepare override (temporary)
- Create `deploy/helm/stellaops/values-policy-canary.yaml` with a single replica, reduced worker counts, and an isolated ingress path for policy publish.
- Keep `mock.enabled=false`; only use real digests when available.
2) Dry-run render
- `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --debug --validate > /tmp/policy-canary.yaml`
3) Apply canary
- `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --atomic --timeout 10m`
- Monitor: `kubectl logs deployment/policy-registry -n stellaops --tail=200 -f` and readiness probes; rollback on errors.
4) Promote
- Remove the canary override from the release branch; rerender with `values-prod.yaml` only and redeploy.
- Update the release manifest with final policy digests and rerun `release-manifest-verify`.
## Emergency freeze
- Hard stop publishes while keeping read access
- `kubectl scale deployment/policy-registry -n stellaops --replicas=0`
- Alternatively, apply a NetworkPolicy that blocks ingress to the publish endpoint while leaving status/read paths open.
- Manifest gate
- Remove policy entries from the target `deploy/releases/*.yaml` and rerun `.gitea/workflows/release-manifest-verify.yml` so pipelines fail closed until the issue is cleared.
## Evidence capture
- Release artefacts: copy the exact release manifest, `/tmp/policy-canary.yaml`, and `/tmp/policy-compose.yaml` used for rollout.
- Runtime state: `kubectl get deploy,po,svc -n stellaops -l app=policy-registry -o yaml > /tmp/policy-live.yaml`.
- Logs: `kubectl logs deployment/policy-registry -n stellaops --since=1h > /tmp/policy-logs.txt`.
- Package as `tar -czf policy-incident-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/policy-*.yaml /tmp/policy-*.txt` and store in the audit bucket.
## Open items (blockers)
- Replace mock digests with production pins in `deploy/releases/*` once provided.
- Update the canary override file with the real policy-registry chart values (service/env schema pending from DEPLOY-POLICY-27-001).
- Add Grafana/Prometheus dashboard references once policy metrics are exposed.