# Policy Publish / Incident Runbook (draft) Status: DRAFT — pending policy-registry overlay and production digests. Use for dev/mock exercises until policy release artefacts land. ## Scope - Policy Registry publish/promote workflows (canary → full rollout). - Emergency freeze for publish endpoints. - Evidence capture for audits and postmortems. ## Pre-flight checks (dev vs. prod) 1) Validate manifests - Dev/mock: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-mock-dev.yaml --downloads deploy/downloads/manifest.json` - Prod: `python ops/devops/release/check_release_manifest.py deploy/releases/2025.09-stable.yaml --downloads deploy/downloads/manifest.json` - Confirm `.gitea/workflows/release-manifest-verify.yml` is green for the target manifest change. 2) Render deployment plan (no apply yet) - Helm: `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-orchestrator.yaml > /tmp/policy-plan.yaml` - Compose (dev): `USE_MOCK=1 deploy/compose/scripts/quickstart.sh env/dev.env.example && docker compose --env-file env/dev.env.example -f deploy/compose/docker-compose.dev.yaml -f deploy/compose/docker-compose.mock.yaml config > /tmp/policy-compose.yaml` 3) Backups - Run `deploy/compose/scripts/backup.sh` before production rollout; archive Mongo/Redis/ObjectStore snapshots to the regulated vault. ## Canary publish → promote 1) Prepare override (temporary) - Create `deploy/helm/stellaops/values-policy-canary.yaml` with a single replica, reduced worker counts, and an isolated ingress path for policy publish. - Keep `mock.enabled=false`; only use real digests when available. 2) Dry-run render - `helm template stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --debug --validate > /tmp/policy-canary.yaml` 3) Apply canary - `helm upgrade --install stellaops ./deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml -f deploy/helm/stellaops/values-policy-canary.yaml --atomic --timeout 10m` - Monitor: `kubectl logs deployment/policy-registry -n stellaops --tail=200 -f` and readiness probes; rollback on errors. 4) Promote - Remove the canary override from the release branch; rerender with `values-prod.yaml` only and redeploy. - Update the release manifest with final policy digests and rerun `release-manifest-verify`. ## Emergency freeze - Hard stop publishes while keeping read access - `kubectl scale deployment/policy-registry -n stellaops --replicas=0` - Alternatively, apply a NetworkPolicy that blocks ingress to the publish endpoint while leaving status/read paths open. - Manifest gate - Remove policy entries from the target `deploy/releases/*.yaml` and rerun `.gitea/workflows/release-manifest-verify.yml` so pipelines fail closed until the issue is cleared. ## Evidence capture - Release artefacts: copy the exact release manifest, `/tmp/policy-canary.yaml`, and `/tmp/policy-compose.yaml` used for rollout. - Runtime state: `kubectl get deploy,po,svc -n stellaops -l app=policy-registry -o yaml > /tmp/policy-live.yaml`. - Logs: `kubectl logs deployment/policy-registry -n stellaops --since=1h > /tmp/policy-logs.txt`. - Package as `tar -czf policy-incident-$(date -u +%Y%m%dT%H%M%SZ).tar.gz /tmp/policy-*.yaml /tmp/policy-*.txt` and store in the audit bucket. ## Open items (blockers) - Replace mock digests with production pins in `deploy/releases/*` once provided. - Update the canary override file with the real policy-registry chart values (service/env schema pending from DEPLOY-POLICY-27-001). - Add Grafana/Prometheus dashboard references once policy metrics are exposed.