- Implemented PolicyDslValidator with command-line options for strict mode and JSON output. - Created PolicySchemaExporter to generate JSON schemas for policy-related models. - Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes. - Added project files and necessary dependencies for each tool. - Ensured proper error handling and usage instructions across tools.
6.4 KiB
Stella Ops Deployment Upgrade & Rollback Runbook
Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003).
This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (edge, stable, airgap) aligned. All steps assume you are working from a clean checkout of the release branch/tag.
1. Channel overview
| Channel | Release manifest | Helm values | Compose profile |
|---|---|---|---|
edge |
deploy/releases/2025.10-edge.yaml |
deploy/helm/stellaops/values-dev.yaml |
deploy/compose/docker-compose.dev.yaml |
stable |
deploy/releases/2025.09-stable.yaml |
deploy/helm/stellaops/values-stage.yaml, deploy/helm/stellaops/values-prod.yaml |
deploy/compose/docker-compose.stage.yaml, deploy/compose/docker-compose.prod.yaml |
airgap |
deploy/releases/2025.09-airgap.yaml |
deploy/helm/stellaops/values-airgap.yaml |
deploy/compose/docker-compose.airgap.yaml |
Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as nats remain on upstream LTS tags; review deploy/compose/*.yaml for the authoritative set.
2. Pre-flight checklist
-
Refresh release manifest
Pull the latest manifest for the channel you are promoting (deploy/releases/<version>-<channel>.yaml). -
Align deployment bundles with the manifest
Run the alignment checker for every profile that should pick up the release. Pass--ignore-repo natsto skip auxiliary services../deploy/tools/check-channel-alignment.py \ --release deploy/releases/2025.10-edge.yaml \ --target deploy/helm/stellaops/values-dev.yaml \ --target deploy/compose/docker-compose.dev.yaml \ --ignore-repo natsRepeat for other channels (
stable,airgap), substituting the manifest and target files. -
Lint and template profiles
./deploy/tools/validate-profiles.sh -
Smoke the Offline Kit debug store (edge/stable only)
When the release pipeline has generatedout/release/debug/.build-id/**, mirror the assets into the Offline Kit staging tree:
./ops/offline-kit/mirror_debug_store.py
--release-dir out/release
--offline-kit-dir out/offline-kit
Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.
5. **Review compatibility matrix**
Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`.
6. **Create a rollback bookmark**
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.
---
## 3. Helm upgrade procedure (staging → production)
1. Switch to the deployment branch and ensure secrets/config maps are current.
2. Apply the upgrade in the staging cluster:
```bash
helm upgrade stellaops deploy/helm/stellaops \
-f deploy/helm/stellaops/values-stage.yaml \
--namespace stellaops \
--atomic \
--timeout 15m
- Run smoke tests (
scripts/smoke-tests.shor environment-specific checks). - Promote to production using the prod values file and the same command.
- Record the new revision number and Git SHA in the change log.
Rollback (Helm)
- Identify the previous revision:
helm history stellaops -n stellaops. - Execute:
helm rollback stellaops <revision> \ --namespace stellaops \ --wait \ --timeout 10m - Verify
kubectl get podsreturns healthy workloads; rerun smoke tests. - Update the incident/operations log with root cause and rollback details.
4. Docker Compose upgrade procedure
- Update environment files (
deploy/compose/env/*.env.example) with any new settings and sync secrets to hosts. - Pull the tagged repository state corresponding to the release (e.g.
git checkout 2025.09.2for stable). - Apply the upgrade:
docker compose \ --env-file deploy/compose/env/prod.env \ -f deploy/compose/docker-compose.prod.yaml \ pull docker compose \ --env-file deploy/compose/env/prod.env \ -f deploy/compose/docker-compose.prod.yaml \ up -d - Tail logs for critical services (
docker compose logs -f authority concelier). - Update monitoring dashboards/alerts to confirm normal operation.
Rollback (Compose)
- Check out the previous release tag (e.g.
git checkout 2025.09.1). - Re-run
docker compose pullanddocker compose up -dwith that profile. Docker will restore the prior digests. - If reverting to a known-good snapshot is required, restore volume backups (see
docs/ops/authority-backup-restore.mdand associated service guides). - Log the rollback in the operations journal.
5. Channel promotion workflow
- Author or update the channel manifest under
deploy/releases/. - Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
- Commit the changes with a message that references the release version and channel (e.g.
deploy: promote 2025.10.0-edge). - Publish release notes and update
deploy/releases/README.md(if applicable). - Tag the repository when promoting stable or airgap builds.
6. Upgrade rehearsal & rollback drill log
Maintain rehearsal notes in docs/ops/launch-cutover.md or the relevant sprint planning document. After each drill capture:
- Release version tested
- Date/time
- Participants
- Issues encountered & fixes
- Rollback duration (if executed)
Attach the log to the sprint retro or operational wiki.
| Date (UTC) | Channel | Outcome | Notes |
|---|---|---|---|
| 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion. |
7. References
deploy/README.md– structure and validation workflow for deployment bundles.docs/13_RELEASE_ENGINEERING_PLAYBOOK.md– release automation and signing pipeline.docs/ARCHITECTURE_DEVOPS.md– high-level DevOps architecture, SLOs, and compliance requirements.ops/offline-kit/mirror_debug_store.py– debug-store mirroring helper.deploy/tools/check-channel-alignment.py– release vs deployment digest alignment checker.