Add Policy DSL Validator, Schema Exporter, and Simulation Smoke tools
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Implemented PolicyDslValidator with command-line options for strict mode and JSON output. - Created PolicySchemaExporter to generate JSON schemas for policy-related models. - Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes. - Added project files and necessary dependencies for each tool. - Ensured proper error handling and usage instructions across tools.
This commit is contained in:
151
docs/ops/deployment-upgrade-runbook.md
Normal file
151
docs/ops/deployment-upgrade-runbook.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Stella Ops Deployment Upgrade & Rollback Runbook
|
||||
|
||||
_Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._
|
||||
|
||||
This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag.
|
||||
|
||||
---
|
||||
|
||||
## 1. Channel overview
|
||||
|
||||
| Channel | Release manifest | Helm values | Compose profile |
|
||||
|---------|------------------|-------------|-----------------|
|
||||
| `edge` | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` |
|
||||
| `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` |
|
||||
| `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` |
|
||||
|
||||
Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set.
|
||||
|
||||
---
|
||||
|
||||
## 2. Pre-flight checklist
|
||||
|
||||
1. **Refresh release manifest**
|
||||
Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`).
|
||||
|
||||
2. **Align deployment bundles with the manifest**
|
||||
Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services.
|
||||
```bash
|
||||
./deploy/tools/check-channel-alignment.py \
|
||||
--release deploy/releases/2025.10-edge.yaml \
|
||||
--target deploy/helm/stellaops/values-dev.yaml \
|
||||
--target deploy/compose/docker-compose.dev.yaml \
|
||||
--ignore-repo nats
|
||||
```
|
||||
Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files.
|
||||
|
||||
3. **Lint and template profiles**
|
||||
```bash
|
||||
./deploy/tools/validate-profiles.sh
|
||||
```
|
||||
|
||||
4. **Smoke the Offline Kit debug store (edge/stable only)**
|
||||
When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree:
|
||||
```bash
|
||||
./ops/offline-kit/mirror_debug_store.py \
|
||||
--release-dir out/release \
|
||||
--offline-kit-dir out/offline-kit
|
||||
```
|
||||
Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.
|
||||
|
||||
5. **Review compatibility matrix**
|
||||
Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`.
|
||||
|
||||
6. **Create a rollback bookmark**
|
||||
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.
|
||||
|
||||
---
|
||||
|
||||
## 3. Helm upgrade procedure (staging → production)
|
||||
|
||||
1. Switch to the deployment branch and ensure secrets/config maps are current.
|
||||
2. Apply the upgrade in the staging cluster:
|
||||
```bash
|
||||
helm upgrade stellaops deploy/helm/stellaops \
|
||||
-f deploy/helm/stellaops/values-stage.yaml \
|
||||
--namespace stellaops \
|
||||
--atomic \
|
||||
--timeout 15m
|
||||
```
|
||||
3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks).
|
||||
4. Promote to production using the prod values file and the same command.
|
||||
5. Record the new revision number and Git SHA in the change log.
|
||||
|
||||
### Rollback (Helm)
|
||||
|
||||
1. Identify the previous revision: `helm history stellaops -n stellaops`.
|
||||
2. Execute:
|
||||
```bash
|
||||
helm rollback stellaops <revision> \
|
||||
--namespace stellaops \
|
||||
--wait \
|
||||
--timeout 10m
|
||||
```
|
||||
3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests.
|
||||
4. Update the incident/operations log with root cause and rollback details.
|
||||
|
||||
---
|
||||
|
||||
## 4. Docker Compose upgrade procedure
|
||||
|
||||
1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts.
|
||||
2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable).
|
||||
3. Apply the upgrade:
|
||||
```bash
|
||||
docker compose \
|
||||
--env-file deploy/compose/env/prod.env \
|
||||
-f deploy/compose/docker-compose.prod.yaml \
|
||||
pull
|
||||
|
||||
docker compose \
|
||||
--env-file deploy/compose/env/prod.env \
|
||||
-f deploy/compose/docker-compose.prod.yaml \
|
||||
up -d
|
||||
```
|
||||
4. Tail logs for critical services (`docker compose logs -f authority concelier`).
|
||||
5. Update monitoring dashboards/alerts to confirm normal operation.
|
||||
|
||||
### Rollback (Compose)
|
||||
|
||||
1. Check out the previous release tag (e.g. `git checkout 2025.09.1`).
|
||||
2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests.
|
||||
3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/ops/authority-backup-restore.md` and associated service guides).
|
||||
4. Log the rollback in the operations journal.
|
||||
|
||||
---
|
||||
|
||||
## 5. Channel promotion workflow
|
||||
|
||||
1. Author or update the channel manifest under `deploy/releases/`.
|
||||
2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
|
||||
3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`).
|
||||
4. Publish release notes and update `deploy/releases/README.md` (if applicable).
|
||||
5. Tag the repository when promoting stable or airgap builds.
|
||||
|
||||
---
|
||||
|
||||
## 6. Upgrade rehearsal & rollback drill log
|
||||
|
||||
Maintain rehearsal notes in `docs/ops/launch-cutover.md` or the relevant sprint planning document. After each drill capture:
|
||||
|
||||
- Release version tested
|
||||
- Date/time
|
||||
- Participants
|
||||
- Issues encountered & fixes
|
||||
- Rollback duration (if executed)
|
||||
|
||||
Attach the log to the sprint retro or operational wiki.
|
||||
|
||||
| Date (UTC) | Channel | Outcome | Notes |
|
||||
|------------|---------|---------|-------|
|
||||
| 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion.
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
- `deploy/README.md` – structure and validation workflow for deployment bundles.
|
||||
- `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline.
|
||||
- `docs/ARCHITECTURE_DEVOPS.md` – high-level DevOps architecture, SLOs, and compliance requirements.
|
||||
- `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper.
|
||||
- `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker.
|
||||
Reference in New Issue
Block a user