Files
git.stella-ops.org/docs/ops/deployment-upgrade-runbook.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

6.6 KiB
Raw Blame History

StellaOps Deployment Upgrade & Rollback Runbook

Last updated: 2025-10-26 (Sprint 14 DEVOPS-OPS-14-003).

This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (edge, stable, airgap) aligned. All steps assume you are working from a clean checkout of the release branch/tag.


1. Channel overview

Channel Release manifest Helm values Compose profile
edge deploy/releases/2025.10-edge.yaml deploy/helm/stellaops/values-dev.yaml deploy/compose/docker-compose.dev.yaml
stable deploy/releases/2025.09-stable.yaml deploy/helm/stellaops/values-stage.yaml, deploy/helm/stellaops/values-prod.yaml deploy/compose/docker-compose.stage.yaml, deploy/compose/docker-compose.prod.yaml
airgap deploy/releases/2025.09-airgap.yaml deploy/helm/stellaops/values-airgap.yaml deploy/compose/docker-compose.airgap.yaml

Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as nats remain on upstream LTS tags; review deploy/compose/*.yaml for the authoritative set.


2. Pre-flight checklist

  1. Refresh release manifest
    Pull the latest manifest for the channel you are promoting (deploy/releases/<version>-<channel>.yaml).

  2. Align deployment bundles with the manifest
    Run the alignment checker for every profile that should pick up the release. Pass --ignore-repo nats to skip auxiliary services.

    ./deploy/tools/check-channel-alignment.py \
        --release deploy/releases/2025.10-edge.yaml \
        --target deploy/helm/stellaops/values-dev.yaml \
        --target deploy/compose/docker-compose.dev.yaml \
        --ignore-repo nats
    

    Repeat for other channels (stable, airgap), substituting the manifest and target files.

  3. Lint and template profiles

    ./deploy/tools/validate-profiles.sh
    
  4. Smoke the Offline Kit debug store (edge/stable only)
    When the release pipeline has generated out/release/debug/.build-id/**, mirror the assets into the Offline Kit staging tree:

./ops/offline-kit/mirror_debug_store.py
--release-dir out/release
--offline-kit-dir out/offline-kit

Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.

5. **Review compatibility matrix**  
Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`.

6. **Create a rollback bookmark**  
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.

---

## 3. Helm upgrade procedure (staging → production)

1. Switch to the deployment branch and ensure secrets/config maps are current.
2. Apply the upgrade in the staging cluster:
```bash
helm upgrade stellaops deploy/helm/stellaops \
  -f deploy/helm/stellaops/values-stage.yaml \
  --namespace stellaops \
  --atomic \
  --timeout 15m
  1. Run smoke tests (scripts/smoke-tests.sh or environment-specific checks).
  2. Promote to production using the prod values file and the same command.
  3. Record the new revision number and Git SHA in the change log.

Rollback (Helm)

  1. Identify the previous revision: helm history stellaops -n stellaops.
  2. Execute:
    helm rollback stellaops <revision> \
      --namespace stellaops \
      --wait \
      --timeout 10m
    
  3. Verify kubectl get pods returns healthy workloads; rerun smoke tests.
  4. Update the incident/operations log with root cause and rollback details.

4. Docker Compose upgrade procedure

  1. Update environment files (deploy/compose/env/*.env.example) with any new settings and sync secrets to hosts.
  2. Pull the tagged repository state corresponding to the release (e.g. git checkout 2025.09.2 for stable).
  3. Apply the upgrade:
    docker compose \
      --env-file deploy/compose/env/prod.env \
      -f deploy/compose/docker-compose.prod.yaml \
      pull
    
    docker compose \
      --env-file deploy/compose/env/prod.env \
      -f deploy/compose/docker-compose.prod.yaml \
      up -d
    
  4. Tail logs for critical services (docker compose logs -f authority concelier).
  5. Update monitoring dashboards/alerts to confirm normal operation.

Rollback (Compose)

  1. Check out the previous release tag (e.g. git checkout 2025.09.1).
  2. Re-run docker compose pull and docker compose up -d with that profile. Docker will restore the prior digests.
  3. If reverting to a known-good snapshot is required, restore volume backups (see docs/ops/authority-backup-restore.md and associated service guides).
  4. Log the rollback in the operations journal.

5. Channel promotion workflow

  1. Author or update the channel manifest under deploy/releases/.
  2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
  3. Commit the changes with a message that references the release version and channel (e.g. deploy: promote 2025.10.0-edge).
  4. Publish release notes and update deploy/releases/README.md (if applicable).
  5. Tag the repository when promoting stable or airgap builds.

6. Upgrade rehearsal & rollback drill log

Maintain rehearsal notes in docs/ops/launch-cutover.md or the relevant sprint planning document. After each drill capture:

  • Release version tested
  • Date/time
  • Participants
  • Issues encountered & fixes
  • Rollback duration (if executed)

Attach the log to the sprint retro or operational wiki.

Date (UTC) Channel Outcome Notes
2025-10-26 Documentation dry-run Planned Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion.

7. References

  • deploy/README.md structure and validation workflow for deployment bundles.
  • docs/13_RELEASE_ENGINEERING_PLAYBOOK.md release automation and signing pipeline.
  • docs/ARCHITECTURE_DEVOPS.md high-level DevOps architecture, SLOs, and compliance requirements.
  • ops/offline-kit/mirror_debug_store.py debug-store mirroring helper.
  • deploy/tools/check-channel-alignment.py release vs deployment digest alignment checker.