Restructure solution layout by module
	
		
			
	
		
	
	
		
	
		
			Some checks failed
		
		
	
	
		
			
				
	
				Docs CI / lint-and-preview (push) Has been cancelled
				
			
		
		
	
	
				
					
				
			
		
			Some checks failed
		
		
	
	Docs CI / lint-and-preview (push) Has been cancelled
				
			This commit is contained in:
		| @@ -1,151 +1,151 @@ | ||||
| # Stella Ops Deployment Upgrade & Rollback Runbook | ||||
|  | ||||
| _Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._ | ||||
|  | ||||
| This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Channel overview | ||||
|  | ||||
| | Channel | Release manifest | Helm values | Compose profile | | ||||
| |---------|------------------|-------------|-----------------| | ||||
| | `edge`  | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` | | ||||
| | `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` | | ||||
| | `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` | | ||||
|  | ||||
| Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Pre-flight checklist | ||||
|  | ||||
| 1. **Refresh release manifest**   | ||||
|    Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`). | ||||
|  | ||||
| 2. **Align deployment bundles with the manifest**   | ||||
|    Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services. | ||||
|    ```bash | ||||
|    ./deploy/tools/check-channel-alignment.py \ | ||||
|        --release deploy/releases/2025.10-edge.yaml \ | ||||
|        --target deploy/helm/stellaops/values-dev.yaml \ | ||||
|        --target deploy/compose/docker-compose.dev.yaml \ | ||||
|        --ignore-repo nats | ||||
|    ``` | ||||
|    Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files. | ||||
|  | ||||
| 3. **Lint and template profiles** | ||||
|    ```bash | ||||
|    ./deploy/tools/validate-profiles.sh | ||||
|    ``` | ||||
|  | ||||
| 4. **Smoke the Offline Kit debug store (edge/stable only)**   | ||||
|    When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree: | ||||
|    ```bash | ||||
|   ./ops/offline-kit/mirror_debug_store.py \ | ||||
|        --release-dir out/release \ | ||||
|        --offline-kit-dir out/offline-kit | ||||
|    ``` | ||||
|    Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle. | ||||
|  | ||||
| 5. **Review compatibility matrix**   | ||||
|    Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`. | ||||
|  | ||||
| 6. **Create a rollback bookmark**   | ||||
|    Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Helm upgrade procedure (staging → production) | ||||
|  | ||||
| 1. Switch to the deployment branch and ensure secrets/config maps are current. | ||||
| 2. Apply the upgrade in the staging cluster: | ||||
|    ```bash | ||||
|    helm upgrade stellaops deploy/helm/stellaops \ | ||||
|      -f deploy/helm/stellaops/values-stage.yaml \ | ||||
|      --namespace stellaops \ | ||||
|      --atomic \ | ||||
|      --timeout 15m | ||||
|    ``` | ||||
| 3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks). | ||||
| 4. Promote to production using the prod values file and the same command. | ||||
| 5. Record the new revision number and Git SHA in the change log. | ||||
|  | ||||
| ### Rollback (Helm) | ||||
|  | ||||
| 1. Identify the previous revision: `helm history stellaops -n stellaops`. | ||||
| 2. Execute: | ||||
|    ```bash | ||||
|    helm rollback stellaops <revision> \ | ||||
|      --namespace stellaops \ | ||||
|      --wait \ | ||||
|      --timeout 10m | ||||
|    ``` | ||||
| 3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests. | ||||
| 4. Update the incident/operations log with root cause and rollback details. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Docker Compose upgrade procedure | ||||
|  | ||||
| 1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts. | ||||
| 2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable). | ||||
| 3. Apply the upgrade: | ||||
|    ```bash | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      pull | ||||
|  | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      up -d | ||||
|    ``` | ||||
| 4. Tail logs for critical services (`docker compose logs -f authority concelier`). | ||||
| 5. Update monitoring dashboards/alerts to confirm normal operation. | ||||
|  | ||||
| ### Rollback (Compose) | ||||
|  | ||||
| 1. Check out the previous release tag (e.g. `git checkout 2025.09.1`). | ||||
| 2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests. | ||||
| 3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/ops/authority-backup-restore.md` and associated service guides). | ||||
| 4. Log the rollback in the operations journal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Channel promotion workflow | ||||
|  | ||||
| 1. Author or update the channel manifest under `deploy/releases/`. | ||||
| 2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile. | ||||
| 3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`). | ||||
| 4. Publish release notes and update `deploy/releases/README.md` (if applicable). | ||||
| 5. Tag the repository when promoting stable or airgap builds. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Upgrade rehearsal & rollback drill log | ||||
|  | ||||
| Maintain rehearsal notes in `docs/ops/launch-cutover.md` or the relevant sprint planning document. After each drill capture: | ||||
|  | ||||
| - Release version tested | ||||
| - Date/time | ||||
| - Participants | ||||
| - Issues encountered & fixes | ||||
| - Rollback duration (if executed) | ||||
|  | ||||
| Attach the log to the sprint retro or operational wiki. | ||||
|  | ||||
| | Date (UTC) | Channel | Outcome | Notes | | ||||
| |------------|---------|---------|-------| | ||||
| | 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. References | ||||
|  | ||||
| - `deploy/README.md` – structure and validation workflow for deployment bundles. | ||||
| - `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline. | ||||
| - `docs/ARCHITECTURE_DEVOPS.md` – high-level DevOps architecture, SLOs, and compliance requirements. | ||||
| - `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper. | ||||
| - `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker. | ||||
| # Stella Ops Deployment Upgrade & Rollback Runbook | ||||
|  | ||||
| _Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._ | ||||
|  | ||||
| This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Channel overview | ||||
|  | ||||
| | Channel | Release manifest | Helm values | Compose profile | | ||||
| |---------|------------------|-------------|-----------------| | ||||
| | `edge`  | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` | | ||||
| | `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` | | ||||
| | `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` | | ||||
|  | ||||
| Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Pre-flight checklist | ||||
|  | ||||
| 1. **Refresh release manifest**   | ||||
|    Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`). | ||||
|  | ||||
| 2. **Align deployment bundles with the manifest**   | ||||
|    Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services. | ||||
|    ```bash | ||||
|    ./deploy/tools/check-channel-alignment.py \ | ||||
|        --release deploy/releases/2025.10-edge.yaml \ | ||||
|        --target deploy/helm/stellaops/values-dev.yaml \ | ||||
|        --target deploy/compose/docker-compose.dev.yaml \ | ||||
|        --ignore-repo nats | ||||
|    ``` | ||||
|    Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files. | ||||
|  | ||||
| 3. **Lint and template profiles** | ||||
|    ```bash | ||||
|    ./deploy/tools/validate-profiles.sh | ||||
|    ``` | ||||
|  | ||||
| 4. **Smoke the Offline Kit debug store (edge/stable only)**   | ||||
|    When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree: | ||||
|    ```bash | ||||
|   ./ops/offline-kit/mirror_debug_store.py \ | ||||
|        --release-dir out/release \ | ||||
|        --offline-kit-dir out/offline-kit | ||||
|    ``` | ||||
|    Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle. | ||||
|  | ||||
| 5. **Review compatibility matrix**   | ||||
|    Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`. | ||||
|  | ||||
| 6. **Create a rollback bookmark**   | ||||
|    Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Helm upgrade procedure (staging → production) | ||||
|  | ||||
| 1. Switch to the deployment branch and ensure secrets/config maps are current. | ||||
| 2. Apply the upgrade in the staging cluster: | ||||
|    ```bash | ||||
|    helm upgrade stellaops deploy/helm/stellaops \ | ||||
|      -f deploy/helm/stellaops/values-stage.yaml \ | ||||
|      --namespace stellaops \ | ||||
|      --atomic \ | ||||
|      --timeout 15m | ||||
|    ``` | ||||
| 3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks). | ||||
| 4. Promote to production using the prod values file and the same command. | ||||
| 5. Record the new revision number and Git SHA in the change log. | ||||
|  | ||||
| ### Rollback (Helm) | ||||
|  | ||||
| 1. Identify the previous revision: `helm history stellaops -n stellaops`. | ||||
| 2. Execute: | ||||
|    ```bash | ||||
|    helm rollback stellaops <revision> \ | ||||
|      --namespace stellaops \ | ||||
|      --wait \ | ||||
|      --timeout 10m | ||||
|    ``` | ||||
| 3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests. | ||||
| 4. Update the incident/operations log with root cause and rollback details. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Docker Compose upgrade procedure | ||||
|  | ||||
| 1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts. | ||||
| 2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable). | ||||
| 3. Apply the upgrade: | ||||
|    ```bash | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      pull | ||||
|  | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      up -d | ||||
|    ``` | ||||
| 4. Tail logs for critical services (`docker compose logs -f authority concelier`). | ||||
| 5. Update monitoring dashboards/alerts to confirm normal operation. | ||||
|  | ||||
| ### Rollback (Compose) | ||||
|  | ||||
| 1. Check out the previous release tag (e.g. `git checkout 2025.09.1`). | ||||
| 2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests. | ||||
| 3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/ops/authority-backup-restore.md` and associated service guides). | ||||
| 4. Log the rollback in the operations journal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Channel promotion workflow | ||||
|  | ||||
| 1. Author or update the channel manifest under `deploy/releases/`. | ||||
| 2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile. | ||||
| 3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`). | ||||
| 4. Publish release notes and update `deploy/releases/README.md` (if applicable). | ||||
| 5. Tag the repository when promoting stable or airgap builds. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Upgrade rehearsal & rollback drill log | ||||
|  | ||||
| Maintain rehearsal notes in `docs/ops/launch-cutover.md` or the relevant sprint planning document. After each drill capture: | ||||
|  | ||||
| - Release version tested | ||||
| - Date/time | ||||
| - Participants | ||||
| - Issues encountered & fixes | ||||
| - Rollback duration (if executed) | ||||
|  | ||||
| Attach the log to the sprint retro or operational wiki. | ||||
|  | ||||
| | Date (UTC) | Channel | Outcome | Notes | | ||||
| |------------|---------|---------|-------| | ||||
| | 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. References | ||||
|  | ||||
| - `deploy/README.md` – structure and validation workflow for deployment bundles. | ||||
| - `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline. | ||||
| - `docs/ARCHITECTURE_DEVOPS.md` – high-level DevOps architecture, SLOs, and compliance requirements. | ||||
| - `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper. | ||||
| - `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user