feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		
							
								
								
									
										151
									
								
								docs/modules/devops/runbooks/deployment-upgrade.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										151
									
								
								docs/modules/devops/runbooks/deployment-upgrade.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,151 @@ | ||||
| # Stella Ops Deployment Upgrade & Rollback Runbook | ||||
|  | ||||
| _Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._ | ||||
|  | ||||
| This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Channel overview | ||||
|  | ||||
| | Channel | Release manifest | Helm values | Compose profile | | ||||
| |---------|------------------|-------------|-----------------| | ||||
| | `edge`  | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` | | ||||
| | `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` | | ||||
| | `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` | | ||||
|  | ||||
| Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Pre-flight checklist | ||||
|  | ||||
| 1. **Refresh release manifest**   | ||||
|    Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`). | ||||
|  | ||||
| 2. **Align deployment bundles with the manifest**   | ||||
|    Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services. | ||||
|    ```bash | ||||
|    ./deploy/tools/check-channel-alignment.py \ | ||||
|        --release deploy/releases/2025.10-edge.yaml \ | ||||
|        --target deploy/helm/stellaops/values-dev.yaml \ | ||||
|        --target deploy/compose/docker-compose.dev.yaml \ | ||||
|        --ignore-repo nats | ||||
|    ``` | ||||
|    Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files. | ||||
|  | ||||
| 3. **Lint and template profiles** | ||||
|    ```bash | ||||
|    ./deploy/tools/validate-profiles.sh | ||||
|    ``` | ||||
|  | ||||
| 4. **Smoke the Offline Kit debug store (edge/stable only)**   | ||||
|    When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree: | ||||
|    ```bash | ||||
|   ./ops/offline-kit/mirror_debug_store.py \ | ||||
|        --release-dir out/release \ | ||||
|        --offline-kit-dir out/offline-kit | ||||
|    ``` | ||||
|    Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle. | ||||
|  | ||||
| 5. **Review compatibility matrix**   | ||||
|    Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`. | ||||
|  | ||||
| 6. **Create a rollback bookmark**   | ||||
|    Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Helm upgrade procedure (staging → production) | ||||
|  | ||||
| 1. Switch to the deployment branch and ensure secrets/config maps are current. | ||||
| 2. Apply the upgrade in the staging cluster: | ||||
|    ```bash | ||||
|    helm upgrade stellaops deploy/helm/stellaops \ | ||||
|      -f deploy/helm/stellaops/values-stage.yaml \ | ||||
|      --namespace stellaops \ | ||||
|      --atomic \ | ||||
|      --timeout 15m | ||||
|    ``` | ||||
| 3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks). | ||||
| 4. Promote to production using the prod values file and the same command. | ||||
| 5. Record the new revision number and Git SHA in the change log. | ||||
|  | ||||
| ### Rollback (Helm) | ||||
|  | ||||
| 1. Identify the previous revision: `helm history stellaops -n stellaops`. | ||||
| 2. Execute: | ||||
|    ```bash | ||||
|    helm rollback stellaops <revision> \ | ||||
|      --namespace stellaops \ | ||||
|      --wait \ | ||||
|      --timeout 10m | ||||
|    ``` | ||||
| 3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests. | ||||
| 4. Update the incident/operations log with root cause and rollback details. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Docker Compose upgrade procedure | ||||
|  | ||||
| 1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts. | ||||
| 2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable). | ||||
| 3. Apply the upgrade: | ||||
|    ```bash | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      pull | ||||
|  | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      up -d | ||||
|    ``` | ||||
| 4. Tail logs for critical services (`docker compose logs -f authority concelier`). | ||||
| 5. Update monitoring dashboards/alerts to confirm normal operation. | ||||
|  | ||||
| ### Rollback (Compose) | ||||
|  | ||||
| 1. Check out the previous release tag (e.g. `git checkout 2025.09.1`). | ||||
| 2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests. | ||||
| 3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/modules/authority/operations/backup-restore.md` and associated service guides). | ||||
| 4. Log the rollback in the operations journal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Channel promotion workflow | ||||
|  | ||||
| 1. Author or update the channel manifest under `deploy/releases/`. | ||||
| 2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile. | ||||
| 3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`). | ||||
| 4. Publish release notes and update `deploy/releases/README.md` (if applicable). | ||||
| 5. Tag the repository when promoting stable or airgap builds. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Upgrade rehearsal & rollback drill log | ||||
|  | ||||
| Maintain rehearsal notes in `docs/modules/devops/runbooks/launch-cutover.md` or the relevant sprint planning document. After each drill capture: | ||||
|  | ||||
| - Release version tested | ||||
| - Date/time | ||||
| - Participants | ||||
| - Issues encountered & fixes | ||||
| - Rollback duration (if executed) | ||||
|  | ||||
| Attach the log to the sprint retro or operational wiki. | ||||
|  | ||||
| | Date (UTC) | Channel | Outcome | Notes | | ||||
| |------------|---------|---------|-------| | ||||
| | 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. References | ||||
|  | ||||
| - `deploy/README.md` – structure and validation workflow for deployment bundles. | ||||
| - `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline. | ||||
| - `docs/modules/devops/architecture.md` – high-level DevOps architecture, SLOs, and compliance requirements. | ||||
| - `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper. | ||||
| - `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker. | ||||
							
								
								
									
										128
									
								
								docs/modules/devops/runbooks/launch-cutover.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										128
									
								
								docs/modules/devops/runbooks/launch-cutover.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,128 @@ | ||||
| # Launch Cutover Runbook - Stella Ops | ||||
|  | ||||
| _Document owner: DevOps Guild (2025-10-26)_   | ||||
| _Scope:_ Full-platform launch from staging to production for release `2025.09.2`. | ||||
|  | ||||
| ## 1. Roles and Communication | ||||
|  | ||||
| | Role | Primary | Backup | Contact | | ||||
| | --- | --- | --- | --- | | ||||
| | Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | `#launch-bridge` (Mattermost) | | ||||
| | Authority stack | Authority Core guild rep | Security guild rep | `#authority` | | ||||
| | Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | `#scanner` | | ||||
| | Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation | | ||||
| | Observability | Telemetry guild rep | SRE on-call | `#telemetry` | | ||||
| | Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket | | ||||
|  | ||||
| Set up a bridge call 30 minutes before start and keep `#launch-bridge` updated every 10 minutes. | ||||
|  | ||||
| ## 2. Timeline Overview (UTC) | ||||
|  | ||||
| | Time | Activity | Owner | | ||||
| | --- | --- | --- | | ||||
| | T-24h | Change ticket approved, prod secrets verified, offline kit build status checked (`DEVOPS-OFFLINE-18-005`). | DevOps lead | | ||||
| | T-12h | Run `deploy/tools/validate-profiles.sh`; capture logs in ticket. | DevOps engineer | | ||||
| | T-6h | Freeze non-launch deployments; notify guild leads. | Product owner | | ||||
| | T-2h | Execute rehearsal in staging (Section 3) using `values-stage.yaml` to verify scripts. | DevOps + module reps | | ||||
| | T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead | | ||||
| | T0 | Execute production cutover steps (Section 4). | Cutover team | | ||||
| | T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead | | ||||
| | T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner | | ||||
|  | ||||
| ## 3. Rehearsal (Staging) Checklist | ||||
|  | ||||
| 1. `docker network create stellaops_frontdoor || true` (if not present on staging jump host). | ||||
| 2. Run `deploy/tools/validate-profiles.sh` and archive output. | ||||
| 3. Apply staging secrets (`kubectl apply -f secrets/stage/*.yaml` or `helm secrets upgrade`) ensuring `stellaops-stage` credentials align with `values-stage.yaml`. | ||||
| 4. Perform `helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-stage.yaml` in staging cluster. | ||||
| 5. Verify health endpoints: `curl https://authority.stage.../healthz`, `curl https://scanner.stage.../healthz`. | ||||
| 6. Execute smoke CLI: `stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.json` and confirm report status in UI. | ||||
| 7. Document total wall time and any deviations in the rehearsal log. | ||||
|  | ||||
| Rehearsal must complete without manual interventions before proceeding to production. | ||||
|  | ||||
| ## 4. Production Cutover Steps | ||||
|  | ||||
| ### 4.1 Pre-flight | ||||
| - Confirm production secrets in the appropriate secret store (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`) contain the keys referenced in `values-prod.yaml`. | ||||
| - Ensure the external reverse proxy network exists: `docker network create stellaops_frontdoor || true` on each compose host. | ||||
| - Back up current configuration and data: | ||||
|   - Mongo snapshot: `mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds)`. | ||||
|   - MinIO policy export: `mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M)`. | ||||
|  | ||||
| ### 4.2 Apply Updates (Compose) | ||||
| 1. On each compose node, pull updated images for release `2025.09.2`: | ||||
|    ```bash | ||||
|    docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml pull | ||||
|    ``` | ||||
| 2. Deploy changes: | ||||
|    ```bash | ||||
|    docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml up -d | ||||
|    ``` | ||||
| 3. Confirm containers healthy via `docker compose ps` and `docker logs <service> --tail 50`. | ||||
|  | ||||
| ### 4.3 Apply Updates (Helm/Kubernetes) | ||||
| If using Kubernetes, perform: | ||||
| ```bash | ||||
| helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml --atomic --timeout 15m | ||||
| ``` | ||||
| Monitor rollout with `kubectl get pods -n stellaops --watch` and `kubectl rollout status deployment/<service>`. | ||||
|  | ||||
| ### 4.4 Configuration Validation | ||||
| - Verify Authority issuer metadata: `curl https://authority.prod.../.well-known/openid-configuration`. | ||||
| - Validate Signer DSSE endpoint: `stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json`. | ||||
| - Check Scanner queue connectivity: `docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue` (returns success). | ||||
| - Ensure Notify (legacy) still accessible while Notifier migration pending. | ||||
|  | ||||
| ## 5. Smoke Tests | ||||
|  | ||||
| | Test | Command / Action | Expected Result | | ||||
| | --- | --- | --- | | ||||
| | API health | `curl https://scanner.prod.../healthz` | HTTP 200 with `status":"Healthy"` | | ||||
| | Scan submit | `stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json` | Scan completes < 5 minutes; report accessible with signed DSSE | | ||||
| | Runtime event ingest | Post sample event from Zastava observer fixture | `/runtime/events` responds 202 Accepted; record visible in Mongo `runtime_events` | | ||||
| | Signing | `stellaops-cli signer sign --bundle demo.json` | Returns DSSE with matching SHA256 and signer metadata | | ||||
| | Attestor verify | `stellaops-cli attestor verify --uuid <uuid>` | Verification result `ok=true` | | ||||
| | Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent | | ||||
|  | ||||
| Log results in the change ticket with timestamps and screenshots where applicable. | ||||
|  | ||||
| ## 6. Rollback Procedure | ||||
|  | ||||
| 1. Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts. | ||||
| 2. For Compose: | ||||
|    ```bash | ||||
|    docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml down | ||||
|    docker compose --env-file stage.env -f deploy/compose/docker-compose.stage.yaml up -d | ||||
|    ``` | ||||
| 3. For Helm: | ||||
|    ```bash | ||||
|    helm rollback stellaops <previous-release-number> --namespace stellaops | ||||
|    ``` | ||||
| 4. Restore Mongo snapshot if data inconsistency detected: `mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>`. | ||||
| 5. Restore MinIO mirror if required: `mc mirror minio-backup/stellaops-<timestamp> minio/stellaops`. | ||||
| 6. Notify stakeholders of rollback and capture root cause notes in incident ticket. | ||||
|  | ||||
| ## 7. Post-cutover Actions | ||||
|  | ||||
| - Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth. | ||||
| - Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored. | ||||
| - Update `docs/modules/devops/runbooks/launch-readiness.md` if any new gaps or follow-ups discovered. | ||||
| - Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner. | ||||
|  | ||||
| ## 8. Approval Matrix | ||||
|  | ||||
| | Step | Required Approvers | Record Location | | ||||
| | --- | --- | --- | | ||||
| | Production deployment plan | CTO + DevOps lead | Change ticket comment | | ||||
| | Cutover start (T0) | DevOps lead + module reps | `#launch-bridge` summary | | ||||
| | Post-smoke success | DevOps lead + product owner | Change ticket closure | | ||||
| | Rollback (if invoked) | DevOps lead + CTO | Incident ticket | | ||||
|  | ||||
| Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned. | ||||
|  | ||||
| ## 9. Rehearsal Log | ||||
|  | ||||
| | Date (UTC) | What We Exercised | Outcome | Follow-up | | ||||
| | --- | --- | --- | --- | | ||||
| | 2025-10-26 | Dry-run of compose/Helm validation via `deploy/tools/validate-profiles.sh` (dev/stage/prod/airgap/mirror). Network creation simulated (`docker network create stellaops_frontdoor` planned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. | | ||||
							
								
								
									
										49
									
								
								docs/modules/devops/runbooks/launch-readiness.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										49
									
								
								docs/modules/devops/runbooks/launch-readiness.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,49 @@ | ||||
| # Launch Readiness Record - Stella Ops | ||||
|  | ||||
| _Updated: 2025-10-26 (UTC)_ | ||||
|  | ||||
| This document captures production launch sign-offs, deployment readiness checkpoints, and any open risks that must be tracked before GA cutover. | ||||
|  | ||||
| ## 1. Sign-off Summary | ||||
|  | ||||
| | Module / Service | Guild / Point of Contact | Evidence (Task or Runbook) | Status | Timestamp (UTC) | Notes | | ||||
| | --- | --- | --- | --- | --- | --- | | ||||
| | Authority (Issuer) | Authority Core Guild | `AUTH-AOC-19-001` - scope issuance & configuration complete (DONE 2025-10-26) | READY | 2025-10-26T14:05Z | Tenant scope propagation follow-up (`AUTH-AOC-19-002`) tracked in gaps section. | | ||||
| | Signer | Signer Guild | `SIGNER-API-11-101` / `SIGNER-REF-11-102` / `SIGNER-QUOTA-11-103` (DONE 2025-10-21) | READY | 2025-10-26T14:07Z | DSSE signing, referrer verification, and quota enforcement validated in CI. | | ||||
| | Attestor | Attestor Guild | `ATTESTOR-API-11-201` / `ATTESTOR-VERIFY-11-202` / `ATTESTOR-OBS-11-203` (DONE 2025-10-19) | READY | 2025-10-26T14:10Z | Rekor submission/verification pipeline green; telemetry pack published. | | ||||
| | Scanner Web + Worker | Scanner WebService Guild | `SCANNER-WEB-09-10x`, `SCANNER-RUNTIME-12-30x` (DONE 2025-10-18 -> 2025-10-24) | READY* | 2025-10-26T14:20Z | Orchestrator envelope work (`SCANNER-EVENTS-16-301/302`) still open; see gaps. | | ||||
| | Concelier Core & Connectors | Concelier Core / Ops Guild | Ops runbook sign-off in `docs/modules/concelier/operations/conflict-resolution.md` (2025-10-16) | READY | 2025-10-26T14:25Z | Conflict resolution & connector coverage accepted; Mongo schema hardening pending (see gaps). | | ||||
| | Excititor API | Excititor Core Guild | Wave 0 connector ingest sign-offs (EXECPLAN.Section  Wave 0) | READY | 2025-10-26T14:28Z | VEX linkset publishing complete for launch datasets. | | ||||
| | Notify Web (legacy) | Notify Guild | Existing stack carried forward; Notifier program tracked separately (Sprint 38-40) | PENDING | 2025-10-26T14:32Z | Legacy notify web remains operational; migration to Notifier blocked on `SCANNER-EVENTS-16-301`. | | ||||
| | Web UI | UI Guild | Stable build `registry.stella-ops.org/.../web-ui@sha256:10d9248...` deployed in stage and smoke-tested | READY | 2025-10-26T14:35Z | Policy editor GA items (Sprint 20) outside launch scope. | | ||||
| | DevOps / Release | DevOps Guild | `deploy/tools/validate-profiles.sh` run (2025-10-26) covering dev/stage/prod/airgap/mirror | READY | 2025-10-26T15:02Z | Compose/Helm lint + docker compose config validated; see Section 2 for details. | | ||||
| | Offline Kit | Offline Kit Guild | `DEVOPS-OFFLINE-18-004` (Go analyzer) and `DEVOPS-OFFLINE-18-005` (Python analyzer) complete; debug-store mirror pending (`DEVOPS-OFFLINE-17-004`). | PENDING | 2025-10-26T15:05Z | Awaiting release debug artefacts to finalise `DEVOPS-OFFLINE-17-004`; tracked in Section 3. | | ||||
|  | ||||
| _\* READY with caveat - remaining work noted in Section 3._ | ||||
|  | ||||
| ## 2. Deployment Readiness Checklist | ||||
|  | ||||
| - **Production profiles committed:** `deploy/compose/docker-compose.prod.yaml` and `deploy/helm/stellaops/values-prod.yaml` added with front-door network hand-off and secret references for Mongo/MinIO/core services. | ||||
| - **Secrets placeholders documented:** `deploy/compose/env/prod.env.example` enumerates required credentials (`MONGO_INITDB_ROOT_PASSWORD`, `MINIO_ROOT_PASSWORD`, Redis/NATS endpoints, `FRONTDOOR_NETWORK`). Helm values reference Kubernetes secrets (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`). | ||||
| - **Static validation executed:** `deploy/tools/validate-profiles.sh` run on 2025-10-26 (docker compose config + helm lint/template) with all profiles passing. | ||||
| - **Ingress model defined:** Production compose profile introduces external `frontdoor` network; README updated with creation instructions and scope of externally reachable services. | ||||
| - **Observability hooks:** Authority/Signer/Attestor telemetry packs verified; scanner runtime build-id metrics landed (`SCANNER-RUNTIME-17-401`). Grafana dashboards referenced in component runbooks. | ||||
| - **Rollback assets:** Stage Compose profile remains aligned (`docker-compose.stage.yaml`), enabling rehearsals before prod cutover; release manifests (`deploy/releases/2025.09-stable.yaml`) map digests for reproducible rollback. | ||||
| - **Rehearsal status:** 2025-10-26 validation dry-run executed (`deploy/tools/validate-profiles.sh` across dev/stage/prod/airgap/mirror). Full stage Helm rollout pending access to the managed staging cluster; target to complete once credentials are provisioned. | ||||
|  | ||||
| ## 3. Outstanding Gaps & Follow-ups | ||||
|  | ||||
| | Item | Owner | Tracking Ref | Target / Next Step | Impact | | ||||
| | --- | --- | --- | --- | --- | | ||||
| | Tenant scope propagation and audit coverage | Authority Core Guild | `AUTH-AOC-19-002` (DOING 2025-10-26) | Land enforcement + audit fixtures by Sprint 19 freeze | Medium - required for multi-tenant GA but does not block initial cutover if tenants scoped manually. | | ||||
| | Orchestrator event envelopes + Notifier handshake | Scanner WebService Guild | `SCANNER-EVENTS-16-301` (BLOCKED), `SCANNER-EVENTS-16-302` (DOING) | Coordinate with Gateway/Notifier owners on preview package replacement or binding redirects; rerun `dotnet test` once patch lands and refresh schema docs. Share envelope samples in `docs/events/` after tests pass. | High — gating Notifier migration; legacy notify path remains functional meanwhile. | | ||||
| | Offline Kit Python analyzer bundle | Offline Kit Guild + Scanner Guild | `DEVOPS-OFFLINE-18-005` (DONE 2025-10-26) | Monitor for follow-up manifest updates and rerun smoke script when analyzers change. | Medium - ensures language analyzer coverage stays current for offline installs. | | ||||
| | Offline Kit debug store mirror | Offline Kit Guild + DevOps Guild | `DEVOPS-OFFLINE-17-004` (BLOCKED 2025-10-26) | Release pipeline must publish `out/release/debug` artefacts; once available, run `mirror_debug_store.py` and commit `metadata/debug-store.json`. | Low - symbol lookup remains accessible from staging assets but required before next Offline Kit tag. | | ||||
| | Mongo schema validators for advisory ingestion | Concelier Storage Guild | `CONCELIER-STORE-AOC-19-001` (TODO) | Finalize JSON schema + migration toggles; coordinate with Ops for rollout window | Low - current validation handled in app layer; schema guard adds defense-in-depth. | | ||||
| | Authority plugin telemetry alignment | Security Guild | `SEC2.PLG`, `SEC3.PLG`, `SEC5.PLG` (BLOCKED pending AUTH DPoP/MTLS tasks) | Resume once upstream auth surfacing stabilises | Low - plugin remains optional; launch uses default Authority configuration. | | ||||
|  | ||||
| ## 4. Approvals & Distribution | ||||
|  | ||||
| - Record shared in `#launch-readiness` (Mattermost) 2025-10-26 15:15 UTC with DevOps + Guild leads for acknowledgement. | ||||
| - Updates to this document require dual sign-off from DevOps Guild (owner) and impacted module guild lead; retain change log via Git history. | ||||
| - Cutover rehearsal and rollback drills are tracked separately in `docs/modules/devops/runbooks/launch-cutover.md` (see associated Task `DEVOPS-LAUNCH-18-001`). *** End Patch | ||||
							
								
								
									
										64
									
								
								docs/modules/devops/runbooks/nuget-preview-bootstrap.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										64
									
								
								docs/modules/devops/runbooks/nuget-preview-bootstrap.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,64 @@ | ||||
| # NuGet Preview Bootstrap (Offline-Friendly) | ||||
|  | ||||
| The StellaOps build relies on .NET 10 RC2 packages (Microsoft.Extensions.*, JwtBearer 10.0 RC). | ||||
| `NuGet.config` now wires three sources: | ||||
|  | ||||
| 1. `local` → `./local-nuget` (preferred, air-gapped mirror) | ||||
| 2. `dotnet-public` → `https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json` | ||||
| 3. `nuget.org` → fallback for everything else | ||||
|  | ||||
| Follow the steps below whenever you refresh the repo or roll a new Offline Kit drop. | ||||
|  | ||||
| ## 1. Mirror the preview packages | ||||
|  | ||||
| ```bash | ||||
| ./ops/devops/sync-preview-nuget.sh | ||||
| ``` | ||||
|  | ||||
| * Reads `ops/devops/nuget-preview-packages.csv`. Each line specifies the package, version, expected SHA-256 hash, and (optionally) the flat-container base URL (we pin to `dotnet-public`). | ||||
| * Downloads the `.nupkg` straight into `./local-nuget/` and re-verifies the checksum. Existing files are skipped when hashes already match. | ||||
| * Use `NUGET_V2_BASE` if you need to temporarily point at a different mirror. | ||||
|  | ||||
| 💡 The script never mutates packages in place—if a checksum changes you will see a “SHA mismatch … refreshing” message. | ||||
|  | ||||
| ## 2. Restore using the shared `NuGet.config` | ||||
|  | ||||
| From the repo root: | ||||
|  | ||||
| ```bash | ||||
| DOTNET_NOLOGO=1 dotnet restore src/Excititor/__Libraries/StellaOps.Excititor.Connectors.Abstractions/StellaOps.Excititor.Connectors.Abstractions.csproj \ | ||||
|   --configfile NuGet.config | ||||
| ``` | ||||
|  | ||||
| The `packageSourceMapping` section keeps `Microsoft.Extensions.*`, `Microsoft.AspNetCore.*`, and `Microsoft.Data.Sqlite` bound to `local`/`dotnet-public`, so `dotnet restore` never has to reach out to nuget.org when mirrors are populated. | ||||
|  | ||||
| Before committing changes (or when wiring up a new environment) run: | ||||
|  | ||||
| ```bash | ||||
| python3 ops/devops/validate_restore_sources.py | ||||
| ``` | ||||
|  | ||||
| The validator asserts: | ||||
|  | ||||
| - `NuGet.config` lists `local` → `dotnet-public` → `nuget.org` in that order. | ||||
| - `Directory.Build.props` pins `RestoreSources` so every project prioritises the local mirror. | ||||
| - No stray `NuGet.config` files shadow the repo root configuration. | ||||
|  | ||||
| CI executes the validator in both the `build-test-deploy` and `release` workflows, | ||||
| so regressions trip before any restore/build begins. | ||||
|  | ||||
| If you run fully air-gapped, remember to clear the cache between SDK upgrades: | ||||
|  | ||||
| ```bash | ||||
| dotnet nuget locals all --clear | ||||
| ``` | ||||
|  | ||||
| ## 3. Troubleshooting | ||||
|  | ||||
| | Symptom | Fix | | ||||
| | --- | --- | | ||||
| | `dotnet restore` still hits nuget.org for preview packages | Re-run `sync-preview-nuget.sh` to ensure the `.nupkg` exists locally, then delete `~/.nuget/packages/microsoft.extensions.*` so the resolver picks up the mirrored copy. | | ||||
| | SHA mismatch in the manifest | Update `ops/devops/nuget-preview-packages.csv` with the new version + checksum (from the feed) and re-run the sync script. | | ||||
| | Azure DevOps feed throttling | Set `DOTNET_PUBLIC_FLAT_BASE` env var and point it at your own mirrored flat-container, then add the URL to the 4th column of the manifest. | | ||||
|  | ||||
| Keep this doc alongside Offline Kit instructions so air-gapped operators know exactly how to refresh the mirror and verify packages before restore. | ||||
		Reference in New Issue
	
	Block a user