audit work, fixed StellaOps.sln warnings/errors, fixed tests, sprints work, new advisories
This commit is contained in:
151
docs/operations/devops/runbooks/deployment-upgrade.md
Normal file
151
docs/operations/devops/runbooks/deployment-upgrade.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Stella Ops Deployment Upgrade & Rollback Runbook
|
||||
|
||||
_Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._
|
||||
|
||||
This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag.
|
||||
|
||||
---
|
||||
|
||||
## 1. Channel overview
|
||||
|
||||
| Channel | Release manifest | Helm values | Compose profile |
|
||||
|---------|------------------|-------------|-----------------|
|
||||
| `edge` | `deploy/releases/2025.10-edge.yaml` | `devops/helm/stellaops/values-dev.yaml` | `devops/compose/docker-compose.dev.yaml` |
|
||||
| `stable` | `deploy/releases/2025.09-stable.yaml` | `devops/helm/stellaops/values-stage.yaml`, `devops/helm/stellaops/values-prod.yaml` | `devops/compose/docker-compose.stage.yaml`, `devops/compose/docker-compose.prod.yaml` |
|
||||
| `airgap` | `deploy/releases/2025.09-airgap.yaml` | `devops/helm/stellaops/values-airgap.yaml` | `devops/compose/docker-compose.airgap.yaml` |
|
||||
|
||||
Infrastructure components (PostgreSQL, Valkey, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `devops/compose/*.yaml` for the authoritative set.
|
||||
|
||||
---
|
||||
|
||||
## 2. Pre-flight checklist
|
||||
|
||||
1. **Refresh release manifest**
|
||||
Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`).
|
||||
|
||||
2. **Align deployment bundles with the manifest**
|
||||
Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services.
|
||||
```bash
|
||||
./deploy/tools/check-channel-alignment.py \
|
||||
--release deploy/releases/2025.10-edge.yaml \
|
||||
--target devops/helm/stellaops/values-dev.yaml \
|
||||
--target devops/compose/docker-compose.dev.yaml \
|
||||
--ignore-repo nats
|
||||
```
|
||||
Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files.
|
||||
|
||||
3. **Lint and template profiles**
|
||||
```bash
|
||||
./deploy/tools/validate-profiles.sh
|
||||
```
|
||||
|
||||
4. **Smoke the Offline Kit debug store (edge/stable only)**
|
||||
When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree:
|
||||
```bash
|
||||
./ops/offline-kit/mirror_debug_store.py \
|
||||
--release-dir out/release \
|
||||
--offline-kit-dir out/offline-kit
|
||||
```
|
||||
Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.
|
||||
|
||||
5. **Review compatibility matrix**
|
||||
Confirm PostgreSQL, Valkey, and RustFS versions in the release manifest match platform SLOs. The default targets are `postgres:16-alpine`, `valkey:8.0`, `rustfs:2025.10.0-edge`.
|
||||
|
||||
6. **Create a rollback bookmark**
|
||||
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.
|
||||
|
||||
---
|
||||
|
||||
## 3. Helm upgrade procedure (staging → production)
|
||||
|
||||
1. Switch to the deployment branch and ensure secrets/config maps are current.
|
||||
2. Apply the upgrade in the staging cluster:
|
||||
```bash
|
||||
helm upgrade stellaops devops/helm/stellaops \
|
||||
-f devops/helm/stellaops/values-stage.yaml \
|
||||
--namespace stellaops \
|
||||
--atomic \
|
||||
--timeout 15m
|
||||
```
|
||||
3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks).
|
||||
4. Promote to production using the prod values file and the same command.
|
||||
5. Record the new revision number and Git SHA in the change log.
|
||||
|
||||
### Rollback (Helm)
|
||||
|
||||
1. Identify the previous revision: `helm history stellaops -n stellaops`.
|
||||
2. Execute:
|
||||
```bash
|
||||
helm rollback stellaops <revision> \
|
||||
--namespace stellaops \
|
||||
--wait \
|
||||
--timeout 10m
|
||||
```
|
||||
3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests.
|
||||
4. Update the incident/operations log with root cause and rollback details.
|
||||
|
||||
---
|
||||
|
||||
## 4. Docker Compose upgrade procedure
|
||||
|
||||
1. Update environment files (`devops/compose/env/*.env.example`) with any new settings and sync secrets to hosts.
|
||||
2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable).
|
||||
3. Apply the upgrade:
|
||||
```bash
|
||||
docker compose \
|
||||
--env-file devops/compose/env/prod.env \
|
||||
-f devops/compose/docker-compose.prod.yaml \
|
||||
pull
|
||||
|
||||
docker compose \
|
||||
--env-file devops/compose/env/prod.env \
|
||||
-f devops/compose/docker-compose.prod.yaml \
|
||||
up -d
|
||||
```
|
||||
4. Tail logs for critical services (`docker compose logs -f authority concelier`).
|
||||
5. Update monitoring dashboards/alerts to confirm normal operation.
|
||||
|
||||
### Rollback (Compose)
|
||||
|
||||
1. Check out the previous release tag (e.g. `git checkout 2025.09.1`).
|
||||
2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests.
|
||||
3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/modules/authority/operations/backup-restore.md` and associated service guides).
|
||||
4. Log the rollback in the operations journal.
|
||||
|
||||
---
|
||||
|
||||
## 5. Channel promotion workflow
|
||||
|
||||
1. Author or update the channel manifest under `deploy/releases/`.
|
||||
2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
|
||||
3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`).
|
||||
4. Publish release notes and update `deploy/releases/README.md` (if applicable).
|
||||
5. Tag the repository when promoting stable or airgap builds.
|
||||
|
||||
---
|
||||
|
||||
## 6. Upgrade rehearsal & rollback drill log
|
||||
|
||||
Maintain rehearsal notes in `docs/modules/devops/runbooks/launch-cutover.md` or the relevant sprint planning document. After each drill capture:
|
||||
|
||||
- Release version tested
|
||||
- Date/time
|
||||
- Participants
|
||||
- Issues encountered & fixes
|
||||
- Rollback duration (if executed)
|
||||
|
||||
Attach the log to the sprint retro or operational wiki.
|
||||
|
||||
| Date (UTC) | Channel | Outcome | Notes |
|
||||
|------------|---------|---------|-------|
|
||||
| 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion.
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
- `deploy/README.md` – structure and validation workflow for deployment bundles.
|
||||
- `docs/RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline.
|
||||
- `docs/modules/devops/architecture.md` – high-level DevOps architecture, SLOs, and compliance requirements.
|
||||
- `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper.
|
||||
- `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker.
|
||||
130
docs/operations/devops/runbooks/launch-cutover.md
Normal file
130
docs/operations/devops/runbooks/launch-cutover.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Launch Cutover Runbook - Stella Ops
|
||||
|
||||
_Document owner: DevOps Guild (2025-10-26)_
|
||||
_Scope:_ Full-platform launch from staging to production for release `2025.09.2`.
|
||||
|
||||
> **Note (2025-12):** This document reflects the state at initial launch. Since then, MongoDB has been fully removed (Sprint 4400) and replaced with PostgreSQL. MinIO references now use RustFS. Redis references now use Valkey. See current deployment docs in `deploy/` for up-to-date configuration.
|
||||
|
||||
## 1. Roles and Communication
|
||||
|
||||
| Role | Primary | Backup | Contact |
|
||||
| --- | --- | --- | --- |
|
||||
| Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | `#launch-bridge` (Mattermost) |
|
||||
| Authority stack | Authority Core guild rep | Security guild rep | `#authority` |
|
||||
| Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | `#scanner` |
|
||||
| Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation |
|
||||
| Observability | Telemetry guild rep | SRE on-call | `#telemetry` |
|
||||
| Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket |
|
||||
|
||||
Set up a bridge call 30 minutes before start and keep `#launch-bridge` updated every 10 minutes.
|
||||
|
||||
## 2. Timeline Overview (UTC)
|
||||
|
||||
| Time | Activity | Owner |
|
||||
| --- | --- | --- |
|
||||
| T-24h | Change ticket approved, prod secrets verified, offline kit build status checked (`DEVOPS-OFFLINE-18-005`). | DevOps lead |
|
||||
| T-12h | Run `deploy/tools/validate-profiles.sh`; capture logs in ticket. | DevOps engineer |
|
||||
| T-6h | Freeze non-launch deployments; notify guild leads. | Product owner |
|
||||
| T-2h | Execute rehearsal in staging (Section 3) using `values-stage.yaml` to verify scripts. | DevOps + module reps |
|
||||
| T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead |
|
||||
| T0 | Execute production cutover steps (Section 4). | Cutover team |
|
||||
| T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead |
|
||||
| T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner |
|
||||
|
||||
## 3. Rehearsal (Staging) Checklist
|
||||
|
||||
1. `docker network create stellaops_frontdoor || true` (if not present on staging jump host).
|
||||
2. Run `deploy/tools/validate-profiles.sh` and archive output.
|
||||
3. Apply staging secrets (`kubectl apply -f secrets/stage/*.yaml` or `helm secrets upgrade`) ensuring `stellaops-stage` credentials align with `values-stage.yaml`.
|
||||
4. Perform `helm upgrade stellaops devops/helm/stellaops -f devops/helm/stellaops/values-stage.yaml` in staging cluster.
|
||||
5. Verify health endpoints: `curl https://authority.stage.../healthz`, `curl https://scanner.stage.../healthz`.
|
||||
6. Execute smoke CLI: `stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.json` and confirm report status in UI.
|
||||
7. Document total wall time and any deviations in the rehearsal log.
|
||||
|
||||
Rehearsal must complete without manual interventions before proceeding to production.
|
||||
|
||||
## 4. Production Cutover Steps
|
||||
|
||||
### 4.1 Pre-flight
|
||||
- Confirm production secrets in the appropriate secret store (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`) contain the keys referenced in `values-prod.yaml`.
|
||||
- Ensure the external reverse proxy network exists: `docker network create stellaops_frontdoor || true` on each compose host.
|
||||
- Back up current configuration and data:
|
||||
- Mongo snapshot: `mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds)`.
|
||||
- MinIO policy export: `mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M)`.
|
||||
|
||||
### 4.2 Apply Updates (Compose)
|
||||
1. On each compose node, pull updated images for release `2025.09.2`:
|
||||
```bash
|
||||
docker compose --env-file prod.env -f devops/compose/docker-compose.prod.yaml pull
|
||||
```
|
||||
2. Deploy changes:
|
||||
```bash
|
||||
docker compose --env-file prod.env -f devops/compose/docker-compose.prod.yaml up -d
|
||||
```
|
||||
3. Confirm containers healthy via `docker compose ps` and `docker logs <service> --tail 50`.
|
||||
|
||||
### 4.3 Apply Updates (Helm/Kubernetes)
|
||||
If using Kubernetes, perform:
|
||||
```bash
|
||||
helm upgrade stellaops devops/helm/stellaops -f devops/helm/stellaops/values-prod.yaml --atomic --timeout 15m
|
||||
```
|
||||
Monitor rollout with `kubectl get pods -n stellaops --watch` and `kubectl rollout status deployment/<service>`.
|
||||
|
||||
### 4.4 Configuration Validation
|
||||
- Verify Authority issuer metadata: `curl https://authority.prod.../.well-known/openid-configuration`.
|
||||
- Validate Signer DSSE endpoint: `stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json`.
|
||||
- Check Scanner queue connectivity: `docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue` (returns success).
|
||||
- Ensure Notify (legacy) still accessible while Notifier migration pending.
|
||||
|
||||
## 5. Smoke Tests
|
||||
|
||||
| Test | Command / Action | Expected Result |
|
||||
| --- | --- | --- |
|
||||
| API health | `curl https://scanner.prod.../healthz` | HTTP 200 with `status":"Healthy"` |
|
||||
| Scan submit | `stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json` | Scan completes < 5 minutes; report accessible with signed DSSE |
|
||||
| Runtime event ingest | Post sample event from Zastava observer fixture | `/runtime/events` responds 202 Accepted; record visible in Mongo `runtime_events` |
|
||||
| Signing | `stellaops-cli signer sign --bundle demo.json` | Returns DSSE with matching SHA256 and signer metadata |
|
||||
| Attestor verify | `stellaops-cli attestor verify --uuid <uuid>` | Verification result `ok=true` |
|
||||
| Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent |
|
||||
|
||||
Log results in the change ticket with timestamps and screenshots where applicable.
|
||||
|
||||
## 6. Rollback Procedure
|
||||
|
||||
1. Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts.
|
||||
2. For Compose:
|
||||
```bash
|
||||
docker compose --env-file prod.env -f devops/compose/docker-compose.prod.yaml down
|
||||
docker compose --env-file stage.env -f devops/compose/docker-compose.stage.yaml up -d
|
||||
```
|
||||
3. For Helm:
|
||||
```bash
|
||||
helm rollback stellaops <previous-release-number> --namespace stellaops
|
||||
```
|
||||
4. Restore Mongo snapshot if data inconsistency detected: `mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>`.
|
||||
5. Restore MinIO mirror if required: `mc mirror minio-backup/stellaops-<timestamp> minio/stellaops`.
|
||||
6. Notify stakeholders of rollback and capture root cause notes in incident ticket.
|
||||
|
||||
## 7. Post-cutover Actions
|
||||
|
||||
- Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth.
|
||||
- Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored.
|
||||
- Update `docs/modules/devops/runbooks/launch-readiness.md` if any new gaps or follow-ups discovered.
|
||||
- Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner.
|
||||
|
||||
## 8. Approval Matrix
|
||||
|
||||
| Step | Required Approvers | Record Location |
|
||||
| --- | --- | --- |
|
||||
| Production deployment plan | CTO + DevOps lead | Change ticket comment |
|
||||
| Cutover start (T0) | DevOps lead + module reps | `#launch-bridge` summary |
|
||||
| Post-smoke success | DevOps lead + product owner | Change ticket closure |
|
||||
| Rollback (if invoked) | DevOps lead + CTO | Incident ticket |
|
||||
|
||||
Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned.
|
||||
|
||||
## 9. Rehearsal Log
|
||||
|
||||
| Date (UTC) | What We Exercised | Outcome | Follow-up |
|
||||
| --- | --- | --- | --- |
|
||||
| 2025-10-26 | Dry-run of compose/Helm validation via `deploy/tools/validate-profiles.sh` (dev/stage/prod/airgap/mirror). Network creation simulated (`docker network create stellaops_frontdoor` planned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. |
|
||||
51
docs/operations/devops/runbooks/launch-readiness.md
Normal file
51
docs/operations/devops/runbooks/launch-readiness.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# Launch Readiness Record - Stella Ops
|
||||
|
||||
_Updated: 2025-10-26 (UTC)_
|
||||
|
||||
> **Note (2025-12):** This document reflects the state at initial launch. Since then, MongoDB has been fully removed (Sprint 4400) and replaced with PostgreSQL. Redis references now use Valkey. See current deployment docs in `deploy/` for up-to-date configuration.
|
||||
|
||||
This document captures production launch sign-offs, deployment readiness checkpoints, and any open risks that must be tracked before GA cutover.
|
||||
|
||||
## 1. Sign-off Summary
|
||||
|
||||
| Module / Service | Guild / Point of Contact | Evidence (Task or Runbook) | Status | Timestamp (UTC) | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| Authority (Issuer) | Authority Core Guild | `AUTH-AOC-19-001` - scope issuance & configuration complete (DONE 2025-10-26) | READY | 2025-10-26T14:05Z | Tenant scope propagation follow-up (`AUTH-AOC-19-002`) tracked in gaps section. |
|
||||
| Signer | Signer Guild | `SIGNER-API-11-101` / `SIGNER-REF-11-102` / `SIGNER-QUOTA-11-103` (DONE 2025-10-21) | READY | 2025-10-26T14:07Z | DSSE signing, referrer verification, and quota enforcement validated in CI. |
|
||||
| Attestor | Attestor Guild | `ATTESTOR-API-11-201` / `ATTESTOR-VERIFY-11-202` / `ATTESTOR-OBS-11-203` (DONE 2025-10-19) | READY | 2025-10-26T14:10Z | Rekor submission/verification pipeline green; telemetry pack published. |
|
||||
| Scanner Web + Worker | Scanner WebService Guild | `SCANNER-WEB-09-10x`, `SCANNER-RUNTIME-12-30x` (DONE 2025-10-18 -> 2025-10-24) | READY* | 2025-10-26T14:20Z | Orchestrator envelope work (`SCANNER-EVENTS-16-301/302`) still open; see gaps. |
|
||||
| Concelier Core & Connectors | Concelier Core / Ops Guild | Ops runbook sign-off in `docs/modules/concelier/operations/conflict-resolution.md` (2025-10-16) | READY | 2025-10-26T14:25Z | Conflict resolution & connector coverage accepted; Mongo schema hardening pending (see gaps). |
|
||||
| Excititor API | Excititor Core Guild | Wave 0 connector ingest sign-offs (Sprint backlog reference) | READY | 2025-10-26T14:28Z | VEX linkset publishing complete for launch datasets. |
|
||||
| Notify Web (legacy) | Notify Guild | Existing stack carried forward; Notifier program tracked separately (Sprint 38-40) | PENDING | 2025-10-26T14:32Z | Legacy notify web remains operational; migration to Notifier blocked on `SCANNER-EVENTS-16-301`. |
|
||||
| Web UI | UI Guild | Stable build `registry.stella-ops.org/.../web-ui@sha256:10d9248...` deployed in stage and smoke-tested | READY | 2025-10-26T14:35Z | Policy editor GA items (Sprint 20) outside launch scope. |
|
||||
| DevOps / Release | DevOps Guild | `deploy/tools/validate-profiles.sh` run (2025-10-26) covering dev/stage/prod/airgap/mirror | READY | 2025-10-26T15:02Z | Compose/Helm lint + docker compose config validated; see Section 2 for details. |
|
||||
| Offline Kit | Offline Kit Guild | `DEVOPS-OFFLINE-18-004` (Go analyzer) and `DEVOPS-OFFLINE-18-005` (Python analyzer) complete; debug-store mirror pending (`DEVOPS-OFFLINE-17-004`). | PENDING | 2025-11-23T15:05Z | Release workflow now ships `out/release/debug`; run `mirror_debug_store.py` on next release artefact and commit `metadata/debug-store.json`. |
|
||||
|
||||
_\* READY with caveat - remaining work noted in Section 3._
|
||||
|
||||
## 2. Deployment Readiness Checklist
|
||||
|
||||
- **Production profiles committed:** `devops/compose/docker-compose.prod.yaml` and `devops/helm/stellaops/values-prod.yaml` added with front-door network hand-off and secret references for Mongo/MinIO/core services.
|
||||
- **Secrets placeholders documented:** `devops/compose/env/prod.env.example` enumerates required credentials (`MONGO_INITDB_ROOT_PASSWORD`, `MINIO_ROOT_PASSWORD`, Redis/NATS endpoints, `FRONTDOOR_NETWORK`). Helm values reference Kubernetes secrets (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`).
|
||||
- **Static validation executed:** `deploy/tools/validate-profiles.sh` run on 2025-10-26 (docker compose config + helm lint/template) with all profiles passing.
|
||||
- **Ingress model defined:** Production compose profile introduces external `frontdoor` network; README updated with creation instructions and scope of externally reachable services.
|
||||
- **Observability hooks:** Authority/Signer/Attestor telemetry packs verified; scanner runtime build-id metrics landed (`SCANNER-RUNTIME-17-401`). Grafana dashboards referenced in component runbooks.
|
||||
- **Rollback assets:** Stage Compose profile remains aligned (`docker-compose.stage.yaml`), enabling rehearsals before prod cutover; release manifests (`deploy/releases/2025.09-stable.yaml`) map digests for reproducible rollback.
|
||||
- **Rehearsal status:** 2025-10-26 validation dry-run executed (`deploy/tools/validate-profiles.sh` across dev/stage/prod/airgap/mirror). Full stage Helm rollout pending access to the managed staging cluster; target to complete once credentials are provisioned.
|
||||
|
||||
## 3. Outstanding Gaps & Follow-ups
|
||||
|
||||
| Item | Owner | Tracking Ref | Target / Next Step | Impact |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Tenant scope propagation and audit coverage | Authority Core Guild | `AUTH-AOC-19-002` (DOING 2025-10-26) | Land enforcement + audit fixtures by Sprint 19 freeze | Medium - required for multi-tenant GA but does not block initial cutover if tenants scoped manually. |
|
||||
| Orchestrator event envelopes + Notifier handshake | Scanner WebService Guild | `SCANNER-EVENTS-16-301` (BLOCKED), `SCANNER-EVENTS-16-302` (DOING) | Coordinate with Gateway/Notifier owners on preview package replacement or binding redirects; rerun `dotnet test` once patch lands and refresh schema docs. Share envelope samples in `docs/modules/signals/events/` after tests pass. | High — gating Notifier migration; legacy notify path remains functional meanwhile. |
|
||||
| Offline Kit Python analyzer bundle | Offline Kit Guild + Scanner Guild | `DEVOPS-OFFLINE-18-005` (DONE 2025-10-26) | Monitor for follow-up manifest updates and rerun smoke script when analyzers change. | Medium - ensures language analyzer coverage stays current for offline installs. |
|
||||
| Offline Kit debug store mirror | Offline Kit Guild + DevOps Guild | `DEVOPS-OFFLINE-17-004` (TODO 2025-11-23) | Release pipeline now publishes `out/release/debug`; run `mirror_debug_store.py`, verify hashes, and commit `metadata/debug-store.json`. | Low - symbol lookup remains accessible from staging assets but required before next Offline Kit tag. |
|
||||
| Mongo schema validators for advisory ingestion | Concelier Storage Guild | `CONCELIER-STORE-AOC-19-001` (TODO) | Finalize JSON schema + migration toggles; coordinate with Ops for rollout window | Low - current validation handled in app layer; schema guard adds defense-in-depth. |
|
||||
| Authority plugin telemetry alignment | Security Guild | `SEC2.PLG`, `SEC3.PLG`, `SEC5.PLG` (BLOCKED pending AUTH DPoP/MTLS tasks) | Resume once upstream auth surfacing stabilises | Low - plugin remains optional; launch uses default Authority configuration. |
|
||||
|
||||
## 4. Approvals & Distribution
|
||||
|
||||
- Record shared in `#launch-readiness` (Mattermost) 2025-10-26 15:15 UTC with DevOps + Guild leads for acknowledgement.
|
||||
- Updates to this document require dual sign-off from DevOps Guild (owner) and impacted module guild lead; retain change log via Git history.
|
||||
- Cutover rehearsal and rollback drills are tracked separately in `docs/modules/devops/runbooks/launch-cutover.md` (see associated Task `DEVOPS-LAUNCH-18-001`). *** End Patch
|
||||
64
docs/operations/devops/runbooks/nuget-preview-bootstrap.md
Normal file
64
docs/operations/devops/runbooks/nuget-preview-bootstrap.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# NuGet Preview Bootstrap (Offline-Friendly)
|
||||
|
||||
The StellaOps build relies on .NET 10 RC2 packages (Microsoft.Extensions.*, JwtBearer 10.0 RC).
|
||||
`NuGet.config` now wires three sources:
|
||||
|
||||
1. `local` → `./local-nuget` (preferred, air-gapped mirror)
|
||||
2. `dotnet-public` → `https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json`
|
||||
3. `nuget.org` → fallback for everything else
|
||||
|
||||
Follow the steps below whenever you refresh the repo or roll a new Offline Kit drop.
|
||||
|
||||
## 1. Mirror the preview packages
|
||||
|
||||
```bash
|
||||
./ops/devops/sync-preview-nuget.sh
|
||||
```
|
||||
|
||||
* Reads `ops/devops/nuget-preview-packages.csv`. Each line specifies the package, version, expected SHA-256 hash, and (optionally) the flat-container base URL (we pin to `dotnet-public`).
|
||||
* Downloads the `.nupkg` straight into `./local-nuget/` and re-verifies the checksum. Existing files are skipped when hashes already match.
|
||||
* Use `NUGET_V2_BASE` if you need to temporarily point at a different mirror.
|
||||
|
||||
💡 The script never mutates packages in place—if a checksum changes you will see a “SHA mismatch … refreshing” message.
|
||||
|
||||
## 2. Restore using the shared `NuGet.config`
|
||||
|
||||
From the repo root:
|
||||
|
||||
```bash
|
||||
DOTNET_NOLOGO=1 dotnet restore src/Excititor/__Libraries/StellaOps.Excititor.Connectors.Abstractions/StellaOps.Excititor.Connectors.Abstractions.csproj \
|
||||
--configfile NuGet.config
|
||||
```
|
||||
|
||||
The `packageSourceMapping` section keeps `Microsoft.Extensions.*`, `Microsoft.AspNetCore.*`, and `Microsoft.Data.Sqlite` bound to `local`/`dotnet-public`, so `dotnet restore` never has to reach out to nuget.org when mirrors are populated.
|
||||
|
||||
Before committing changes (or when wiring up a new environment) run:
|
||||
|
||||
```bash
|
||||
python3 ops/devops/validate_restore_sources.py
|
||||
```
|
||||
|
||||
The validator asserts:
|
||||
|
||||
- `NuGet.config` lists `local` → `dotnet-public` → `nuget.org` in that order.
|
||||
- `Directory.Build.props` pins `RestoreSources` so every project prioritises the local mirror.
|
||||
- No stray `NuGet.config` files shadow the repo root configuration.
|
||||
|
||||
CI executes the validator in both the `build-test-deploy` and `release` workflows,
|
||||
so regressions trip before any restore/build begins.
|
||||
|
||||
If you run fully air-gapped, remember to clear the cache between SDK upgrades:
|
||||
|
||||
```bash
|
||||
dotnet nuget locals all --clear
|
||||
```
|
||||
|
||||
## 3. Troubleshooting
|
||||
|
||||
| Symptom | Fix |
|
||||
| --- | --- |
|
||||
| `dotnet restore` still hits nuget.org for preview packages | Re-run `sync-preview-nuget.sh` to ensure the `.nupkg` exists locally, then delete `~/.nuget/packages/microsoft.extensions.*` so the resolver picks up the mirrored copy. |
|
||||
| SHA mismatch in the manifest | Update `ops/devops/nuget-preview-packages.csv` with the new version + checksum (from the feed) and re-run the sync script. |
|
||||
| Azure DevOps feed throttling | Set `DOTNET_PUBLIC_FLAT_BASE` env var and point it at your own mirrored flat-container, then add the URL to the 4th column of the manifest. |
|
||||
|
||||
Keep this doc alongside Offline Kit instructions so air-gapped operators know exactly how to refresh the mirror and verify packages before restore.
|
||||
49
docs/operations/devops/runbooks/zastava-deployment.md
Normal file
49
docs/operations/devops/runbooks/zastava-deployment.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Zastava Deployment Runbook
|
||||
|
||||
> **Audience:** DevOps, Zastava Guild
|
||||
>
|
||||
> **Purpose:** Provide steps for deploying Zastava Observer + Webhook in connected and air-gapped clusters.
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
- Kubernetes 1.26+ with admission registration permissions.
|
||||
- Access to StellaOps Container Registry or offline bundle with Zastava images.
|
||||
- Authority scopes and certificates configured for Zastava identities.
|
||||
- Surface.FS cache endpoint (RustFS/S3) reachable from nodes.
|
||||
|
||||
## 2. Installation Steps
|
||||
|
||||
1. **Prepare namespace & secrets**
|
||||
- Create Kubernetes namespace (default `stellaops-runtime`).
|
||||
- Provision secrets (`zastava-mtls`, `zastava-op-token`, `surface-secrets`).
|
||||
2. **Deploy Observer**
|
||||
- Apply Helm chart `helm/zastava` with values aligning to Surface.Env settings.
|
||||
- Confirm DaemonSet pods schedule on all nodes; check `/healthz` endpoints.
|
||||
3. **Deploy Webhook**
|
||||
- Install ValidatingWebhookConfiguration with CA bundle and service reference.
|
||||
- Enable dry-run mode first, monitor logs, then switch `enforce=true` once validations pass.
|
||||
4. **Configure policies**
|
||||
- Populate admission policies in Policy Engine; ensure tokens contain `runtime:read` scopes.
|
||||
- Update CLI/Console settings for runtime posture view.
|
||||
5. **Observability**
|
||||
- Scrape metrics (`zastava_observer_*`, `zastava_webhook_*`).
|
||||
- Stream logs to central collector.
|
||||
|
||||
## 3. Air-Gapped Deployment Notes
|
||||
|
||||
- Use Offline Kit bundle (`offline/zastava/`) to load images and configuration.
|
||||
- Validate Surface.FS bundles before enabling enforcement.
|
||||
- Replace webhook CA with offline authority; document rotation schedule.
|
||||
|
||||
## 4. Validation
|
||||
|
||||
- Run `stella runtime policy test` against sample workloads.
|
||||
- Trigger deployment denial for unsigned images; verify Notifier emits alerts.
|
||||
- Check timeline events for observer telemetry.
|
||||
|
||||
## 5. References
|
||||
|
||||
- `docs/modules/zastava/architecture.md`
|
||||
- `docs/modules/scanner/architecture.md`
|
||||
- `docs/airgap/airgap-mode.md`
|
||||
- `docs/forensics/timeline.md`
|
||||
Reference in New Issue
Block a user