feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
2025-10-30 00:09:39 +02:00
parent 3154c67978
commit 7b5bdcf4d3
503 changed files with 16136 additions and 54638 deletions

View File

@@ -0,0 +1,22 @@
# DevOps agent guide
## Mission
The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
## Key docs
- [Module README](./README.md)
- [Architecture](./architecture.md)
- [Implementation plan](./implementation_plan.md)
- [Task board](./TASKS.md)
## How to get started
1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.
2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED).
3. Read the architecture and README for domain context before editing code or docs.
4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan.
## Guardrails
- Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md).
- Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts.
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
- Update runbooks/observability assets when operational characteristics change.

View File

@@ -0,0 +1,41 @@
# StellaOps DevOps
The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
## Responsibilities
- Maintain CI pipelines, signing workflows, and release packaging steps.
- Operate shared runbooks for launch readiness, upgrades, and NuGet previews.
- Provide offline kit assembly instructions and tooling integration.
- Wrap observability/telemetry bootstrap flows for platform teams.
## Key components
- Runbooks under ./runbooks/ (launch, deployment, nuget).
- Migration guidance under ./migrations/.
- Architecture overview bridging CI/CD & infrastructure concerns.
## Integrations & dependencies
- Ops pipelines (Gitea, GitHub Actions) and artifact registries.
- Authority/Signer for supply chain signing.
- Telemetry stack bootstrap scripts.
## Operational notes
- Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md.
- Dashboards for launch cutover rehearsals.
- Coordination with Security for enforced guardrails.
## Related resources
- ./runbooks/launch-readiness.md
- ./runbooks/launch-cutover.md
- ./runbooks/deployment-upgrade.md
- ./runbooks/nuget-preview-bootstrap.md
- ./migrations/semver-style.md
## Backlog references
- DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md.
- Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`.
## Epic alignment
- **Epic 1 AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines.
- **Epic 9 Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance.
- **Epic 10 Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports.
- **Epic 15 Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation.

View File

@@ -0,0 +1,9 @@
# Task board — DevOps
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
| ID | Status | Owner(s) | Description | Notes |
|----|--------|----------|-------------|-------|
| DEVOPS-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
| DEVOPS-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
| DEVOPS-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |

View File

@@ -0,0 +1,488 @@
# component_architecture_devops.md — **StellaOps Release & Operations** (2025Q4)
> Draws from the AOC guardrails, Orchestrator, Export Center, and Observability module plans to describe how StellaOps is built, signed, distributed, and operated.
> **Scope.** Implementationready blueprint for **how StellaOps is built, versioned, signed, distributed, upgraded, licensed (PoE)**, and operated in customer environments (online and airgapped). Covers reproducible builds, supplychain attestations, registries, offline kits, migration/rollback, artifact lifecycle (RustFS default + Mongo, S3 fallback), monitoring SLOs, and customer activation.
---
## 0) Product vision (operations lens)
StellaOps must be **trustable at a glance** and **boringly operable**:
* Every release ships with **firstparty SBOMs, provenance, and signatures**; services verify **each others** integrity at runtime.
* Customers can deploy by **digest** and stay aligned with **LTS/stable/edge** channels.
* Paid customers receive **attestation authority** (Signer accepts their PoE) while the core platform remains **free to run**.
* Airgapped customers receive **offline kits** with verifiable digests and deterministic import.
* Artifacts expire predictably; operators know whats kept, for how long, and why.
---
## 1) Release trains & versioning
### 1.1 Channels
* **LTS** (12month support window): quarterly cadence (Q1/Q2/Q3/Q4).
* **Stable** (default): monthly rollup (bug fixes + compatible features).
* **Edge**: weekly; for early adopters, no guarantees.
### 1.2 Version strings
Semantic core + calendar tag:
```
<MAJOR>.<MINOR>.<PATCH> (<YYYY>.<MM>) e.g., 2.4.1 (2027.06)
```
* **MAJOR**: breaking API/DB changes (rare).
* **MINOR**: new features, compatible schema migrations (expand/contract pattern).
* **PATCH**: bug fixes, perf and security updates.
* **Calendar tag** exposes **release year** used by Signer for **PoE window checks**.
### 1.3 Component alignment
A release is a **bundle** of image digests + charts + manifests. All services in a bundle are **wirecompatible**. Mixed minor versions are allowed within a bounded skew:
* **Web UI ↔ backend**: `±1 minor`.
* **Scanner ↔ Policy/Excititor/Concelier**: `±1 minor`.
* **Authority/Signer/Attestor triangle**: **must** be same minor (crypto and DPoP/mTLS binding rules).
At startup, services **selfadvertise** their semver & channel; the UI surfaces **mismatch warnings**.
---
## 2) Supplychain pipeline (how a release is built)
### 2.1 Deterministic builds
* **Builders**: isolated **BuildKit** workers with pinned base images (digest only).
* **Pinning**: lock files or `go.mod`, `package-lock.json`, `global.json`, `Directory.Packages.props` are **frozen** at tag.
* **Reproducibility**: timestamps normalized; source date epoch; deterministic zips/tars.
* **Multiarch**: linux/amd64 + linux/arm64 (Windows images track M2 roadmap).
### 2.2 Firstparty SBOMs & provenance
* Each image gets **CycloneDX (JSON+Protobuf) SBOM** and **SLSAstyle provenance** attached as **OCI referrers**.
* Scanners **Buildx generator** is used to produce SBOMs *during* build; a separate postbuild scan verifies parity (red flag if drift).
* **Release manifest** (see §6.1) lists all digests and SBOM/attestation refs.
### 2.3 Signing & transparency
* Images are **cosignsigned** (keyless) with a StellaOps release identity; inclusion in a **transparency log** (Rekor) is required.
* SBOM and provenance attestations are **DSSE** and also transparencylogged.
* Release keys (Fulcio roots or public keys) are embedded in **Signer** policy (for **scannerrelease validation** at customer side).
### 2.4 Gates & tests
* **Static**: linters, codegen checks, protobuf API freeze (backwardcompat tests).
* **Unit/integration**: percomponent, plus **endtoend** flows (scan→vex→policy→sign→attest).
* **Perf SLOs**: hot paths (SBOM compose, diff, export) measured against budgets.
* **Security**: dependency audit vs Concelier export; container hardening tests; minimal caps.
* **Analyzer smoke**: restart-time language plug-ins (currently Python) verified via `dotnet run --project src/Tools/LanguageAnalyzerSmoke` to ensure manifest integrity plus cold vs warm determinism (<30s / <5s budgets); the harness logs deviations from repository goldens for follow-up.
* **Canary cohort**: internal staging + selected customers; one week on **edge** before **stable** tag.
### 2.5 Debug-store artefacts
* Every release exports stripped debug information for ELF binaries discovered in service images. Debug files follow the GNU build-id layout (`debug/.build-id/<aa>/<rest>.debug`) and are generated via `objcopy --only-keep-debug`.
* `debug/debug-manifest.json` captures build-id component/image/source mappings with SHA-256 checksums so operators can mirror the directory into debuginfod or offline symbol stores. The manifest (and its `.sha256` companion) ships with every release bundle and Offline Kit.
---
## 3) Distribution & activation
### 3.1 Registries
* **Primary**: `registry.stella-ops.org` (OCI v2, supports Referrers API).
* **Mirrors**: GHCR (readonly), regional mirrors for latency.
* Operational runbook: see `docs/modules/concelier/operations/mirror.md` for deployment profiles, CDN guidance, and sync automation.
* **Pull by digest only** in Kubernetes/Compose manifests.
**Gating policy**:
* **Core images** (Authority, Scanner, Concelier, Excititor, Attestor, UI): public **read**.
* **Enterprise addons** (if any) and **prerelease**: private repos via the **Registry Token Service** (`src/Registry/StellaOps.Registry.TokenService`) which exchanges Authority-issued OpToks for short-lived Docker registry bearer tokens.
> Monetization lever is **signing** (PoE gate), not image pulls, so the core remains simple to consume.
### 3.2 OAuth2 token service (for private repos)
* Docker Registrys token flow backed by **Authority**:
1. Client hits registry (`401` with `WWW-Authenticate: Bearer realm=…`).
2. Client gets an **access token** from the token service (validated by Authority) with `scope=repository:…:pull`.
3. Registry allows pull for the requested repo.
* Tokens are **shortlived** (60300s) and **DPoPbound**.
The token service enforces plan gating via `registry-token.yaml` (see `docs/modules/registry/operations/token-service.md`) and exposes Prometheus metrics (`registry_token_issued_total`, `registry_token_rejected_total`). Revoked licence identifiers halt issuance even when scope requirements are met.
### 3.3 Offline kits (airgapped)
* Tarball per release channel:
```
stellaops-kit-<ver>-<channel>.tar.zst
/images/ OCI layout with all first-party images (multi-arch)
/sboms/ CycloneDX JSON+PB for each image
/attest/ DSSE bundles + Rekor proofs
/charts/ Helm charts + values templates
/compose/ docker-compose.yml + .env template
/plugins/ Concelier/Excititor connectors (restart-time)
/policy/ example policies
/manifest/ release.yaml (see §6.1)
```
* Import via CLI `offline kit import`; checks digests and signatures before load.
---
## 4) Licensing (PoE) & monetization
**Principle**: **Only paid StellaOps issues valid signed attestations.** Running the stack is free; signing requires PoE.
### 4.1 PoE issuance
* Customers purchase a plan and obtain a **PoE artifact** from `www.stella-ops.org`:
* **PoEJWT** (DPoP/mTLSbound) **or** **PoE mTLS client certificate**.
* Contains: `license_id`, `plan`, `valid_release_year`, `max_version`, `exp`, optional `tenant/customer` IDs.
### 4.2 Online enforcement
* **Signer** calls **Licensing /license/introspect** on every signing request (see signer doc).
* If **revoked/expired/outofwindow** → deny with machinereadable reason.
* All **valid** bundles are DSSEsigned and **Attestor** logs them; Rekor UUID returned.
* UI badges: “**Verified by StellaOps**” with link to the public log.
### 4.3 Airgapped / offline
* Customers obtain a **timeboxed PoE lease** (signed JSON, 730 days).
* Signer accepts the lease and emits **provisional** attestations (clearly labeled).
* When connectivity returns, a background job **endorses** the provisional entries with the cloud service, updating their status to **verified**.
* Operators can export a **verification bundle** for auditors even before endorsement (contains DSSE + local Rekor proof + lease snapshot).
### 4.4 Stolen/abused PoE
* Customers report theft; **Licensing** flags `license_id` as **revoked**.
* Subsequent Signer requests **deny**; previous attestations remain but can be marked **contested** (UI shows badge, optional resign path upon new PoE).
---
## 5) Deployment path (customer side)
### 5.1 First install
* **Helm** (Kubernetes) or **Compose** (VMs). Example (K8s):
```bash
helm repo add stellaops https://charts.stella-ops.org
helm install stella stellaops/platform \
--version 2.4.0 \
--set global.channel=stable \
--set authority.issuer=https://authority.stella.local \
--set scanner.minio.endpoint=http://minio.stella.local:9000 \
--set scanner.mongo.uri=mongodb://mongo/scanner \
--set concelier.mongo.uri=mongodb://mongo/concelier \
--set excititor.mongo.uri=mongodb://mongo/excititor
```
* Postinstall job registers **Authority clients** (Scanner, Signer, Attestor, UI) and prints **bootstrap** URLs and client credentials (sealed secrets).
* UI banner shows **release bundle** and verification state (cosign OK? Rekor OK?).
### 5.2 Updates
* **Blue/green**: pull new bundle by **digest**; deploy sidebyside; cut traffic.
* **Rolling**: upgrade stateful components in safe order:
1. Authority (stateless, dualkey rotation ready)
2. Signer/Attestor (same minor)
3. Scanner WebService & Workers
4. Concelier, then Excititor (schema migrations are expand/contract)
5. UI last
* **DB migrations** are **expand/contract**:
* Phase A (release N): **add** new fields/indexes, write old+new.
* Phase B (N+1): **read** new fields; **drop** old.
* Rollback is a matter of redeploying previous images and keeping both schemas valid.
### 5.3 Rollback
* Images referenced by **digest**; keep previous release manifest `K` versions back.
* `helm rollback` or compose `docker compose -f release-K.yml up -d`.
* Mongo migrations are additive; **no destructive changes** within a single minor.
---
## 6) Release payloads & manifests
### 6.1 Release manifest (`release.yaml`)
```yaml
release:
version: "2.4.1"
channel: "stable"
date: "2027-06-20T12:00:00Z"
calendar: "2027.06"
components:
- name: scanner-webservice
image: registry.stella-ops.org/stellaops/scanner-web@sha256:aa..bb
sbom: oci://.../referrers/cdx-json@sha256:11..22
provenance: oci://.../attest/provenance@sha256:33..44
signature: { rekorUUID: "…" }
- name: signer
image: registry.stella-ops.org/stellaops/signer@sha256:cc..dd
signature: { rekorUUID: "…" }
charts:
- name: platform
version: "2.4.1"
digest: "sha256:ee..ff"
compose:
file: "docker-compose.yml"
digest: "sha256:77..88"
checksums:
sha256: "… digest of this release.yaml …"
```
The manifest is **cosignsigned**; UI/CLI can verify a bundle without talking to registries.
> Deployment guardrails The repository keeps channel-aligned Compose bundles
> in `deploy/compose/` and Helm overlays in `deploy/helm/stellaops/`. Both sets
> pull their digests from `deploy/releases/` and are validated by
> `deploy/tools/validate-profiles.sh` to guarantee lint/dry-run cleanliness.
### 6.2 Image labels (release metadata)
Each image sets OCI labels:
```
org.opencontainers.image.version = "2.4.1"
org.opencontainers.image.revision = "<git sha>"
org.opencontainers.image.created = "2027-06-20T12:00:00Z"
org.stellaops.release.calendar = "2027.06"
org.stellaops.release.channel = "stable"
org.stellaops.build.slsaProvenance = "oci://…"
```
Signer validates **scanner** images cosign identity + calendar tag for **release window** checks.
---
## 7) Artifact lifecycle & storage (RustFS/Mongo)
### 7.1 Buckets & prefixes (RustFS)
```
rustfs://stellaops/
scanner/
layers/<sha256>/sbom.cdx.json.zst
images/<imgDigest>/inventory.cdx.pb
images/<imgDigest>/usage.cdx.pb
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json
concelier/
json/<exportId>/...
trivy/<exportId>/...
excititor/
exports/<exportId>/...
attestor/
dsse/<bundleSha256>.json
proof/<rekorUuid>.json
```
### 7.2 ILM classes
* **`short`**: working artifacts (diffs, queues) — TTL 714 days.
* **`default`**: SBOMs & indexes — TTL 90180 days (configurable).
* **`compliance`**: signed reports & attested exports — retention enforced via RustFS hold or S3 Object Lock (governance/compliance) 17 years.
### 7.3 Artifact Lifecycle Controller (ALC)
* A background worker (part of Scanner.WebService) enforces **TTL** and **reference counting**:
* Artifacts referenced by **reports** or **tickets** are pinned.
* ILM actions logged; UI shows perclass usage & upcoming purges.
> **Migration note.** Follow `docs/modules/scanner/operations/rustfs-migration.md` when transitioning existing
> MinIO buckets to RustFS. The provided migrator is idempotent and safe to rerun per prefix.
### 7.4 Mongo retention
* **Scanner**: `runtime.events` use TTL (e.g., 3090 days); **catalog** permanent.
* **Concelier/Excititor**: raw docs keep **last N windows**; canonical stores permanent.
* **Attestor**: `entries` permanent; `dedupe` TTL 2448h.
### 7.5 Mongo server baseline
* **Minimum supported server:** MongoDB **4.2+**. Driver 3.5.0 removes compatibility shims for 4.0; upstream has already announced 4.0 support will be dropped in upcoming C# driver releases. citeturn1open1
* **Deploy images:** Compose/Helm defaults stay on `mongo:7.x`. For air-gapped installs, refresh Offline Kit bundles so the packaged `mongod` matches ≥4.2.
* **Upgrade guard:** During rollout, verify replica sets reach FCV `4.2` or above before swapping binaries; automation should hard-stop if FCV is <4.2.
---
## 8) Observability & SLOs (operations)
* **Uptime SLO**: 99.9% for Signer/Authority/Attestor; 99.5% for Scanner WebService; Excititor/Concelier 99.0%.
* **Error budgets**: tracked per month; dashboards show burn rates.
* **Golden signals**:
* **Latency**: token issuance, sign→attest roundtrip, scan enqueue→emit, export build.
* **Saturation**: queue depth, Mongo write IOPS, RustFS throughput / queue depth (or S3 metrics when in fallback mode).
* **Traffic**: scans/min, attestations/min, webhook admits/min.
* **Errors**: 5xx rates, cosign verification failures, Rekor timeouts.
Prometheus + OTLP; Grafana dashboards ship in the charts.
---
## 9) Security & compliance operations
* **Key rotation**:
* Authority JWKS: 60day cadence, dualkey overlap.
* Release signing identities: rotate per minor or quarterly.
* Sigstore roots mirrored and pinned; alarms on drift.
* **FIPS mode** (Gov build):
* Enforce `ES256` + KMS/HSM; disable Ed25519; MLS ciphers only.
* Local **Rekor v2** and **Fulcio** alternatives; **airgapped** CA.
* **Vulnerability response**:
* Concelier red-flag advisories trigger accelerated **stable** patch rollout; UI/CLI “security patch available” notice.
* 2025-10: Pinned `MongoDB.Driver` **3.5.0** and `SharpCompress` **0.41.0** across services (DEVOPS-SEC-10-301) to eliminate NU1902/NU1903 warnings surfaced during scanner cache/worker test runs; repacked the local `Mongo2Go` feed so test fixtures inherit the patched dependencies; future bumps follow the same central override pattern.
* **Backups/DR**:
* Mongo nightly snapshots; MinIO versioning + replication (if configured).
* Restore runbooks tested quarterly with synthetic data.
---
## 10) Customer update flow (how versions are fetched & activated)
### 10.1 Online clusters
* **UI** surfaces update banner with **release manifest** diff and risk notes.
* Operator approves → **Controller** pulls new images by digest; healthchecks; moves traffic; deprecates old revision.
* Postswitch, **schema Phase B** migrations (if any) run automatically.
### 10.2 Airgapped clusters
* Operator downloads **offline kit** from a mirror → `stellaops offline kit import`.
* Controller validates bundle checksums and **cosign signatures**; applies charts/compose by digest.
* After install, **verify** page shows green checks: image sigs, SBOMs attached, provenance logged.
### 10.3 CLI selfupdate (optional)
* `stellaops self-update` pulls a **signed release manifest** and verifies the **CLI binary** with cosign before swapping (admin can disable).
---
## 11) Compatibility & deprecation policy
* **APIs** are stable within a **major**; breaking changes imply **MAJOR++** and deprecation period of one minor.
* **Storage**: expand/contract; “drop old fields” only after one minor grace.
* **Config**: feature flags (default off) for risky features (e.g., eBPF).
---
## 12) Runbooks (selected)
### 12.1 Lost PoE
1. Suspend **automatic attestation** jobs.
2. Use CLI `stellaops signer status` to confirm `entitlement_denied`.
3. Obtain new PoE from portal; verify on Signer `/poe/verify`.
4. Reenable; optionally **resign** last N reports (UI button → batch).
### 12.2 Rekor outage (selfhosted)
* Attestor returns `202 (pending)` with queued proof fetch.
* Keep DSSE bundles locally; resubmit on schedule; UI badge shows **Pending**.
* If outage > SLA, you can switch to a **mirror** log in config; Attestor writes to both when restored.
### 12.3 Emergency downgrade
* Identify prior release manifest (UI → Admin → Releases).
* `helm rollback stella <revision>` (or compose apply previous file).
* Services tolerate skew per §1.3; ensure **Signer/Authority/Attestor** are rolled together.
---
## 13) Example: cluster bootstrap (Compose)
```yaml
version: "3.9"
services:
authority:
image: registry.stella-ops.org/stellaops/authority@sha256:...
env_file: ./env/authority.env
ports: ["8440:8440"]
signer:
image: registry.stella-ops.org/stellaops/signer@sha256:...
depends_on: [authority]
environment:
- SIGNER__POE__LICENSING__INTROSPECTURL=https://www.stella-ops.org/api/v1/license/introspect
attestor:
image: registry.stella-ops.org/stellaops/attestor@sha256:...
depends_on: [signer]
scanner-web:
image: registry.stella-ops.org/stellaops/scanner-web@sha256:...
environment:
- SCANNER__S3__ENDPOINT=http://minio:9000
scanner-worker:
image: registry.stella-ops.org/stellaops/scanner-worker@sha256:...
deploy: { replicas: 4 }
concelier:
image: registry.stella-ops.org/stellaops/concelier@sha256:...
excititor:
image: registry.stella-ops.org/stellaops/excititor@sha256:...
web-ui:
image: registry.stella-ops.org/stellaops/web-ui@sha256:...
mongo:
image: mongo:7
minio:
image: minio/minio:RELEASE.2025-07-10T00-00-00Z
```
---
## 14) Governance & keys (who owns the trust root)
* **Release key policy**: only the Release Engineering group can push signed releases; 4eyes approval; TUFstyle manifest possible in future.
* **Signer acceptance policy**: embedded release identities are updated **only** via minor upgrade; emergency CRL supported.
* **Customer keys**: none needed for core use; enterprise addons may require percustomer registries and keys.
---
## 15) Roadmap (Ops)
* **Windows containers GA** (Scanner + Zastava).
* **Key Transparency** for Signer certs.
* **Deltakit** (offline) for incremental updates.
* **Operator CRDs** (K8s) to manage policy and ILM declaratively.
* **SBOM **protobuf** as default transport at rest (smaller, faster).
---
### Appendix A — Minimal SLO monitors
* `authority.tokens_issued_total` slope ≈ normal.
* `signer.requests_total{result="success"}/minute` > 0 (when scans occur).
* `attestor.submit_latency_seconds{quantile=0.95}` < 0.3.
* `scanner.scan_latency_seconds{quantile=0.95}` < target per image size.
* `concelier.export.duration_seconds` stable; `excititor.consensus.conflicts_total` not exploding after policy changes.
* RustFS request error rate near zero (or `s3_requests_errors_total` when operating against S3); Mongo `opcounters` hit expected baseline.
### Appendix B — Upgrade safety checklist
* Verify **release manifest** signature.
* Ensure **Signer/Authority/Attestor** are same minor.
* Verify **DB backups** < 24h old.
* Confirm **ILM** wont purge compliance artifacts during upgrade window.
* Roll **one component** at a time; watch SLOs; abort on regression.
---
**End — component_architecture_devops.md**

View File

@@ -0,0 +1,22 @@
# Implementation plan — DevOps
## Current objectives
- Maintain deterministic behaviour and offline parity across releases.
- Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes.
## Workstreams
- Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap.
- Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs.
- Validation: extend tests/fixtures to preserve determinism and provenance requirements.
## Epic milestones
- **Epic 1 AOC enforcement:** ensure CI/CD guardrails, schema validation, and verifier pipelines are enforced.
- **Epic 9 Orchestrator Dashboard:** deliver dashboards, recovery runbooks, and rate-limit governance.
- **Epic 10 Export Center:** manage signing/promotions and Offline Kit bundle publishing.
- **Epic 15 Observability & Forensics:** coordinate telemetry deployments, evidence retention, and forensic automation.
- Track module runbooks (DEVOPS-LAUNCH-18-001/900) and telemetry automation via ../../TASKS.md and ops/devops/TASKS.md.
## Coordination
- Review ./AGENTS.md before picking up new work.
- Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md.
- Update this plan whenever scope, dependencies, or guardrails change.

View File

@@ -0,0 +1,50 @@
# SemVer Style Backfill Runbook
_Last updated: 2025-10-11_
## Overview
The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures
provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only
runs when the feature flag `concelier:storage:enableSemVerStyle` is enabled.
## Preconditions
1. **Review configuration** set `concelier.storage.enableSemVerStyle` to `true` on all Concelier services.
2. **Confirm batch size** adjust `concelier.storage.backfillBatchSize` if you need smaller batches for older
deployments (default: `250`).
3. **Back up** capture a fresh snapshot of the `advisory` collection or a full MongoDB backup.
4. **Staging dry-run** enable the flag in a staging environment and observe the migration output before
rolling to production.
## Execution
No manual command is required. After deploying the configuration change, restart the Concelier WebService or
any component that hosts the Mongo migration runner. During startup you will see log entries similar to:
```
Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled.
Mongo migration 20251011-semver-style-backfill applied
```
The migration reads advisories in batches (`concelier.storage.backfillBatchSize`) and writes flattened
`normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched.
## Post-checks
1. Verify the new indexes exist:
```
db.advisory.getIndexes()
```
You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`.
2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches
the embedded package data.
3. Run `dotnet test` for `StellaOps.Concelier.Storage.Mongo.Tests` (optional but recommended) in CI to confirm
the storage suite passes with the feature flag enabled.
## Rollback
Set `concelier.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on
subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when
the feature flag is off. If you must remove them entirely, restore from the backup captured during
preparation.

View File

@@ -0,0 +1,151 @@
# StellaOps Deployment Upgrade & Rollback Runbook
_Last updated: 2025-10-26 (Sprint 14 DEVOPS-OPS-14-003)._
This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag.
---
## 1. Channel overview
| Channel | Release manifest | Helm values | Compose profile |
|---------|------------------|-------------|-----------------|
| `edge` | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` |
| `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` |
| `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` |
Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set.
---
## 2. Pre-flight checklist
1. **Refresh release manifest**
Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`).
2. **Align deployment bundles with the manifest**
Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services.
```bash
./deploy/tools/check-channel-alignment.py \
--release deploy/releases/2025.10-edge.yaml \
--target deploy/helm/stellaops/values-dev.yaml \
--target deploy/compose/docker-compose.dev.yaml \
--ignore-repo nats
```
Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files.
3. **Lint and template profiles**
```bash
./deploy/tools/validate-profiles.sh
```
4. **Smoke the Offline Kit debug store (edge/stable only)**
When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree:
```bash
./ops/offline-kit/mirror_debug_store.py \
--release-dir out/release \
--offline-kit-dir out/offline-kit
```
Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.
5. **Review compatibility matrix**
Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258`, `minio@sha256:14ce`, `rustfs:2025.10.0-edge`.
6. **Create a rollback bookmark**
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.
---
## 3. Helm upgrade procedure (staging → production)
1. Switch to the deployment branch and ensure secrets/config maps are current.
2. Apply the upgrade in the staging cluster:
```bash
helm upgrade stellaops deploy/helm/stellaops \
-f deploy/helm/stellaops/values-stage.yaml \
--namespace stellaops \
--atomic \
--timeout 15m
```
3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks).
4. Promote to production using the prod values file and the same command.
5. Record the new revision number and Git SHA in the change log.
### Rollback (Helm)
1. Identify the previous revision: `helm history stellaops -n stellaops`.
2. Execute:
```bash
helm rollback stellaops <revision> \
--namespace stellaops \
--wait \
--timeout 10m
```
3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests.
4. Update the incident/operations log with root cause and rollback details.
---
## 4. Docker Compose upgrade procedure
1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts.
2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable).
3. Apply the upgrade:
```bash
docker compose \
--env-file deploy/compose/env/prod.env \
-f deploy/compose/docker-compose.prod.yaml \
pull
docker compose \
--env-file deploy/compose/env/prod.env \
-f deploy/compose/docker-compose.prod.yaml \
up -d
```
4. Tail logs for critical services (`docker compose logs -f authority concelier`).
5. Update monitoring dashboards/alerts to confirm normal operation.
### Rollback (Compose)
1. Check out the previous release tag (e.g. `git checkout 2025.09.1`).
2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests.
3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/modules/authority/operations/backup-restore.md` and associated service guides).
4. Log the rollback in the operations journal.
---
## 5. Channel promotion workflow
1. Author or update the channel manifest under `deploy/releases/`.
2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`).
4. Publish release notes and update `deploy/releases/README.md` (if applicable).
5. Tag the repository when promoting stable or airgap builds.
---
## 6. Upgrade rehearsal & rollback drill log
Maintain rehearsal notes in `docs/modules/devops/runbooks/launch-cutover.md` or the relevant sprint planning document. After each drill capture:
- Release version tested
- Date/time
- Participants
- Issues encountered & fixes
- Rollback duration (if executed)
Attach the log to the sprint retro or operational wiki.
| Date (UTC) | Channel | Outcome | Notes |
|------------|---------|---------|-------|
| 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion.
---
## 7. References
- `deploy/README.md` structure and validation workflow for deployment bundles.
- `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` release automation and signing pipeline.
- `docs/modules/devops/architecture.md` high-level DevOps architecture, SLOs, and compliance requirements.
- `ops/offline-kit/mirror_debug_store.py` debug-store mirroring helper.
- `deploy/tools/check-channel-alignment.py` release vs deployment digest alignment checker.

View File

@@ -0,0 +1,128 @@
# Launch Cutover Runbook - Stella Ops
_Document owner: DevOps Guild (2025-10-26)_
_Scope:_ Full-platform launch from staging to production for release `2025.09.2`.
## 1. Roles and Communication
| Role | Primary | Backup | Contact |
| --- | --- | --- | --- |
| Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | `#launch-bridge` (Mattermost) |
| Authority stack | Authority Core guild rep | Security guild rep | `#authority` |
| Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | `#scanner` |
| Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation |
| Observability | Telemetry guild rep | SRE on-call | `#telemetry` |
| Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket |
Set up a bridge call 30 minutes before start and keep `#launch-bridge` updated every 10 minutes.
## 2. Timeline Overview (UTC)
| Time | Activity | Owner |
| --- | --- | --- |
| T-24h | Change ticket approved, prod secrets verified, offline kit build status checked (`DEVOPS-OFFLINE-18-005`). | DevOps lead |
| T-12h | Run `deploy/tools/validate-profiles.sh`; capture logs in ticket. | DevOps engineer |
| T-6h | Freeze non-launch deployments; notify guild leads. | Product owner |
| T-2h | Execute rehearsal in staging (Section 3) using `values-stage.yaml` to verify scripts. | DevOps + module reps |
| T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead |
| T0 | Execute production cutover steps (Section 4). | Cutover team |
| T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead |
| T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner |
## 3. Rehearsal (Staging) Checklist
1. `docker network create stellaops_frontdoor || true` (if not present on staging jump host).
2. Run `deploy/tools/validate-profiles.sh` and archive output.
3. Apply staging secrets (`kubectl apply -f secrets/stage/*.yaml` or `helm secrets upgrade`) ensuring `stellaops-stage` credentials align with `values-stage.yaml`.
4. Perform `helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-stage.yaml` in staging cluster.
5. Verify health endpoints: `curl https://authority.stage.../healthz`, `curl https://scanner.stage.../healthz`.
6. Execute smoke CLI: `stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.json` and confirm report status in UI.
7. Document total wall time and any deviations in the rehearsal log.
Rehearsal must complete without manual interventions before proceeding to production.
## 4. Production Cutover Steps
### 4.1 Pre-flight
- Confirm production secrets in the appropriate secret store (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`) contain the keys referenced in `values-prod.yaml`.
- Ensure the external reverse proxy network exists: `docker network create stellaops_frontdoor || true` on each compose host.
- Back up current configuration and data:
- Mongo snapshot: `mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds)`.
- MinIO policy export: `mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M)`.
### 4.2 Apply Updates (Compose)
1. On each compose node, pull updated images for release `2025.09.2`:
```bash
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml pull
```
2. Deploy changes:
```bash
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml up -d
```
3. Confirm containers healthy via `docker compose ps` and `docker logs <service> --tail 50`.
### 4.3 Apply Updates (Helm/Kubernetes)
If using Kubernetes, perform:
```bash
helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml --atomic --timeout 15m
```
Monitor rollout with `kubectl get pods -n stellaops --watch` and `kubectl rollout status deployment/<service>`.
### 4.4 Configuration Validation
- Verify Authority issuer metadata: `curl https://authority.prod.../.well-known/openid-configuration`.
- Validate Signer DSSE endpoint: `stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json`.
- Check Scanner queue connectivity: `docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue` (returns success).
- Ensure Notify (legacy) still accessible while Notifier migration pending.
## 5. Smoke Tests
| Test | Command / Action | Expected Result |
| --- | --- | --- |
| API health | `curl https://scanner.prod.../healthz` | HTTP 200 with `status":"Healthy"` |
| Scan submit | `stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json` | Scan completes < 5 minutes; report accessible with signed DSSE |
| Runtime event ingest | Post sample event from Zastava observer fixture | `/runtime/events` responds 202 Accepted; record visible in Mongo `runtime_events` |
| Signing | `stellaops-cli signer sign --bundle demo.json` | Returns DSSE with matching SHA256 and signer metadata |
| Attestor verify | `stellaops-cli attestor verify --uuid <uuid>` | Verification result `ok=true` |
| Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent |
Log results in the change ticket with timestamps and screenshots where applicable.
## 6. Rollback Procedure
1. Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts.
2. For Compose:
```bash
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml down
docker compose --env-file stage.env -f deploy/compose/docker-compose.stage.yaml up -d
```
3. For Helm:
```bash
helm rollback stellaops <previous-release-number> --namespace stellaops
```
4. Restore Mongo snapshot if data inconsistency detected: `mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>`.
5. Restore MinIO mirror if required: `mc mirror minio-backup/stellaops-<timestamp> minio/stellaops`.
6. Notify stakeholders of rollback and capture root cause notes in incident ticket.
## 7. Post-cutover Actions
- Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth.
- Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored.
- Update `docs/modules/devops/runbooks/launch-readiness.md` if any new gaps or follow-ups discovered.
- Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner.
## 8. Approval Matrix
| Step | Required Approvers | Record Location |
| --- | --- | --- |
| Production deployment plan | CTO + DevOps lead | Change ticket comment |
| Cutover start (T0) | DevOps lead + module reps | `#launch-bridge` summary |
| Post-smoke success | DevOps lead + product owner | Change ticket closure |
| Rollback (if invoked) | DevOps lead + CTO | Incident ticket |
Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned.
## 9. Rehearsal Log
| Date (UTC) | What We Exercised | Outcome | Follow-up |
| --- | --- | --- | --- |
| 2025-10-26 | Dry-run of compose/Helm validation via `deploy/tools/validate-profiles.sh` (dev/stage/prod/airgap/mirror). Network creation simulated (`docker network create stellaops_frontdoor` planned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. |

View File

@@ -0,0 +1,49 @@
# Launch Readiness Record - Stella Ops
_Updated: 2025-10-26 (UTC)_
This document captures production launch sign-offs, deployment readiness checkpoints, and any open risks that must be tracked before GA cutover.
## 1. Sign-off Summary
| Module / Service | Guild / Point of Contact | Evidence (Task or Runbook) | Status | Timestamp (UTC) | Notes |
| --- | --- | --- | --- | --- | --- |
| Authority (Issuer) | Authority Core Guild | `AUTH-AOC-19-001` - scope issuance & configuration complete (DONE 2025-10-26) | READY | 2025-10-26T14:05Z | Tenant scope propagation follow-up (`AUTH-AOC-19-002`) tracked in gaps section. |
| Signer | Signer Guild | `SIGNER-API-11-101` / `SIGNER-REF-11-102` / `SIGNER-QUOTA-11-103` (DONE 2025-10-21) | READY | 2025-10-26T14:07Z | DSSE signing, referrer verification, and quota enforcement validated in CI. |
| Attestor | Attestor Guild | `ATTESTOR-API-11-201` / `ATTESTOR-VERIFY-11-202` / `ATTESTOR-OBS-11-203` (DONE 2025-10-19) | READY | 2025-10-26T14:10Z | Rekor submission/verification pipeline green; telemetry pack published. |
| Scanner Web + Worker | Scanner WebService Guild | `SCANNER-WEB-09-10x`, `SCANNER-RUNTIME-12-30x` (DONE 2025-10-18 -> 2025-10-24) | READY* | 2025-10-26T14:20Z | Orchestrator envelope work (`SCANNER-EVENTS-16-301/302`) still open; see gaps. |
| Concelier Core & Connectors | Concelier Core / Ops Guild | Ops runbook sign-off in `docs/modules/concelier/operations/conflict-resolution.md` (2025-10-16) | READY | 2025-10-26T14:25Z | Conflict resolution & connector coverage accepted; Mongo schema hardening pending (see gaps). |
| Excititor API | Excititor Core Guild | Wave 0 connector ingest sign-offs (EXECPLAN.Section Wave 0) | READY | 2025-10-26T14:28Z | VEX linkset publishing complete for launch datasets. |
| Notify Web (legacy) | Notify Guild | Existing stack carried forward; Notifier program tracked separately (Sprint 38-40) | PENDING | 2025-10-26T14:32Z | Legacy notify web remains operational; migration to Notifier blocked on `SCANNER-EVENTS-16-301`. |
| Web UI | UI Guild | Stable build `registry.stella-ops.org/.../web-ui@sha256:10d9248...` deployed in stage and smoke-tested | READY | 2025-10-26T14:35Z | Policy editor GA items (Sprint 20) outside launch scope. |
| DevOps / Release | DevOps Guild | `deploy/tools/validate-profiles.sh` run (2025-10-26) covering dev/stage/prod/airgap/mirror | READY | 2025-10-26T15:02Z | Compose/Helm lint + docker compose config validated; see Section 2 for details. |
| Offline Kit | Offline Kit Guild | `DEVOPS-OFFLINE-18-004` (Go analyzer) and `DEVOPS-OFFLINE-18-005` (Python analyzer) complete; debug-store mirror pending (`DEVOPS-OFFLINE-17-004`). | PENDING | 2025-10-26T15:05Z | Awaiting release debug artefacts to finalise `DEVOPS-OFFLINE-17-004`; tracked in Section 3. |
_\* READY with caveat - remaining work noted in Section 3._
## 2. Deployment Readiness Checklist
- **Production profiles committed:** `deploy/compose/docker-compose.prod.yaml` and `deploy/helm/stellaops/values-prod.yaml` added with front-door network hand-off and secret references for Mongo/MinIO/core services.
- **Secrets placeholders documented:** `deploy/compose/env/prod.env.example` enumerates required credentials (`MONGO_INITDB_ROOT_PASSWORD`, `MINIO_ROOT_PASSWORD`, Redis/NATS endpoints, `FRONTDOOR_NETWORK`). Helm values reference Kubernetes secrets (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`).
- **Static validation executed:** `deploy/tools/validate-profiles.sh` run on 2025-10-26 (docker compose config + helm lint/template) with all profiles passing.
- **Ingress model defined:** Production compose profile introduces external `frontdoor` network; README updated with creation instructions and scope of externally reachable services.
- **Observability hooks:** Authority/Signer/Attestor telemetry packs verified; scanner runtime build-id metrics landed (`SCANNER-RUNTIME-17-401`). Grafana dashboards referenced in component runbooks.
- **Rollback assets:** Stage Compose profile remains aligned (`docker-compose.stage.yaml`), enabling rehearsals before prod cutover; release manifests (`deploy/releases/2025.09-stable.yaml`) map digests for reproducible rollback.
- **Rehearsal status:** 2025-10-26 validation dry-run executed (`deploy/tools/validate-profiles.sh` across dev/stage/prod/airgap/mirror). Full stage Helm rollout pending access to the managed staging cluster; target to complete once credentials are provisioned.
## 3. Outstanding Gaps & Follow-ups
| Item | Owner | Tracking Ref | Target / Next Step | Impact |
| --- | --- | --- | --- | --- |
| Tenant scope propagation and audit coverage | Authority Core Guild | `AUTH-AOC-19-002` (DOING 2025-10-26) | Land enforcement + audit fixtures by Sprint 19 freeze | Medium - required for multi-tenant GA but does not block initial cutover if tenants scoped manually. |
| Orchestrator event envelopes + Notifier handshake | Scanner WebService Guild | `SCANNER-EVENTS-16-301` (BLOCKED), `SCANNER-EVENTS-16-302` (DOING) | Coordinate with Gateway/Notifier owners on preview package replacement or binding redirects; rerun `dotnet test` once patch lands and refresh schema docs. Share envelope samples in `docs/events/` after tests pass. | High — gating Notifier migration; legacy notify path remains functional meanwhile. |
| Offline Kit Python analyzer bundle | Offline Kit Guild + Scanner Guild | `DEVOPS-OFFLINE-18-005` (DONE 2025-10-26) | Monitor for follow-up manifest updates and rerun smoke script when analyzers change. | Medium - ensures language analyzer coverage stays current for offline installs. |
| Offline Kit debug store mirror | Offline Kit Guild + DevOps Guild | `DEVOPS-OFFLINE-17-004` (BLOCKED 2025-10-26) | Release pipeline must publish `out/release/debug` artefacts; once available, run `mirror_debug_store.py` and commit `metadata/debug-store.json`. | Low - symbol lookup remains accessible from staging assets but required before next Offline Kit tag. |
| Mongo schema validators for advisory ingestion | Concelier Storage Guild | `CONCELIER-STORE-AOC-19-001` (TODO) | Finalize JSON schema + migration toggles; coordinate with Ops for rollout window | Low - current validation handled in app layer; schema guard adds defense-in-depth. |
| Authority plugin telemetry alignment | Security Guild | `SEC2.PLG`, `SEC3.PLG`, `SEC5.PLG` (BLOCKED pending AUTH DPoP/MTLS tasks) | Resume once upstream auth surfacing stabilises | Low - plugin remains optional; launch uses default Authority configuration. |
## 4. Approvals & Distribution
- Record shared in `#launch-readiness` (Mattermost) 2025-10-26 15:15 UTC with DevOps + Guild leads for acknowledgement.
- Updates to this document require dual sign-off from DevOps Guild (owner) and impacted module guild lead; retain change log via Git history.
- Cutover rehearsal and rollback drills are tracked separately in `docs/modules/devops/runbooks/launch-cutover.md` (see associated Task `DEVOPS-LAUNCH-18-001`). *** End Patch

View File

@@ -0,0 +1,64 @@
# NuGet Preview Bootstrap (Offline-Friendly)
The StellaOps build relies on .NET 10 RC2 packages (Microsoft.Extensions.*, JwtBearer 10.0 RC).
`NuGet.config` now wires three sources:
1. `local``./local-nuget` (preferred, air-gapped mirror)
2. `dotnet-public``https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json`
3. `nuget.org` → fallback for everything else
Follow the steps below whenever you refresh the repo or roll a new Offline Kit drop.
## 1. Mirror the preview packages
```bash
./ops/devops/sync-preview-nuget.sh
```
* Reads `ops/devops/nuget-preview-packages.csv`. Each line specifies the package, version, expected SHA-256 hash, and (optionally) the flat-container base URL (we pin to `dotnet-public`).
* Downloads the `.nupkg` straight into `./local-nuget/` and re-verifies the checksum. Existing files are skipped when hashes already match.
* Use `NUGET_V2_BASE` if you need to temporarily point at a different mirror.
💡 The script never mutates packages in place—if a checksum changes you will see a “SHA mismatch … refreshing” message.
## 2. Restore using the shared `NuGet.config`
From the repo root:
```bash
DOTNET_NOLOGO=1 dotnet restore src/Excititor/__Libraries/StellaOps.Excititor.Connectors.Abstractions/StellaOps.Excititor.Connectors.Abstractions.csproj \
--configfile NuGet.config
```
The `packageSourceMapping` section keeps `Microsoft.Extensions.*`, `Microsoft.AspNetCore.*`, and `Microsoft.Data.Sqlite` bound to `local`/`dotnet-public`, so `dotnet restore` never has to reach out to nuget.org when mirrors are populated.
Before committing changes (or when wiring up a new environment) run:
```bash
python3 ops/devops/validate_restore_sources.py
```
The validator asserts:
- `NuGet.config` lists `local``dotnet-public``nuget.org` in that order.
- `Directory.Build.props` pins `RestoreSources` so every project prioritises the local mirror.
- No stray `NuGet.config` files shadow the repo root configuration.
CI executes the validator in both the `build-test-deploy` and `release` workflows,
so regressions trip before any restore/build begins.
If you run fully air-gapped, remember to clear the cache between SDK upgrades:
```bash
dotnet nuget locals all --clear
```
## 3. Troubleshooting
| Symptom | Fix |
| --- | --- |
| `dotnet restore` still hits nuget.org for preview packages | Re-run `sync-preview-nuget.sh` to ensure the `.nupkg` exists locally, then delete `~/.nuget/packages/microsoft.extensions.*` so the resolver picks up the mirrored copy. |
| SHA mismatch in the manifest | Update `ops/devops/nuget-preview-packages.csv` with the new version + checksum (from the feed) and re-run the sync script. |
| Azure DevOps feed throttling | Set `DOTNET_PUBLIC_FLAT_BASE` env var and point it at your own mirrored flat-container, then add the URL to the 4th column of the manifest. |
Keep this doc alongside Offline Kit instructions so air-gapped operators know exactly how to refresh the mirror and verify packages before restore.