431 lines
36 KiB
Markdown
431 lines
36 KiB
Markdown
No file to print
|
||
Fine. Shipping containers, but for software. Here’s the serious version you can paste into your docs without the sarcasm.
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
---
|
||
|
||
# Epic 13: Containerized Distribution & Quickstart
|
||
|
||
**Short name:** `Containerized Distribution & Quickstart`
|
||
**Primary components:** OCI images for all services, Compose Quickstart, Helm chart for production, Air‑gap bundles
|
||
**Surfaces:** Container registry, `/deploy/*`, `/docs/install/*`, Console onboarding screen
|
||
**Touches:** Authority (authN/Z), Web Services API, Orchestrator, Task Runner, Policy Engine, Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant, Object Storage/KMS, Telemetry
|
||
|
||
**AOC ground rule reminder:** Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Containerized deployments must preserve this behavior and expose links to originals.
|
||
|
||
---
|
||
|
||
## 1) What it is
|
||
|
||
A complete, reproducible containerized distribution of StellaOps with three delivery modes:
|
||
|
||
1. **Quickstart (single host)** using Docker Compose: one command to run a full stack suitable for evaluation and local development. Ships with seed data and sane defaults.
|
||
|
||
2. **Production Helm chart** for Kubernetes: modular, scalable, secure‑by‑default deployment with optional HA and external dependencies.
|
||
|
||
3. **Air‑gapped bundles**: signed offline packages containing images, seed configs, and installation scripts for disconnected environments.
|
||
|
||
All images are multi‑arch (amd64/arm64), signed, SBOM‑attached, and versioned with consistent tags. A “Download & Install” doc set guides users from zero to a working system in minutes and to a production‑ready posture in hours.
|
||
|
||
---
|
||
|
||
## 2) Why (brief)
|
||
|
||
People don’t adopt tools they can’t run quickly or securely. Containers make our deployment reproducible; Quickstart removes friction; Helm unlocks real ops. Air‑gap bundles acknowledge reality in regulated environments.
|
||
|
||
---
|
||
|
||
## 3) How it should work (maximum detail)
|
||
|
||
### 3.1 Image catalog
|
||
|
||
Build and publish OCI images for the following:
|
||
|
||
* `stella-api` (Web Services API)
|
||
* `stella-console` (Web UI)
|
||
* `stella-orchestrator` (source/job scheduler)
|
||
* `stella-task-runner` (executes Task Packs remotely)
|
||
* `stella-conseiller` (Feedser; advisory aggregator)
|
||
* `stella-excitator` (Vexer; VEX aggregator)
|
||
* `stella-policy` (Policy Engine)
|
||
* `stella-ledger` (Findings Ledger worker; if separated from API)
|
||
* `stella-export` (Export Center worker; optional if part of API)
|
||
* `stella-notify` (Notifications Studio worker)
|
||
* `stella-ai` (Advisory AI Assistant; lightweight service calling configured LLM backends or local models)
|
||
* Support services (optionally bundled for Quickstart): `postgres`, `redis`, `object-store` (S3‑compatible), `queue` (NATS or RabbitMQ), `otel-collector`.
|
||
|
||
**Image standards**
|
||
|
||
* **Base:** distroless or minimal; non‑root user; read‑only filesystem; writable `/tmp` only if needed.
|
||
* **Ports:** declare via labels; expose health endpoints `/health/liveness`, `/health/readiness`.
|
||
* **Env:** explicit, documented, with safe defaults; secrets via env or file mounts only.
|
||
* **Config:** `STELLA_*` envs or mounted config directory `/etc/stella/`.
|
||
* **SBOM:** attach SPDX JSON as OCI artifact and include in `/app/sbom.spdx.json` baked at build time.
|
||
* **Signing:** cosign attestations for image, SBOM, and provenance.
|
||
* **Labels:** org.opencontainers.image.* (title, version, revision, source, licenses).
|
||
* **Entrypoint:** PID 1 with reap; graceful shutdown on SIGTERM; configurable termination grace period.
|
||
* **Logs:** structured JSON by default; stdout/stderr only.
|
||
|
||
**Tagging scheme**
|
||
|
||
* `:vX.Y.Z` (immutable release)
|
||
* `:vX.Y.Z-rc.N` (release candidate)
|
||
* `:edge` (latest main)
|
||
* `:nightly-YYYYMMDD` (optional)
|
||
* Multi‑arch manifest lists for linux/amd64 and linux/arm64.
|
||
|
||
### 3.2 Quickstart (Compose)
|
||
|
||
**Goal:** `curl | sh` equivalent that yields a working stack on a laptop/server with defaults and demo data. No internet beyond pulling images, unless configured.
|
||
|
||
**Compose file `deploy/compose/docker-compose.yml`**
|
||
|
||
* Services:
|
||
|
||
* `api`, `console`, `orchestrator`, `task-runner`, `conseiller`, `excitator`, `policy`, `notify`, `export`, `ai`
|
||
* `postgres`, `redis`, `minio` (S3), `nats` or `rabbitmq`, `otel-collector`
|
||
* Volumes:
|
||
|
||
* `pgdata`, `minio-data`, `redis-data`, `stella-state` (for local cache, packs registry)
|
||
* Networks:
|
||
|
||
* `stella-net` bridge
|
||
* Ports (defaults):
|
||
|
||
* Console `8080`, API `8081`, MinIO `9000`, NATS/RabbitMQ default ports
|
||
* Env files:
|
||
|
||
* `.env.example` with safe defaults; users copy to `.env`.
|
||
|
||
**Seed data**
|
||
|
||
* Seed admin account and tenant on first run via `stella-api` migration/seed job.
|
||
* Seed demo SBOMs, advisories, VEX samples, baseline policy, and a task pack.
|
||
* On first login, Console shows “Welcome” wizard: confirm endpoints, generate API token, run sample scan import, open Vulnerability Explorer.
|
||
|
||
**Security posture**
|
||
|
||
* Default credentials only for Quickstart; randomize secrets on first `up` and store in `.secrets/` file.
|
||
* All services run as non‑root; bind to localhost by default unless `EXPOSE_PUBLIC=1` set.
|
||
* TLS optional via `CADDY` or `nginx` sidecar disabled by default.
|
||
|
||
**One‑liner**
|
||
|
||
* `./deploy/compose/quickstart.sh` does: preflight checks, pulls images, writes `.env`, runs `docker compose up -d`, polls readiness, prints URLs and credentials.
|
||
|
||
**Backups & reset**
|
||
|
||
* `./deploy/compose/backup.sh` creates a tarball of volumes and config.
|
||
* `./deploy/compose/reset.sh` nukes persistent volumes with a big scary prompt unless `--yes`.
|
||
|
||
### 3.3 Production Helm chart
|
||
|
||
**Chart location:** `deploy/helm/stella/` with subcharts or toggles.
|
||
|
||
**Chart features**
|
||
|
||
* Components enabled via values: `api`, `console`, `orchestrator`, `taskRunner`, `conseiller`, `excitator`, `policy`, `notify`, `export`, `ai`.
|
||
* External dependencies by default:
|
||
|
||
* PostgreSQL, Redis, S3 bucket, Message queue, OTel endpoint provided via values.
|
||
* Optional “bundled” mode for lab clusters using StatefulSets.
|
||
* Security:
|
||
|
||
* PodSecurityContext: runAsNonRoot, readOnlyRootFilesystem, fsGroup when needed.
|
||
* NetworkPolicy for east‑west traffic; deny‑all then allow specific ports.
|
||
* Secrets as `Secret` from External Secrets operator or sealed secrets.
|
||
* HPA per component; PDBs; liveness/readiness probes.
|
||
* Ingress:
|
||
|
||
* One hostname for Console, one for API; TLS required in production values.
|
||
* Option to serve Console as static behind CDN while API behind private ingress gateway.
|
||
* Config:
|
||
|
||
* Values for Authority provider, token TTLs, policy cache TTL, pack registry endpoint, notifications sinks, export locations.
|
||
* Feature flags per epic enablement.
|
||
* Migrations:
|
||
|
||
* `stella-migrator` Job runs before rollouts; idempotent migrations.
|
||
* Optional “break glass” manual job.
|
||
* Observability:
|
||
|
||
* `/metrics` endpoints scraped by Prometheus; exemplars via OTel; logs structured.
|
||
* OpenTelemetry auto‑config via env if collector provided.
|
||
* Upgrades:
|
||
|
||
* Blue/green or rolling; readiness gates based on background indexers catching up.
|
||
* Chart hooks to block until Conseiller/Excitator catch up to feed watermarks.
|
||
|
||
### 3.4 Air‑gapped distribution
|
||
|
||
**Bundle format**
|
||
|
||
* `stella-bundle-vX.Y.Z.tar.zst` containing:
|
||
|
||
* All images as OCI layout (multi‑arch), cosign signatures, SBOMs, SLSA provenance.
|
||
* `load.sh` to import into a local registry.
|
||
* `compose/` and `helm/` directories with pinned image digests.
|
||
* `checksums.txt` and `bundle.sig`.
|
||
* **Process**
|
||
|
||
* Online build job crafts bundle; signatures produced by CI keys.
|
||
* Offline install:
|
||
|
||
* Verify `bundle.sig`
|
||
* `./load.sh --to registry.local:5000`
|
||
* `helm install stella ./helm -f values-airgap.yaml --set image.registry=registry.local:5000`
|
||
|
||
### 3.5 Configuration matrix
|
||
|
||
Document every config knob in a single table:
|
||
|
||
* Auth: Authority issuer, JWKS, RBAC cache TTL.
|
||
* Storage: DB URL, pool sizes, migration flags.
|
||
* Object store: S3 endpoint, buckets, SSE, IAM.
|
||
* Queue: URL, prefetch, retention.
|
||
* Policy engine: rule cache TTL, default policy version.
|
||
* Conseiller/Excitator: polling intervals, feed sources, retry backoff, max in‑flight; **merge disabled** enforced.
|
||
* Orchestrator/Task Runner: concurrency, sandbox, network egress policy, artifact retention.
|
||
* Notifications: sinks, templates path, batch windows.
|
||
* Export Center: formats enabled, rate limits.
|
||
* AI Assistant: model endpoint, token limits, guardrails, disable by default.
|
||
|
||
### 3.6 Health, readiness, and upgrades
|
||
|
||
* **Health endpoints:** `GET /health/liveness` returns 200 if process responsive; `GET /health/readiness` checks dependencies with timeout.
|
||
* **Graceful shutdown:** SIGTERM starts drain; HTTP returns 503; background workers flush; exit on deadline.
|
||
* **Upgrade choreography:** migrations run, API becomes ready, workers rolling restart, indexes catch up, AOC evaluation warms caches, then flip traffic.
|
||
* **Version skew policy:** define supported skew between components; chart validates.
|
||
|
||
### 3.7 Security & compliance
|
||
|
||
* **Image signing & verification:** cosign attestations; optional admission policy to verify signatures by key.
|
||
* **SBOM provenance:** attach SPDX and provenance attestations; publish via registry referrers.
|
||
* **Non‑root & least privilege:** capabilities dropped; only NET_BIND for proxies if needed.
|
||
* **Secrets handling:** mount from files; avoid putting secrets in args; redacted logs by default.
|
||
* **Audit:** container labels propagate release metadata to all logs and spans.
|
||
* **AOC enforcement:** images for Conseiller/Excitator hard‑disable merge code paths via env/defaults.
|
||
|
||
### 3.8 Quickstart UX polish
|
||
|
||
* Console shows “Connected to Quickstart” banner with a button “View install docs” and “Export pack to production.”
|
||
* One click to generate a `Task Pack` that exports seed data from Quickstart to a production tenant via Export Center.
|
||
|
||
---
|
||
|
||
## 4) Architecture
|
||
|
||
### 4.1 Repos & layout
|
||
|
||
```
|
||
/deploy
|
||
/compose
|
||
docker-compose.yml
|
||
.env.example
|
||
quickstart.sh
|
||
backup.sh
|
||
reset.sh
|
||
/helm
|
||
/stella
|
||
Chart.yaml
|
||
values.yaml
|
||
values-prod.yaml
|
||
values-airgap.yaml
|
||
templates/*.yaml
|
||
/docker
|
||
stella-api.Dockerfile
|
||
stella-console.Dockerfile
|
||
stella-orchestrator.Dockerfile
|
||
stella-task-runner.Dockerfile
|
||
stella-conseiller.Dockerfile
|
||
stella-excitator.Dockerfile
|
||
stella-policy.Dockerfile
|
||
stella-notify.Dockerfile
|
||
stella-export.Dockerfile
|
||
stella-ai.Dockerfile
|
||
```
|
||
|
||
### 4.2 CI/CD flow
|
||
|
||
* Build multi‑arch with buildx; run unit/integration tests; embed version metadata and SBOM.
|
||
* Sign images; push to registry; publish Helm chart with pinned digests.
|
||
* Generate Air‑gap bundle and signatures.
|
||
* Smoke test Quickstart on fresh VM; e2e tests exercise Console and CLI parity (Epic 12).
|
||
|
||
---
|
||
|
||
## 5) APIs and contracts
|
||
|
||
No new external APIs, but every service must expose:
|
||
|
||
* `GET /health/liveness` and `GET /health/readiness`.
|
||
* `GET /version` returning `{ version, gitCommit, buildDate }`.
|
||
* `GET /metrics` when enabled.
|
||
* Config discovery endpoint for Console with trimmed, safe values (no secrets).
|
||
* Conseiller/Excitator must expose `GET /capabilities` returning `{"merge": false}` to prove merge is disabled.
|
||
|
||
---
|
||
|
||
## 6) Documentation changes
|
||
|
||
Create/update:
|
||
|
||
1. `/docs/install/overview.md`
|
||
Supported deployment modes, hardware requirements, network ports, quickstart vs production.
|
||
|
||
2. `/docs/install/compose-quickstart.md`
|
||
Preconditions, one‑liner, first‑login wizard, seed data, reset/backup, common pitfalls.
|
||
|
||
3. `/docs/install/helm-prod.md`
|
||
Prereqs, external dependencies, values reference, TLS/ingress, HPA, PDB, upgrades, rollbacks.
|
||
|
||
4. `/docs/install/airgap.md`
|
||
Bundle verification, loading into private registry, running without internet, patching images.
|
||
|
||
5. `/docs/install/configuration-reference.md`
|
||
The full configuration matrix with examples.
|
||
|
||
6. `/docs/security/supply-chain.md`
|
||
Image signing, SBOMs, provenance, admission controls, non‑root posture.
|
||
|
||
7. `/docs/operations/health-and-readiness.md`
|
||
Endpoints, probes, troubleshooting, expected states during upgrades.
|
||
|
||
8. `/docs/release/image-catalog.md`
|
||
All image names, tags, architectures, checksums; mapping between chart version and image digests.
|
||
|
||
9. `/docs/console/onboarding.md`
|
||
Quickstart banner, links to install docs, exporting data to production.
|
||
|
||
Add at the top of each page:
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
---
|
||
|
||
## 7) Implementation plan
|
||
|
||
### New modules/artifacts
|
||
|
||
* Dockerfiles per service under `/deploy/docker/` with common builder stages.
|
||
* Helm chart under `/deploy/helm/stella`.
|
||
* Compose quickstart under `/deploy/compose/`.
|
||
* Air‑gap bundle generator in CI, script `tools/make-airgap-bundle.sh`.
|
||
* Seed dataset packaged as container image layer or mounted config.
|
||
|
||
### Changes to services
|
||
|
||
* Add health/version/metrics endpoints where missing.
|
||
* Ensure all services read config from env/files with defaults suitable for Quickstart.
|
||
* Conseiller/Excitator: add hard config flag `DISABLE_MERGE=true` defaulted in images and values.
|
||
* API: seed job and migration runner; serve `/welcome` state for Console wizard.
|
||
* Console: onboarding wizard and Quickstart banner.
|
||
* Task Runner: respect offline mode by failing gracefully if egress blocked.
|
||
|
||
### Packaging & signing
|
||
|
||
* Embed SBOM in all images; publish as OCI referrers.
|
||
* Cosign sign images and attest provenance; verify in CI.
|
||
* Publish checksums and signatures on release page.
|
||
|
||
---
|
||
|
||
## 8) Engineering tasks
|
||
|
||
**Images**
|
||
|
||
* [ ] Author multi‑stage Dockerfiles with cache‑efficient builds.
|
||
* [ ] Add non‑root user, drop capabilities, read‑only FS, healthcheck scripts.
|
||
* [ ] Generate and attach SBOM for each image.
|
||
* [ ] Implement `/health/*`, `/version`, optional `/metrics`.
|
||
|
||
**Compose**
|
||
|
||
* [ ] Write `docker-compose.yml` with all core services and deps.
|
||
* [ ] Create `.env.example`, `quickstart.sh`, `backup.sh`, `reset.sh`.
|
||
* [ ] Seed job container and sample data ingestion on first run.
|
||
|
||
**Helm**
|
||
|
||
* [ ] Scaffold chart; values for each component; pinned digests.
|
||
* [ ] Ingress, TLS, HPA, PDB, NetworkPolicy, ServiceAccount/RBAC.
|
||
* [ ] Migration Job and upgrade hooks; readiness gates for indexers.
|
||
* [ ] Documentation of values with `helm-docs` generator.
|
||
|
||
**Air‑gap**
|
||
|
||
* [ ] Build script to save images to OCI layout; compress, sign, and checksum.
|
||
* [ ] `load.sh` to import into private registry and rewrite manifests.
|
||
* [ ] `values-airgap.yaml` with image registry overrides.
|
||
|
||
**Console & API**
|
||
|
||
* [ ] Onboarding wizard, Quickstart banner, links to docs.
|
||
* [ ] Seed data endpoints guarded behind `QUICKSTART_MODE`.
|
||
* [ ] Config discovery endpoint for console.
|
||
|
||
**Security**
|
||
|
||
* [ ] Cosign integration; key management; CI verification step.
|
||
* [ ] Admission policy example in docs to enforce signatures.
|
||
* [ ] Secret redaction in logs; env var audit.
|
||
|
||
**Observability**
|
||
|
||
* [ ] OTel config sample; `/metrics` endpoints; compose prom scrape.
|
||
* [ ] Helm values for tracing and metrics.
|
||
|
||
**Validation**
|
||
|
||
* [ ] Fresh VM smoke test for Compose quickstart.
|
||
* [ ] Kind cluster e2e for Helm path.
|
||
* [ ] Air‑gap install test in CI with a local registry.
|
||
|
||
**Docs**
|
||
|
||
* [ ] Write all pages listed in §6 with copy‑pasteable commands and screenshots.
|
||
* [ ] Include a troubleshooting matrix: symptom → probable cause → fix.
|
||
* [ ] Add “Imposed rule” header line to each page.
|
||
|
||
---
|
||
|
||
## 9) Feature changes required
|
||
|
||
* **Console:** Onboarding wizard, Quickstart banner, and deep links to install docs; “Copy CLI” buttons should prefer the `stella` container image in quickstart if local binary missing.
|
||
* **API:** Seed job and health endpoints; version reporting; feature flag `QUICKSTART_MODE`.
|
||
* **Registry/Release tooling:** Publish image catalog and checksums; maintain compatibility matrix per chart version.
|
||
* **Task Runner:** Offline mode awareness and explicit error when attempting egress in air‑gap.
|
||
* **Conseiller/Excitator:** enforce non‑merge at runtime and show capability endpoint.
|
||
|
||
---
|
||
|
||
## 10) Acceptance criteria
|
||
|
||
* Quickstart: from clean host to working Console in under 5 minutes on a typical laptop; seed data visible; AOC rules active.
|
||
* Helm: install succeeds with external dependencies; roll forward and roll back with zero data loss; probes green.
|
||
* Air‑gap: bundle verifies, loads to a private registry, and installs without external network.
|
||
* All images: signed, SBOM‑attached, non‑root, read‑only FS, health endpoints exposed.
|
||
* Docs: a new user can complete Quickstart without assistance; a platform team can deploy the chart with only values editing.
|
||
* Conseiller/Excitator: capability endpoint confirms `merge=false`; tests prove aggregation‑only behavior.
|
||
|
||
---
|
||
|
||
## 11) Risks & mitigations
|
||
|
||
* **Config sprawl.** Centralize in `/docs/install/configuration-reference.md` and ship sane defaults.
|
||
* **Drift between Compose and Helm.** Pin digests; generate manifests from a common values source where possible; CI diff.
|
||
* **Resource contention in Quickstart.** Limit concurrency; ship low default worker counts; document overrides.
|
||
* **Air‑gap surprises.** Remove implicit egress; provide offline doc copies in bundle; deterministic artifact paths.
|
||
* **Security regressions.** Enforce non‑root/read‑only in CI; signature verification gates release.
|
||
|
||
---
|
||
|
||
## 12) Philosophy
|
||
|
||
* **First run matters.** Quickstart must be boring, predictable, and immediately useful.
|
||
* **Prod isn’t a flag.** Helm defaults are safe; “convenience” belongs in Quickstart, not production.
|
||
* **Prove your supply chain.** Signed images, SBOMs, and provenance are table stakes, not an upsell.
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|