431 lines
		
	
	
		
			36 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			431 lines
		
	
	
		
			36 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| No file to print
 | ||
| Fine. Shipping containers, but for software. Here’s the serious version you can paste into your docs without the sarcasm.
 | ||
| 
 | ||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| # Epic 13: Containerized Distribution & Quickstart
 | ||
| 
 | ||
| **Short name:** `Containerized Distribution & Quickstart`
 | ||
| **Primary components:** OCI images for all services, Compose Quickstart, Helm chart for production, Air‑gap bundles
 | ||
| **Surfaces:** Container registry, `/deploy/*`, `/docs/install/*`, Console onboarding screen
 | ||
| **Touches:** Authority (authN/Z), Web Services API, Orchestrator, Task Runner, Policy Engine, Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant, Object Storage/KMS, Telemetry
 | ||
| 
 | ||
| **AOC ground rule reminder:** Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Containerized deployments must preserve this behavior and expose links to originals.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 1) What it is
 | ||
| 
 | ||
| A complete, reproducible containerized distribution of StellaOps with three delivery modes:
 | ||
| 
 | ||
| 1. **Quickstart (single host)** using Docker Compose: one command to run a full stack suitable for evaluation and local development. Ships with seed data and sane defaults.
 | ||
| 
 | ||
| 2. **Production Helm chart** for Kubernetes: modular, scalable, secure‑by‑default deployment with optional HA and external dependencies.
 | ||
| 
 | ||
| 3. **Air‑gapped bundles**: signed offline packages containing images, seed configs, and installation scripts for disconnected environments.
 | ||
| 
 | ||
| All images are multi‑arch (amd64/arm64), signed, SBOM‑attached, and versioned with consistent tags. A “Download & Install” doc set guides users from zero to a working system in minutes and to a production‑ready posture in hours.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 2) Why (brief)
 | ||
| 
 | ||
| People don’t adopt tools they can’t run quickly or securely. Containers make our deployment reproducible; Quickstart removes friction; Helm unlocks real ops. Air‑gap bundles acknowledge reality in regulated environments.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 3) How it should work (maximum detail)
 | ||
| 
 | ||
| ### 3.1 Image catalog
 | ||
| 
 | ||
| Build and publish OCI images for the following:
 | ||
| 
 | ||
| * `stella-api` (Web Services API)
 | ||
| * `stella-console` (Web UI)
 | ||
| * `stella-orchestrator` (source/job scheduler)
 | ||
| * `stella-task-runner` (executes Task Packs remotely)
 | ||
| * `stella-conseiller` (Feedser; advisory aggregator)
 | ||
| * `stella-excitator` (Vexer; VEX aggregator)
 | ||
| * `stella-policy` (Policy Engine)
 | ||
| * `stella-ledger` (Findings Ledger worker; if separated from API)
 | ||
| * `stella-export` (Export Center worker; optional if part of API)
 | ||
| * `stella-notify` (Notifications Studio worker)
 | ||
| * `stella-ai` (Advisory AI Assistant; lightweight service calling configured LLM backends or local models)
 | ||
| * Support services (optionally bundled for Quickstart): `postgres`, `redis`, `object-store` (S3‑compatible), `queue` (NATS or RabbitMQ), `otel-collector`.
 | ||
| 
 | ||
| **Image standards**
 | ||
| 
 | ||
| * **Base:** distroless or minimal; non‑root user; read‑only filesystem; writable `/tmp` only if needed.
 | ||
| * **Ports:** declare via labels; expose health endpoints `/health/liveness`, `/health/readiness`.
 | ||
| * **Env:** explicit, documented, with safe defaults; secrets via env or file mounts only.
 | ||
| * **Config:** `STELLA_*` envs or mounted config directory `/etc/stella/`.
 | ||
| * **SBOM:** attach SPDX JSON as OCI artifact and include in `/app/sbom.spdx.json` baked at build time.
 | ||
| * **Signing:** cosign attestations for image, SBOM, and provenance.
 | ||
| * **Labels:** org.opencontainers.image.* (title, version, revision, source, licenses).
 | ||
| * **Entrypoint:** PID 1 with reap; graceful shutdown on SIGTERM; configurable termination grace period.
 | ||
| * **Logs:** structured JSON by default; stdout/stderr only.
 | ||
| 
 | ||
| **Tagging scheme**
 | ||
| 
 | ||
| * `:vX.Y.Z` (immutable release)
 | ||
| * `:vX.Y.Z-rc.N` (release candidate)
 | ||
| * `:edge` (latest main)
 | ||
| * `:nightly-YYYYMMDD` (optional)
 | ||
| * Multi‑arch manifest lists for linux/amd64 and linux/arm64.
 | ||
| 
 | ||
| ### 3.2 Quickstart (Compose)
 | ||
| 
 | ||
| **Goal:** `curl | sh` equivalent that yields a working stack on a laptop/server with defaults and demo data. No internet beyond pulling images, unless configured.
 | ||
| 
 | ||
| **Compose file `deploy/compose/docker-compose.yml`**
 | ||
| 
 | ||
| * Services:
 | ||
| 
 | ||
|   * `api`, `console`, `orchestrator`, `task-runner`, `conseiller`, `excitator`, `policy`, `notify`, `export`, `ai`
 | ||
|   * `postgres`, `redis`, `minio` (S3), `nats` or `rabbitmq`, `otel-collector`
 | ||
| * Volumes:
 | ||
| 
 | ||
|   * `pgdata`, `minio-data`, `redis-data`, `stella-state` (for local cache, packs registry)
 | ||
| * Networks:
 | ||
| 
 | ||
|   * `stella-net` bridge
 | ||
| * Ports (defaults):
 | ||
| 
 | ||
|   * Console `8080`, API `8081`, MinIO `9000`, NATS/RabbitMQ default ports
 | ||
| * Env files:
 | ||
| 
 | ||
|   * `.env.example` with safe defaults; users copy to `.env`.
 | ||
| 
 | ||
| **Seed data**
 | ||
| 
 | ||
| * Seed admin account and tenant on first run via `stella-api` migration/seed job.
 | ||
| * Seed demo SBOMs, advisories, VEX samples, baseline policy, and a task pack.
 | ||
| * On first login, Console shows “Welcome” wizard: confirm endpoints, generate API token, run sample scan import, open Vulnerability Explorer.
 | ||
| 
 | ||
| **Security posture**
 | ||
| 
 | ||
| * Default credentials only for Quickstart; randomize secrets on first `up` and store in `.secrets/` file.
 | ||
| * All services run as non‑root; bind to localhost by default unless `EXPOSE_PUBLIC=1` set.
 | ||
| * TLS optional via `CADDY` or `nginx` sidecar disabled by default.
 | ||
| 
 | ||
| **One‑liner**
 | ||
| 
 | ||
| * `./deploy/compose/quickstart.sh` does: preflight checks, pulls images, writes `.env`, runs `docker compose up -d`, polls readiness, prints URLs and credentials.
 | ||
| 
 | ||
| **Backups & reset**
 | ||
| 
 | ||
| * `./deploy/compose/backup.sh` creates a tarball of volumes and config.
 | ||
| * `./deploy/compose/reset.sh` nukes persistent volumes with a big scary prompt unless `--yes`.
 | ||
| 
 | ||
| ### 3.3 Production Helm chart
 | ||
| 
 | ||
| **Chart location:** `deploy/helm/stella/` with subcharts or toggles.
 | ||
| 
 | ||
| **Chart features**
 | ||
| 
 | ||
| * Components enabled via values: `api`, `console`, `orchestrator`, `taskRunner`, `conseiller`, `excitator`, `policy`, `notify`, `export`, `ai`.
 | ||
| * External dependencies by default:
 | ||
| 
 | ||
|   * PostgreSQL, Redis, S3 bucket, Message queue, OTel endpoint provided via values.
 | ||
|   * Optional “bundled” mode for lab clusters using StatefulSets.
 | ||
| * Security:
 | ||
| 
 | ||
|   * PodSecurityContext: runAsNonRoot, readOnlyRootFilesystem, fsGroup when needed.
 | ||
|   * NetworkPolicy for east‑west traffic; deny‑all then allow specific ports.
 | ||
|   * Secrets as `Secret` from External Secrets operator or sealed secrets.
 | ||
|   * HPA per component; PDBs; liveness/readiness probes.
 | ||
| * Ingress:
 | ||
| 
 | ||
|   * One hostname for Console, one for API; TLS required in production values.
 | ||
|   * Option to serve Console as static behind CDN while API behind private ingress gateway.
 | ||
| * Config:
 | ||
| 
 | ||
|   * Values for Authority provider, token TTLs, policy cache TTL, pack registry endpoint, notifications sinks, export locations.
 | ||
|   * Feature flags per epic enablement.
 | ||
| * Migrations:
 | ||
| 
 | ||
|   * `stella-migrator` Job runs before rollouts; idempotent migrations.
 | ||
|   * Optional “break glass” manual job.
 | ||
| * Observability:
 | ||
| 
 | ||
|   * `/metrics` endpoints scraped by Prometheus; exemplars via OTel; logs structured.
 | ||
|   * OpenTelemetry auto‑config via env if collector provided.
 | ||
| * Upgrades:
 | ||
| 
 | ||
|   * Blue/green or rolling; readiness gates based on background indexers catching up.
 | ||
|   * Chart hooks to block until Conseiller/Excitator catch up to feed watermarks.
 | ||
| 
 | ||
| ### 3.4 Air‑gapped distribution
 | ||
| 
 | ||
| **Bundle format**
 | ||
| 
 | ||
| * `stella-bundle-vX.Y.Z.tar.zst` containing:
 | ||
| 
 | ||
|   * All images as OCI layout (multi‑arch), cosign signatures, SBOMs, SLSA provenance.
 | ||
|   * `load.sh` to import into a local registry.
 | ||
|   * `compose/` and `helm/` directories with pinned image digests.
 | ||
|   * `checksums.txt` and `bundle.sig`.
 | ||
| * **Process**
 | ||
| 
 | ||
|   * Online build job crafts bundle; signatures produced by CI keys.
 | ||
|   * Offline install:
 | ||
| 
 | ||
|     * Verify `bundle.sig`
 | ||
|     * `./load.sh --to registry.local:5000`
 | ||
|     * `helm install stella ./helm -f values-airgap.yaml --set image.registry=registry.local:5000`
 | ||
| 
 | ||
| ### 3.5 Configuration matrix
 | ||
| 
 | ||
| Document every config knob in a single table:
 | ||
| 
 | ||
| * Auth: Authority issuer, JWKS, RBAC cache TTL.
 | ||
| * Storage: DB URL, pool sizes, migration flags.
 | ||
| * Object store: S3 endpoint, buckets, SSE, IAM.
 | ||
| * Queue: URL, prefetch, retention.
 | ||
| * Policy engine: rule cache TTL, default policy version.
 | ||
| * Conseiller/Excitator: polling intervals, feed sources, retry backoff, max in‑flight; **merge disabled** enforced.
 | ||
| * Orchestrator/Task Runner: concurrency, sandbox, network egress policy, artifact retention.
 | ||
| * Notifications: sinks, templates path, batch windows.
 | ||
| * Export Center: formats enabled, rate limits.
 | ||
| * AI Assistant: model endpoint, token limits, guardrails, disable by default.
 | ||
| 
 | ||
| ### 3.6 Health, readiness, and upgrades
 | ||
| 
 | ||
| * **Health endpoints:** `GET /health/liveness` returns 200 if process responsive; `GET /health/readiness` checks dependencies with timeout.
 | ||
| * **Graceful shutdown:** SIGTERM starts drain; HTTP returns 503; background workers flush; exit on deadline.
 | ||
| * **Upgrade choreography:** migrations run, API becomes ready, workers rolling restart, indexes catch up, AOC evaluation warms caches, then flip traffic.
 | ||
| * **Version skew policy:** define supported skew between components; chart validates.
 | ||
| 
 | ||
| ### 3.7 Security & compliance
 | ||
| 
 | ||
| * **Image signing & verification:** cosign attestations; optional admission policy to verify signatures by key.
 | ||
| * **SBOM provenance:** attach SPDX and provenance attestations; publish via registry referrers.
 | ||
| * **Non‑root & least privilege:** capabilities dropped; only NET_BIND for proxies if needed.
 | ||
| * **Secrets handling:** mount from files; avoid putting secrets in args; redacted logs by default.
 | ||
| * **Audit:** container labels propagate release metadata to all logs and spans.
 | ||
| * **AOC enforcement:** images for Conseiller/Excitator hard‑disable merge code paths via env/defaults.
 | ||
| 
 | ||
| ### 3.8 Quickstart UX polish
 | ||
| 
 | ||
| * Console shows “Connected to Quickstart” banner with a button “View install docs” and “Export pack to production.”
 | ||
| * One click to generate a `Task Pack` that exports seed data from Quickstart to a production tenant via Export Center.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 4) Architecture
 | ||
| 
 | ||
| ### 4.1 Repos & layout
 | ||
| 
 | ||
| ```
 | ||
| /deploy
 | ||
|   /compose
 | ||
|     docker-compose.yml
 | ||
|     .env.example
 | ||
|     quickstart.sh
 | ||
|     backup.sh
 | ||
|     reset.sh
 | ||
|   /helm
 | ||
|     /stella
 | ||
|       Chart.yaml
 | ||
|       values.yaml
 | ||
|       values-prod.yaml
 | ||
|       values-airgap.yaml
 | ||
|       templates/*.yaml
 | ||
|   /docker
 | ||
|     stella-api.Dockerfile
 | ||
|     stella-console.Dockerfile
 | ||
|     stella-orchestrator.Dockerfile
 | ||
|     stella-task-runner.Dockerfile
 | ||
|     stella-conseiller.Dockerfile
 | ||
|     stella-excitator.Dockerfile
 | ||
|     stella-policy.Dockerfile
 | ||
|     stella-notify.Dockerfile
 | ||
|     stella-export.Dockerfile
 | ||
|     stella-ai.Dockerfile
 | ||
| ```
 | ||
| 
 | ||
| ### 4.2 CI/CD flow
 | ||
| 
 | ||
| * Build multi‑arch with buildx; run unit/integration tests; embed version metadata and SBOM.
 | ||
| * Sign images; push to registry; publish Helm chart with pinned digests.
 | ||
| * Generate Air‑gap bundle and signatures.
 | ||
| * Smoke test Quickstart on fresh VM; e2e tests exercise Console and CLI parity (Epic 12).
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 5) APIs and contracts
 | ||
| 
 | ||
| No new external APIs, but every service must expose:
 | ||
| 
 | ||
| * `GET /health/liveness` and `GET /health/readiness`.
 | ||
| * `GET /version` returning `{ version, gitCommit, buildDate }`.
 | ||
| * `GET /metrics` when enabled.
 | ||
| * Config discovery endpoint for Console with trimmed, safe values (no secrets).
 | ||
| * Conseiller/Excitator must expose `GET /capabilities` returning `{"merge": false}` to prove merge is disabled.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 6) Documentation changes
 | ||
| 
 | ||
| Create/update:
 | ||
| 
 | ||
| 1. `/docs/install/overview.md`
 | ||
|    Supported deployment modes, hardware requirements, network ports, quickstart vs production.
 | ||
| 
 | ||
| 2. `/docs/install/compose-quickstart.md`
 | ||
|    Preconditions, one‑liner, first‑login wizard, seed data, reset/backup, common pitfalls.
 | ||
| 
 | ||
| 3. `/docs/install/helm-prod.md`
 | ||
|    Prereqs, external dependencies, values reference, TLS/ingress, HPA, PDB, upgrades, rollbacks.
 | ||
| 
 | ||
| 4. `/docs/install/airgap.md`
 | ||
|    Bundle verification, loading into private registry, running without internet, patching images.
 | ||
| 
 | ||
| 5. `/docs/install/configuration-reference.md`
 | ||
|    The full configuration matrix with examples.
 | ||
| 
 | ||
| 6. `/docs/security/supply-chain.md`
 | ||
|    Image signing, SBOMs, provenance, admission controls, non‑root posture.
 | ||
| 
 | ||
| 7. `/docs/operations/health-and-readiness.md`
 | ||
|    Endpoints, probes, troubleshooting, expected states during upgrades.
 | ||
| 
 | ||
| 8. `/docs/release/image-catalog.md`
 | ||
|    All image names, tags, architectures, checksums; mapping between chart version and image digests.
 | ||
| 
 | ||
| 9. `/docs/console/onboarding.md`
 | ||
|    Quickstart banner, links to install docs, exporting data to production.
 | ||
| 
 | ||
| Add at the top of each page:
 | ||
| 
 | ||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 7) Implementation plan
 | ||
| 
 | ||
| ### New modules/artifacts
 | ||
| 
 | ||
| * Dockerfiles per service under `/deploy/docker/` with common builder stages.
 | ||
| * Helm chart under `/deploy/helm/stella`.
 | ||
| * Compose quickstart under `/deploy/compose/`.
 | ||
| * Air‑gap bundle generator in CI, script `tools/make-airgap-bundle.sh`.
 | ||
| * Seed dataset packaged as container image layer or mounted config.
 | ||
| 
 | ||
| ### Changes to services
 | ||
| 
 | ||
| * Add health/version/metrics endpoints where missing.
 | ||
| * Ensure all services read config from env/files with defaults suitable for Quickstart.
 | ||
| * Conseiller/Excitator: add hard config flag `DISABLE_MERGE=true` defaulted in images and values.
 | ||
| * API: seed job and migration runner; serve `/welcome` state for Console wizard.
 | ||
| * Console: onboarding wizard and Quickstart banner.
 | ||
| * Task Runner: respect offline mode by failing gracefully if egress blocked.
 | ||
| 
 | ||
| ### Packaging & signing
 | ||
| 
 | ||
| * Embed SBOM in all images; publish as OCI referrers.
 | ||
| * Cosign sign images and attest provenance; verify in CI.
 | ||
| * Publish checksums and signatures on release page.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 8) Engineering tasks
 | ||
| 
 | ||
| **Images**
 | ||
| 
 | ||
| * [ ] Author multi‑stage Dockerfiles with cache‑efficient builds.
 | ||
| * [ ] Add non‑root user, drop capabilities, read‑only FS, healthcheck scripts.
 | ||
| * [ ] Generate and attach SBOM for each image.
 | ||
| * [ ] Implement `/health/*`, `/version`, optional `/metrics`.
 | ||
| 
 | ||
| **Compose**
 | ||
| 
 | ||
| * [ ] Write `docker-compose.yml` with all core services and deps.
 | ||
| * [ ] Create `.env.example`, `quickstart.sh`, `backup.sh`, `reset.sh`.
 | ||
| * [ ] Seed job container and sample data ingestion on first run.
 | ||
| 
 | ||
| **Helm**
 | ||
| 
 | ||
| * [ ] Scaffold chart; values for each component; pinned digests.
 | ||
| * [ ] Ingress, TLS, HPA, PDB, NetworkPolicy, ServiceAccount/RBAC.
 | ||
| * [ ] Migration Job and upgrade hooks; readiness gates for indexers.
 | ||
| * [ ] Documentation of values with `helm-docs` generator.
 | ||
| 
 | ||
| **Air‑gap**
 | ||
| 
 | ||
| * [ ] Build script to save images to OCI layout; compress, sign, and checksum.
 | ||
| * [ ] `load.sh` to import into private registry and rewrite manifests.
 | ||
| * [ ] `values-airgap.yaml` with image registry overrides.
 | ||
| 
 | ||
| **Console & API**
 | ||
| 
 | ||
| * [ ] Onboarding wizard, Quickstart banner, links to docs.
 | ||
| * [ ] Seed data endpoints guarded behind `QUICKSTART_MODE`.
 | ||
| * [ ] Config discovery endpoint for console.
 | ||
| 
 | ||
| **Security**
 | ||
| 
 | ||
| * [ ] Cosign integration; key management; CI verification step.
 | ||
| * [ ] Admission policy example in docs to enforce signatures.
 | ||
| * [ ] Secret redaction in logs; env var audit.
 | ||
| 
 | ||
| **Observability**
 | ||
| 
 | ||
| * [ ] OTel config sample; `/metrics` endpoints; compose prom scrape.
 | ||
| * [ ] Helm values for tracing and metrics.
 | ||
| 
 | ||
| **Validation**
 | ||
| 
 | ||
| * [ ] Fresh VM smoke test for Compose quickstart.
 | ||
| * [ ] Kind cluster e2e for Helm path.
 | ||
| * [ ] Air‑gap install test in CI with a local registry.
 | ||
| 
 | ||
| **Docs**
 | ||
| 
 | ||
| * [ ] Write all pages listed in §6 with copy‑pasteable commands and screenshots.
 | ||
| * [ ] Include a troubleshooting matrix: symptom → probable cause → fix.
 | ||
| * [ ] Add “Imposed rule” header line to each page.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 9) Feature changes required
 | ||
| 
 | ||
| * **Console:** Onboarding wizard, Quickstart banner, and deep links to install docs; “Copy CLI” buttons should prefer the `stella` container image in quickstart if local binary missing.
 | ||
| * **API:** Seed job and health endpoints; version reporting; feature flag `QUICKSTART_MODE`.
 | ||
| * **Registry/Release tooling:** Publish image catalog and checksums; maintain compatibility matrix per chart version.
 | ||
| * **Task Runner:** Offline mode awareness and explicit error when attempting egress in air‑gap.
 | ||
| * **Conseiller/Excitator:** enforce non‑merge at runtime and show capability endpoint.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 10) Acceptance criteria
 | ||
| 
 | ||
| * Quickstart: from clean host to working Console in under 5 minutes on a typical laptop; seed data visible; AOC rules active.
 | ||
| * Helm: install succeeds with external dependencies; roll forward and roll back with zero data loss; probes green.
 | ||
| * Air‑gap: bundle verifies, loads to a private registry, and installs without external network.
 | ||
| * All images: signed, SBOM‑attached, non‑root, read‑only FS, health endpoints exposed.
 | ||
| * Docs: a new user can complete Quickstart without assistance; a platform team can deploy the chart with only values editing.
 | ||
| * Conseiller/Excitator: capability endpoint confirms `merge=false`; tests prove aggregation‑only behavior.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 11) Risks & mitigations
 | ||
| 
 | ||
| * **Config sprawl.** Centralize in `/docs/install/configuration-reference.md` and ship sane defaults.
 | ||
| * **Drift between Compose and Helm.** Pin digests; generate manifests from a common values source where possible; CI diff.
 | ||
| * **Resource contention in Quickstart.** Limit concurrency; ship low default worker counts; document overrides.
 | ||
| * **Air‑gap surprises.** Remove implicit egress; provide offline doc copies in bundle; deterministic artifact paths.
 | ||
| * **Security regressions.** Enforce non‑root/read‑only in CI; signature verification gates release.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 12) Philosophy
 | ||
| 
 | ||
| * **First run matters.** Quickstart must be boring, predictable, and immediately useful.
 | ||
| * **Prod isn’t a flag.** Helm defaults are safe; “convenience” belongs in Quickstart, not production.
 | ||
| * **Prove your supply chain.** Signed images, SBOMs, and provenance are table stakes, not an upsell.
 | ||
| 
 | ||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
 |