- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution. - Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done. - Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
36 KiB
No file to print Fine. Shipping containers, but for software. Here’s the serious version you can paste into your docs without the sarcasm.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Epic 13: Containerized Distribution & Quickstart
Short name: Containerized Distribution & Quickstart
Primary components: OCI images for all services, Compose Quickstart, Helm chart for production, Air‑gap bundles
Surfaces: Container registry, /deploy/*, /docs/install/*, Console onboarding screen
Touches: Authority (authN/Z), Web Services API, Orchestrator, Task Runner, Policy Engine, Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant, Object Storage/KMS, Telemetry
AOC ground rule reminder: Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Containerized deployments must preserve this behavior and expose links to originals.
1) What it is
A complete, reproducible containerized distribution of StellaOps with three delivery modes:
-
Quickstart (single host) using Docker Compose: one command to run a full stack suitable for evaluation and local development. Ships with seed data and sane defaults.
-
Production Helm chart for Kubernetes: modular, scalable, secure‑by‑default deployment with optional HA and external dependencies.
-
Air‑gapped bundles: signed offline packages containing images, seed configs, and installation scripts for disconnected environments.
All images are multi‑arch (amd64/arm64), signed, SBOM‑attached, and versioned with consistent tags. A “Download & Install” doc set guides users from zero to a working system in minutes and to a production‑ready posture in hours.
2) Why (brief)
People don’t adopt tools they can’t run quickly or securely. Containers make our deployment reproducible; Quickstart removes friction; Helm unlocks real ops. Air‑gap bundles acknowledge reality in regulated environments.
3) How it should work (maximum detail)
3.1 Image catalog
Build and publish OCI images for the following:
stella-api(Web Services API)stella-console(Web UI)stella-orchestrator(source/job scheduler)stella-task-runner(executes Task Packs remotely)stella-conseiller(Feedser; advisory aggregator)stella-excitator(Vexer; VEX aggregator)stella-policy(Policy Engine)stella-ledger(Findings Ledger worker; if separated from API)stella-export(Export Center worker; optional if part of API)stella-notify(Notifications Studio worker)stella-ai(Advisory AI Assistant; lightweight service calling configured LLM backends or local models)- Support services (optionally bundled for Quickstart):
postgres,redis,object-store(S3‑compatible),queue(NATS or RabbitMQ),otel-collector.
Image standards
- Base: distroless or minimal; non‑root user; read‑only filesystem; writable
/tmponly if needed. - Ports: declare via labels; expose health endpoints
/health/liveness,/health/readiness. - Env: explicit, documented, with safe defaults; secrets via env or file mounts only.
- Config:
STELLA_*envs or mounted config directory/etc/stella/. - SBOM: attach SPDX JSON as OCI artifact and include in
/app/sbom.spdx.jsonbaked at build time. - Signing: cosign attestations for image, SBOM, and provenance.
- Labels: org.opencontainers.image.* (title, version, revision, source, licenses).
- Entrypoint: PID 1 with reap; graceful shutdown on SIGTERM; configurable termination grace period.
- Logs: structured JSON by default; stdout/stderr only.
Tagging scheme
:vX.Y.Z(immutable release):vX.Y.Z-rc.N(release candidate):edge(latest main):nightly-YYYYMMDD(optional)- Multi‑arch manifest lists for linux/amd64 and linux/arm64.
3.2 Quickstart (Compose)
Goal: curl | sh equivalent that yields a working stack on a laptop/server with defaults and demo data. No internet beyond pulling images, unless configured.
Compose file deploy/compose/docker-compose.yml
-
Services:
api,console,orchestrator,task-runner,conseiller,excitator,policy,notify,export,aipostgres,redis,minio(S3),natsorrabbitmq,otel-collector
-
Volumes:
pgdata,minio-data,redis-data,stella-state(for local cache, packs registry)
-
Networks:
stella-netbridge
-
Ports (defaults):
- Console
8080, API8081, MinIO9000, NATS/RabbitMQ default ports
- Console
-
Env files:
.env.examplewith safe defaults; users copy to.env.
Seed data
- Seed admin account and tenant on first run via
stella-apimigration/seed job. - Seed demo SBOMs, advisories, VEX samples, baseline policy, and a task pack.
- On first login, Console shows “Welcome” wizard: confirm endpoints, generate API token, run sample scan import, open Vulnerability Explorer.
Security posture
- Default credentials only for Quickstart; randomize secrets on first
upand store in.secrets/file. - All services run as non‑root; bind to localhost by default unless
EXPOSE_PUBLIC=1set. - TLS optional via
CADDYornginxsidecar disabled by default.
One‑liner
./deploy/compose/quickstart.shdoes: preflight checks, pulls images, writes.env, runsdocker compose up -d, polls readiness, prints URLs and credentials.
Backups & reset
./deploy/compose/backup.shcreates a tarball of volumes and config../deploy/compose/reset.shnukes persistent volumes with a big scary prompt unless--yes.
3.3 Production Helm chart
Chart location: deploy/helm/stella/ with subcharts or toggles.
Chart features
-
Components enabled via values:
api,console,orchestrator,taskRunner,conseiller,excitator,policy,notify,export,ai. -
External dependencies by default:
- PostgreSQL, Redis, S3 bucket, Message queue, OTel endpoint provided via values.
- Optional “bundled” mode for lab clusters using StatefulSets.
-
Security:
- PodSecurityContext: runAsNonRoot, readOnlyRootFilesystem, fsGroup when needed.
- NetworkPolicy for east‑west traffic; deny‑all then allow specific ports.
- Secrets as
Secretfrom External Secrets operator or sealed secrets. - HPA per component; PDBs; liveness/readiness probes.
-
Ingress:
- One hostname for Console, one for API; TLS required in production values.
- Option to serve Console as static behind CDN while API behind private ingress gateway.
-
Config:
- Values for Authority provider, token TTLs, policy cache TTL, pack registry endpoint, notifications sinks, export locations.
- Feature flags per epic enablement.
-
Migrations:
stella-migratorJob runs before rollouts; idempotent migrations.- Optional “break glass” manual job.
-
Observability:
/metricsendpoints scraped by Prometheus; exemplars via OTel; logs structured.- OpenTelemetry auto‑config via env if collector provided.
-
Upgrades:
- Blue/green or rolling; readiness gates based on background indexers catching up.
- Chart hooks to block until Conseiller/Excitator catch up to feed watermarks.
3.4 Air‑gapped distribution
Bundle format
-
stella-bundle-vX.Y.Z.tar.zstcontaining:- All images as OCI layout (multi‑arch), cosign signatures, SBOMs, SLSA provenance.
load.shto import into a local registry.compose/andhelm/directories with pinned image digests.checksums.txtandbundle.sig.
-
Process
-
Online build job crafts bundle; signatures produced by CI keys.
-
Offline install:
- Verify
bundle.sig ./load.sh --to registry.local:5000helm install stella ./helm -f values-airgap.yaml --set image.registry=registry.local:5000
- Verify
-
3.5 Configuration matrix
Document every config knob in a single table:
- Auth: Authority issuer, JWKS, RBAC cache TTL.
- Storage: DB URL, pool sizes, migration flags.
- Object store: S3 endpoint, buckets, SSE, IAM.
- Queue: URL, prefetch, retention.
- Policy engine: rule cache TTL, default policy version.
- Conseiller/Excitator: polling intervals, feed sources, retry backoff, max in‑flight; merge disabled enforced.
- Orchestrator/Task Runner: concurrency, sandbox, network egress policy, artifact retention.
- Notifications: sinks, templates path, batch windows.
- Export Center: formats enabled, rate limits.
- AI Assistant: model endpoint, token limits, guardrails, disable by default.
3.6 Health, readiness, and upgrades
- Health endpoints:
GET /health/livenessreturns 200 if process responsive;GET /health/readinesschecks dependencies with timeout. - Graceful shutdown: SIGTERM starts drain; HTTP returns 503; background workers flush; exit on deadline.
- Upgrade choreography: migrations run, API becomes ready, workers rolling restart, indexes catch up, AOC evaluation warms caches, then flip traffic.
- Version skew policy: define supported skew between components; chart validates.
3.7 Security & compliance
- Image signing & verification: cosign attestations; optional admission policy to verify signatures by key.
- SBOM provenance: attach SPDX and provenance attestations; publish via registry referrers.
- Non‑root & least privilege: capabilities dropped; only NET_BIND for proxies if needed.
- Secrets handling: mount from files; avoid putting secrets in args; redacted logs by default.
- Audit: container labels propagate release metadata to all logs and spans.
- AOC enforcement: images for Conseiller/Excitator hard‑disable merge code paths via env/defaults.
3.8 Quickstart UX polish
- Console shows “Connected to Quickstart” banner with a button “View install docs” and “Export pack to production.”
- One click to generate a
Task Packthat exports seed data from Quickstart to a production tenant via Export Center.
4) Architecture
4.1 Repos & layout
/deploy
/compose
docker-compose.yml
.env.example
quickstart.sh
backup.sh
reset.sh
/helm
/stella
Chart.yaml
values.yaml
values-prod.yaml
values-airgap.yaml
templates/*.yaml
/docker
stella-api.Dockerfile
stella-console.Dockerfile
stella-orchestrator.Dockerfile
stella-task-runner.Dockerfile
stella-conseiller.Dockerfile
stella-excitator.Dockerfile
stella-policy.Dockerfile
stella-notify.Dockerfile
stella-export.Dockerfile
stella-ai.Dockerfile
4.2 CI/CD flow
- Build multi‑arch with buildx; run unit/integration tests; embed version metadata and SBOM.
- Sign images; push to registry; publish Helm chart with pinned digests.
- Generate Air‑gap bundle and signatures.
- Smoke test Quickstart on fresh VM; e2e tests exercise Console and CLI parity (Epic 12).
5) APIs and contracts
No new external APIs, but every service must expose:
GET /health/livenessandGET /health/readiness.GET /versionreturning{ version, gitCommit, buildDate }.GET /metricswhen enabled.- Config discovery endpoint for Console with trimmed, safe values (no secrets).
- Conseiller/Excitator must expose
GET /capabilitiesreturning{"merge": false}to prove merge is disabled.
6) Documentation changes
Create/update:
-
/docs/install/overview.mdSupported deployment modes, hardware requirements, network ports, quickstart vs production. -
/docs/install/compose-quickstart.mdPreconditions, one‑liner, first‑login wizard, seed data, reset/backup, common pitfalls. -
/docs/install/helm-prod.mdPrereqs, external dependencies, values reference, TLS/ingress, HPA, PDB, upgrades, rollbacks. -
/docs/install/airgap.mdBundle verification, loading into private registry, running without internet, patching images. -
/docs/install/configuration-reference.mdThe full configuration matrix with examples. -
/docs/security/supply-chain.mdImage signing, SBOMs, provenance, admission controls, non‑root posture. -
/docs/operations/health-and-readiness.mdEndpoints, probes, troubleshooting, expected states during upgrades. -
/docs/release/image-catalog.mdAll image names, tags, architectures, checksums; mapping between chart version and image digests. -
/docs/console/onboarding.mdQuickstart banner, links to install docs, exporting data to production.
Add at the top of each page:
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
7) Implementation plan
New modules/artifacts
- Dockerfiles per service under
/deploy/docker/with common builder stages. - Helm chart under
/deploy/helm/stella. - Compose quickstart under
/deploy/compose/. - Air‑gap bundle generator in CI, script
tools/make-airgap-bundle.sh. - Seed dataset packaged as container image layer or mounted config.
Changes to services
- Add health/version/metrics endpoints where missing.
- Ensure all services read config from env/files with defaults suitable for Quickstart.
- Conseiller/Excitator: add hard config flag
DISABLE_MERGE=truedefaulted in images and values. - API: seed job and migration runner; serve
/welcomestate for Console wizard. - Console: onboarding wizard and Quickstart banner.
- Task Runner: respect offline mode by failing gracefully if egress blocked.
Packaging & signing
- Embed SBOM in all images; publish as OCI referrers.
- Cosign sign images and attest provenance; verify in CI.
- Publish checksums and signatures on release page.
8) Engineering tasks
Images
- Author multi‑stage Dockerfiles with cache‑efficient builds.
- Add non‑root user, drop capabilities, read‑only FS, healthcheck scripts.
- Generate and attach SBOM for each image.
- Implement
/health/*,/version, optional/metrics.
Compose
- Write
docker-compose.ymlwith all core services and deps. - Create
.env.example,quickstart.sh,backup.sh,reset.sh. - Seed job container and sample data ingestion on first run.
Helm
- Scaffold chart; values for each component; pinned digests.
- Ingress, TLS, HPA, PDB, NetworkPolicy, ServiceAccount/RBAC.
- Migration Job and upgrade hooks; readiness gates for indexers.
- Documentation of values with
helm-docsgenerator.
Air‑gap
- Build script to save images to OCI layout; compress, sign, and checksum.
load.shto import into private registry and rewrite manifests.values-airgap.yamlwith image registry overrides.
Console & API
- Onboarding wizard, Quickstart banner, links to docs.
- Seed data endpoints guarded behind
QUICKSTART_MODE. - Config discovery endpoint for console.
Security
- Cosign integration; key management; CI verification step.
- Admission policy example in docs to enforce signatures.
- Secret redaction in logs; env var audit.
Observability
- OTel config sample;
/metricsendpoints; compose prom scrape. - Helm values for tracing and metrics.
Validation
- Fresh VM smoke test for Compose quickstart.
- Kind cluster e2e for Helm path.
- Air‑gap install test in CI with a local registry.
Docs
- Write all pages listed in §6 with copy‑pasteable commands and screenshots.
- Include a troubleshooting matrix: symptom → probable cause → fix.
- Add “Imposed rule” header line to each page.
9) Feature changes required
- Console: Onboarding wizard, Quickstart banner, and deep links to install docs; “Copy CLI” buttons should prefer the
stellacontainer image in quickstart if local binary missing. - API: Seed job and health endpoints; version reporting; feature flag
QUICKSTART_MODE. - Registry/Release tooling: Publish image catalog and checksums; maintain compatibility matrix per chart version.
- Task Runner: Offline mode awareness and explicit error when attempting egress in air‑gap.
- Conseiller/Excitator: enforce non‑merge at runtime and show capability endpoint.
10) Acceptance criteria
- Quickstart: from clean host to working Console in under 5 minutes on a typical laptop; seed data visible; AOC rules active.
- Helm: install succeeds with external dependencies; roll forward and roll back with zero data loss; probes green.
- Air‑gap: bundle verifies, loads to a private registry, and installs without external network.
- All images: signed, SBOM‑attached, non‑root, read‑only FS, health endpoints exposed.
- Docs: a new user can complete Quickstart without assistance; a platform team can deploy the chart with only values editing.
- Conseiller/Excitator: capability endpoint confirms
merge=false; tests prove aggregation‑only behavior.
11) Risks & mitigations
- Config sprawl. Centralize in
/docs/install/configuration-reference.mdand ship sane defaults. - Drift between Compose and Helm. Pin digests; generate manifests from a common values source where possible; CI diff.
- Resource contention in Quickstart. Limit concurrency; ship low default worker counts; document overrides.
- Air‑gap surprises. Remove implicit egress; provide offline doc copies in bundle; deterministic artifact paths.
- Security regressions. Enforce non‑root/read‑only in CI; signature verification gates release.
12) Philosophy
- First run matters. Quickstart must be boring, predictable, and immediately useful.
- Prod isn’t a flag. Helm defaults are safe; “convenience” belongs in Quickstart, not production.
- Prove your supply chain. Signed images, SBOMs, and provenance are table stakes, not an upsell.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.