Files
git.stella-ops.org/docs/implplan/EPIC_13.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

36 KiB
Raw Blame History

No file to print Fine. Shipping containers, but for software. Heres the serious version you can paste into your docs without the sarcasm.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


Epic 13: Containerized Distribution & Quickstart

Short name: Containerized Distribution & Quickstart Primary components: OCI images for all services, Compose Quickstart, Helm chart for production, Airgap bundles Surfaces: Container registry, /deploy/*, /docs/install/*, Console onboarding screen Touches: Authority (authN/Z), Web Services API, Orchestrator, Task Runner, Policy Engine, Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant, Object Storage/KMS, Telemetry

AOC ground rule reminder: Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Containerized deployments must preserve this behavior and expose links to originals.


1) What it is

A complete, reproducible containerized distribution of StellaOps with three delivery modes:

  1. Quickstart (single host) using Docker Compose: one command to run a full stack suitable for evaluation and local development. Ships with seed data and sane defaults.

  2. Production Helm chart for Kubernetes: modular, scalable, securebydefault deployment with optional HA and external dependencies.

  3. Airgapped bundles: signed offline packages containing images, seed configs, and installation scripts for disconnected environments.

All images are multiarch (amd64/arm64), signed, SBOMattached, and versioned with consistent tags. A “Download & Install” doc set guides users from zero to a working system in minutes and to a productionready posture in hours.


2) Why (brief)

People dont adopt tools they cant run quickly or securely. Containers make our deployment reproducible; Quickstart removes friction; Helm unlocks real ops. Airgap bundles acknowledge reality in regulated environments.


3) How it should work (maximum detail)

3.1 Image catalog

Build and publish OCI images for the following:

  • stella-api (Web Services API)
  • stella-console (Web UI)
  • stella-orchestrator (source/job scheduler)
  • stella-task-runner (executes Task Packs remotely)
  • stella-conseiller (Feedser; advisory aggregator)
  • stella-excitator (Vexer; VEX aggregator)
  • stella-policy (Policy Engine)
  • stella-ledger (Findings Ledger worker; if separated from API)
  • stella-export (Export Center worker; optional if part of API)
  • stella-notify (Notifications Studio worker)
  • stella-ai (Advisory AI Assistant; lightweight service calling configured LLM backends or local models)
  • Support services (optionally bundled for Quickstart): postgres, redis, object-store (S3compatible), queue (NATS or RabbitMQ), otel-collector.

Image standards

  • Base: distroless or minimal; nonroot user; readonly filesystem; writable /tmp only if needed.
  • Ports: declare via labels; expose health endpoints /health/liveness, /health/readiness.
  • Env: explicit, documented, with safe defaults; secrets via env or file mounts only.
  • Config: STELLA_* envs or mounted config directory /etc/stella/.
  • SBOM: attach SPDX JSON as OCI artifact and include in /app/sbom.spdx.json baked at build time.
  • Signing: cosign attestations for image, SBOM, and provenance.
  • Labels: org.opencontainers.image.* (title, version, revision, source, licenses).
  • Entrypoint: PID 1 with reap; graceful shutdown on SIGTERM; configurable termination grace period.
  • Logs: structured JSON by default; stdout/stderr only.

Tagging scheme

  • :vX.Y.Z (immutable release)
  • :vX.Y.Z-rc.N (release candidate)
  • :edge (latest main)
  • :nightly-YYYYMMDD (optional)
  • Multiarch manifest lists for linux/amd64 and linux/arm64.

3.2 Quickstart (Compose)

Goal: curl | sh equivalent that yields a working stack on a laptop/server with defaults and demo data. No internet beyond pulling images, unless configured.

Compose file deploy/compose/docker-compose.yml

  • Services:

    • api, console, orchestrator, task-runner, conseiller, excitator, policy, notify, export, ai
    • postgres, redis, minio (S3), nats or rabbitmq, otel-collector
  • Volumes:

    • pgdata, minio-data, redis-data, stella-state (for local cache, packs registry)
  • Networks:

    • stella-net bridge
  • Ports (defaults):

    • Console 8080, API 8081, MinIO 9000, NATS/RabbitMQ default ports
  • Env files:

    • .env.example with safe defaults; users copy to .env.

Seed data

  • Seed admin account and tenant on first run via stella-api migration/seed job.
  • Seed demo SBOMs, advisories, VEX samples, baseline policy, and a task pack.
  • On first login, Console shows “Welcome” wizard: confirm endpoints, generate API token, run sample scan import, open Vulnerability Explorer.

Security posture

  • Default credentials only for Quickstart; randomize secrets on first up and store in .secrets/ file.
  • All services run as nonroot; bind to localhost by default unless EXPOSE_PUBLIC=1 set.
  • TLS optional via CADDY or nginx sidecar disabled by default.

Oneliner

  • ./deploy/compose/quickstart.sh does: preflight checks, pulls images, writes .env, runs docker compose up -d, polls readiness, prints URLs and credentials.

Backups & reset

  • ./deploy/compose/backup.sh creates a tarball of volumes and config.
  • ./deploy/compose/reset.sh nukes persistent volumes with a big scary prompt unless --yes.

3.3 Production Helm chart

Chart location: deploy/helm/stella/ with subcharts or toggles.

Chart features

  • Components enabled via values: api, console, orchestrator, taskRunner, conseiller, excitator, policy, notify, export, ai.

  • External dependencies by default:

    • PostgreSQL, Redis, S3 bucket, Message queue, OTel endpoint provided via values.
    • Optional “bundled” mode for lab clusters using StatefulSets.
  • Security:

    • PodSecurityContext: runAsNonRoot, readOnlyRootFilesystem, fsGroup when needed.
    • NetworkPolicy for eastwest traffic; denyall then allow specific ports.
    • Secrets as Secret from External Secrets operator or sealed secrets.
    • HPA per component; PDBs; liveness/readiness probes.
  • Ingress:

    • One hostname for Console, one for API; TLS required in production values.
    • Option to serve Console as static behind CDN while API behind private ingress gateway.
  • Config:

    • Values for Authority provider, token TTLs, policy cache TTL, pack registry endpoint, notifications sinks, export locations.
    • Feature flags per epic enablement.
  • Migrations:

    • stella-migrator Job runs before rollouts; idempotent migrations.
    • Optional “break glass” manual job.
  • Observability:

    • /metrics endpoints scraped by Prometheus; exemplars via OTel; logs structured.
    • OpenTelemetry autoconfig via env if collector provided.
  • Upgrades:

    • Blue/green or rolling; readiness gates based on background indexers catching up.
    • Chart hooks to block until Conseiller/Excitator catch up to feed watermarks.

3.4 Airgapped distribution

Bundle format

  • stella-bundle-vX.Y.Z.tar.zst containing:

    • All images as OCI layout (multiarch), cosign signatures, SBOMs, SLSA provenance.
    • load.sh to import into a local registry.
    • compose/ and helm/ directories with pinned image digests.
    • checksums.txt and bundle.sig.
  • Process

    • Online build job crafts bundle; signatures produced by CI keys.

    • Offline install:

      • Verify bundle.sig
      • ./load.sh --to registry.local:5000
      • helm install stella ./helm -f values-airgap.yaml --set image.registry=registry.local:5000

3.5 Configuration matrix

Document every config knob in a single table:

  • Auth: Authority issuer, JWKS, RBAC cache TTL.
  • Storage: DB URL, pool sizes, migration flags.
  • Object store: S3 endpoint, buckets, SSE, IAM.
  • Queue: URL, prefetch, retention.
  • Policy engine: rule cache TTL, default policy version.
  • Conseiller/Excitator: polling intervals, feed sources, retry backoff, max inflight; merge disabled enforced.
  • Orchestrator/Task Runner: concurrency, sandbox, network egress policy, artifact retention.
  • Notifications: sinks, templates path, batch windows.
  • Export Center: formats enabled, rate limits.
  • AI Assistant: model endpoint, token limits, guardrails, disable by default.

3.6 Health, readiness, and upgrades

  • Health endpoints: GET /health/liveness returns 200 if process responsive; GET /health/readiness checks dependencies with timeout.
  • Graceful shutdown: SIGTERM starts drain; HTTP returns 503; background workers flush; exit on deadline.
  • Upgrade choreography: migrations run, API becomes ready, workers rolling restart, indexes catch up, AOC evaluation warms caches, then flip traffic.
  • Version skew policy: define supported skew between components; chart validates.

3.7 Security & compliance

  • Image signing & verification: cosign attestations; optional admission policy to verify signatures by key.
  • SBOM provenance: attach SPDX and provenance attestations; publish via registry referrers.
  • Nonroot & least privilege: capabilities dropped; only NET_BIND for proxies if needed.
  • Secrets handling: mount from files; avoid putting secrets in args; redacted logs by default.
  • Audit: container labels propagate release metadata to all logs and spans.
  • AOC enforcement: images for Conseiller/Excitator harddisable merge code paths via env/defaults.

3.8 Quickstart UX polish

  • Console shows “Connected to Quickstart” banner with a button “View install docs” and “Export pack to production.”
  • One click to generate a Task Pack that exports seed data from Quickstart to a production tenant via Export Center.

4) Architecture

4.1 Repos & layout

/deploy
  /compose
    docker-compose.yml
    .env.example
    quickstart.sh
    backup.sh
    reset.sh
  /helm
    /stella
      Chart.yaml
      values.yaml
      values-prod.yaml
      values-airgap.yaml
      templates/*.yaml
  /docker
    stella-api.Dockerfile
    stella-console.Dockerfile
    stella-orchestrator.Dockerfile
    stella-task-runner.Dockerfile
    stella-conseiller.Dockerfile
    stella-excitator.Dockerfile
    stella-policy.Dockerfile
    stella-notify.Dockerfile
    stella-export.Dockerfile
    stella-ai.Dockerfile

4.2 CI/CD flow

  • Build multiarch with buildx; run unit/integration tests; embed version metadata and SBOM.
  • Sign images; push to registry; publish Helm chart with pinned digests.
  • Generate Airgap bundle and signatures.
  • Smoke test Quickstart on fresh VM; e2e tests exercise Console and CLI parity (Epic 12).

5) APIs and contracts

No new external APIs, but every service must expose:

  • GET /health/liveness and GET /health/readiness.
  • GET /version returning { version, gitCommit, buildDate }.
  • GET /metrics when enabled.
  • Config discovery endpoint for Console with trimmed, safe values (no secrets).
  • Conseiller/Excitator must expose GET /capabilities returning {"merge": false} to prove merge is disabled.

6) Documentation changes

Create/update:

  1. /docs/install/overview.md Supported deployment modes, hardware requirements, network ports, quickstart vs production.

  2. /docs/install/compose-quickstart.md Preconditions, oneliner, firstlogin wizard, seed data, reset/backup, common pitfalls.

  3. /docs/install/helm-prod.md Prereqs, external dependencies, values reference, TLS/ingress, HPA, PDB, upgrades, rollbacks.

  4. /docs/install/airgap.md Bundle verification, loading into private registry, running without internet, patching images.

  5. /docs/install/configuration-reference.md The full configuration matrix with examples.

  6. /docs/security/supply-chain.md Image signing, SBOMs, provenance, admission controls, nonroot posture.

  7. /docs/operations/health-and-readiness.md Endpoints, probes, troubleshooting, expected states during upgrades.

  8. /docs/release/image-catalog.md All image names, tags, architectures, checksums; mapping between chart version and image digests.

  9. /docs/console/onboarding.md Quickstart banner, links to install docs, exporting data to production.

Add at the top of each page:

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


7) Implementation plan

New modules/artifacts

  • Dockerfiles per service under /deploy/docker/ with common builder stages.
  • Helm chart under /deploy/helm/stella.
  • Compose quickstart under /deploy/compose/.
  • Airgap bundle generator in CI, script tools/make-airgap-bundle.sh.
  • Seed dataset packaged as container image layer or mounted config.

Changes to services

  • Add health/version/metrics endpoints where missing.
  • Ensure all services read config from env/files with defaults suitable for Quickstart.
  • Conseiller/Excitator: add hard config flag DISABLE_MERGE=true defaulted in images and values.
  • API: seed job and migration runner; serve /welcome state for Console wizard.
  • Console: onboarding wizard and Quickstart banner.
  • Task Runner: respect offline mode by failing gracefully if egress blocked.

Packaging & signing

  • Embed SBOM in all images; publish as OCI referrers.
  • Cosign sign images and attest provenance; verify in CI.
  • Publish checksums and signatures on release page.

8) Engineering tasks

Images

  • Author multistage Dockerfiles with cacheefficient builds.
  • Add nonroot user, drop capabilities, readonly FS, healthcheck scripts.
  • Generate and attach SBOM for each image.
  • Implement /health/*, /version, optional /metrics.

Compose

  • Write docker-compose.yml with all core services and deps.
  • Create .env.example, quickstart.sh, backup.sh, reset.sh.
  • Seed job container and sample data ingestion on first run.

Helm

  • Scaffold chart; values for each component; pinned digests.
  • Ingress, TLS, HPA, PDB, NetworkPolicy, ServiceAccount/RBAC.
  • Migration Job and upgrade hooks; readiness gates for indexers.
  • Documentation of values with helm-docs generator.

Airgap

  • Build script to save images to OCI layout; compress, sign, and checksum.
  • load.sh to import into private registry and rewrite manifests.
  • values-airgap.yaml with image registry overrides.

Console & API

  • Onboarding wizard, Quickstart banner, links to docs.
  • Seed data endpoints guarded behind QUICKSTART_MODE.
  • Config discovery endpoint for console.

Security

  • Cosign integration; key management; CI verification step.
  • Admission policy example in docs to enforce signatures.
  • Secret redaction in logs; env var audit.

Observability

  • OTel config sample; /metrics endpoints; compose prom scrape.
  • Helm values for tracing and metrics.

Validation

  • Fresh VM smoke test for Compose quickstart.
  • Kind cluster e2e for Helm path.
  • Airgap install test in CI with a local registry.

Docs

  • Write all pages listed in §6 with copypasteable commands and screenshots.
  • Include a troubleshooting matrix: symptom → probable cause → fix.
  • Add “Imposed rule” header line to each page.

9) Feature changes required

  • Console: Onboarding wizard, Quickstart banner, and deep links to install docs; “Copy CLI” buttons should prefer the stella container image in quickstart if local binary missing.
  • API: Seed job and health endpoints; version reporting; feature flag QUICKSTART_MODE.
  • Registry/Release tooling: Publish image catalog and checksums; maintain compatibility matrix per chart version.
  • Task Runner: Offline mode awareness and explicit error when attempting egress in airgap.
  • Conseiller/Excitator: enforce nonmerge at runtime and show capability endpoint.

10) Acceptance criteria

  • Quickstart: from clean host to working Console in under 5 minutes on a typical laptop; seed data visible; AOC rules active.
  • Helm: install succeeds with external dependencies; roll forward and roll back with zero data loss; probes green.
  • Airgap: bundle verifies, loads to a private registry, and installs without external network.
  • All images: signed, SBOMattached, nonroot, readonly FS, health endpoints exposed.
  • Docs: a new user can complete Quickstart without assistance; a platform team can deploy the chart with only values editing.
  • Conseiller/Excitator: capability endpoint confirms merge=false; tests prove aggregationonly behavior.

11) Risks & mitigations

  • Config sprawl. Centralize in /docs/install/configuration-reference.md and ship sane defaults.
  • Drift between Compose and Helm. Pin digests; generate manifests from a common values source where possible; CI diff.
  • Resource contention in Quickstart. Limit concurrency; ship low default worker counts; document overrides.
  • Airgap surprises. Remove implicit egress; provide offline doc copies in bundle; deterministic artifact paths.
  • Security regressions. Enforce nonroot/readonly in CI; signature verification gates release.

12) Philosophy

  • First run matters. Quickstart must be boring, predictable, and immediately useful.
  • Prod isnt a flag. Helm defaults are safe; “convenience” belongs in Quickstart, not production.
  • Prove your supply chain. Signed images, SBOMs, and provenance are table stakes, not an upsell.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.