Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
135 lines
8.5 KiB
Markdown
135 lines
8.5 KiB
Markdown
# Stella Ops Compose Profiles
|
||
|
||
These Compose bundles ship the minimum services required to exercise the scanner pipeline plus control-plane dependencies. Every profile is pinned to immutable image digests sourced from `deploy/releases/*.yaml` and is linted via `docker compose config` in CI.
|
||
|
||
## Layout
|
||
|
||
| Path | Purpose |
|
||
| ---- | ------- |
|
||
| `docker-compose.dev.yaml` | Edge/nightly stack tuned for laptops and iterative work. |
|
||
| `docker-compose.stage.yaml` | Stable channel stack mirroring pre-production clusters. |
|
||
| `docker-compose.prod.yaml` | Production cutover stack with front-door network hand-off and Notify events enabled. |
|
||
| `docker-compose.airgap.yaml` | Stable stack with air-gapped defaults (no outbound hostnames). |
|
||
| `docker-compose.mirror.yaml` | Managed mirror topology for `*.stella-ops.org` distribution (Concelier + Excititor + CDN gateway). |
|
||
| `docker-compose.telemetry.yaml` | Optional OpenTelemetry collector overlay (mutual TLS, OTLP ingest endpoints). |
|
||
| `docker-compose.telemetry-storage.yaml` | Prometheus/Tempo/Loki storage overlay with multi-tenant defaults. |
|
||
| `docker-compose.gpu.yaml` | Optional GPU overlay enabling NVIDIA devices for Advisory AI web/worker. Apply with `-f docker-compose.<env>.yaml -f docker-compose.gpu.yaml`. |
|
||
| `env/*.env.example` | Seed `.env` files that document required secrets and ports per profile. |
|
||
| `scripts/backup.sh` | Pauses workers and creates tar.gz of Mongo/MinIO/Redis volumes (deterministic snapshot). |
|
||
| `scripts/reset.sh` | Stops the stack and removes Mongo/MinIO/Redis volumes after explicit confirmation. |
|
||
| `scripts/quickstart.sh` | Helper to validate config and start dev stack; set `USE_MOCK=1` to include `docker-compose.mock.yaml` overlay. |
|
||
| `docker-compose.mock.yaml` | Dev-only overlay with placeholder digests for missing services (orchestrator, policy-registry, packs, task-runner, VEX/Vuln stack). Use only with mock release manifest `deploy/releases/2025.09-mock-dev.yaml`. |
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
cp env/dev.env.example dev.env
|
||
docker compose --env-file dev.env -f docker-compose.dev.yaml config
|
||
docker compose --env-file dev.env -f docker-compose.dev.yaml up -d
|
||
```
|
||
|
||
The stage and airgap variants behave the same way—swap the file names accordingly. All profiles expose 443/8443 for the UI and REST APIs, and they share a `stellaops` Docker network scoped to the compose project.
|
||
|
||
> **Graph Explorer reminder:** If you enable Cartographer or Graph API containers alongside these profiles, update `etc/authority.yaml` so the `cartographer-service` client is marked with `properties.serviceIdentity: "cartographer"` and carries a tenant hint. The Authority host now refuses `graph:write` tokens without that marker, so apply the configuration change before rolling out the updated images.
|
||
|
||
### Telemetry collector overlay
|
||
|
||
The OpenTelemetry collector overlay is optional and can be layered on top of any profile:
|
||
|
||
```bash
|
||
./ops/devops/telemetry/generate_dev_tls.sh
|
||
docker compose -f docker-compose.telemetry.yaml up -d
|
||
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
|
||
docker compose -f docker-compose.telemetry-storage.yaml up -d
|
||
```
|
||
|
||
The generator script creates a development CA plus server/client certificates under
|
||
`deploy/telemetry/certs/`. The smoke test sends OTLP/HTTP payloads using the generated
|
||
client certificate and asserts the collector reports accepted traces, metrics, and logs.
|
||
The storage overlay starts Prometheus, Tempo, and Loki with multitenancy enabled so you
|
||
can validate the end-to-end pipeline before promoting changes to staging. Adjust the
|
||
configs in `deploy/telemetry/storage/` before running in production.
|
||
Mount the same certificates when running workloads so the collector can enforce mutual TLS.
|
||
|
||
For production cutovers copy `env/prod.env.example` to `prod.env`, update the secret placeholders, and create the external network expected by the profile:
|
||
|
||
```bash
|
||
docker network create stellaops_frontdoor
|
||
docker compose --env-file prod.env -f docker-compose.prod.yaml config
|
||
```
|
||
|
||
### Scanner event stream settings
|
||
|
||
Scanner WebService can emit signed `scanner.report.*` events to Redis Streams when `SCANNER__EVENTS__ENABLED=true`. Each profile ships environment placeholders you can override in the `.env` file:
|
||
|
||
- `SCANNER_EVENTS_ENABLED` – toggle emission on/off (defaults to `false`).
|
||
- `SCANNER_EVENTS_DRIVER` – currently only `redis` is supported.
|
||
- `SCANNER_EVENTS_DSN` – Redis endpoint; leave blank to reuse the queue DSN when it uses `redis://`.
|
||
- `SCANNER_EVENTS_STREAM` – stream name (`stella.events` by default).
|
||
- `SCANNER_EVENTS_PUBLISH_TIMEOUT_SECONDS` – per-publish timeout window (defaults to `5`).
|
||
- `SCANNER_EVENTS_MAX_STREAM_LENGTH` – max stream length before Redis trims entries (defaults to `10000`).
|
||
|
||
Helm values mirror the same knobs under each service’s `env` map (see `deploy/helm/stellaops/values-*.yaml`).
|
||
|
||
### Scheduler worker configuration
|
||
|
||
Every Compose profile now provisions the `scheduler-worker` container (backed by the
|
||
`StellaOps.Scheduler.Worker.Host` entrypoint). The environment placeholders exposed
|
||
in the `.env` samples match the options bound by `AddSchedulerWorker`:
|
||
|
||
- `SCHEDULER_QUEUE_KIND` – queue transport (`Nats` or `Redis`).
|
||
- `SCHEDULER_QUEUE_NATS_URL` – NATS connection string used by planner/runner consumers.
|
||
- `SCHEDULER_STORAGE_DATABASE` – MongoDB database name for scheduler state.
|
||
- `SCHEDULER_SCANNER_BASEADDRESS` – base URL the runner uses when invoking Scanner’s
|
||
`/api/v1/reports` (defaults to the in-cluster `http://scanner-web:8444`).
|
||
|
||
Helm deployments inherit the same defaults from `services.scheduler-worker.env` in
|
||
`values.yaml`; override them per environment as needed.
|
||
|
||
### Advisory AI configuration
|
||
|
||
`advisory-ai-web` hosts the API/plan cache while `advisory-ai-worker` executes queued tasks. Both containers mount the shared volumes (`advisory-ai-queue`, `advisory-ai-plans`, `advisory-ai-outputs`) so they always read/write the same deterministic state. New environment knobs:
|
||
|
||
- `ADVISORY_AI_SBOM_BASEADDRESS` – endpoint the SBOM context client hits (defaults to the in-cluster Scanner URL).
|
||
- `ADVISORY_AI_INFERENCE_MODE` – `Local` (default) keeps inference on-prem; `Remote` posts sanitized prompts to the URL supplied via `ADVISORY_AI_REMOTE_BASEADDRESS`. Optional `ADVISORY_AI_REMOTE_APIKEY` carries the bearer token when remote inference is enabled.
|
||
- `ADVISORY_AI_WEB_PORT` – host port for `advisory-ai-web`.
|
||
|
||
The Helm chart mirrors these settings under `services.advisory-ai-web` / `advisory-ai-worker` and expects a PVC named `stellaops-advisory-ai-data` so both deployments can mount the same RWX volume.
|
||
|
||
### Front-door network hand-off
|
||
|
||
`docker-compose.prod.yaml` adds a `frontdoor` network so operators can attach Traefik, Envoy, or an on-prem load balancer that terminates TLS. Override `FRONTDOOR_NETWORK` in `prod.env` if your reverse proxy uses a different bridge name. Attach only the externally reachable services (Authority, Signer, Attestor, Concelier, Scanner Web, Notify Web, UI) to that network—internal infrastructure (Mongo, MinIO, RustFS, NATS) stays on the private `stellaops` network.
|
||
|
||
### Updating to a new release
|
||
|
||
1. Import the new manifest into `deploy/releases/` (see `deploy/README.md`).
|
||
2. Update image digests in the relevant Compose file(s).
|
||
3. Re-run `docker compose config` to confirm the bundle is deterministic.
|
||
|
||
### Mock overlay for missing digests (dev only)
|
||
|
||
Until official digests land, you can exercise Compose packaging with mock placeholders:
|
||
|
||
```bash
|
||
# assumes docker-compose.dev.yaml as the base profile
|
||
USE_MOCK=1 ./scripts/quickstart.sh env/dev.env.example
|
||
```
|
||
|
||
The overlay pins the missing services (orchestrator, policy-registry, packs-registry, task-runner, VEX/Vuln stack) to mock digests from `deploy/releases/2025.09-mock-dev.yaml` and starts their real entrypoints so integration flows can be exercised end-to-end. Replace the mock pins with production digests once releases publish; keep the mock overlay dev-only.
|
||
|
||
Keep digests synchronized between Compose, Helm, and the release manifest to preserve reproducibility guarantees. `deploy/tools/validate-profiles.sh` performs a quick audit.
|
||
|
||
### GPU toggle for Advisory AI
|
||
|
||
GPU is disabled by default. To run inference on NVIDIA GPUs:
|
||
|
||
```bash
|
||
docker compose \
|
||
--env-file prod.env \
|
||
-f docker-compose.prod.yaml \
|
||
-f docker-compose.gpu.yaml \
|
||
up -d
|
||
```
|
||
|
||
The GPU overlay requests one GPU for `advisory-ai-worker` and `advisory-ai-web` and sets `ADVISORY_AI_INFERENCE_GPU=true`. Ensure the host has the NVIDIA container runtime and that the base compose file still sets the correct digests.
|