devops folders consolidate

This commit is contained in:
master
2026-01-25 23:27:41 +02:00
parent 6e687b523a
commit a50bbb38ef
334 changed files with 35079 additions and 5569 deletions

View File

@@ -1,150 +1,459 @@
# StellaOps Compose Profiles
# Stella Ops Docker Compose Profiles
These Compose bundles ship the minimum services required to exercise the scanner pipeline plus control-plane dependencies. Every profile is pinned to immutable image digests sourced from `deploy/releases/*.yaml` and is linted via `docker compose config` in CI.
Consolidated Docker Compose configuration for the StellaOps platform. All profiles use immutable image digests from `deploy/releases/*.yaml` and are validated via `docker compose config` in CI.
## Layout
## Quick Reference
| I want to... | Command |
|--------------|---------|
| Run the full platform | `docker compose -f docker-compose.stella-ops.yml up -d` |
| Add observability | `docker compose -f docker-compose.stella-ops.yml -f docker-compose.telemetry.yml up -d` |
| Run CI/testing infrastructure | `docker compose -f docker-compose.testing.yml --profile ci up -d` |
| Deploy with China compliance | See [China Compliance](#china-compliance-sm2sm3sm4) |
| Deploy with Russia compliance | See [Russia Compliance](#russia-compliance-gost) |
| Deploy with EU compliance | See [EU Compliance](#eu-compliance-eidas) |
---
## File Structure
### Core Stack Files
| File | Purpose |
|------|---------|
| `docker-compose.stella-ops.yml` | **Main stack**: PostgreSQL 18.1, Valkey 9.0.1, RustFS, Rekor v2, all StellaOps services |
| `docker-compose.telemetry.yml` | **Observability**: OpenTelemetry collector, Prometheus, Tempo, Loki |
| `docker-compose.testing.yml` | **CI/Testing**: Test databases, mock services, Gitea for integration tests |
| `docker-compose.dev.yml` | **Minimal dev infrastructure**: PostgreSQL, Valkey, RustFS only |
### Specialized Infrastructure
| File | Purpose |
|------|---------|
| `docker-compose.bsim.yml` | **BSim analysis**: PostgreSQL for Ghidra binary similarity corpus |
| `docker-compose.corpus.yml` | **Function corpus**: PostgreSQL for function behavior database |
| `docker-compose.sealed-ci.yml` | **Air-gapped CI**: Sealed testing environment with authority, signer, attestor |
| `docker-compose.telemetry-offline.yml` | **Offline observability**: Air-gapped Loki, Promtail, OTEL collector, Tempo, Prometheus |
### Regional Compliance Overlays
| File | Purpose | Jurisdiction |
|------|---------|--------------|
| `docker-compose.compliance-china.yml` | SM2/SM3/SM4 ShangMi crypto configuration | China (OSCCA) |
| `docker-compose.compliance-russia.yml` | GOST R 34.10-2012 crypto configuration | Russia (FSB) |
| `docker-compose.compliance-eu.yml` | eIDAS qualified trust services configuration | EU |
### Crypto Provider Overlays
| File | Purpose | Use Case |
|------|---------|----------|
| `docker-compose.crypto-sim.yml` | Universal crypto simulation | Testing without licensed crypto |
| `docker-compose.cryptopro.yml` | CryptoPro CSP (real GOST) | Production Russia deployments |
| `docker-compose.sm-remote.yml` | SM Remote service (real SM2) | Production China deployments |
### Additional Overlays
| File | Purpose | Use Case |
|------|---------|----------|
| `docker-compose.gpu.yaml` | NVIDIA GPU acceleration | Advisory AI inference with GPU |
| `docker-compose.cas.yaml` | Content Addressable Storage | Dedicated CAS with retention policies |
| `docker-compose.tile-proxy.yml` | Rekor tile caching proxy | Air-gapped Sigstore deployments |
### Supporting Files
| Path | Purpose |
| ---- | ------- |
| `docker-compose.dev.yaml` | Edge/nightly stack tuned for laptops and iterative work. |
| `docker-compose.stage.yaml` | Stable channel stack mirroring pre-production clusters. |
| `docker-compose.prod.yaml` | Production cutover stack with front-door network hand-off and Notify events enabled. |
| `docker-compose.airgap.yaml` | Stable stack with air-gapped defaults (no outbound hostnames). |
| `docker-compose.mirror.yaml` | Managed mirror topology for `*.stella-ops.org` distribution (Concelier + Excititor + CDN gateway). |
| `docker-compose.rekor-v2.yaml` | Rekor v2 tiles overlay (MySQL-free) for bundled transparency logs. |
| `docker-compose.telemetry.yaml` | Optional OpenTelemetry collector overlay (mutual TLS, OTLP ingest endpoints). |
| `docker-compose.telemetry-storage.yaml` | Prometheus/Tempo/Loki storage overlay with multi-tenant defaults. |
| `docker-compose.gpu.yaml` | Optional GPU overlay enabling NVIDIA devices for Advisory AI web/worker. Apply with `-f docker-compose.<env>.yaml -f docker-compose.gpu.yaml`. |
| `env/*.env.example` | Seed `.env` files that document required secrets and ports per profile. |
| `scripts/backup.sh` | Pauses workers and creates tar.gz of Mongo/MinIO/Valkey volumes (deterministic snapshot). |
| `scripts/reset.sh` | Stops the stack and removes Mongo/MinIO/Valkey volumes after explicit confirmation. |
| `scripts/quickstart.sh` | Helper to validate config and start dev stack; set `USE_MOCK=1` to include `docker-compose.mock.yaml` overlay. |
| `docker-compose.mock.yaml` | Dev-only overlay with placeholder digests for missing services (orchestrator, policy-registry, packs, task-runner, VEX/Vuln stack). Use only with mock release manifest `deploy/releases/2025.09-mock-dev.yaml`. |
|------|---------|
| `env/*.env.example` | Environment variable templates per profile |
| `scripts/backup.sh` | Create deterministic volume snapshots |
| `scripts/reset.sh` | Stop stack and remove volumes (with confirmation) |
## Usage
---
## Usage Patterns
### Basic Development
```bash
cp env/dev.env.example dev.env
docker compose --env-file dev.env -f docker-compose.dev.yaml config
docker compose --env-file dev.env -f docker-compose.dev.yaml up -d
# Copy environment template
cp env/stellaops.env.example .env
# Validate configuration
docker compose -f docker-compose.stella-ops.yml config
# Start the platform
docker compose -f docker-compose.stella-ops.yml up -d
# View logs
docker compose -f docker-compose.stella-ops.yml logs -f scanner-web
```
The stage and airgap variants behave the same way—swap the file names accordingly. All profiles expose 443/8443 for the UI and REST APIs, and they share a `stellaops` Docker network scoped to the compose project.
### Rekor v2 overlay (tiles)
Use the overlay below and set the Rekor env vars in your `.env` file (see
`env/dev.env.example`):
```bash
docker compose --env-file dev.env \
-f docker-compose.dev.yaml \
-f docker-compose.rekor-v2.yaml \
--profile sigstore up -d
```
> **Surface.Secrets:** set `SCANNER_SURFACE_SECRETS_PROVIDER`/`SCANNER_SURFACE_SECRETS_ROOT` in your `.env` and point `SURFACE_SECRETS_HOST_PATH` to the decrypted bundle path (default `./offline/surface-secrets`). The stack mounts that path read-only into Scanner Web/Worker so `secret://` references resolve without embedding plaintext.
> **Graph Explorer reminder:** If you enable Cartographer or Graph API containers alongside these profiles, update `etc/authority.yaml` so the `cartographer-service` client is marked with `properties.serviceIdentity: "cartographer"` and carries a tenant hint. The Authority host now refuses `graph:write` tokens without that marker, so apply the configuration change before rolling out the updated images.
### Telemetry collector overlay
The OpenTelemetry collector overlay is optional and can be layered on top of any profile:
### With Observability
```bash
# Generate TLS certificates for telemetry
./ops/devops/telemetry/generate_dev_tls.sh
docker compose -f docker-compose.telemetry.yaml up -d
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
docker compose -f docker-compose.telemetry-storage.yaml up -d
# Start platform with telemetry
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.telemetry.yml up -d
```
The generator script creates a development CA plus server/client certificates under
`deploy/telemetry/certs/`. The smoke test sends OTLP/HTTP payloads using the generated
client certificate and asserts the collector reports accepted traces, metrics, and logs.
The storage overlay starts Prometheus, Tempo, and Loki with multitenancy enabled so you
can validate the end-to-end pipeline before promoting changes to staging. Adjust the
configs in `deploy/telemetry/storage/` before running in production.
Mount the same certificates when running workloads so the collector can enforce mutual TLS.
For production cutovers copy `env/prod.env.example` to `prod.env`, update the secret placeholders, and create the external network expected by the profile:
### CI/Testing Infrastructure
```bash
# Start CI infrastructure only (different ports to avoid conflicts)
docker compose -f docker-compose.testing.yml --profile ci up -d
# Start mock services for integration testing
docker compose -f docker-compose.testing.yml --profile mock up -d
# Start Gitea for SCM integration tests
docker compose -f docker-compose.testing.yml --profile gitea up -d
# Start everything
docker compose -f docker-compose.testing.yml --profile all up -d
```
**Test Infrastructure Ports:**
| Service | Port | Purpose |
|---------|------|---------|
| postgres-test | 5433 | PostgreSQL 18 for tests |
| valkey-test | 6380 | Valkey for cache/queue tests |
| rustfs-test | 8180 | S3-compatible storage |
| mock-registry | 5001 | Container registry mock |
| gitea | 3000 | Git hosting for SCM tests |
---
## Regional Compliance Deployments
### China Compliance (SM2/SM3/SM4)
**For Testing (simulation):**
```bash
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-china.yml \
-f docker-compose.crypto-sim.yml up -d
```
**For Production (real SM crypto):**
```bash
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-china.yml \
-f docker-compose.sm-remote.yml up -d
```
**With OSCCA-certified HSM:**
```bash
# Set HSM connection details in environment
export SM_REMOTE_HSM_URL="https://sm-hsm.example.com:8900"
export SM_REMOTE_HSM_API_KEY="your-api-key"
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-china.yml \
-f docker-compose.sm-remote.yml up -d
```
**Algorithms:**
- SM2: Public key cryptography (GM/T 0003-2012)
- SM3: Hash function, 256-bit (GM/T 0004-2012)
- SM4: Block cipher, 128-bit (GM/T 0002-2012)
---
### Russia Compliance (GOST)
**For Testing (simulation):**
```bash
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-russia.yml \
-f docker-compose.crypto-sim.yml up -d
```
**For Production (CryptoPro CSP):**
```bash
# CryptoPro requires EULA acceptance
CRYPTOPRO_ACCEPT_EULA=1 docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-russia.yml \
-f docker-compose.cryptopro.yml up -d
```
**Requirements for CryptoPro:**
- CryptoPro CSP license files in `opt/cryptopro/downloads/`
- `CRYPTOPRO_ACCEPT_EULA=1` environment variable
- Valid CryptoPro container images
**Algorithms:**
- GOST R 34.10-2012: Digital signature (256/512-bit)
- GOST R 34.11-2012: Hash function (Streebog, 256/512-bit)
- GOST R 34.12-2015: Block cipher (Kuznyechik, Magma)
---
### EU Compliance (eIDAS)
**For Testing (simulation):**
```bash
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-eu.yml \
-f docker-compose.crypto-sim.yml up -d
```
**For Production:**
EU eIDAS deployments typically integrate with external Qualified Trust Service Providers (QTSPs) rather than hosting crypto locally. Configure your QTSP integration in the application settings.
```bash
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-eu.yml up -d
```
**Standards:**
- ETSI TS 119 312 compliant algorithms
- Qualified electronic signatures
- QTSP integration for qualified trust services
---
## Crypto Simulation Details
The `docker-compose.crypto-sim.yml` overlay provides a unified simulation service for all sovereign crypto profiles:
| Algorithm ID | Simulation | Use Case |
|--------------|------------|----------|
| `SM2`, `sm.sim` | HMAC-SHA256 | China testing |
| `GOST12-256`, `GOST12-512` | HMAC-SHA256 | Russia testing |
| `ru.magma.sim`, `ru.kuznyechik.sim` | HMAC-SHA256 | Russia testing |
| `DILITHIUM3`, `FALCON512`, `pq.sim` | HMAC-SHA256 | Post-quantum testing |
| `fips.sim`, `eidas.sim`, `kcmvp.sim` | ECDSA P-256 | FIPS/EU/Korea testing |
**Important:** Simulation is for testing only. Uses deterministic HMAC or static ECDSA keys—not suitable for production or compliance certification.
---
## Configuration Reference
### Infrastructure Services
| Service | Default Port | Purpose |
|---------|--------------|---------|
| PostgreSQL | 5432 | Primary database |
| Valkey | 6379 | Cache, queues, events |
| RustFS | 8080 | S3-compatible artifact storage |
| Rekor v2 | (internal) | Sigstore transparency log |
### Application Services
| Service | Default Port | Purpose |
|---------|--------------|---------|
| Authority | 8440 | OAuth2/OIDC identity provider |
| Signer | 8441 | Cryptographic signing |
| Attestor | 8442 | SLSA attestation |
| Scanner Web | 8444 | SBOM/vulnerability scanning API |
| Concelier | 8445 | Advisory aggregation |
| Notify Web | 8446 | Notification service |
| Issuer Directory | 8447 | CSAF publisher registry |
| Advisory AI Web | 8448 | AI-powered advisory analysis |
| Web UI | 8443 | Angular frontend |
### Environment Variables
Key variables (see `env/*.env.example` for complete list):
```bash
# Database
POSTGRES_USER=stellaops
POSTGRES_PASSWORD=<secret>
POSTGRES_DB=stellaops_platform
# Authority
AUTHORITY_ISSUER=https://authority.example.com
# Scanner
SCANNER_EVENTS_ENABLED=false
SCANNER_OFFLINEKIT_ENABLED=false
# Crypto (for compliance overlays)
STELLAOPS_CRYPTO_PROFILE=default # or: china, russia, eu
STELLAOPS_CRYPTO_ENABLE_SIM=0 # set to 1 for simulation
# CryptoPro (Russia only)
CRYPTOPRO_ACCEPT_EULA=0 # must be 1 to use CryptoPro
# SM Remote (China only)
SM_SOFT_ALLOWED=1 # software-only SM2
SM_REMOTE_HSM_URL= # optional: OSCCA-certified HSM
```
---
## Networking
All profiles use a shared `stellaops` Docker network. Production deployments can attach a `frontdoor` network for reverse proxy integration:
```bash
# Create external network for load balancer
docker network create stellaops_frontdoor
docker compose --env-file prod.env -f docker-compose.prod.yaml config
# Set in environment
export FRONTDOOR_NETWORK=stellaops_frontdoor
```
### Scanner event stream settings
Only externally-reachable services (Authority, Signer, Attestor, Concelier, Scanner Web, Notify Web, UI) attach to the frontdoor network. Infrastructure services (PostgreSQL, Valkey, RustFS) remain on the private network.
Scanner WebService can emit signed `scanner.report.*` events to Redis Streams when `SCANNER__EVENTS__ENABLED=true`. Each profile ships environment placeholders you can override in the `.env` file:
---
- `SCANNER_EVENTS_ENABLED` toggle emission on/off (defaults to `false`).
- `SCANNER_EVENTS_DRIVER` currently only `redis` is supported.
- `SCANNER_EVENTS_DSN` Redis endpoint; leave blank to reuse the queue DSN when it uses `redis://`.
- `SCANNER_EVENTS_STREAM` stream name (`stella.events` by default).
- `SCANNER_EVENTS_PUBLISH_TIMEOUT_SECONDS` per-publish timeout window (defaults to `5`).
- `SCANNER_EVENTS_MAX_STREAM_LENGTH` max stream length before Redis trims entries (defaults to `10000`).
## Sigstore Tools
Helm values mirror the same knobs under each services `env` map (see `deploy/helm/stellaops/values-*.yaml`).
### Scheduler worker configuration
Every Compose profile now provisions the `scheduler-worker` container (backed by the
`StellaOps.Scheduler.Worker.Host` entrypoint). The environment placeholders exposed
in the `.env` samples match the options bound by `AddSchedulerWorker`:
- `SCHEDULER_QUEUE_KIND` queue transport (`Nats` or `Redis`).
- `SCHEDULER_QUEUE_NATS_URL` NATS connection string used by planner/runner consumers.
- `SCHEDULER_STORAGE_DATABASE` PostgreSQL database name for scheduler state.
- `SCHEDULER_SCANNER_BASEADDRESS` base URL the runner uses when invoking Scanners
`/api/v1/reports` (defaults to the in-cluster `http://scanner-web:8444`).
Helm deployments inherit the same defaults from `services.scheduler-worker.env` in
`values.yaml`; override them per environment as needed.
### Advisory AI configuration
`advisory-ai-web` hosts the API/plan cache while `advisory-ai-worker` executes queued tasks. Both containers mount the shared volumes (`advisory-ai-queue`, `advisory-ai-plans`, `advisory-ai-outputs`) so they always read/write the same deterministic state. New environment knobs:
- `ADVISORY_AI_SBOM_BASEADDRESS` endpoint the SBOM context client hits (defaults to the in-cluster Scanner URL).
- `ADVISORY_AI_INFERENCE_MODE` `Local` (default) keeps inference on-prem; `Remote` posts sanitized prompts to the URL supplied via `ADVISORY_AI_REMOTE_BASEADDRESS`. Optional `ADVISORY_AI_REMOTE_APIKEY` carries the bearer token when remote inference is enabled.
- `ADVISORY_AI_WEB_PORT` host port for `advisory-ai-web`.
The Helm chart mirrors these settings under `services.advisory-ai-web` / `advisory-ai-worker` and expects a PVC named `stellaops-advisory-ai-data` so both deployments can mount the same RWX volume.
### Front-door network hand-off
`docker-compose.prod.yaml` adds a `frontdoor` network so operators can attach Traefik, Envoy, or an on-prem load balancer that terminates TLS. Override `FRONTDOOR_NETWORK` in `prod.env` if your reverse proxy uses a different bridge name. Attach only the externally reachable services (Authority, Signer, Attestor, Concelier, Scanner Web, Notify Web, UI) to that network—internal infrastructure (Mongo, MinIO, RustFS, NATS) stays on the private `stellaops` network.
### Updating to a new release
1. Import the new manifest into `deploy/releases/` (see `deploy/README.md`).
2. Update image digests in the relevant Compose file(s).
3. Re-run `docker compose config` to confirm the bundle is deterministic.
### Mock overlay for missing digests (dev only)
Until official digests land, you can exercise Compose packaging with mock placeholders:
Enable Sigstore CLI tools (rekor-cli, cosign) with the `sigstore` profile:
```bash
# assumes docker-compose.dev.yaml as the base profile
USE_MOCK=1 ./scripts/quickstart.sh env/dev.env.example
docker compose -f docker-compose.stella-ops.yml --profile sigstore up -d
```
The overlay pins the missing services (orchestrator, policy-registry, packs-registry, task-runner, VEX/Vuln stack) to mock digests from `deploy/releases/2025.09-mock-dev.yaml` and starts their real entrypoints so integration flows can be exercised end-to-end. Replace the mock pins with production digests once releases publish; keep the mock overlay dev-only.
---
Keep digests synchronized between Compose, Helm, and the release manifest to preserve reproducibility guarantees. `deploy/tools/validate-profiles.sh` performs a quick audit.
## GPU Support for Advisory AI
### GPU toggle for Advisory AI
GPU is disabled by default. To run inference on NVIDIA GPUs:
GPU is disabled by default. To enable NVIDIA GPU inference:
```bash
docker compose \
--env-file prod.env \
-f docker-compose.prod.yaml \
-f docker-compose.gpu.yaml \
up -d
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.gpu.yaml up -d
```
The GPU overlay requests one GPU for `advisory-ai-worker` and `advisory-ai-web` and sets `ADVISORY_AI_INFERENCE_GPU=true`. Ensure the host has the NVIDIA container runtime and that the base compose file still sets the correct digests.
**Requirements:**
- NVIDIA GPU with CUDA support
- nvidia-container-toolkit installed
- Docker configured with nvidia runtime
---
## Content Addressable Storage (CAS)
The CAS overlay provides dedicated RustFS instances with retention policies for different artifact types:
```bash
# Standalone CAS infrastructure
docker compose -f docker-compose.cas.yaml up -d
# Combined with main stack
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.cas.yaml up -d
```
**CAS Services:**
| Service | Port | Purpose |
|---------|------|---------|
| rustfs-cas | 8180 | Runtime facts, signals, replay artifacts |
| rustfs-evidence | 8181 | Merkle roots, hash chains, evidence bundles (immutable) |
| rustfs-attestation | 8182 | DSSE envelopes, in-toto attestations (immutable) |
**Retention Policies (configurable via `env/cas.env.example`):**
- Vulnerability DB: 7 days
- SBOM artifacts: 365 days
- Scan results: 90 days
- Evidence bundles: Indefinite (immutable)
- Attestations: Indefinite (immutable)
---
## Tile Proxy (Air-Gapped Sigstore)
For air-gapped deployments, the tile-proxy caches Rekor transparency log tiles locally from public Sigstore:
```bash
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.tile-proxy.yml up -d
```
**Tile Proxy vs Rekor v2:**
- Use `--profile sigstore` when running your own Rekor transparency log locally
- Use `docker-compose.tile-proxy.yml` when caching tiles from public Sigstore (rekor.sigstore.dev)
**Configuration:**
| Variable | Default | Purpose |
|----------|---------|---------|
| `REKOR_SERVER_URL` | `https://rekor.sigstore.dev` | Upstream Rekor to proxy |
| `TILE_PROXY_SYNC_ENABLED` | `true` | Enable periodic tile sync |
| `TILE_PROXY_SYNC_SCHEDULE` | `0 */6 * * *` | Sync every 6 hours |
| `TILE_PROXY_CACHE_MAX_SIZE_GB` | `10` | Local cache size limit |
The proxy syncs tiles on schedule and serves them to internal services for offline verification.
---
## Maintenance
### Backup
```bash
./scripts/backup.sh # Creates timestamped tar.gz of volumes
```
### Reset
```bash
./scripts/reset.sh # Stops stack, removes volumes (requires confirmation)
```
### Validate Configuration
```bash
docker compose -f docker-compose.stella-ops.yml config
```
### Update to New Release
1. Import new manifest to `deploy/releases/`
2. Update image digests in compose files
3. Run `docker compose config` to validate
4. Run `deploy/tools/validate-profiles.sh` for audit
---
## Troubleshooting
### Port Conflicts
Override ports in your `.env` file:
```bash
POSTGRES_PORT=5433
VALKEY_PORT=6380
SCANNER_WEB_PORT=8544
```
### Service Dependencies
Services declare `depends_on` with health checks. If a service fails to start, check its dependencies:
```bash
docker compose -f docker-compose.stella-ops.yml ps
docker compose -f docker-compose.stella-ops.yml logs postgres
docker compose -f docker-compose.stella-ops.yml logs valkey
```
### Crypto Provider Issues
For crypto simulation issues:
```bash
# Check sim-crypto service
docker compose logs sim-crypto
curl http://localhost:18090/keys
```
For CryptoPro issues:
```bash
# Verify EULA acceptance
echo $CRYPTOPRO_ACCEPT_EULA # must be 1
# Check CryptoPro service
docker compose logs cryptopro-csp
```
---
## Related Documentation
- [Deployment Upgrade Runbook](../../docs/operations/devops/runbooks/deployment-upgrade.md)
- [Local CI Guide](../../docs/technical/testing/LOCAL_CI_GUIDE.md)
- [Crypto Profile Configuration](../../docs/security/crypto-profile-configuration.md)
- [Regional Deployments](../../docs/operations/regional-deployments.md)