Add Policy DSL Validator, Schema Exporter, and Simulation Smoke tools

- Implemented PolicyDslValidator with command-line options for strict mode and JSON output.
- Created PolicySchemaExporter to generate JSON schemas for policy-related models.
- Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes.
- Added project files and necessary dependencies for each tool.
- Ensured proper error handling and usage instructions across tools.
This commit is contained in:
master
2025-10-27 08:00:11 +02:00
parent 2b7b88ca77
commit 799f787de2
712 changed files with 49449 additions and 6124 deletions

View File

@@ -5,20 +5,34 @@ This directory contains deterministic deployment bundles for the core Stella Ops
## Structure
- `releases/` canonical release manifests (edge, stable, airgap) used to source image digests.
- `compose/` Docker Compose bundles for dev/stage/airgap targets plus `.env` seed files.
- `compose/docker-compose.mirror.yaml` managed mirror bundle for `*.stella-ops.org` with gateway cache and multi-tenant auth.
- `helm/stellaops/` multi-profile Helm chart with values files for dev/stage/airgap.
- `tools/validate-profiles.sh` helper that runs `docker compose config` and `helm lint/template` for every profile.
- `compose/` Docker Compose bundles for dev/stage/airgap targets plus `.env` seed files.
- `compose/docker-compose.mirror.yaml` managed mirror bundle for `*.stella-ops.org` with gateway cache and multi-tenant auth.
- `compose/docker-compose.telemetry.yaml` optional OpenTelemetry collector overlay (mutual TLS, OTLP pipelines).
- `compose/docker-compose.telemetry-storage.yaml` optional Prometheus/Tempo/Loki stack for observability backends.
- `helm/stellaops/` multi-profile Helm chart with values files for dev/stage/airgap.
- `telemetry/` shared OpenTelemetry collector configuration and certificate artefacts (generated via tooling).
- `tools/validate-profiles.sh` helper that runs `docker compose config` and `helm lint/template` for every profile.
## Workflow
1. Update or add a release manifest under `releases/` with the new digests.
2. Mirror the digests into the Compose and Helm profiles that correspond to that channel.
3. Run `deploy/tools/validate-profiles.sh` (requires Docker CLI and Helm) to ensure the bundles lint and template cleanly.
4. Commit the change alongside any documentation updates (e.g. install guide cross-links).
3. Run `deploy/tools/validate-profiles.sh` (requires Docker CLI and Helm) to ensure the bundles lint and template cleanly.
4. If telemetry ingest is required for the release, generate development certificates using
`./ops/devops/telemetry/generate_dev_tls.sh` and run the collector smoke test with
`python ./ops/devops/telemetry/smoke_otel_collector.py` to verify the OTLP endpoints.
5. Commit the change alongside any documentation updates (e.g. install guide cross-links).
Maintaining the digest linkage keeps offline/air-gapped installs reproducible and avoids tag drift between environments.
### Additional tooling
- `deploy/tools/check-channel-alignment.py` verifies that Helm/Compose profiles reference the exact images listed in a release manifest. Run it for each channel before promoting a release.
- `ops/devops/telemetry/generate_dev_tls.sh` produces local CA/server/client certificates for Compose-based collector testing.
- `ops/devops/telemetry/smoke_otel_collector.py` sends OTLP traffic and asserts the collector accepted traces, metrics, and logs.
- `ops/devops/telemetry/package_offline_bundle.py` packages telemetry assets (config/Helm/Compose) into a signed tarball for air-gapped installs.
- `docs/ops/deployment-upgrade-runbook.md` end-to-end instructions for upgrade, rollback, and channel promotion workflows (Helm + Compose).
## CI smoke checks
The `.gitea/workflows/build-test-deploy.yml` pipeline includes a `notify-smoke` stage that validates scanner event propagation after staging deployments. Configure the following repository secrets (or environment-level secrets) so the job can connect to Redis and the Notify API:

View File

@@ -7,21 +7,52 @@ These Compose bundles ship the minimum services required to exercise the scanner
| Path | Purpose |
| ---- | ------- |
| `docker-compose.dev.yaml` | Edge/nightly stack tuned for laptops and iterative work. |
| `docker-compose.stage.yaml` | Stable channel stack mirroring pre-production clusters. |
| `docker-compose.airgap.yaml` | Stable stack with air-gapped defaults (no outbound hostnames). |
| `docker-compose.mirror.yaml` | Managed mirror topology for `*.stella-ops.org` distribution (Concelier + Excititor + CDN gateway). |
| `env/*.env.example` | Seed `.env` files that document required secrets and ports per profile. |
| `docker-compose.stage.yaml` | Stable channel stack mirroring pre-production clusters. |
| `docker-compose.prod.yaml` | Production cutover stack with front-door network hand-off and Notify events enabled. |
| `docker-compose.airgap.yaml` | Stable stack with air-gapped defaults (no outbound hostnames). |
| `docker-compose.mirror.yaml` | Managed mirror topology for `*.stella-ops.org` distribution (Concelier + Excititor + CDN gateway). |
| `docker-compose.telemetry.yaml` | Optional OpenTelemetry collector overlay (mutual TLS, OTLP ingest endpoints). |
| `docker-compose.telemetry-storage.yaml` | Prometheus/Tempo/Loki storage overlay with multi-tenant defaults. |
| `env/*.env.example` | Seed `.env` files that document required secrets and ports per profile. |
## Usage
```bash
cp env/dev.env.example dev.env
docker compose --env-file dev.env -f docker-compose.dev.yaml config
docker compose --env-file dev.env -f docker-compose.dev.yaml up -d
```
```bash
cp env/dev.env.example dev.env
docker compose --env-file dev.env -f docker-compose.dev.yaml config
docker compose --env-file dev.env -f docker-compose.dev.yaml up -d
```
The stage and airgap variants behave the same way—swap the file names accordingly. All profiles expose 443/8443 for the UI and REST APIs, and they share a `stellaops` Docker network scoped to the compose project.
> **Graph Explorer reminder:** If you enable Cartographer or Graph API containers alongside these profiles, update `etc/authority.yaml` so the `cartographer-service` client is marked with `properties.serviceIdentity: "cartographer"` and carries a tenant hint. The Authority host now refuses `graph:write` tokens without that marker, so apply the configuration change before rolling out the updated images.
### Telemetry collector overlay
The OpenTelemetry collector overlay is optional and can be layered on top of any profile:
```bash
./ops/devops/telemetry/generate_dev_tls.sh
docker compose -f docker-compose.telemetry.yaml up -d
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
docker compose -f docker-compose.telemetry-storage.yaml up -d
```
The generator script creates a development CA plus server/client certificates under
`deploy/telemetry/certs/`. The smoke test sends OTLP/HTTP payloads using the generated
client certificate and asserts the collector reports accepted traces, metrics, and logs.
The storage overlay starts Prometheus, Tempo, and Loki with multitenancy enabled so you
can validate the end-to-end pipeline before promoting changes to staging. Adjust the
configs in `deploy/telemetry/storage/` before running in production.
Mount the same certificates when running workloads so the collector can enforce mutual TLS.
For production cutovers copy `env/prod.env.example` to `prod.env`, update the secret placeholders, and create the external network expected by the profile:
```bash
docker network create stellaops_frontdoor
docker compose --env-file prod.env -f docker-compose.prod.yaml config
```
### Scanner event stream settings
Scanner WebService can emit signed `scanner.report.*` events to Redis Streams when `SCANNER__EVENTS__ENABLED=true`. Each profile ships environment placeholders you can override in the `.env` file:
@@ -35,6 +66,10 @@ Scanner WebService can emit signed `scanner.report.*` events to Redis Streams wh
Helm values mirror the same knobs under each services `env` map (see `deploy/helm/stellaops/values-*.yaml`).
### Front-door network hand-off
`docker-compose.prod.yaml` adds a `frontdoor` network so operators can attach Traefik, Envoy, or an on-prem load balancer that terminates TLS. Override `FRONTDOOR_NETWORK` in `prod.env` if your reverse proxy uses a different bridge name. Attach only the externally reachable services (Authority, Signer, Attestor, Concelier, Scanner Web, Notify Web, UI) to that network—internal infrastructure (Mongo, MinIO, RustFS, NATS) stays on the private `stellaops` network.
### Updating to a new release
1. Import the new manifest into `deploy/releases/` (see `deploy/README.md`).

View File

@@ -0,0 +1,237 @@
x-release-labels: &release-labels
com.stellaops.release.version: "2025.09.2"
com.stellaops.release.channel: "stable"
com.stellaops.profile: "prod"
networks:
stellaops:
driver: bridge
frontdoor:
external: true
name: ${FRONTDOOR_NETWORK:-stellaops_frontdoor}
volumes:
mongo-data:
minio-data:
rustfs-data:
concelier-jobs:
nats-data:
services:
mongo:
image: docker.io/library/mongo@sha256:c258b26dbb7774f97f52aff52231ca5f228273a84329c5f5e451c3739457db49
command: ["mongod", "--bind_ip_all"]
restart: unless-stopped
environment:
MONGO_INITDB_ROOT_USERNAME: "${MONGO_INITDB_ROOT_USERNAME}"
MONGO_INITDB_ROOT_PASSWORD: "${MONGO_INITDB_ROOT_PASSWORD}"
volumes:
- mongo-data:/data/db
networks:
- stellaops
labels: *release-labels
minio:
image: docker.io/minio/minio@sha256:14cea493d9a34af32f524e538b8346cf79f3321eff8e708c1e2960462bd8936e
command: ["server", "/data", "--console-address", ":9001"]
restart: unless-stopped
environment:
MINIO_ROOT_USER: "${MINIO_ROOT_USER}"
MINIO_ROOT_PASSWORD: "${MINIO_ROOT_PASSWORD}"
volumes:
- minio-data:/data
ports:
- "${MINIO_CONSOLE_PORT:-9001}:9001"
networks:
- stellaops
labels: *release-labels
rustfs:
image: registry.stella-ops.org/stellaops/rustfs:2025.10.0-edge
command: ["serve", "--listen", "0.0.0.0:8080", "--root", "/data"]
restart: unless-stopped
environment:
RUSTFS__LOG__LEVEL: info
RUSTFS__STORAGE__PATH: /data
volumes:
- rustfs-data:/data
ports:
- "${RUSTFS_HTTP_PORT:-8080}:8080"
networks:
- stellaops
labels: *release-labels
nats:
image: docker.io/library/nats@sha256:c82559e4476289481a8a5196e675ebfe67eea81d95e5161e3e78eccfe766608e
command:
- "-js"
- "-sd"
- /data
restart: unless-stopped
ports:
- "${NATS_CLIENT_PORT:-4222}:4222"
volumes:
- nats-data:/data
networks:
- stellaops
labels: *release-labels
authority:
image: registry.stella-ops.org/stellaops/authority@sha256:b0348bad1d0b401cc3c71cb40ba034c8043b6c8874546f90d4783c9dbfcc0bf5
restart: unless-stopped
depends_on:
- mongo
environment:
STELLAOPS_AUTHORITY__ISSUER: "${AUTHORITY_ISSUER}"
STELLAOPS_AUTHORITY__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
STELLAOPS_AUTHORITY__PLUGINDIRECTORIES__0: "/app/plugins"
STELLAOPS_AUTHORITY__PLUGINS__CONFIGURATIONDIRECTORY: "/app/etc/authority.plugins"
volumes:
- ../../etc/authority.yaml:/etc/authority.yaml:ro
- ../../etc/authority.plugins:/app/etc/authority.plugins:ro
ports:
- "${AUTHORITY_PORT:-8440}:8440"
networks:
- stellaops
- frontdoor
labels: *release-labels
signer:
image: registry.stella-ops.org/stellaops/signer@sha256:8ad574e61f3a9e9bda8a58eb2700ae46813284e35a150b1137bc7c2b92ac0f2e
restart: unless-stopped
depends_on:
- authority
environment:
SIGNER__AUTHORITY__BASEURL: "https://authority:8440"
SIGNER__POE__INTROSPECTURL: "${SIGNER_POE_INTROSPECT_URL}"
SIGNER__STORAGE__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
ports:
- "${SIGNER_PORT:-8441}:8441"
networks:
- stellaops
- frontdoor
labels: *release-labels
attestor:
image: registry.stella-ops.org/stellaops/attestor@sha256:0534985f978b0b5d220d73c96fddd962cd9135f616811cbe3bff4666c5af568f
restart: unless-stopped
depends_on:
- signer
environment:
ATTESTOR__SIGNER__BASEURL: "https://signer:8441"
ATTESTOR__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
ports:
- "${ATTESTOR_PORT:-8442}:8442"
networks:
- stellaops
- frontdoor
labels: *release-labels
concelier:
image: registry.stella-ops.org/stellaops/concelier@sha256:c58cdcaee1d266d68d498e41110a589dd204b487d37381096bd61ab345a867c5
restart: unless-stopped
depends_on:
- mongo
- minio
environment:
CONCELIER__STORAGE__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
CONCELIER__STORAGE__S3__ENDPOINT: "http://minio:9000"
CONCELIER__STORAGE__S3__ACCESSKEYID: "${MINIO_ROOT_USER}"
CONCELIER__STORAGE__S3__SECRETACCESSKEY: "${MINIO_ROOT_PASSWORD}"
CONCELIER__AUTHORITY__BASEURL: "https://authority:8440"
volumes:
- concelier-jobs:/var/lib/concelier/jobs
ports:
- "${CONCELIER_PORT:-8445}:8445"
networks:
- stellaops
- frontdoor
labels: *release-labels
scanner-web:
image: registry.stella-ops.org/stellaops/scanner-web@sha256:14b23448c3f9586a9156370b3e8c1991b61907efa666ca37dd3aaed1e79fe3b7
restart: unless-stopped
depends_on:
- concelier
- rustfs
- nats
environment:
SCANNER__STORAGE__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
SCANNER__ARTIFACTSTORE__DRIVER: "rustfs"
SCANNER__ARTIFACTSTORE__ENDPOINT: "http://rustfs:8080/api/v1"
SCANNER__ARTIFACTSTORE__BUCKET: "scanner-artifacts"
SCANNER__ARTIFACTSTORE__TIMEOUTSECONDS: "30"
SCANNER__QUEUE__BROKER: "${SCANNER_QUEUE_BROKER}"
SCANNER__EVENTS__ENABLED: "${SCANNER_EVENTS_ENABLED:-true}"
SCANNER__EVENTS__DRIVER: "${SCANNER_EVENTS_DRIVER:-redis}"
SCANNER__EVENTS__DSN: "${SCANNER_EVENTS_DSN:-}"
SCANNER__EVENTS__STREAM: "${SCANNER_EVENTS_STREAM:-stella.events}"
SCANNER__EVENTS__PUBLISHTIMEOUTSECONDS: "${SCANNER_EVENTS_PUBLISH_TIMEOUT_SECONDS:-5}"
SCANNER__EVENTS__MAXSTREAMLENGTH: "${SCANNER_EVENTS_MAX_STREAM_LENGTH:-10000}"
ports:
- "${SCANNER_WEB_PORT:-8444}:8444"
networks:
- stellaops
- frontdoor
labels: *release-labels
scanner-worker:
image: registry.stella-ops.org/stellaops/scanner-worker@sha256:32e25e76386eb9ea8bee0a1ad546775db9a2df989fab61ac877e351881960dab
restart: unless-stopped
depends_on:
- scanner-web
- rustfs
- nats
environment:
SCANNER__STORAGE__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
SCANNER__ARTIFACTSTORE__DRIVER: "rustfs"
SCANNER__ARTIFACTSTORE__ENDPOINT: "http://rustfs:8080/api/v1"
SCANNER__ARTIFACTSTORE__BUCKET: "scanner-artifacts"
SCANNER__ARTIFACTSTORE__TIMEOUTSECONDS: "30"
SCANNER__QUEUE__BROKER: "${SCANNER_QUEUE_BROKER}"
networks:
- stellaops
labels: *release-labels
notify-web:
image: ${NOTIFY_WEB_IMAGE:-registry.stella-ops.org/stellaops/notify-web:2025.09.2}
restart: unless-stopped
depends_on:
- mongo
- authority
environment:
DOTNET_ENVIRONMENT: Production
volumes:
- ../../etc/notify.prod.yaml:/app/etc/notify.yaml:ro
ports:
- "${NOTIFY_WEB_PORT:-8446}:8446"
networks:
- stellaops
- frontdoor
labels: *release-labels
excititor:
image: registry.stella-ops.org/stellaops/excititor@sha256:59022e2016aebcef5c856d163ae705755d3f81949d41195256e935ef40a627fa
restart: unless-stopped
depends_on:
- concelier
environment:
EXCITITOR__CONCELIER__BASEURL: "https://concelier:8445"
EXCITITOR__STORAGE__MONGO__CONNECTIONSTRING: "mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017"
networks:
- stellaops
labels: *release-labels
web-ui:
image: registry.stella-ops.org/stellaops/web-ui@sha256:10d924808c48e4353e3a241da62eb7aefe727a1d6dc830eb23a8e181013b3a23
restart: unless-stopped
depends_on:
- scanner-web
environment:
STELLAOPS_UI__BACKEND__BASEURL: "https://scanner-web:8444"
ports:
- "${UI_PORT:-8443}:8443"
networks:
- stellaops
- frontdoor
labels: *release-labels

View File

@@ -0,0 +1,57 @@
version: "3.9"
services:
prometheus:
image: prom/prometheus:v2.53.0
container_name: stellaops-prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yaml"
volumes:
- ../telemetry/storage/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro
- prometheus-data:/prometheus
- ../telemetry/certs:/etc/telemetry/tls:ro
- ../telemetry/storage/auth:/etc/telemetry/auth:ro
environment:
PROMETHEUS_COLLECTOR_TARGET: stellaops-otel-collector:9464
ports:
- "9090:9090"
depends_on:
- tempo
- loki
tempo:
image: grafana/tempo:2.5.0
container_name: stellaops-tempo
command:
- "-config.file=/etc/tempo/tempo.yaml"
volumes:
- ../telemetry/storage/tempo.yaml:/etc/tempo/tempo.yaml:ro
- ../telemetry/storage/tenants/tempo-overrides.yaml:/etc/telemetry/tenants/tempo-overrides.yaml:ro
- ../telemetry/certs:/etc/telemetry/tls:ro
- tempo-data:/var/tempo
ports:
- "3200:3200"
environment:
TEMPO_ZONE: docker
loki:
image: grafana/loki:3.1.0
container_name: stellaops-loki
command:
- "-config.file=/etc/loki/loki.yaml"
volumes:
- ../telemetry/storage/loki.yaml:/etc/loki/loki.yaml:ro
- ../telemetry/storage/tenants/loki-overrides.yaml:/etc/telemetry/tenants/loki-overrides.yaml:ro
- ../telemetry/certs:/etc/telemetry/tls:ro
- loki-data:/var/loki
ports:
- "3100:3100"
volumes:
prometheus-data:
tempo-data:
loki-data:
networks:
default:
name: stellaops-telemetry

View File

@@ -0,0 +1,34 @@
version: "3.9"
services:
otel-collector:
image: otel/opentelemetry-collector:0.105.0
container_name: stellaops-otel-collector
command:
- "--config=/etc/otel-collector/config.yaml"
environment:
STELLAOPS_OTEL_TLS_CERT: /etc/otel-collector/tls/collector.crt
STELLAOPS_OTEL_TLS_KEY: /etc/otel-collector/tls/collector.key
STELLAOPS_OTEL_TLS_CA: /etc/otel-collector/tls/ca.crt
STELLAOPS_OTEL_PROMETHEUS_ENDPOINT: 0.0.0.0:9464
STELLAOPS_OTEL_REQUIRE_CLIENT_CERT: "true"
STELLAOPS_TENANT_ID: dev
volumes:
- ../telemetry/otel-collector-config.yaml:/etc/otel-collector/config.yaml:ro
- ../telemetry/certs:/etc/otel-collector/tls:ro
ports:
- "4317:4317" # OTLP gRPC (mTLS)
- "4318:4318" # OTLP HTTP (mTLS)
- "9464:9464" # Prometheus exporter (mTLS)
- "13133:13133" # Health check
- "1777:1777" # pprof
healthcheck:
test: ["CMD", "curl", "-fsk", "--cert", "/etc/otel-collector/tls/client.crt", "--key", "/etc/otel-collector/tls/client.key", "--cacert", "/etc/otel-collector/tls/ca.crt", "https://localhost:13133/healthz"]
interval: 30s
start_period: 15s
timeout: 5s
retries: 3
networks:
default:
name: stellaops-telemetry

29
deploy/compose/env/prod.env.example vendored Normal file
View File

@@ -0,0 +1,29 @@
# Substitutions for docker-compose.prod.yaml
# ⚠️ Replace all placeholder secrets with values sourced from your secret manager.
MONGO_INITDB_ROOT_USERNAME=stellaops-prod
MONGO_INITDB_ROOT_PASSWORD=REPLACE_WITH_STRONG_PASSWORD
MINIO_ROOT_USER=stellaops-prod
MINIO_ROOT_PASSWORD=REPLACE_WITH_STRONG_PASSWORD
# Expose the MinIO console only to trusted operator networks.
MINIO_CONSOLE_PORT=39001
RUSTFS_HTTP_PORT=8080
AUTHORITY_ISSUER=https://authority.prod.stella-ops.org
AUTHORITY_PORT=8440
SIGNER_POE_INTROSPECT_URL=https://licensing.prod.stella-ops.org/introspect
SIGNER_PORT=8441
ATTESTOR_PORT=8442
CONCELIER_PORT=8445
SCANNER_WEB_PORT=8444
UI_PORT=8443
NATS_CLIENT_PORT=4222
SCANNER_QUEUE_BROKER=nats://nats:4222
# `true` enables signed scanner events for Notify ingestion.
SCANNER_EVENTS_ENABLED=true
SCANNER_EVENTS_DRIVER=redis
# Leave SCANNER_EVENTS_DSN empty to inherit the Redis queue DSN when SCANNER_QUEUE_BROKER uses redis://.
SCANNER_EVENTS_DSN=
SCANNER_EVENTS_STREAM=stella.events
SCANNER_EVENTS_PUBLISH_TIMEOUT_SECONDS=5
SCANNER_EVENTS_MAX_STREAM_LENGTH=10000
# External reverse proxy (Traefik, Envoy, etc.) that terminates TLS.
FRONTDOOR_NETWORK=stellaops_frontdoor

View File

@@ -0,0 +1,64 @@
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
tls:
cert_file: ${STELLAOPS_OTEL_TLS_CERT:?STELLAOPS_OTEL_TLS_CERT not set}
key_file: ${STELLAOPS_OTEL_TLS_KEY:?STELLAOPS_OTEL_TLS_KEY not set}
client_ca_file: ${STELLAOPS_OTEL_TLS_CA:?STELLAOPS_OTEL_TLS_CA not set}
require_client_certificate: ${STELLAOPS_OTEL_REQUIRE_CLIENT_CERT:true}
http:
endpoint: 0.0.0.0:4318
tls:
cert_file: ${STELLAOPS_OTEL_TLS_CERT:?STELLAOPS_OTEL_TLS_CERT not set}
key_file: ${STELLAOPS_OTEL_TLS_KEY:?STELLAOPS_OTEL_TLS_KEY not set}
client_ca_file: ${STELLAOPS_OTEL_TLS_CA:?STELLAOPS_OTEL_TLS_CA not set}
require_client_certificate: ${STELLAOPS_OTEL_REQUIRE_CLIENT_CERT:true}
processors:
attributes/tenant-tag:
actions:
- key: tenant.id
action: insert
value: ${STELLAOPS_TENANT_ID:unknown}
batch:
send_batch_size: 1024
timeout: 5s
exporters:
logging:
verbosity: normal
prometheus:
endpoint: ${STELLAOPS_OTEL_PROMETHEUS_ENDPOINT:0.0.0.0:9464}
enable_open_metrics: true
metric_expiration: 5m
tls:
cert_file: ${STELLAOPS_OTEL_TLS_CERT:?STELLAOPS_OTEL_TLS_CERT not set}
key_file: ${STELLAOPS_OTEL_TLS_KEY:?STELLAOPS_OTEL_TLS_KEY not set}
client_ca_file: ${STELLAOPS_OTEL_TLS_CA:?STELLAOPS_OTEL_TLS_CA not set}
extensions:
health_check:
endpoint: ${STELLAOPS_OTEL_HEALTH_ENDPOINT:0.0.0.0:13133}
pprof:
endpoint: ${STELLAOPS_OTEL_PPROF_ENDPOINT:0.0.0.0:1777}
service:
telemetry:
logs:
level: ${STELLAOPS_OTEL_LOG_LEVEL:info}
extensions: [health_check, pprof]
pipelines:
traces:
receivers: [otlp]
processors: [attributes/tenant-tag, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [attributes/tenant-tag, batch]
exporters: [logging, prometheus]
logs:
receivers: [otlp]
processors: [attributes/tenant-tag, batch]
exporters: [logging]

View File

@@ -1,6 +1,18 @@
{{- define "stellaops.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- define "stellaops.telemetryCollector.config" -}}
{{- if .Values.telemetry.collector.config }}
{{ tpl .Values.telemetry.collector.config . }}
{{- else }}
{{ tpl (.Files.Get "files/otel-collector-config.yaml") . }}
{{- end }}
{{- end -}}
{{- define "stellaops.telemetryCollector.fullname" -}}
{{- printf "%s-otel-collector" (include "stellaops.name" .) | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- define "stellaops.fullname" -}}
{{- $name := default .root.Chart.Name .root.Values.fullnameOverride -}}

View File

@@ -0,0 +1,121 @@
{{- if .Values.telemetry.collector.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "stellaops.telemetryCollector.fullname" . }}
labels:
{{- include "stellaops.labels" (dict "root" . "name" "otel-collector" "svc" (dict "class" "telemetry")) | nindent 4 }}
data:
config.yaml: |
{{ include "stellaops.telemetryCollector.config" . | indent 4 }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "stellaops.telemetryCollector.fullname" . }}
labels:
{{- include "stellaops.labels" (dict "root" . "name" "otel-collector" "svc" (dict "class" "telemetry")) | nindent 4 }}
spec:
replicas: {{ .Values.telemetry.collector.replicas | default 1 }}
selector:
matchLabels:
app.kubernetes.io/name: {{ include "stellaops.name" . | quote }}
app.kubernetes.io/component: "otel-collector"
template:
metadata:
labels:
app.kubernetes.io/name: {{ include "stellaops.name" . | quote }}
app.kubernetes.io/component: "otel-collector"
stellaops.profile: {{ .Values.global.profile | quote }}
spec:
containers:
- name: otel-collector
image: {{ .Values.telemetry.collector.image | default "otel/opentelemetry-collector:0.105.0" | quote }}
args:
- "--config=/etc/otel/config.yaml"
ports:
- name: otlp-grpc
containerPort: 4317
- name: otlp-http
containerPort: 4318
- name: metrics
containerPort: 9464
- name: health
containerPort: 13133
- name: pprof
containerPort: 1777
env:
- name: STELLAOPS_OTEL_TLS_CERT
value: {{ .Values.telemetry.collector.tls.certPath | default "/etc/otel/tls/tls.crt" | quote }}
- name: STELLAOPS_OTEL_TLS_KEY
value: {{ .Values.telemetry.collector.tls.keyPath | default "/etc/otel/tls/tls.key" | quote }}
- name: STELLAOPS_OTEL_TLS_CA
value: {{ .Values.telemetry.collector.tls.caPath | default "/etc/otel/tls/ca.crt" | quote }}
- name: STELLAOPS_OTEL_PROMETHEUS_ENDPOINT
value: {{ .Values.telemetry.collector.prometheusEndpoint | default "0.0.0.0:9464" | quote }}
- name: STELLAOPS_OTEL_REQUIRE_CLIENT_CERT
value: {{ .Values.telemetry.collector.requireClientCert | default true | quote }}
- name: STELLAOPS_TENANT_ID
value: {{ .Values.telemetry.collector.defaultTenant | default "unknown" | quote }}
- name: STELLAOPS_OTEL_LOG_LEVEL
value: {{ .Values.telemetry.collector.logLevel | default "info" | quote }}
volumeMounts:
- name: config
mountPath: /etc/otel/config.yaml
subPath: config.yaml
readOnly: true
- name: tls
mountPath: /etc/otel/tls
readOnly: true
livenessProbe:
httpGet:
scheme: HTTPS
port: health
path: /healthz
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
scheme: HTTPS
port: health
path: /healthz
initialDelaySeconds: 5
periodSeconds: 15
{{- with .Values.telemetry.collector.resources }}
resources:
{{ toYaml . | indent 12 }}
{{- end }}
volumes:
- name: config
configMap:
name: {{ include "stellaops.telemetryCollector.fullname" . }}
- name: tls
secret:
secretName: {{ .Values.telemetry.collector.tls.secretName | required "telemetry.collector.tls.secretName is required" }}
{{- if .Values.telemetry.collector.tls.items }}
items:
{{ toYaml .Values.telemetry.collector.tls.items | indent 14 }}
{{- end }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ include "stellaops.telemetryCollector.fullname" . }}
labels:
{{- include "stellaops.labels" (dict "root" . "name" "otel-collector" "svc" (dict "class" "telemetry")) | nindent 4 }}
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: {{ include "stellaops.name" . | quote }}
app.kubernetes.io/component: "otel-collector"
ports:
- name: otlp-grpc
port: {{ .Values.telemetry.collector.service.grpcPort | default 4317 }}
targetPort: otlp-grpc
- name: otlp-http
port: {{ .Values.telemetry.collector.service.httpPort | default 4318 }}
targetPort: otlp-http
- name: metrics
port: {{ .Values.telemetry.collector.service.metricsPort | default 9464 }}
targetPort: metrics
{{- end }}

View File

@@ -1,15 +1,22 @@
global:
profile: dev
release:
version: "2025.10.0-edge"
channel: edge
manifestSha256: "822f82987529ea38d2321dbdd2ef6874a4062a117116a20861c26a8df1807beb"
image:
pullPolicy: IfNotPresent
labels:
stellaops.io/channel: edge
configMaps:
global:
profile: dev
release:
version: "2025.10.0-edge"
channel: edge
manifestSha256: "822f82987529ea38d2321dbdd2ef6874a4062a117116a20861c26a8df1807beb"
image:
pullPolicy: IfNotPresent
labels:
stellaops.io/channel: edge
telemetry:
collector:
enabled: true
defaultTenant: dev
tls:
secretName: stellaops-otel-tls
configMaps:
notify-config:
data:
notify.yaml: |

View File

@@ -0,0 +1,221 @@
global:
profile: prod
release:
version: "2025.09.2"
channel: stable
manifestSha256: "dc3c8fe1ab83941c838ccc5a8a5862f7ddfa38c2078e580b5649db26554565b7"
image:
pullPolicy: IfNotPresent
labels:
stellaops.io/channel: stable
stellaops.io/profile: prod
configMaps:
notify-config:
data:
notify.yaml: |
storage:
driver: mongo
connectionString: "mongodb://stellaops-mongo:27017"
database: "stellaops_notify_prod"
commandTimeoutSeconds: 45
authority:
enabled: true
issuer: "https://authority.prod.stella-ops.org"
metadataAddress: "https://authority.prod.stella-ops.org/.well-known/openid-configuration"
requireHttpsMetadata: true
allowAnonymousFallback: false
backchannelTimeoutSeconds: 30
tokenClockSkewSeconds: 60
audiences:
- notify
readScope: notify.read
adminScope: notify.admin
api:
basePath: "/api/v1/notify"
internalBasePath: "/internal/notify"
tenantHeader: "X-StellaOps-Tenant"
plugins:
baseDirectory: "/opt/stellaops"
directory: "plugins/notify"
searchPatterns:
- "StellaOps.Notify.Connectors.*.dll"
orderedPlugins:
- StellaOps.Notify.Connectors.Slack
- StellaOps.Notify.Connectors.Teams
- StellaOps.Notify.Connectors.Email
- StellaOps.Notify.Connectors.Webhook
telemetry:
enableRequestLogging: true
minimumLogLevel: Information
services:
authority:
image: registry.stella-ops.org/stellaops/authority@sha256:b0348bad1d0b401cc3c71cb40ba034c8043b6c8874546f90d4783c9dbfcc0bf5
service:
port: 8440
env:
STELLAOPS_AUTHORITY__ISSUER: "https://authority.prod.stella-ops.org"
STELLAOPS_AUTHORITY__PLUGINDIRECTORIES__0: "/app/plugins"
STELLAOPS_AUTHORITY__PLUGINS__CONFIGURATIONDIRECTORY: "/app/etc/authority.plugins"
envFrom:
- secretRef:
name: stellaops-prod-core
signer:
image: registry.stella-ops.org/stellaops/signer@sha256:8ad574e61f3a9e9bda8a58eb2700ae46813284e35a150b1137bc7c2b92ac0f2e
service:
port: 8441
env:
SIGNER__AUTHORITY__BASEURL: "https://stellaops-authority:8440"
SIGNER__POE__INTROSPECTURL: "https://licensing.prod.stella-ops.org/introspect"
envFrom:
- secretRef:
name: stellaops-prod-core
attestor:
image: registry.stella-ops.org/stellaops/attestor@sha256:0534985f978b0b5d220d73c96fddd962cd9135f616811cbe3bff4666c5af568f
service:
port: 8442
env:
ATTESTOR__SIGNER__BASEURL: "https://stellaops-signer:8441"
envFrom:
- secretRef:
name: stellaops-prod-core
concelier:
image: registry.stella-ops.org/stellaops/concelier@sha256:c58cdcaee1d266d68d498e41110a589dd204b487d37381096bd61ab345a867c5
service:
port: 8445
env:
CONCELIER__STORAGE__S3__ENDPOINT: "http://stellaops-minio:9000"
CONCELIER__AUTHORITY__BASEURL: "https://stellaops-authority:8440"
envFrom:
- secretRef:
name: stellaops-prod-core
volumeMounts:
- name: concelier-jobs
mountPath: /var/lib/concelier/jobs
volumeClaims:
- name: concelier-jobs
claimName: stellaops-concelier-jobs
scanner-web:
image: registry.stella-ops.org/stellaops/scanner-web@sha256:14b23448c3f9586a9156370b3e8c1991b61907efa666ca37dd3aaed1e79fe3b7
service:
port: 8444
env:
SCANNER__ARTIFACTSTORE__DRIVER: "rustfs"
SCANNER__ARTIFACTSTORE__ENDPOINT: "http://stellaops-rustfs:8080/api/v1"
SCANNER__ARTIFACTSTORE__BUCKET: "scanner-artifacts"
SCANNER__ARTIFACTSTORE__TIMEOUTSECONDS: "30"
SCANNER__QUEUE__BROKER: "nats://stellaops-nats:4222"
SCANNER__EVENTS__ENABLED: "true"
SCANNER__EVENTS__DRIVER: "redis"
SCANNER__EVENTS__DSN: ""
SCANNER__EVENTS__STREAM: "stella.events"
SCANNER__EVENTS__PUBLISHTIMEOUTSECONDS: "5"
SCANNER__EVENTS__MAXSTREAMLENGTH: "10000"
envFrom:
- secretRef:
name: stellaops-prod-core
scanner-worker:
image: registry.stella-ops.org/stellaops/scanner-worker@sha256:32e25e76386eb9ea8bee0a1ad546775db9a2df989fab61ac877e351881960dab
replicas: 3
env:
SCANNER__ARTIFACTSTORE__DRIVER: "rustfs"
SCANNER__ARTIFACTSTORE__ENDPOINT: "http://stellaops-rustfs:8080/api/v1"
SCANNER__ARTIFACTSTORE__BUCKET: "scanner-artifacts"
SCANNER__ARTIFACTSTORE__TIMEOUTSECONDS: "30"
SCANNER__QUEUE__BROKER: "nats://stellaops-nats:4222"
SCANNER__EVENTS__ENABLED: "true"
SCANNER__EVENTS__DRIVER: "redis"
SCANNER__EVENTS__DSN: ""
SCANNER__EVENTS__STREAM: "stella.events"
SCANNER__EVENTS__PUBLISHTIMEOUTSECONDS: "5"
SCANNER__EVENTS__MAXSTREAMLENGTH: "10000"
envFrom:
- secretRef:
name: stellaops-prod-core
notify-web:
image: registry.stella-ops.org/stellaops/notify-web:2025.09.2
service:
port: 8446
env:
DOTNET_ENVIRONMENT: Production
envFrom:
- secretRef:
name: stellaops-prod-notify
configMounts:
- name: notify-config
mountPath: /app/etc/notify.yaml
subPath: notify.yaml
configMap: notify-config
excititor:
image: registry.stella-ops.org/stellaops/excititor@sha256:59022e2016aebcef5c856d163ae705755d3f81949d41195256e935ef40a627fa
env:
EXCITITOR__CONCELIER__BASEURL: "https://stellaops-concelier:8445"
envFrom:
- secretRef:
name: stellaops-prod-core
web-ui:
image: registry.stella-ops.org/stellaops/web-ui@sha256:10d924808c48e4353e3a241da62eb7aefe727a1d6dc830eb23a8e181013b3a23
service:
port: 8443
env:
STELLAOPS_UI__BACKEND__BASEURL: "https://stellaops-scanner-web:8444"
mongo:
class: infrastructure
image: docker.io/library/mongo@sha256:c258b26dbb7774f97f52aff52231ca5f228273a84329c5f5e451c3739457db49
service:
port: 27017
command:
- mongod
- --bind_ip_all
envFrom:
- secretRef:
name: stellaops-prod-mongo
volumeMounts:
- name: mongo-data
mountPath: /data/db
volumeClaims:
- name: mongo-data
claimName: stellaops-mongo-data
minio:
class: infrastructure
image: docker.io/minio/minio@sha256:14cea493d9a34af32f524e538b8346cf79f3321eff8e708c1e2960462bd8936e
service:
port: 9000
command:
- server
- /data
- --console-address
- :9001
envFrom:
- secretRef:
name: stellaops-prod-minio
volumeMounts:
- name: minio-data
mountPath: /data
volumeClaims:
- name: minio-data
claimName: stellaops-minio-data
rustfs:
class: infrastructure
image: registry.stella-ops.org/stellaops/rustfs:2025.10.0-edge
service:
port: 8080
command:
- serve
- --listen
- 0.0.0.0:8080
- --root
- /data
env:
RUSTFS__LOG__LEVEL: info
RUSTFS__STORAGE__PATH: /data
volumeMounts:
- name: rustfs-data
mountPath: /data
volumeClaims:
- name: rustfs-data
claimName: stellaops-rustfs-data

View File

@@ -1,13 +1,20 @@
global:
profile: stage
release:
version: "2025.09.2"
global:
profile: stage
release:
version: "2025.09.2"
channel: stable
manifestSha256: "dc3c8fe1ab83941c838ccc5a8a5862f7ddfa38c2078e580b5649db26554565b7"
image:
pullPolicy: IfNotPresent
labels:
stellaops.io/channel: stable
labels:
stellaops.io/channel: stable
telemetry:
collector:
enabled: true
defaultTenant: stage
tls:
secretName: stellaops-otel-tls-stage
configMaps:
notify-config:

View File

@@ -1,10 +1,37 @@
global:
release:
version: ""
channel: ""
manifestSha256: ""
profile: ""
image:
pullPolicy: IfNotPresent
labels: {}
services: {}
global:
release:
version: ""
channel: ""
manifestSha256: ""
profile: ""
image:
pullPolicy: IfNotPresent
labels: {}
telemetry:
collector:
enabled: false
replicas: 1
image: otel/opentelemetry-collector:0.105.0
requireClientCert: true
defaultTenant: unknown
logLevel: info
tls:
secretName: ""
certPath: /etc/otel/tls/tls.crt
keyPath: /etc/otel/tls/tls.key
caPath: /etc/otel/tls/ca.crt
items:
- key: tls.crt
path: tls.crt
- key: tls.key
path: tls.key
- key: ca.crt
path: ca.crt
service:
grpcPort: 4317
httpPort: 4318
metricsPort: 9464
resources: {}
services: {}

1
deploy/telemetry/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
certs/

View File

@@ -0,0 +1,35 @@
# Telemetry Collector Assets
These assets provision the default OpenTelemetry Collector instance required by
`DEVOPS-OBS-50-001`. The collector acts as the secured ingest point for traces,
metrics, and logs emitted by StellaOps services.
## Contents
| File | Purpose |
| ---- | ------- |
| `otel-collector-config.yaml` | Baseline collector configuration (mutual TLS, OTLP receivers, Prometheus exporter). |
| `storage/prometheus.yaml` | Prometheus scrape configuration tuned for the collector and service tenants. |
| `storage/tempo.yaml` | Tempo configuration with multitenancy, WAL, and compaction settings. |
| `storage/loki.yaml` | Loki configuration enabling multitenant log ingestion with retention policies. |
| `storage/tenants/*.yaml` | Per-tenant overrides for Tempo and Loki rate/retention controls. |
## Development workflow
1. Generate development certificates (collector + client) using
`ops/devops/telemetry/generate_dev_tls.sh`.
2. Launch the collector via `docker compose -f docker-compose.telemetry.yaml up`.
3. Launch the storage backends (Prometheus, Tempo, Loki) via
`docker compose -f docker-compose.telemetry-storage.yaml up`.
4. Run the smoke test: `python ops/devops/telemetry/smoke_otel_collector.py`.
5. Explore the storage configuration (`storage/README.md`) to tune retention/limits.
The smoke test sends OTLP traffic over TLS and asserts the collector accepted
traces, metrics, and logs by scraping the Prometheus metrics endpoint.
## Kubernetes
The Helm chart consumes the same configuration (see `values.yaml`). Provide TLS
material via a secret referenced by `telemetry.collector.tls.secretName`,
containing `ca.crt`, `tls.crt`, and `tls.key`. Client certificates are required
for ingestion and should be issued by the same CA.

View File

@@ -0,0 +1,67 @@
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
tls:
cert_file: ${STELLAOPS_OTEL_TLS_CERT:?STELLAOPS_OTEL_TLS_CERT not set}
key_file: ${STELLAOPS_OTEL_TLS_KEY:?STELLAOPS_OTEL_TLS_KEY not set}
client_ca_file: ${STELLAOPS_OTEL_TLS_CA:?STELLAOPS_OTEL_TLS_CA not set}
require_client_certificate: ${STELLAOPS_OTEL_REQUIRE_CLIENT_CERT:true}
http:
endpoint: 0.0.0.0:4318
tls:
cert_file: ${STELLAOPS_OTEL_TLS_CERT:?STELLAOPS_OTEL_TLS_CERT not set}
key_file: ${STELLAOPS_OTEL_TLS_KEY:?STELLAOPS_OTEL_TLS_KEY not set}
client_ca_file: ${STELLAOPS_OTEL_TLS_CA:?STELLAOPS_OTEL_TLS_CA not set}
require_client_certificate: ${STELLAOPS_OTEL_REQUIRE_CLIENT_CERT:true}
processors:
attributes/tenant-tag:
actions:
- key: tenant.id
action: insert
value: ${STELLAOPS_TENANT_ID:unknown}
batch:
send_batch_size: 1024
timeout: 5s
exporters:
logging:
verbosity: normal
prometheus:
endpoint: ${STELLAOPS_OTEL_PROMETHEUS_ENDPOINT:0.0.0.0:9464}
enable_open_metrics: true
metric_expiration: 5m
tls:
cert_file: ${STELLAOPS_OTEL_TLS_CERT:?STELLAOPS_OTEL_TLS_CERT not set}
key_file: ${STELLAOPS_OTEL_TLS_KEY:?STELLAOPS_OTEL_TLS_KEY not set}
client_ca_file: ${STELLAOPS_OTEL_TLS_CA:?STELLAOPS_OTEL_TLS_CA not set}
# Additional OTLP exporters can be configured by extending this section at runtime.
# For example, set STELLAOPS_OTEL_UPSTREAM_ENDPOINT and mount certificates, then
# add the exporter via a sidecar overlay.
extensions:
health_check:
endpoint: ${STELLAOPS_OTEL_HEALTH_ENDPOINT:0.0.0.0:13133}
pprof:
endpoint: ${STELLAOPS_OTEL_PPROF_ENDPOINT:0.0.0.0:1777}
service:
telemetry:
logs:
level: ${STELLAOPS_OTEL_LOG_LEVEL:info}
extensions: [health_check, pprof]
pipelines:
traces:
receivers: [otlp]
processors: [attributes/tenant-tag, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [attributes/tenant-tag, batch]
exporters: [logging, prometheus]
logs:
receivers: [otlp]
processors: [attributes/tenant-tag, batch]
exporters: [logging]

View File

@@ -0,0 +1,33 @@
# Telemetry Storage Stack
Configuration snippets for the default StellaOps observability backends used in
staging and production environments. The stack comprises:
- **Prometheus** for metrics (scraping the collector's Prometheus exporter)
- **Tempo** for traces (OTLP ingest via mTLS)
- **Loki** for logs (HTTP ingest with tenant isolation)
## Files
| Path | Description |
| ---- | ----------- |
| `prometheus.yaml` | Scrape configuration for the collector (mTLS + bearer token placeholder). |
| `tempo.yaml` | Tempo configuration with multitenancy enabled and local storage paths. |
| `loki.yaml` | Loki configuration enabling per-tenant overrides and boltdb-shipper storage. |
| `tenants/tempo-overrides.yaml` | Example tenant overrides for Tempo (retention, limits). |
| `tenants/loki-overrides.yaml` | Example tenant overrides for Loki (rate limits, retention). |
| `auth/` | Placeholder directory for Prometheus bearer token files (e.g., `token`). |
These configurations are referenced by the Docker Compose overlay
(`deploy/compose/docker-compose.telemetry-storage.yaml`) and the staging rollout documented in
`docs/ops/telemetry-storage.md`. Adjust paths, credentials, and overrides before running in
connected environments. Place the Prometheus bearer token in `auth/token` when using the
Compose overlay (the directory contains a `.gitkeep` placeholder and is gitignored by default).
## Security
- Both Tempo and Loki require mutual TLS.
- Prometheus uses mTLS plus a bearer token that should be minted by Authority.
- Update the overrides files to enforce per-tenant retention/ingestion limits.
For comprehensive deployment steps see `docs/ops/telemetry-storage.md`.

View File

View File

@@ -0,0 +1,48 @@
auth_enabled: true
server:
http_listen_port: 3100
log_level: info
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /var/loki
schema_config:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
storage_config:
filesystem:
directory: /var/loki/chunks
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/index_cache
shared_store: filesystem
ruler:
storage:
type: local
local:
directory: /var/loki/rules
rule_path: /tmp/loki-rules
enable_api: true
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_entries_limit_per_query: 5000
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
per_tenant_override_config: /etc/telemetry/tenants/loki-overrides.yaml

View File

@@ -0,0 +1,19 @@
global:
scrape_interval: 15s
evaluation_interval: 30s
scrape_configs:
- job_name: "stellaops-otel-collector"
scheme: https
metrics_path: /
tls_config:
ca_file: ${PROMETHEUS_TLS_CA_FILE:-/etc/telemetry/tls/ca.crt}
cert_file: ${PROMETHEUS_TLS_CERT_FILE:-/etc/telemetry/tls/client.crt}
key_file: ${PROMETHEUS_TLS_KEY_FILE:-/etc/telemetry/tls/client.key}
insecure_skip_verify: false
authorization:
type: Bearer
credentials_file: ${PROMETHEUS_BEARER_TOKEN_FILE:-/etc/telemetry/auth/token}
static_configs:
- targets:
- ${PROMETHEUS_COLLECTOR_TARGET:-stellaops-otel-collector:9464}

View File

@@ -0,0 +1,56 @@
multitenancy_enabled: true
usage_report:
reporting_enabled: false
server:
http_listen_port: 3200
log_level: info
distributor:
receivers:
otlp:
protocols:
grpc:
tls:
cert_file: ${TEMPO_TLS_CERT_FILE:-/etc/telemetry/tls/server.crt}
key_file: ${TEMPO_TLS_KEY_FILE:-/etc/telemetry/tls/server.key}
client_ca_file: ${TEMPO_TLS_CA_FILE:-/etc/telemetry/tls/ca.crt}
require_client_cert: true
http:
tls:
cert_file: ${TEMPO_TLS_CERT_FILE:-/etc/telemetry/tls/server.crt}
key_file: ${TEMPO_TLS_KEY_FILE:-/etc/telemetry/tls/server.key}
client_ca_file: ${TEMPO_TLS_CA_FILE:-/etc/telemetry/tls/ca.crt}
require_client_cert: true
ingester:
lifecycler:
ring:
instance_availability_zone: ${TEMPO_ZONE:-zone-a}
trace_idle_period: 10s
max_block_bytes: 1_048_576
compactor:
compaction:
block_retention: 168h
metrics_generator:
registry:
external_labels:
cluster: stellaops
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
metrics:
backend: prometheus
overrides:
defaults:
ingestion_rate_limit_bytes: 1048576
max_traces_per_user: 200000
per_tenant_override_config: /etc/telemetry/tenants/tempo-overrides.yaml

View File

@@ -0,0 +1,19 @@
# Example Loki per-tenant overrides
# Adjust according to https://grafana.com/docs/loki/latest/configuration/#limits_config
stellaops-dev:
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_global_streams_per_user: 5000
retention_period: 168h
stellaops-stage:
ingestion_rate_mb: 20
ingestion_burst_size_mb: 40
max_global_streams_per_user: 10000
retention_period: 336h
__default__:
ingestion_rate_mb: 5
ingestion_burst_size_mb: 10
retention_period: 72h

View File

@@ -0,0 +1,16 @@
# Example Tempo per-tenant overrides
# Consult https://grafana.com/docs/tempo/latest/configuration/#limits-configuration
# before applying in production.
stellaops-dev:
traces_per_second_limit: 100000
max_bytes_per_trace: 10485760
max_search_bytes_per_trace: 20971520
stellaops-stage:
traces_per_second_limit: 200000
max_bytes_per_trace: 20971520
__default__:
traces_per_second_limit: 50000
max_bytes_per_trace: 5242880

View File

@@ -0,0 +1,130 @@
#!/usr/bin/env python3
"""
Ensure deployment bundles reference the images defined in a release manifest.
Usage:
./deploy/tools/check-channel-alignment.py \
--release deploy/releases/2025.10-edge.yaml \
--target deploy/helm/stellaops/values-dev.yaml \
--target deploy/compose/docker-compose.dev.yaml
For every target file, the script scans `image:` declarations and verifies that
any image belonging to a repository listed in the release manifest matches the
exact digest or tag recorded there. Images outside of the manifest (for example,
supporting services such as `nats`) are ignored.
"""
from __future__ import annotations
import argparse
import pathlib
import re
import sys
from typing import Dict, Iterable, List, Optional, Set
IMAGE_LINE = re.compile(r"^\s*image:\s*['\"]?(?P<image>\S+)['\"]?\s*$")
def extract_images(path: pathlib.Path) -> List[str]:
images: List[str] = []
for line in path.read_text(encoding="utf-8").splitlines():
match = IMAGE_LINE.match(line)
if match:
images.append(match.group("image"))
return images
def image_repo(image: str) -> str:
if "@" in image:
return image.split("@", 1)[0]
# Split on the last colon to preserve registries with ports (e.g. localhost:5000)
if ":" in image:
prefix, tag = image.rsplit(":", 1)
if "/" in tag:
# handle digestive colon inside path (unlikely)
return image
return prefix
return image
def load_release_map(release_path: pathlib.Path) -> Dict[str, str]:
release_map: Dict[str, str] = {}
for image in extract_images(release_path):
repo = image_repo(image)
release_map[repo] = image
return release_map
def check_target(
target_path: pathlib.Path,
release_map: Dict[str, str],
ignore_repos: Set[str],
) -> List[str]:
errors: List[str] = []
for image in extract_images(target_path):
repo = image_repo(image)
if repo in ignore_repos:
continue
if repo not in release_map:
continue
expected = release_map[repo]
if image != expected:
errors.append(
f"{target_path}: {image} does not match release value {expected}"
)
return errors
def parse_args(argv: Optional[Iterable[str]] = None) -> argparse.Namespace:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--release",
required=True,
type=pathlib.Path,
help="Path to the release manifest (YAML)",
)
parser.add_argument(
"--target",
action="append",
required=True,
type=pathlib.Path,
help="Deployment profile to validate against the release manifest",
)
parser.add_argument(
"--ignore-repo",
action="append",
default=[],
help="Repository prefix to ignore (may be repeated)",
)
return parser.parse_args(argv)
def main(argv: Optional[Iterable[str]] = None) -> int:
args = parse_args(argv)
release_map = load_release_map(args.release)
ignore_repos = {repo.rstrip("/") for repo in args.ignore_repo}
if not release_map:
print(f"error: no images found in release manifest {args.release}", file=sys.stderr)
return 2
total_errors: List[str] = []
for target in args.target:
if not target.exists():
total_errors.append(f"{target}: file not found")
continue
total_errors.extend(check_target(target, release_map, ignore_repos))
if total_errors:
print("✖ channel alignment check failed:", file=sys.stderr)
for err in total_errors:
print(f" - {err}", file=sys.stderr)
return 1
print("✓ deployment profiles reference release images for the inspected repositories.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -1,53 +1,61 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
COMPOSE_DIR="$ROOT_DIR/compose"
HELM_DIR="$ROOT_DIR/helm/stellaops"
compose_profiles=(
"docker-compose.dev.yaml:env/dev.env.example"
"docker-compose.stage.yaml:env/stage.env.example"
"docker-compose.airgap.yaml:env/airgap.env.example"
"docker-compose.mirror.yaml:env/mirror.env.example"
)
docker_ready=false
if command -v docker >/dev/null 2>&1; then
if docker compose version >/dev/null 2>&1; then
docker_ready=true
else
echo "⚠️ docker CLI present but Compose plugin unavailable; skipping compose validation" >&2
fi
else
echo "⚠️ docker CLI not found; skipping compose validation" >&2
fi
if [[ "$docker_ready" == "true" ]]; then
for entry in "${compose_profiles[@]}"; do
IFS=":" read -r compose_file env_file <<<"$entry"
printf '→ validating %s with %s\n' "$compose_file" "$env_file"
docker compose \
--env-file "$COMPOSE_DIR/$env_file" \
-f "$COMPOSE_DIR/$compose_file" config >/dev/null
done
fi
helm_values=(
"$HELM_DIR/values-dev.yaml"
"$HELM_DIR/values-stage.yaml"
"$HELM_DIR/values-airgap.yaml"
"$HELM_DIR/values-mirror.yaml"
)
if command -v helm >/dev/null 2>&1; then
for values in "${helm_values[@]}"; do
printf '→ linting Helm chart with %s\n' "$(basename "$values")"
helm lint "$HELM_DIR" -f "$values"
helm template test-release "$HELM_DIR" -f "$values" >/dev/null
done
else
echo "⚠️ helm CLI not found; skipping Helm lint/template" >&2
fi
printf 'Profiles validated (where tooling was available).\n'
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
COMPOSE_DIR="$ROOT_DIR/compose"
HELM_DIR="$ROOT_DIR/helm/stellaops"
compose_profiles=(
"docker-compose.dev.yaml:env/dev.env.example"
"docker-compose.stage.yaml:env/stage.env.example"
"docker-compose.prod.yaml:env/prod.env.example"
"docker-compose.airgap.yaml:env/airgap.env.example"
"docker-compose.mirror.yaml:env/mirror.env.example"
"docker-compose.telemetry.yaml:"
"docker-compose.telemetry-storage.yaml:"
)
docker_ready=false
if command -v docker >/dev/null 2>&1; then
if docker compose version >/dev/null 2>&1; then
docker_ready=true
else
echo "⚠️ docker CLI present but Compose plugin unavailable; skipping compose validation" >&2
fi
else
echo "⚠️ docker CLI not found; skipping compose validation" >&2
fi
if [[ "$docker_ready" == "true" ]]; then
for entry in "${compose_profiles[@]}"; do
IFS=":" read -r compose_file env_file <<<"$entry"
printf '→ validating %s with %s\n' "$compose_file" "$env_file"
if [[ -n "$env_file" ]]; then
docker compose \
--env-file "$COMPOSE_DIR/$env_file" \
-f "$COMPOSE_DIR/$compose_file" config >/dev/null
else
docker compose -f "$COMPOSE_DIR/$compose_file" config >/dev/null
fi
done
fi
helm_values=(
"$HELM_DIR/values-dev.yaml"
"$HELM_DIR/values-stage.yaml"
"$HELM_DIR/values-prod.yaml"
"$HELM_DIR/values-airgap.yaml"
"$HELM_DIR/values-mirror.yaml"
)
if command -v helm >/dev/null 2>&1; then
for values in "${helm_values[@]}"; do
printf '→ linting Helm chart with %s\n' "$(basename "$values")"
helm lint "$HELM_DIR" -f "$values"
helm template test-release "$HELM_DIR" -f "$values" >/dev/null
done
else
echo "⚠️ helm CLI not found; skipping Helm lint/template" >&2
fi
printf 'Profiles validated (where tooling was available).\n'