Files
git.stella-ops.org/deploy
master 342c35f8ce Deprecate MongoDB support in AOC verification CLI
Removes legacy MongoDB options and code paths from the AOC verification command, enforcing PostgreSQL as the required backend. Updates environment examples and documentation to reflect Valkey and RustFS as defaults, replacing Redis and MinIO references.
2025-12-23 10:21:02 +02:00
..
up
2025-12-14 18:33:02 +02:00
up
2025-12-14 23:20:14 +02:00
up
2025-12-14 15:50:38 +02:00
2025-12-18 00:47:24 +02:00

Deployment Profiles

This directory contains deterministic deployment bundles for the core Stella Ops stack. All manifests reference immutable image digests and map 1:1 to the release manifests stored under deploy/releases/.

Structure

  • releases/ canonical release manifests (edge, stable, airgap) used to source image digests.
  • compose/ Docker Compose bundles for dev/stage/airgap targets plus .env seed files.
  • compose/docker-compose.mirror.yaml managed mirror bundle for *.stella-ops.org with gateway cache and multi-tenant auth.
  • compose/docker-compose.telemetry.yaml optional OpenTelemetry collector overlay (mutual TLS, OTLP pipelines).
  • compose/docker-compose.telemetry-storage.yaml optional Prometheus/Tempo/Loki stack for observability backends.
  • helm/stellaops/ multi-profile Helm chart with values files for dev/stage/airgap.
  • helm/stellaops/INSTALL.md install/runbook for prod and airgap profiles with digest pins.
  • telemetry/ shared OpenTelemetry collector configuration and certificate artefacts (generated via tooling).
  • tools/validate-profiles.sh helper that runs docker compose config and helm lint/template for every profile.

Workflow

  1. Update or add a release manifest under releases/ with the new digests.
  2. Mirror the digests into the Compose and Helm profiles that correspond to that channel.
  3. Run deploy/tools/validate-profiles.sh (requires Docker CLI and Helm) to ensure the bundles lint and template cleanly.
  4. If telemetry ingest is required for the release, generate development certificates using ./ops/devops/telemetry/generate_dev_tls.sh and run the collector smoke test with python ./ops/devops/telemetry/smoke_otel_collector.py to verify the OTLP endpoints.
  5. Commit the change alongside any documentation updates (e.g. install guide cross-links).

Maintaining the digest linkage keeps offline/air-gapped installs reproducible and avoids tag drift between environments.

Surface.Env rollout warnings

  • Compose (deploy/compose/env/*.env.example) and Helm (deploy/helm/stellaops/values-*.yaml) now seed SCANNER_SURFACE_* and ZASTAVA_SURFACE_* variables so Scanner Worker/WebService and Zastava Observer/Webhook resolve cache roots, Surface.FS endpoints, and secrets providers through StellaOps.Scanner.Surface.Env.
  • During rollout, watch for structured log messages (and readiness output) prefixed with surface.env.—for example, surface.env.cache_root_missing, surface.env.endpoint_unreachable, or surface.env.secrets_provider_invalid.
  • Treat these warnings as deployment blockers: update the endpoint/cache/secrets values or permissions before promoting the environment, otherwise workers will fail fast at startup.
  • Air-gapped bundles default the secrets provider to file with /etc/stellaops/secrets; connected clusters default to kubernetes. Adjust the provider/root pair if your secrets manager differs.
  • Secret provisioning workflows for Kubernetes/Compose/Offline Kit are documented in ops/devops/secrets/surface-secrets-provisioning.md; follow that for Surface.Secrets handles and RBAC/permissions.

Mongo2Go OpenSSL prerequisites

  • Linux runners that execute Mongo2Go-backed suites (Excititor, Scheduler, Graph, etc.) must expose OpenSSL 1.1 (libcrypto.so.1.1, libssl.so.1.1). The canonical copies live under tests/native/openssl-1.1/linux-x64.
  • Export LD_LIBRARY_PATH="$(git rev-parse --show-toplevel)/tests/native/openssl-1.1/linux-x64:${LD_LIBRARY_PATH:-}" before invoking dotnet test. Example:
    LD_LIBRARY_PATH="$(pwd)/tests/native/openssl-1.1/linux-x64" dotnet test src/Excititor/__Tests/StellaOps.Excititor.WebService.Tests/StellaOps.Excititor.WebService.Tests.csproj --nologo.
  • CI agents or Dockerfiles that host these tests should either mount the directory into the container or copy the two .so files into a directory that is already on the runtime library path.

Additional tooling

  • deploy/tools/check-channel-alignment.py verifies that Helm/Compose profiles reference the exact images listed in a release manifest. Run it for each channel before promoting a release.
  • ops/devops/telemetry/generate_dev_tls.sh produces local CA/server/client certificates for Compose-based collector testing.
  • ops/devops/telemetry/smoke_otel_collector.py sends OTLP traffic and asserts the collector accepted traces, metrics, and logs.
  • ops/devops/telemetry/package_offline_bundle.py packages telemetry assets (config/Helm/Compose) into a signed tarball for air-gapped installs.
  • docs/modules/devops/runbooks/deployment-upgrade.md end-to-end instructions for upgrade, rollback, and channel promotion workflows (Helm + Compose).

Tenancy observability & chaos (DEVOPS-TEN-49-001)

  • Import ops/devops/tenant/recording-rules.yaml and ops/devops/tenant/alerts.yaml into your Prometheus rule groups.
  • Add Grafana dashboard ops/devops/tenant/dashboards/tenant-audit.json (folder StellaOps / Tenancy) to watch latency/error/auth cache ratios per tenant/service.
  • Run the multi-tenant k6 harness ops/devops/tenant/k6-tenant-load.js to hit 5k concurrent tenant-labelled requests (defaults to read/write 90/10, header X-StellaOps-Tenant).
  • Execute JWKS outage chaos via ops/devops/tenant/jwks-chaos.sh on an isolated agent with sudo/iptables; watch alerts jwks_cache_miss_spike and tenant_auth_failures_spike while load is active.

CI smoke checks

The .gitea/workflows/build-test-deploy.yml pipeline includes a notify-smoke stage that validates scanner event propagation after staging deployments. Configure the following repository secrets (or environment-level secrets) so the job can connect to Redis and the Notify API:

  • NOTIFY_SMOKE_REDIS_DSN Redis connection string (redis://user:pass@host:port/db).
  • NOTIFY_SMOKE_NOTIFY_BASEURL Base URL for the staging Notify WebService (e.g. https://notify.stage.stella-ops.internal).
  • NOTIFY_SMOKE_NOTIFY_TOKEN OAuth bearer token (service account) with permission to read deliveries.
  • NOTIFY_SMOKE_NOTIFY_TENANT Tenant identifier used for the smoke validation requests.
  • (Optional) NOTIFY_SMOKE_NOTIFY_TENANT_HEADER Override for the tenant header name (defaults to X-StellaOps-Tenant).

Define the following repository variables (or secrets) to drive the assertions performed by the smoke check:

  • NOTIFY_SMOKE_EXPECT_KINDS Comma-separated event kinds the checker must observe (for example scanner.report.ready,scanner.scan.completed).
  • NOTIFY_SMOKE_LOOKBACK_MINUTES Time window (in minutes) used when scanning the Redis stream for recent events (for example 30).

All of the above values are required—the workflow fails fast with a descriptive error if any are missing or empty. Provide the variables at the organisation or repository scope before enabling the smoke stage.