devops folders consolidate
This commit is contained in:
@@ -2,34 +2,44 @@
|
||||
|
||||
This directory contains operational tooling, deployment configurations, and CI/CD support for StellaOps.
|
||||
|
||||
## Infrastructure Stack
|
||||
|
||||
| Component | Technology | Purpose |
|
||||
|-----------|------------|---------|
|
||||
| Database | PostgreSQL 18.1 | Primary data store |
|
||||
| Messaging/Cache | Valkey 9.0.1 | Queues, caching, pub/sub |
|
||||
| Object Storage | RustFS | S3-compatible storage |
|
||||
| Transparency Log | Rekor v2 | Sigstore transparency |
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
devops/
|
||||
├── ansible/ # Ansible playbooks for deployment automation
|
||||
├── compose/ # Docker Compose configurations
|
||||
├── compose/ # Docker Compose configurations (consolidated)
|
||||
│ ├── docker-compose.stella-ops.yml # Main stack
|
||||
│ ├── docker-compose.telemetry.yml # Observability stack
|
||||
│ ├── docker-compose.testing.yml # CI/testing services
|
||||
│ └── docker-compose.compliance-*.yml # Regional crypto overlays
|
||||
├── database/ # Database schemas and migrations
|
||||
│ ├── mongo/ # MongoDB (deprecated)
|
||||
│ └── postgres/ # PostgreSQL schemas
|
||||
│ ├── migrations/ # Schema migration scripts
|
||||
│ └── postgres/ # PostgreSQL configuration
|
||||
├── docker/ # Dockerfiles and container build scripts
|
||||
│ ├── Dockerfile.ci # CI runner environment
|
||||
│ └── base/ # Base images
|
||||
│ └── repro-builders/ # Reproducible build containers
|
||||
├── docs/ # This documentation
|
||||
├── gitlab/ # GitLab CI templates (legacy)
|
||||
├── helm/ # Helm charts for Kubernetes deployment
|
||||
│ └── stellaops/ # Main Helm chart with env-specific values
|
||||
├── logging/ # Logging configuration templates
|
||||
│ ├── serilog.json.template # Serilog config for .NET services
|
||||
│ ├── filebeat.yml # Filebeat for log shipping
|
||||
│ └── logrotate.conf # Log rotation configuration
|
||||
├── observability/ # Monitoring, metrics, and tracing
|
||||
├── observability/ # Monitoring, alerting, and dashboards
|
||||
├── offline/ # Air-gap deployment support
|
||||
│ ├── airgap/ # Air-gap bundle scripts
|
||||
│ └── kit/ # Offline installation kit
|
||||
├── releases/ # Release artifacts and manifests
|
||||
├── scripts/ # Operational scripts
|
||||
├── scripts/ # Operational scripts and libraries
|
||||
├── services/ # Per-service operational configs
|
||||
├── telemetry/ # OpenTelemetry and metrics configs
|
||||
└── tools/ # DevOps tooling
|
||||
├── telemetry/ # OpenTelemetry collector and storage
|
||||
└── tools/ # DevOps tooling and helpers
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -9,8 +9,8 @@ This directory contains deterministic deployment bundles for the core Stella Ops
|
||||
- `compose/docker-compose.mirror.yaml` – managed mirror bundle for `*.stella-ops.org` with gateway cache and multi-tenant auth.
|
||||
- `compose/docker-compose.telemetry.yaml` – optional OpenTelemetry collector overlay (mutual TLS, OTLP pipelines).
|
||||
- `compose/docker-compose.telemetry-storage.yaml` – optional Prometheus/Tempo/Loki stack for observability backends.
|
||||
- `helm/stellaops/` – multi-profile Helm chart with values files for dev/stage/airgap.
|
||||
- `helm/stellaops/INSTALL.md` – install/runbook for prod and airgap profiles with digest pins.
|
||||
- `helm/stellaops/` – multi-profile Helm chart with values files for dev/stage/airgap.
|
||||
- `helm/stellaops/INSTALL.md` – install/runbook for prod and airgap profiles with digest pins.
|
||||
- `telemetry/` – shared OpenTelemetry collector configuration and certificate artefacts (generated via tooling).
|
||||
- `tools/validate-profiles.sh` – helper that runs `docker compose config` and `helm lint/template` for every profile.
|
||||
|
||||
@@ -24,37 +24,30 @@ This directory contains deterministic deployment bundles for the core Stella Ops
|
||||
`python ./ops/devops/telemetry/smoke_otel_collector.py` to verify the OTLP endpoints.
|
||||
5. Commit the change alongside any documentation updates (e.g. install guide cross-links).
|
||||
|
||||
Maintaining the digest linkage keeps offline/air-gapped installs reproducible and avoids tag drift between environments.
|
||||
|
||||
### Surface.Env rollout warnings
|
||||
|
||||
- Compose (`deploy/compose/env/*.env.example`) and Helm (`deploy/helm/stellaops/values-*.yaml`) now seed `SCANNER_SURFACE_*` _and_ `ZASTAVA_SURFACE_*` variables so Scanner Worker/WebService and Zastava Observer/Webhook resolve cache roots, Surface.FS endpoints, and secrets providers through `StellaOps.Scanner.Surface.Env`.
|
||||
- During rollout, watch for structured log messages (and readiness output) prefixed with `surface.env.`—for example, `surface.env.cache_root_missing`, `surface.env.endpoint_unreachable`, or `surface.env.secrets_provider_invalid`.
|
||||
- Treat these warnings as deployment blockers: update the endpoint/cache/secrets values or permissions before promoting the environment, otherwise workers will fail fast at startup.
|
||||
- Air-gapped bundles default the secrets provider to `file` with `/etc/stellaops/secrets`; connected clusters default to `kubernetes`. Adjust the provider/root pair if your secrets manager differs.
|
||||
- Secret provisioning workflows for Kubernetes/Compose/Offline Kit are documented in `ops/devops/secrets/surface-secrets-provisioning.md`; follow that for `Surface.Secrets` handles and RBAC/permissions.
|
||||
|
||||
### Mongo2Go OpenSSL prerequisites
|
||||
|
||||
- Linux runners that execute Mongo2Go-backed suites (Excititor, Scheduler, Graph, etc.) must expose OpenSSL 1.1 (`libcrypto.so.1.1`, `libssl.so.1.1`). The canonical copies live under `tests/native/openssl-1.1/linux-x64`.
|
||||
- Export `LD_LIBRARY_PATH="$(git rev-parse --show-toplevel)/tests/native/openssl-1.1/linux-x64:${LD_LIBRARY_PATH:-}"` before invoking `dotnet test`. Example:\
|
||||
`LD_LIBRARY_PATH="$(pwd)/tests/native/openssl-1.1/linux-x64" dotnet test src/Excititor/__Tests/StellaOps.Excititor.WebService.Tests/StellaOps.Excititor.WebService.Tests.csproj --nologo`.
|
||||
- CI agents or Dockerfiles that host these tests should either mount the directory into the container or copy the two `.so` files into a directory that is already on the runtime library path.
|
||||
|
||||
### Additional tooling
|
||||
|
||||
- `deploy/tools/check-channel-alignment.py` – verifies that Helm/Compose profiles reference the exact images listed in a release manifest. Run it for each channel before promoting a release.
|
||||
- `ops/devops/telemetry/generate_dev_tls.sh` – produces local CA/server/client certificates for Compose-based collector testing.
|
||||
- `ops/devops/telemetry/smoke_otel_collector.py` – sends OTLP traffic and asserts the collector accepted traces, metrics, and logs.
|
||||
- `ops/devops/telemetry/package_offline_bundle.py` – packages telemetry assets (config/Helm/Compose) into a signed tarball for air-gapped installs.
|
||||
- `docs/modules/devops/runbooks/deployment-upgrade.md` – end-to-end instructions for upgrade, rollback, and channel promotion workflows (Helm + Compose).
|
||||
|
||||
### Tenancy observability & chaos (DEVOPS-TEN-49-001)
|
||||
|
||||
- Import `ops/devops/tenant/recording-rules.yaml` and `ops/devops/tenant/alerts.yaml` into your Prometheus rule groups.
|
||||
- Add Grafana dashboard `ops/devops/tenant/dashboards/tenant-audit.json` (folder `StellaOps / Tenancy`) to watch latency/error/auth cache ratios per tenant/service.
|
||||
- Run the multi-tenant k6 harness `ops/devops/tenant/k6-tenant-load.js` to hit 5k concurrent tenant-labelled requests (defaults to read/write 90/10, header `X-StellaOps-Tenant`).
|
||||
- Execute JWKS outage chaos via `ops/devops/tenant/jwks-chaos.sh` on an isolated agent with sudo/iptables; watch alerts `jwks_cache_miss_spike` and `tenant_auth_failures_spike` while load is active.
|
||||
Maintaining the digest linkage keeps offline/air-gapped installs reproducible and avoids tag drift between environments.
|
||||
|
||||
### Surface.Env rollout warnings
|
||||
|
||||
- Compose (`deploy/compose/env/*.env.example`) and Helm (`deploy/helm/stellaops/values-*.yaml`) now seed `SCANNER_SURFACE_*` _and_ `ZASTAVA_SURFACE_*` variables so Scanner Worker/WebService and Zastava Observer/Webhook resolve cache roots, Surface.FS endpoints, and secrets providers through `StellaOps.Scanner.Surface.Env`.
|
||||
- During rollout, watch for structured log messages (and readiness output) prefixed with `surface.env.`—for example, `surface.env.cache_root_missing`, `surface.env.endpoint_unreachable`, or `surface.env.secrets_provider_invalid`.
|
||||
- Treat these warnings as deployment blockers: update the endpoint/cache/secrets values or permissions before promoting the environment, otherwise workers will fail fast at startup.
|
||||
- Air-gapped bundles default the secrets provider to `file` with `/etc/stellaops/secrets`; connected clusters default to `kubernetes`. Adjust the provider/root pair if your secrets manager differs.
|
||||
- Secret provisioning workflows for Kubernetes/Compose/Offline Kit are documented in `ops/devops/secrets/surface-secrets-provisioning.md`; follow that for `Surface.Secrets` handles and RBAC/permissions.
|
||||
|
||||
### Additional tooling
|
||||
|
||||
- `deploy/tools/check-channel-alignment.py` – verifies that Helm/Compose profiles reference the exact images listed in a release manifest. Run it for each channel before promoting a release.
|
||||
- `ops/devops/telemetry/generate_dev_tls.sh` – produces local CA/server/client certificates for Compose-based collector testing.
|
||||
- `ops/devops/telemetry/smoke_otel_collector.py` – sends OTLP traffic and asserts the collector accepted traces, metrics, and logs.
|
||||
- `ops/devops/telemetry/package_offline_bundle.py` – packages telemetry assets (config/Helm/Compose) into a signed tarball for air-gapped installs.
|
||||
- `docs/modules/devops/runbooks/deployment-upgrade.md` – end-to-end instructions for upgrade, rollback, and channel promotion workflows (Helm + Compose).
|
||||
|
||||
### Tenancy observability & chaos (DEVOPS-TEN-49-001)
|
||||
|
||||
- Import `ops/devops/tenant/recording-rules.yaml` and `ops/devops/tenant/alerts.yaml` into your Prometheus rule groups.
|
||||
- Add Grafana dashboard `ops/devops/tenant/dashboards/tenant-audit.json` (folder `StellaOps / Tenancy`) to watch latency/error/auth cache ratios per tenant/service.
|
||||
- Run the multi-tenant k6 harness `ops/devops/tenant/k6-tenant-load.js` to hit 5k concurrent tenant-labelled requests (defaults to read/write 90/10, header `X-StellaOps-Tenant`).
|
||||
- Execute JWKS outage chaos via `ops/devops/tenant/jwks-chaos.sh` on an isolated agent with sudo/iptables; watch alerts `jwks_cache_miss_spike` and `tenant_auth_failures_spike` while load is active.
|
||||
|
||||
## CI smoke checks
|
||||
|
||||
|
||||
Reference in New Issue
Block a user