devops folders consolidate

This commit is contained in:
master
2026-01-25 23:27:41 +02:00
parent 6e687b523a
commit a50bbb38ef
334 changed files with 35079 additions and 5569 deletions

View File

@@ -1,46 +1,39 @@
# Orchestrator Infra Bootstrap (DEVOPS-ORCH-32-001)
## Components
- Postgres 16 (state/config)
- Mongo 7 (job ledger history)
- NATS 2.10 JetStream (queue/bus)
- PostgreSQL 18.1 (state/config/job ledger)
- Valkey 9.0.1 (queue/bus/cache)
Compose file: `ops/devops/orchestrator/docker-compose.orchestrator.yml`
Compose file: `devops/compose/docker-compose.stella-ops.yml`
## Quick start (offline-friendly)
```bash
# bring up infra
COMPOSE_FILE=ops/devops/orchestrator/docker-compose.orchestrator.yml docker compose up -d
docker compose -f devops/compose/docker-compose.stella-ops.yml up -d stellaops-postgres stellaops-valkey
# smoke check and emit connection strings
scripts/orchestrator/smoke.sh
devops/tools/orchestrator-scripts/smoke.sh
cat out/orchestrator-smoke/readiness.txt
# synthetic probe (postgres/mongo/nats health)
scripts/orchestrator/probe.sh
# synthetic probe (postgres/valkey health)
devops/tools/orchestrator-scripts/probe.sh
cat out/orchestrator-probe/status.txt
# replay readiness (restart then smoke)
scripts/orchestrator/replay-smoke.sh
```
Connection strings
- Postgres: `postgres://orch:orchpass@localhost:55432/orchestrator`
- Mongo: `mongodb://localhost:57017`
- NATS: `nats://localhost:4222`
- Postgres: `postgres://stellaops:stellaops@localhost:5432/stellaops`
- Valkey: `valkey://localhost:6379`
## Observability
- Alerts: `ops/devops/orchestrator/alerts.yaml`
- Grafana dashboard: `ops/devops/orchestrator/grafana/orchestrator-overview.json`
- Alerts: `devops/observability/alerting/`
- Grafana dashboard: `devops/observability/dashboards/`
- Metrics expected: `job_queue_depth`, `job_failures_total`, `lease_extensions_total`, `job_latency_seconds_bucket`.
- Runbook: `ops/devops/orchestrator/incident-response.md`
- Synthetic probes: `scripts/orchestrator/probe.sh` (writes `out/orchestrator-probe/status.txt`).
- Replay smoke: `scripts/orchestrator/replay-smoke.sh` (idempotent restart + smoke).
- Synthetic probes: `devops/tools/orchestrator-scripts/probe.sh` (writes `out/orchestrator-probe/status.txt`).
## CI hook (suggested)
Add a workflow step (or local cron) to run `scripts/orchestrator/smoke.sh` with `SKIP_UP=1` against existing infra and publish the `readiness.txt` artifact for traceability.
Add a workflow step (or local cron) to run `devops/tools/orchestrator-scripts/smoke.sh` with `SKIP_UP=1` against existing infra and publish the `readiness.txt` artifact for traceability.
## Notes
- Uses fixed ports for determinism; adjust via COMPOSE overrides if needed.
- Data volumes: `orch_pg_data`, `orch_mongo_data` (docker volumes).
- Data volumes: `stellaops-postgres`, `stellaops-valkey` (docker volumes).
- No external downloads beyond base images; pin images to specific tags above.

View File

@@ -1,4 +1,14 @@
version: "3.9"
# =============================================================================
# ORCHESTRATOR - LOCAL DEVELOPMENT INFRASTRUCTURE
# =============================================================================
# Infrastructure services for Orchestrator local development.
#
# Usage:
# docker compose -f docker-compose.orchestrator.yml up -d
#
# For production, use compose/docker-compose.stella-ops.yml instead.
# =============================================================================
services:
orchestrator-postgres:
image: postgres:18.1-alpine
@@ -17,28 +27,15 @@ services:
retries: 5
restart: unless-stopped
orchestrator-mongo:
image: mongo:7
command: ["mongod", "--quiet", "--storageEngine=wiredTiger"]
orchestrator-valkey:
image: valkey/valkey:9.0.1-alpine
ports:
- "57017:27017"
- "56379:6379"
command: ["valkey-server", "--appendonly", "yes"]
volumes:
- orch_mongo_data:/data/db
- orch_valkey_data:/data
healthcheck:
test: ["CMD", "mongosh", "--quiet", "--eval", "db.adminCommand('ping')"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
orchestrator-nats:
image: nats:2.10-alpine
ports:
- "5422:4222"
- "5822:8222"
command: ["-js", "-m", "8222"]
healthcheck:
test: ["CMD", "nats", "--server", "localhost:4222", "ping"]
test: ["CMD", "valkey-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
@@ -46,5 +43,4 @@ services:
volumes:
orch_pg_data:
orch_mongo_data:
orch_valkey_data: