Some checks failed
		
		
	
	Docs CI / lint-and-preview (push) Has been cancelled
				
			- Implemented PolicyDslValidator with command-line options for strict mode and JSON output. - Created PolicySchemaExporter to generate JSON schemas for policy-related models. - Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes. - Added project files and necessary dependencies for each tool. - Ensured proper error handling and usage instructions across tools.
		
			
				
	
	
		
			114 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			114 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Telemetry Collector Deployment Guide
 | ||
| 
 | ||
| > **Scope:** DevOps Guild, Observability Guild, and operators enabling the StellaOps telemetry pipeline (DEVOPS-OBS-50-001 / DEVOPS-OBS-50-003).
 | ||
| 
 | ||
| This guide describes how to deploy the default OpenTelemetry Collector packaged with Stella Ops, validate its ingest endpoints, and prepare an offline-ready bundle for air-gapped environments.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 1. Overview
 | ||
| 
 | ||
| The collector terminates OTLP traffic from Stella Ops services and exports metrics, traces, and logs.
 | ||
| 
 | ||
| | Endpoint | Purpose | TLS | Authentication |
 | ||
| | -------- | ------- | --- | -------------- |
 | ||
| | `:4317`  | OTLP gRPC ingest | mTLS | Client certificate issued by collector CA |
 | ||
| | `:4318`  | OTLP HTTP ingest | mTLS | Client certificate issued by collector CA |
 | ||
| | `:9464`  | Prometheus scrape | mTLS | Same client certificate |
 | ||
| | `:13133` | Health check | mTLS | Same client certificate |
 | ||
| | `:1777`  | pprof diagnostics | mTLS | Same client certificate |
 | ||
| 
 | ||
| The default configuration lives at `deploy/telemetry/otel-collector-config.yaml` and mirrors the Helm values in the `stellaops` chart.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 2. Local validation (Compose)
 | ||
| 
 | ||
| ```bash
 | ||
| # 1. Generate dev certificates (CA + collector + client)
 | ||
| ./ops/devops/telemetry/generate_dev_tls.sh
 | ||
| 
 | ||
| # 2. Start the collector overlay
 | ||
| cd deploy/compose
 | ||
| docker compose -f docker-compose.telemetry.yaml up -d
 | ||
| 
 | ||
| # 3. Start the storage overlay (Prometheus, Tempo, Loki)
 | ||
| docker compose -f docker-compose.telemetry-storage.yaml up -d
 | ||
| 
 | ||
| # 4. Run the smoke test (OTLP HTTP)
 | ||
| python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
 | ||
| ```
 | ||
| 
 | ||
| The smoke test posts sample traces, metrics, and logs and verifies that the collector increments the `otelcol_receiver_accepted_*` counters exposed via the Prometheus exporter. The storage overlay gives you a local Prometheus/Tempo/Loki stack to confirm end-to-end wiring. The same client certificate can be used by local services to weave traces together. See [`Telemetry Storage Deployment`](telemetry-storage.md) for the storage configuration guidelines used in staging/production.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 3. Kubernetes deployment
 | ||
| 
 | ||
| Enable the collector in Helm by setting the following values (example shown for the dev profile):
 | ||
| 
 | ||
| ```yaml
 | ||
| telemetry:
 | ||
|   collector:
 | ||
|     enabled: true
 | ||
|     defaultTenant: <tenant>
 | ||
|     tls:
 | ||
|       secretName: stellaops-otel-tls-<env>
 | ||
| ```
 | ||
| 
 | ||
| Provide a Kubernetes secret named `stellaops-otel-tls-<env>` (for staging: `stellaops-otel-tls-stage`) with the keys `tls.crt`, `tls.key`, and `ca.crt`. The secret must contain the collector certificate, private key, and issuing CA respectively. Example:
 | ||
| 
 | ||
| ```bash
 | ||
| kubectl create secret generic stellaops-otel-tls-stage \
 | ||
|   --from-file=tls.crt=collector.crt \
 | ||
|   --from-file=tls.key=collector.key \
 | ||
|   --from-file=ca.crt=ca.crt
 | ||
| ```
 | ||
| 
 | ||
| Helm renders the collector deployment, service, and config map automatically:
 | ||
| 
 | ||
| ```bash
 | ||
| helm upgrade --install stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-dev.yaml
 | ||
| ```
 | ||
| 
 | ||
| Update client workloads to trust `ca.crt` and present client certificates that chain back to the same CA.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 4. Offline packaging (DEVOPS-OBS-50-003)
 | ||
| 
 | ||
| Use the packaging helper to produce a tarball that can be mirrored inside the Offline Kit or air-gapped sites:
 | ||
| 
 | ||
| ```bash
 | ||
| python ops/devops/telemetry/package_offline_bundle.py --output out/telemetry/telemetry-bundle.tar.gz
 | ||
| ```
 | ||
| 
 | ||
| The script gathers:
 | ||
| 
 | ||
| - `deploy/telemetry/README.md`
 | ||
| - Collector configuration (`deploy/telemetry/otel-collector-config.yaml` and Helm copy)
 | ||
| - Helm template/values for the collector
 | ||
| - Compose overlay (`deploy/compose/docker-compose.telemetry.yaml`)
 | ||
| 
 | ||
| The tarball ships with a `.sha256` checksum. To attach a Cosign signature, add `--sign` and provide `COSIGN_KEY_REF`/`COSIGN_IDENTITY_TOKEN` env vars (or use the `--cosign-key` flag).
 | ||
| 
 | ||
| Distribute the bundle alongside certificates generated by your PKI. For air-gapped installs, regenerate certificates inside the enclave and recreate the `stellaops-otel-tls` secret.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 5. Operational checks
 | ||
| 
 | ||
| 1. **Health probes** – `kubectl exec` into the collector pod and run `curl -fsSk --cert client.crt --key client.key --cacert ca.crt https://127.0.0.1:13133/healthz`.
 | ||
| 2. **Metrics scrape** – confirm Prometheus ingests `otelcol_receiver_accepted_*` counters.
 | ||
| 3. **Trace correlation** – ensure services propagate `trace_id` and `tenant.id` attributes; refer to `docs/observability/observability.md` for expected spans.
 | ||
| 4. **Certificate rotation** – when rotating the CA, update the secret and restart the collector; roll out new client certificates before enabling `require_client_certificate` if staged.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 6. Related references
 | ||
| 
 | ||
| - `deploy/telemetry/README.md` – source configuration and local workflow.
 | ||
| - `ops/devops/telemetry/smoke_otel_collector.py` – OTLP smoke test.
 | ||
| - `docs/observability/observability.md` – metrics/traces/logs taxonomy.
 | ||
| - `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release checklist for telemetry assets.
 |