- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
		
			
				
	
	
	
		
			4.8 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Telemetry Collector Deployment Guide
Scope: DevOps Guild, Observability Guild, and operators enabling the StellaOps telemetry pipeline (DEVOPS-OBS-50-001 / DEVOPS-OBS-50-003).
This guide describes how to deploy the default OpenTelemetry Collector packaged with Stella Ops, validate its ingest endpoints, and prepare an offline-ready bundle for air-gapped environments.
1. Overview
The collector terminates OTLP traffic from Stella Ops services and exports metrics, traces, and logs.
| Endpoint | Purpose | TLS | Authentication | 
|---|---|---|---|
| :4317 | OTLP gRPC ingest | mTLS | Client certificate issued by collector CA | 
| :4318 | OTLP HTTP ingest | mTLS | Client certificate issued by collector CA | 
| :9464 | Prometheus scrape | mTLS | Same client certificate | 
| :13133 | Health check | mTLS | Same client certificate | 
| :1777 | pprof diagnostics | mTLS | Same client certificate | 
The default configuration lives at deploy/telemetry/otel-collector-config.yaml and mirrors the Helm values in the stellaops chart.
2. Local validation (Compose)
# 1. Generate dev certificates (CA + collector + client)
./ops/devops/telemetry/generate_dev_tls.sh
# 2. Start the collector overlay
cd deploy/compose
docker compose -f docker-compose.telemetry.yaml up -d
# 3. Start the storage overlay (Prometheus, Tempo, Loki)
docker compose -f docker-compose.telemetry-storage.yaml up -d
# 4. Run the smoke test (OTLP HTTP)
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
The smoke test posts sample traces, metrics, and logs and verifies that the collector increments the otelcol_receiver_accepted_* counters exposed via the Prometheus exporter. The storage overlay gives you a local Prometheus/Tempo/Loki stack to confirm end-to-end wiring. The same client certificate can be used by local services to weave traces together. See Telemetry Storage Deployment for the storage configuration guidelines used in staging/production.
3. Kubernetes deployment
Enable the collector in Helm by setting the following values (example shown for the dev profile):
telemetry:
  collector:
    enabled: true
    defaultTenant: <tenant>
    tls:
      secretName: stellaops-otel-tls-<env>
Provide a Kubernetes secret named stellaops-otel-tls-<env> (for staging: stellaops-otel-tls-stage) with the keys tls.crt, tls.key, and ca.crt. The secret must contain the collector certificate, private key, and issuing CA respectively. Example:
kubectl create secret generic stellaops-otel-tls-stage \
  --from-file=tls.crt=collector.crt \
  --from-file=tls.key=collector.key \
  --from-file=ca.crt=ca.crt
Helm renders the collector deployment, service, and config map automatically:
helm upgrade --install stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-dev.yaml
Update client workloads to trust ca.crt and present client certificates that chain back to the same CA.
4. Offline packaging (DEVOPS-OBS-50-003)
Use the packaging helper to produce a tarball that can be mirrored inside the Offline Kit or air-gapped sites:
python ops/devops/telemetry/package_offline_bundle.py --output out/telemetry/telemetry-bundle.tar.gz
The script gathers:
- deploy/telemetry/README.md
- Collector configuration (deploy/telemetry/otel-collector-config.yamland Helm copy)
- Helm template/values for the collector
- Compose overlay (deploy/compose/docker-compose.telemetry.yaml)
The tarball ships with a .sha256 checksum. To attach a Cosign signature, add --sign and provide COSIGN_KEY_REF/COSIGN_IDENTITY_TOKEN env vars (or use the --cosign-key flag).
Distribute the bundle alongside certificates generated by your PKI. For air-gapped installs, regenerate certificates inside the enclave and recreate the stellaops-otel-tls secret.
5. Operational checks
- Health probes – kubectl execinto the collector pod and runcurl -fsSk --cert client.crt --key client.key --cacert ca.crt https://127.0.0.1:13133/healthz.
- Metrics scrape – confirm Prometheus ingests otelcol_receiver_accepted_*counters.
- Trace correlation – ensure services propagate trace_idandtenant.idattributes; refer todocs/observability/observability.mdfor expected spans.
- Certificate rotation – when rotating the CA, update the secret and restart the collector; roll out new client certificates before enabling require_client_certificateif staged.
6. Related references
- deploy/telemetry/README.md– source configuration and local workflow.
- ops/devops/telemetry/smoke_otel_collector.py– OTLP smoke test.
- docs/observability/observability.md– metrics/traces/logs taxonomy.
- docs/13_RELEASE_ENGINEERING_PLAYBOOK.md– release checklist for telemetry assets.