# StellaOps Telemetry Telemetry module captures deployment and operations guidance for the shared observability stack (collectors, storage, dashboards). ## Latest updates (2025-11-30) - Sprint tracker `docs/implplan/SPRINT_0330_0001_0001_docs_modules_telemetry.md` and module `TASKS.md` added to mirror status. - Observability runbook stub + dashboard placeholder added under `operations/` (offline import). - Storage/isolation posture references updated; align with platform docs. ## Responsibilities - Deploy and operate OpenTelemetry collectors for StellaOps services. - Provide storage configuration for Prometheus/Tempo/Loki stacks. - Document smoke tests and offline bootstrapping steps. - Align metrics and alert packs with module SLOs. ## Key components - Collector deployment guide (./operations/collector.md). - Storage deployment guide (./operations/storage.md). - Smoke tooling in `ops/devops/telemetry/`. ## Integrations & dependencies - DevOps pipelines for packaging telemetry bundles. - Module-specific dashboards (scheduler, scanner, etc.). - Security/Compliance for retention policies. ## Operational notes - Smoke script references (../../ops/devops/telemetry). - Bundle packaging instructions in ops/devops/telemetry. - Sprint 23 console security sign-off (2025-10-27) added the `console-security.json` Grafana board and burn-rate alert pack—ensure environments import the updated dashboards/alerts referenced in `docs/updates/2025-10-27-console-security-signoff.md`. - Observability assets for this sprint: `operations/observability.md` and `operations/dashboards/telemetry-observability.json` (offline import). ## Related resources - ./operations/collector.md - ./operations/storage.md ## Backlog references - TELEMETRY-OBS-50-001 … 50-004 in ../../TASKS.md. - Collector/storage automation tracked in ops/devops/TASKS.md. ## Implementation Status ### Phase 1 – Collector & pipeline profiles (In Progress) - OpenTelemetry collector configs: default, forensic, airgap profiles - Ingest gateways with TLS/mTLS support - Attribute redaction policies and tenant isolation - CLI automation: stella telemetry deploy, stella telemetry profile diff ### Phase 2 – Storage backends & retention (Planned) - Prometheus/Tempo/Loki deployment with retention tiers - Bucket/object storage with deterministic manifest generation - Sealed-mode allowlists and offline bundle support - Remote-write configuration and archivers ### Phase 3 – Incident mode & forensic capture (Planned) - Incident toggles via CLI/API for sampling adjustments - Tail sampling to 100% during incidents - Forensic bundle generation: OTLP archives with manifest/signature - Notify hooks for incident escalation ### Phase 4 – Observability dashboards & automation (Planned) - Service SLO dashboards: queue depth, policy latency, ingestion violations - Alert rules: burn-rate, collector failure, exporter backlog - Grafana packages for core services - Self-observability metrics ### Phase 5 – Offline & compliance (Planned) - Offline Kit artifacts: collector binaries/configs, import scripts - Deterministic bundles with signed manifests - Replay tooling and compliance checklists - File-based exporters for air-gapped environments ### Phase 6 – Hardening & SOC handoff (Planned) - RBAC integration and audit logging - Incident response runbooks and performance tuning - Integration tests across services - SOC handoff package with control objectives ### Key Acceptance Criteria - Collectors ingest metrics/logs/traces with redaction rules and tenant isolation - Storage backends retain data per SLAs with deterministic manifests - Incident mode triggers forensic capture with signed bundles - Dashboards/alerts cover service SLOs and telemetry stack health - CLI automates config rollout, forensic capture, verification - Offline bundles replay telemetry in sealed environments ### Technical Decisions & Risks - PII leakage prevented via strict redaction processors, policy-managed allowlists - Collector overload managed with horizontal scaling, batching, circuit breakers - Storage cost controlled via tiered retention, compression, pruning, offline archiving - Air-gap drift mitigated with offline kit refresh schedule, manifest verification - Alert fatigue reduced with burn-rate alerts, deduping, SOC runbooks ### Operational Assets (Sprint 0330 · 2025-11-30) - Observability runbook: operations/observability.md - Dashboard placeholder: operations/dashboards/telemetry-observability.json - Console security dashboard: console-security.json (Sprint 23) - Burn-rate alert pack for environments ## Epic alignment - **Epic 15 – Observability & Forensics:** deliver collector/storage deployments, forensic evidence retention, and observability bundles with deterministic configuration.