Files
git.stella-ops.org/docs/modules/telemetry/architecture.md
master 2eb6852d34
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests for SBOM ingestion and transformation
- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly.
- Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps.
- Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges.
- Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges.
- Set up project file for the test project with necessary dependencies and configurations.
- Include JSON fixture files for testing purposes.
2025-11-04 07:49:39 +02:00

3.2 KiB
Raw Blame History

Telemetry architecture

Derived from Epic15 Observability & Forensics; details collector topology, storage profiles, forensic pipelines, and offline packaging.

1) Topology

  • Collector tier. OpenTelemetry Collector instances deployed per environment (ingest TLS, GRPC/OTLP receivers, tail-based sampling). Config packages delivered via Offline Kit.
  • Processing pipelines. Pipelines for traces, metrics, logs with processors (batch, tail sampling, attributes redaction, resource detection). Profiles: default, forensic (high-retention), airgap (file-based exporters).
  • Exporters. OTLP to Prometheus/Tempo/Loki (online) or file/OTLP-HTTP to Offline Kit staging (air-gapped). Exporters are allow-listed to satisfy Sovereign readiness.

2) Storage

  • Prometheus for metrics with remote-write support and retention windows (default 30days, forensic 180days).
  • Tempo (or Jaeger all-in-one) for traces with block storage backend (S3-compatible or filesystem) and deterministic chunk manifests.
  • Loki for logs stored in immutable chunks; index shards hashed for reproducibility.
  • Forensic archive — periodic export of raw OTLP records into signed bundles (otlp/metrics.pb, otlp/traces.pb, otlp/logs.pb, manifest.json).

3) Pipelines & Guardrails

  • Redaction. Attribute processors strip PII/secrets based on policy-managed allowed keys. Redaction profiles mirrored in Offline Kit.
  • Sampling. Tail sampling by service/error; incident mode (triggered by Orchestrator) promotes services to 100% sampling, extends retention, and toggles Notify alerts.
  • Alerting. Prometheus rules/Dashboards packaged with Export Center: service SLOs, queue depth, policy run latency, ingestion AOC violations.
  • Sealed-mode guard. StellaOps.Telemetry.Core enforces IEgressPolicy on OTLP exporters; when air-gap mode is sealed any non-loopback collector endpoints are automatically disabled and a structured warning with remediation is emitted.

4) APIs & integration

  • GET /telemetry/config/profile/{name} — download collector config bundle (YAML + signature).
  • POST /telemetry/incidents/mode — toggle incident sampling + forensic bundle generation.
  • GET /telemetry/exports/forensic/{window} — stream signed OTLP bundles for compliance.
  • CLI commands: stella telemetry deploy --profile default, stella telemetry capture --window 24h --out bundle.tar.gz.

5) Offline support

  • Offline Kit ships collector binaries/config, bootstrap scripts, dashboards, alert rules, and OTLP replay tooling. Bundles include manifest.json with digests, DSSE signatures, and instructions.
  • For offline environments, exporters write to local filesystem; operators transfer bundles to analysis workstation using signed manifests.

6) Observability of telemetry stack

  • Meta-metrics: collector_export_failures_total, telemetry_bundle_generation_seconds, telemetry_incident_mode{state}.
  • Health endpoints for collectors and storage clusters, plus dashboards for ingestion rate, retention, rule evaluations.

Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.