# Aggregation Observability Last updated: 2025-11-25 (Docs Tasks Md.V · DOCS-LNM-22-007) Covers metrics, traces, and logs for Link-Not-Merge (LNM) aggregation and evidence pipelines. ## Metrics - `aggregation_ingest_latency_seconds` (histogram) — end-to-end ingest per statement; labels: `tenant`, `source`, `status`. - `aggregation_conflict_total` (counter) — conflicts encountered; labels: `tenant`, `advisory`, `product`, `reason`. - `aggregation_overlay_cache_hits_total` / `_misses_total` — overlay cache effectiveness; labels: `tenant`, `cache`. - `aggregation_vex_gate_total` — VEX gating outcomes; labels: `tenant`, `status` (`affected`, `not_affected`, `unknown`). - `aggregation_queue_depth` (gauge) — pending statements per tenant. ## Traces - Span name `aggregation.process` with attributes: - `tenant`, `advisory`, `product`, `vex_status`, `source_kind` - `overlay_version`, `cache_hit` (bool) - Link to upstream ingest span (`traceparent` forwarded by Excititor/Concelier). - Export to OTLP; sampling default 10% outside prod, 100% for `status=error`. ## Logs Structured JSON with fields: `tenant`, `advisory`, `product`, `vex_status`, `decision` (`merged|suppressed|dropped`), `reason`, `duration_ms`, `trace_id`. ## SLOs - **Ingest latency**: p95 < 500ms per statement (steady state). - **Cache hit rate**: >80% for overlays; alerts when below for 15 minutes. - **Error rate**: <0.1% over 10 minute window. ## Alerts - `HighConflictRate` — `aggregation_conflict_total` delta > 100/minute per tenant. - `QueueBacklog` — `aggregation_queue_depth` > 10k for 5 minutes. - `LowCacheHit` — overlay cache hit rate < 60% for 10 minutes. ## Offline/air-gap considerations - Export metrics to local Prometheus scrape; no external sinks. - Trace sampling and log retention configured via environment without needing control-plane access. - Deterministic ordering preserved; cache warmers seeded from bundled fixtures.