# Excititor Observability Guide > Added 2025-11-14 alongside Sprint 119 (`EXCITITOR-AIAI-31-003`). Complements the AirGap/mirror runbooks under the same folder. Excititor’s evidence APIs now emit first-class OpenTelemetry metrics so Lens, Advisory AI, and Ops can detect misuse or missing provenance without paging through logs. This document lists the counters/histograms shipped by the WebService (`src/Excititor/StellaOps.Excititor.WebService`) and how to hook them into your exporters/dashboards. ## Telemetry prerequisites - Enable `Excititor:Telemetry` in the service configuration (`appsettings.*`), ensuring **metrics** export is on. The WebService automatically adds the evidence meter (`StellaOps.Excititor.WebService.Evidence`) alongside the ingestion meter. - Deploy at least one OTLP or console exporter (see `TelemetryExtensions.ConfigureExcititorTelemetry`). If your region lacks OTLP transport, fall back to scraping the console exporter for smoke tests. - Coordinate with the Ops/Signals guild to provision the span/metric sinks referenced in `docs/modules/platform/architecture-overview.md#observability`. ## Metrics reference | Metric | Type | Description | Key dimensions | | --- | --- | --- | --- | | `excititor.vex.observation.requests` | Counter | Number of `/v1/vex/observations/{vulnerabilityId}/{productKey}` requests handled. | `tenant`, `outcome` (`success`, `error`, `cancelled`), `truncated` (`true/false`) | | `excititor.vex.observation.statement_count` | Histogram | Distribution of statements returned per observation projection request. | `tenant`, `outcome` | | `excititor.vex.signature.status` | Counter | Signature status per statement (missing vs. unverified). | `tenant`, `status` (`missing`, `unverified`) | | `excititor.vex.aoc.guard_violations` | Counter | Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + `/v1/vex/aoc/verify`). | `tenant`, `surface` (`ingest`, `aoc_verify`, etc.), `code` (AOC error code) | | `excititor.vex.chunks.requests` | Counter | Requests to `/v1/vex/evidence/chunks` (NDJSON stream). | `tenant`, `outcome` (`success`,`error`,`cancelled`), `truncated` (`true/false`) | | `excititor.vex.chunks.bytes` | Histogram | Size of NDJSON chunk streams served (bytes). | `tenant`, `outcome` | | `excititor.vex.chunks.records` | Histogram | Count of evidence records emitted per chunk stream. | `tenant`, `outcome` | > All metrics originate from the `EvidenceTelemetry` helper (`src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs`). When disabled (telemetry off), the helper is inert. ### Dashboard hints - **Advisory-AI readiness** – alert when `excititor.vex.signature.status{status="missing"}` spikes for a tenant, indicating connectors aren’t supplying signatures. - **Guardrail monitoring** – graph `excititor.vex.aoc.guard_violations` per `code` to catch upstream feed regressions before they pollute Evidence Locker or Lens caches. - **Capacity planning** – histogram percentiles of `excititor.vex.observation.statement_count` feed API sizing (higher counts mean Advisory AI is requesting broad scopes). ## Operational steps 1. **Enable telemetry**: set `Excititor:Telemetry:EnableMetrics=true`, configure OTLP endpoints/headers as described in `TelemetryExtensions`. 2. **Add dashboards**: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged). 3. **Alerting**: add rules for high guard violation rates, missing signatures, and abnormal chunk bytes/record counts. Tie alerts back to connectors via tenant metadata. 4. **Post-deploy checks**: after each release, verify metrics emit by curling `/v1/vex/observations/...` and `/v1/vex/evidence/chunks`, watching the console exporter (dev) or OTLP (prod). ## SLOs (Sprint 119 – OBS-51-001) The following SLOs apply to Excititor evidence read paths when telemetry is enabled. Record them in the shared SLO registry and alert via the platform alertmanager. | Surface | SLI | Target | Window | Burn alert | Notes | | --- | --- | --- | --- | --- | --- | | `/v1/vex/observations` | p95 latency | ≤ 450 ms | 7d | 2 % over 1h | Measured on successful responses only; tenant scoped. | | `/v1/vex/observations` | freshness | ≥ 99 % within 5 min of upstream ingest | 7d | 5 % over 4h | Derived from arrival minus `createdAt`; requires ingest clocks in UTC. | | `/v1/vex/observations` | signature presence | ≥ 98 % statements with signature present | 7d | 3 % over 24h | Use `excititor.vex.signature.status{status="missing"}`. | | `/v1/vex/evidence/chunks` | p95 stream duration | ≤ 600 ms | 7d | 2 % over 1h | From request start to last NDJSON write; excludes client disconnects. | | `/v1/vex/evidence/chunks` | truncation rate | ≤ 1 % truncated streams | 7d | 1 % over 1h | `excititor.vex.chunks.records` with `truncated=true`. | | AOC guardrail | zero hard violations | 0 | continuous | immediate | Any `excititor.vex.aoc.guard_violations` with severity `error` pages ops. | Implementation notes: - Emit latency/freshness SLOs via OTEL views that pre-aggregate by tenant and route to the platform SLO backend; keep bucket boundaries aligned with 50/100/250/450/650/1000 ms. - Freshness SLI derived from ingest timestamps; ensure clocks are synchronized (NTP) and stored in UTC. - For air-gapped deployments without OTEL sinks, scrape console exporter and push to offline Prometheus; same thresholds apply. ## Related documents - `docs/modules/excititor/architecture.md` – API contract, AOC guardrails, connector responsibilities. - `docs/modules/excititor/mirrors.md` – AirGap/mirror ingestion checklist (feeds into `EXCITITOR-AIRGAP-56/57`). - `docs/modules/platform/architecture-overview.md#observability` – platform-wide telemetry guidance.