Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added `LedgerMetrics` class to record write latency and total events for ledger operations. - Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling. - Introduced `TestSurfaceSecretsScope` for managing environment variables during tests. - Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents. - Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB. - Established MongoDB indexes for efficient querying of events based on provenance and trust. - Added models and JSON parsing logic for DSSE provenance and trust information.
3.6 KiB
3.6 KiB
Excititor Observability Guide
Added 2025-11-14 alongside Sprint 119 (
EXCITITOR-AIAI-31-003). Complements the AirGap/mirror runbooks under the same folder.
Excititor’s evidence APIs now emit first-class OpenTelemetry metrics so Lens, Advisory AI, and Ops can detect misuse or missing provenance without paging through logs. This document lists the counters/histograms shipped by the WebService (src/Excititor/StellaOps.Excititor.WebService) and how to hook them into your exporters/dashboards.
Telemetry prerequisites
- Enable
Excititor:Telemetryin the service configuration (appsettings.*), ensuring metrics export is on. The WebService automatically adds the evidence meter (StellaOps.Excititor.WebService.Evidence) alongside the ingestion meter. - Deploy at least one OTLP or console exporter (see
TelemetryExtensions.ConfigureExcititorTelemetry). If your region lacks OTLP transport, fall back to scraping the console exporter for smoke tests. - Coordinate with the Ops/Signals guild to provision the span/metric sinks referenced in
docs/modules/platform/architecture-overview.md#observability.
Metrics reference
| Metric | Type | Description | Key dimensions |
|---|---|---|---|
excititor.vex.observation.requests |
Counter | Number of /v1/vex/observations/{vulnerabilityId}/{productKey} requests handled. |
tenant, outcome (success, error, cancelled), truncated (true/false) |
excititor.vex.observation.statement_count |
Histogram | Distribution of statements returned per observation projection request. | tenant, outcome |
excititor.vex.signature.status |
Counter | Signature status per statement (missing vs. unverified). | tenant, status (missing, unverified) |
excititor.vex.aoc.guard_violations |
Counter | Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + /vex/aoc/verify). |
tenant, surface (ingest, aoc_verify, etc.), code (AOC error code) |
All metrics originate from the
EvidenceTelemetryhelper (src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs). When disabled (telemetry off), the helper is inert.
Dashboard hints
- Advisory-AI readiness – alert when
excititor.vex.signature.status{status="missing"}spikes for a tenant, indicating connectors aren’t supplying signatures. - Guardrail monitoring – graph
excititor.vex.aoc.guard_violationspercodeto catch upstream feed regressions before they pollute Evidence Locker or Lens caches. - Capacity planning – histogram percentiles of
excititor.vex.observation.statement_countfeed API sizing (higher counts mean Advisory AI is requesting broad scopes).
Operational steps
- Enable telemetry: set
Excititor:Telemetry:EnableMetrics=true, configure OTLP endpoints/headers as described inTelemetryExtensions. - Add dashboards: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged).
- Alerting: add rules for high guard violation rates and missing signatures. Tie alerts back to connectors via tenant metadata.
- Post-deploy checks: after each release, verify metrics emit by curling
/v1/vex/observations/..., watching the console exporter (dev) or OTLP (prod).
Related documents
docs/modules/excititor/architecture.md– API contract, AOC guardrails, connector responsibilities.docs/modules/excititor/mirrors.md– AirGap/mirror ingestion checklist (feeds intoEXCITITOR-AIRGAP-56/57).docs/modules/platform/architecture-overview.md#observability– platform-wide telemetry guidance.