Files
git.stella-ops.org/docs/modules/excititor/operations/observability.md
master 61f963fd52
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement ledger metrics for observability and add tests for Ruby packages endpoints
- Added `LedgerMetrics` class to record write latency and total events for ledger operations.
- Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling.
- Introduced `TestSurfaceSecretsScope` for managing environment variables during tests.
- Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents.
- Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB.
- Established MongoDB indexes for efficient querying of events based on provenance and trust.
- Added models and JSON parsing logic for DSSE provenance and trust information.
2025-11-13 09:29:09 +02:00

3.6 KiB
Raw Blame History

Excititor Observability Guide

Added 2025-11-14 alongside Sprint 119 (EXCITITOR-AIAI-31-003). Complements the AirGap/mirror runbooks under the same folder.

Excititors evidence APIs now emit first-class OpenTelemetry metrics so Lens, Advisory AI, and Ops can detect misuse or missing provenance without paging through logs. This document lists the counters/histograms shipped by the WebService (src/Excititor/StellaOps.Excititor.WebService) and how to hook them into your exporters/dashboards.

Telemetry prerequisites

  • Enable Excititor:Telemetry in the service configuration (appsettings.*), ensuring metrics export is on. The WebService automatically adds the evidence meter (StellaOps.Excititor.WebService.Evidence) alongside the ingestion meter.
  • Deploy at least one OTLP or console exporter (see TelemetryExtensions.ConfigureExcititorTelemetry). If your region lacks OTLP transport, fall back to scraping the console exporter for smoke tests.
  • Coordinate with the Ops/Signals guild to provision the span/metric sinks referenced in docs/modules/platform/architecture-overview.md#observability.

Metrics reference

Metric Type Description Key dimensions
excititor.vex.observation.requests Counter Number of /v1/vex/observations/{vulnerabilityId}/{productKey} requests handled. tenant, outcome (success, error, cancelled), truncated (true/false)
excititor.vex.observation.statement_count Histogram Distribution of statements returned per observation projection request. tenant, outcome
excititor.vex.signature.status Counter Signature status per statement (missing vs. unverified). tenant, status (missing, unverified)
excititor.vex.aoc.guard_violations Counter Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + /vex/aoc/verify). tenant, surface (ingest, aoc_verify, etc.), code (AOC error code)

All metrics originate from the EvidenceTelemetry helper (src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs). When disabled (telemetry off), the helper is inert.

Dashboard hints

  • Advisory-AI readiness alert when excititor.vex.signature.status{status="missing"} spikes for a tenant, indicating connectors arent supplying signatures.
  • Guardrail monitoring graph excititor.vex.aoc.guard_violations per code to catch upstream feed regressions before they pollute Evidence Locker or Lens caches.
  • Capacity planning histogram percentiles of excititor.vex.observation.statement_count feed API sizing (higher counts mean Advisory AI is requesting broad scopes).

Operational steps

  1. Enable telemetry: set Excititor:Telemetry:EnableMetrics=true, configure OTLP endpoints/headers as described in TelemetryExtensions.
  2. Add dashboards: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged).
  3. Alerting: add rules for high guard violation rates and missing signatures. Tie alerts back to connectors via tenant metadata.
  4. Post-deploy checks: after each release, verify metrics emit by curling /v1/vex/observations/..., watching the console exporter (dev) or OTLP (prod).
  • docs/modules/excititor/architecture.md API contract, AOC guardrails, connector responsibilities.
  • docs/modules/excititor/mirrors.md AirGap/mirror ingestion checklist (feeds into EXCITITOR-AIRGAP-56/57).
  • docs/modules/platform/architecture-overview.md#observability platform-wide telemetry guidance.