Files
git.stella-ops.org/docs/modules/excititor/operations/observability.md
master 8355e2ff75
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Add initial implementation of Vulnerability Resolver Jobs
- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies.
- Documented roles and guidelines in AGENTS.md for Scheduler module.
- Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs.
- Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics.
- Developed API endpoints for managing resolver jobs and retrieving metrics.
- Defined models for resolver job requests and responses.
- Integrated dependency injection for resolver job services.
- Implemented ImpactIndexSnapshot for persisting impact index data.
- Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring.
- Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService.
- Created dotnet-filter.sh script to handle command-line arguments for dotnet.
- Established nuget-prime project for managing package downloads.
2025-11-18 07:52:15 +02:00

4.0 KiB
Raw Blame History

Excititor Observability Guide

Added 2025-11-14 alongside Sprint 119 (EXCITITOR-AIAI-31-003). Complements the AirGap/mirror runbooks under the same folder.

Excititors evidence APIs now emit first-class OpenTelemetry metrics so Lens, Advisory AI, and Ops can detect misuse or missing provenance without paging through logs. This document lists the counters/histograms shipped by the WebService (src/Excititor/StellaOps.Excititor.WebService) and how to hook them into your exporters/dashboards.

Telemetry prerequisites

  • Enable Excititor:Telemetry in the service configuration (appsettings.*), ensuring metrics export is on. The WebService automatically adds the evidence meter (StellaOps.Excititor.WebService.Evidence) alongside the ingestion meter.
  • Deploy at least one OTLP or console exporter (see TelemetryExtensions.ConfigureExcititorTelemetry). If your region lacks OTLP transport, fall back to scraping the console exporter for smoke tests.
  • Coordinate with the Ops/Signals guild to provision the span/metric sinks referenced in docs/modules/platform/architecture-overview.md#observability.

Metrics reference

Metric Type Description Key dimensions
excititor.vex.observation.requests Counter Number of /v1/vex/observations/{vulnerabilityId}/{productKey} requests handled. tenant, outcome (success, error, cancelled), truncated (true/false)
excititor.vex.observation.statement_count Histogram Distribution of statements returned per observation projection request. tenant, outcome
excititor.vex.signature.status Counter Signature status per statement (missing vs. unverified). tenant, status (missing, unverified)
excititor.vex.aoc.guard_violations Counter Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + /v1/vex/aoc/verify). tenant, surface (ingest, aoc_verify, etc.), code (AOC error code)
excititor.vex.chunks.requests Counter Requests to /v1/vex/evidence/chunks (NDJSON stream). tenant, outcome (success,error,cancelled), truncated (true/false)
excititor.vex.chunks.bytes Histogram Size of NDJSON chunk streams served (bytes). tenant, outcome
excititor.vex.chunks.records Histogram Count of evidence records emitted per chunk stream. tenant, outcome

All metrics originate from the EvidenceTelemetry helper (src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs). When disabled (telemetry off), the helper is inert.

Dashboard hints

  • Advisory-AI readiness alert when excititor.vex.signature.status{status="missing"} spikes for a tenant, indicating connectors arent supplying signatures.
  • Guardrail monitoring graph excititor.vex.aoc.guard_violations per code to catch upstream feed regressions before they pollute Evidence Locker or Lens caches.
  • Capacity planning histogram percentiles of excititor.vex.observation.statement_count feed API sizing (higher counts mean Advisory AI is requesting broad scopes).

Operational steps

  1. Enable telemetry: set Excititor:Telemetry:EnableMetrics=true, configure OTLP endpoints/headers as described in TelemetryExtensions.
  2. Add dashboards: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged).
  3. Alerting: add rules for high guard violation rates, missing signatures, and abnormal chunk bytes/record counts. Tie alerts back to connectors via tenant metadata.
  4. Post-deploy checks: after each release, verify metrics emit by curling /v1/vex/observations/... and /v1/vex/evidence/chunks, watching the console exporter (dev) or OTLP (prod).
  • docs/modules/excititor/architecture.md API contract, AOC guardrails, connector responsibilities.
  • docs/modules/excititor/mirrors.md AirGap/mirror ingestion checklist (feeds into EXCITITOR-AIRGAP-56/57).
  • docs/modules/platform/architecture-overview.md#observability platform-wide telemetry guidance.