Implement ledger metrics for observability and add tests for Ruby packages endpoints
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Added `LedgerMetrics` class to record write latency and total events for ledger operations.
- Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling.
- Introduced `TestSurfaceSecretsScope` for managing environment variables during tests.
- Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents.
- Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB.
- Established MongoDB indexes for efficient querying of events based on provenance and trust.
- Added models and JSON parsing logic for DSSE provenance and trust information.
This commit is contained in:
master
2025-11-13 09:29:09 +02:00
parent 151f6b35cc
commit 61f963fd52
101 changed files with 5881 additions and 1776 deletions

View File

@@ -0,0 +1,41 @@
# Excititor Observability Guide
> Added 2025-11-14 alongside Sprint 119 (`EXCITITOR-AIAI-31-003`). Complements the AirGap/mirror runbooks under the same folder.
Excititors evidence APIs now emit first-class OpenTelemetry metrics so Lens, Advisory AI, and Ops can detect misuse or missing provenance without paging through logs. This document lists the counters/histograms shipped by the WebService (`src/Excititor/StellaOps.Excititor.WebService`) and how to hook them into your exporters/dashboards.
## Telemetry prerequisites
- Enable `Excititor:Telemetry` in the service configuration (`appsettings.*`), ensuring **metrics** export is on. The WebService automatically adds the evidence meter (`StellaOps.Excititor.WebService.Evidence`) alongside the ingestion meter.
- Deploy at least one OTLP or console exporter (see `TelemetryExtensions.ConfigureExcititorTelemetry`). If your region lacks OTLP transport, fall back to scraping the console exporter for smoke tests.
- Coordinate with the Ops/Signals guild to provision the span/metric sinks referenced in `docs/modules/platform/architecture-overview.md#observability`.
## Metrics reference
| Metric | Type | Description | Key dimensions |
| --- | --- | --- | --- |
| `excititor.vex.observation.requests` | Counter | Number of `/v1/vex/observations/{vulnerabilityId}/{productKey}` requests handled. | `tenant`, `outcome` (`success`, `error`, `cancelled`), `truncated` (`true/false`) |
| `excititor.vex.observation.statement_count` | Histogram | Distribution of statements returned per observation projection request. | `tenant`, `outcome` |
| `excititor.vex.signature.status` | Counter | Signature status per statement (missing vs. unverified). | `tenant`, `status` (`missing`, `unverified`) |
| `excititor.vex.aoc.guard_violations` | Counter | Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + `/vex/aoc/verify`). | `tenant`, `surface` (`ingest`, `aoc_verify`, etc.), `code` (AOC error code) |
> All metrics originate from the `EvidenceTelemetry` helper (`src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs`). When disabled (telemetry off), the helper is inert.
### Dashboard hints
- **Advisory-AI readiness** alert when `excititor.vex.signature.status{status="missing"}` spikes for a tenant, indicating connectors arent supplying signatures.
- **Guardrail monitoring** graph `excititor.vex.aoc.guard_violations` per `code` to catch upstream feed regressions before they pollute Evidence Locker or Lens caches.
- **Capacity planning** histogram percentiles of `excititor.vex.observation.statement_count` feed API sizing (higher counts mean Advisory AI is requesting broad scopes).
## Operational steps
1. **Enable telemetry**: set `Excititor:Telemetry:EnableMetrics=true`, configure OTLP endpoints/headers as described in `TelemetryExtensions`.
2. **Add dashboards**: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged).
3. **Alerting**: add rules for high guard violation rates and missing signatures. Tie alerts back to connectors via tenant metadata.
4. **Post-deploy checks**: after each release, verify metrics emit by curling `/v1/vex/observations/...`, watching the console exporter (dev) or OTLP (prod).
## Related documents
- `docs/modules/excititor/architecture.md` API contract, AOC guardrails, connector responsibilities.
- `docs/modules/excititor/mirrors.md` AirGap/mirror ingestion checklist (feeds into `EXCITITOR-AIRGAP-56/57`).
- `docs/modules/platform/architecture-overview.md#observability` platform-wide telemetry guidance.