Implement ledger metrics for observability and add tests for Ruby packages endpoints
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added `LedgerMetrics` class to record write latency and total events for ledger operations. - Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling. - Introduced `TestSurfaceSecretsScope` for managing environment variables during tests. - Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents. - Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB. - Established MongoDB indexes for efficient querying of events based on provenance and trust. - Added models and JSON parsing logic for DSSE provenance and trust information.
This commit is contained in:
41
docs/modules/excititor/operations/observability.md
Normal file
41
docs/modules/excititor/operations/observability.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Excititor Observability Guide
|
||||
|
||||
> Added 2025-11-14 alongside Sprint 119 (`EXCITITOR-AIAI-31-003`). Complements the AirGap/mirror runbooks under the same folder.
|
||||
|
||||
Excititor’s evidence APIs now emit first-class OpenTelemetry metrics so Lens, Advisory AI, and Ops can detect misuse or missing provenance without paging through logs. This document lists the counters/histograms shipped by the WebService (`src/Excititor/StellaOps.Excititor.WebService`) and how to hook them into your exporters/dashboards.
|
||||
|
||||
## Telemetry prerequisites
|
||||
|
||||
- Enable `Excititor:Telemetry` in the service configuration (`appsettings.*`), ensuring **metrics** export is on. The WebService automatically adds the evidence meter (`StellaOps.Excititor.WebService.Evidence`) alongside the ingestion meter.
|
||||
- Deploy at least one OTLP or console exporter (see `TelemetryExtensions.ConfigureExcititorTelemetry`). If your region lacks OTLP transport, fall back to scraping the console exporter for smoke tests.
|
||||
- Coordinate with the Ops/Signals guild to provision the span/metric sinks referenced in `docs/modules/platform/architecture-overview.md#observability`.
|
||||
|
||||
## Metrics reference
|
||||
|
||||
| Metric | Type | Description | Key dimensions |
|
||||
| --- | --- | --- | --- |
|
||||
| `excititor.vex.observation.requests` | Counter | Number of `/v1/vex/observations/{vulnerabilityId}/{productKey}` requests handled. | `tenant`, `outcome` (`success`, `error`, `cancelled`), `truncated` (`true/false`) |
|
||||
| `excititor.vex.observation.statement_count` | Histogram | Distribution of statements returned per observation projection request. | `tenant`, `outcome` |
|
||||
| `excititor.vex.signature.status` | Counter | Signature status per statement (missing vs. unverified). | `tenant`, `status` (`missing`, `unverified`) |
|
||||
| `excititor.vex.aoc.guard_violations` | Counter | Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + `/vex/aoc/verify`). | `tenant`, `surface` (`ingest`, `aoc_verify`, etc.), `code` (AOC error code) |
|
||||
|
||||
> All metrics originate from the `EvidenceTelemetry` helper (`src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs`). When disabled (telemetry off), the helper is inert.
|
||||
|
||||
### Dashboard hints
|
||||
|
||||
- **Advisory-AI readiness** – alert when `excititor.vex.signature.status{status="missing"}` spikes for a tenant, indicating connectors aren’t supplying signatures.
|
||||
- **Guardrail monitoring** – graph `excititor.vex.aoc.guard_violations` per `code` to catch upstream feed regressions before they pollute Evidence Locker or Lens caches.
|
||||
- **Capacity planning** – histogram percentiles of `excititor.vex.observation.statement_count` feed API sizing (higher counts mean Advisory AI is requesting broad scopes).
|
||||
|
||||
## Operational steps
|
||||
|
||||
1. **Enable telemetry**: set `Excititor:Telemetry:EnableMetrics=true`, configure OTLP endpoints/headers as described in `TelemetryExtensions`.
|
||||
2. **Add dashboards**: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged).
|
||||
3. **Alerting**: add rules for high guard violation rates and missing signatures. Tie alerts back to connectors via tenant metadata.
|
||||
4. **Post-deploy checks**: after each release, verify metrics emit by curling `/v1/vex/observations/...`, watching the console exporter (dev) or OTLP (prod).
|
||||
|
||||
## Related documents
|
||||
|
||||
- `docs/modules/excititor/architecture.md` – API contract, AOC guardrails, connector responsibilities.
|
||||
- `docs/modules/excititor/mirrors.md` – AirGap/mirror ingestion checklist (feeds into `EXCITITOR-AIRGAP-56/57`).
|
||||
- `docs/modules/platform/architecture-overview.md#observability` – platform-wide telemetry guidance.
|
||||
Reference in New Issue
Block a user