up
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-24 07:52:25 +02:00
parent 5970f0d9bd
commit 150b3730ef
215 changed files with 8119 additions and 740 deletions

View File

@@ -37,6 +37,24 @@ Excititors evidence APIs now emit first-class OpenTelemetry metrics so Lens,
3. **Alerting**: add rules for high guard violation rates, missing signatures, and abnormal chunk bytes/record counts. Tie alerts back to connectors via tenant metadata.
4. **Post-deploy checks**: after each release, verify metrics emit by curling `/v1/vex/observations/...` and `/v1/vex/evidence/chunks`, watching the console exporter (dev) or OTLP (prod).
## SLOs (Sprint 119 OBS-51-001)
The following SLOs apply to Excititor evidence read paths when telemetry is enabled. Record them in the shared SLO registry and alert via the platform alertmanager.
| Surface | SLI | Target | Window | Burn alert | Notes |
| --- | --- | --- | --- | --- | --- |
| `/v1/vex/observations` | p95 latency | ≤ 450ms | 7d | 2% over 1h | Measured on successful responses only; tenant scoped. |
| `/v1/vex/observations` | freshness | ≥ 99% within 5min of upstream ingest | 7d | 5% over 4h | Derived from arrival minus `createdAt`; requires ingest clocks in UTC. |
| `/v1/vex/observations` | signature presence | ≥ 98% statements with signature present | 7d | 3% over 24h | Use `excititor.vex.signature.status{status="missing"}`. |
| `/v1/vex/evidence/chunks` | p95 stream duration | ≤ 600ms | 7d | 2% over 1h | From request start to last NDJSON write; excludes client disconnects. |
| `/v1/vex/evidence/chunks` | truncation rate | ≤ 1% truncated streams | 7d | 1% over 1h | `excititor.vex.chunks.records` with `truncated=true`. |
| AOC guardrail | zero hard violations | 0 | continuous | immediate | Any `excititor.vex.aoc.guard_violations` with severity `error` pages ops. |
Implementation notes:
- Emit latency/freshness SLOs via OTEL views that pre-aggregate by tenant and route to the platform SLO backend; keep bucket boundaries aligned with 50/100/250/450/650/1000ms.
- Freshness SLI derived from ingest timestamps; ensure clocks are synchronized (NTP) and stored in UTC.
- For air-gapped deployments without OTEL sinks, scrape console exporter and push to offline Prometheus; same thresholds apply.
## Related documents
- `docs/modules/excititor/architecture.md` API contract, AOC guardrails, connector responsibilities.