docs consolidation

This commit is contained in:
StellaOps Bot
2025-12-25 12:16:13 +02:00
parent deb82b4f03
commit 223843f1d1
34 changed files with 2141 additions and 106 deletions

View File

@@ -156,40 +156,40 @@ Downstream automation reads `manifest.json`/`bundle.json` directly, while `/exci
---
## 6) Future alignment
* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.
---
## 7) Runbook & observability checklist (Sprint 22 demo refresh · 2025-11-07)
### Daily / on-call checks
1. **Index freshness** watch `excitor_mirror_export_latency_seconds` (p95 < 180) grouped by `domainId`. If latency grows past 10 minutes, verify the export worker queue (`stellaops-export-worker` logs) and ensure Mongo `vex_exports` has entries newer than `now()-10m`.
2. **Quota exhaustion** alert on `excitor_mirror_quota_exhausted_total{scope="download"}` increases. When triggered, inspect structured logs (`MirrorDomainId`, `QuotaScope`, `RemoteIp`) and either raise limits or throttle abusive clients.
3. **Bundle signature health** metric `excitor_mirror_bundle_signature_verified_total` should match download counts when signing enabled. Deltas indicate missing `.jws` files; rebuild the bundle via export job or copy artefacts from the authority mirror cache.
4. **HTTP errors** dashboards should track 4xx/5xx rates split by route; repeated `503` statuses imply misconfigured exports. Check `mirror/index` logs for `status=misconfigured`.
### Incident steps
1. Use `GET /excititor/mirror/domains/{id}/index` to capture current manifests. Attach the response to the incident log for reproducibility.
2. For quota incidents, temporarily raise `maxIndexRequestsPerHour`/`maxDownloadRequestsPerHour` via the `Excititor:Mirror:Domains` config override, redeploy, then work with the consuming team on caching.
3. For stale exports, trigger the export job (`Excititor.ExportRunner`) and confirm the artefacts are written to `outputRoot/<domain>`.
4. Validate DSSE artefacts by running `cosign verify-blob --certificate-rekor-url=<rekor> --bundle <domain>/bundle.json --signature <domain>/bundle.json.jws`.
### Logging fields (structured)
| Field | Description |
| --- | --- |
| `MirrorDomainId` | Domain handling the request (matches `id` in config). |
| `QuotaScope` | `index` / `download`, useful when alerting on quota events. |
| `ExportKey` | Included in download logs to pinpoint misconfigured exports. |
| `BundleDigest` | SHA-256 of the artefact; compare with index payload when debugging corruption. |
### OTEL signals
- **Counters:** `excitor.mirror.requests`, `excitor.mirror.quota_blocked`, `excitor.mirror.signature.failures`.
- **Histograms:** `excitor.mirror.download.duration`, `excitor.mirror.export.latency`.
- **Spans:** `mirror.index`, `mirror.download` include attributes `mirror.domain`, `mirror.export.key`, and `mirror.quota.remaining`.
Add these instruments via the `MirrorEndpoints` middleware; see `StellaOps.Excititor.WebService/Telemetry/MirrorMetrics.cs`.
## 6) Future alignment
* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.
---
## 7) Runbook & observability checklist (Sprint 22 demo refresh · 2025-11-07)
### Daily / on-call checks
1. **Index freshness** watch `excitor_mirror_export_latency_seconds` (p95 < 180) grouped by `domainId`. If latency grows past 10 minutes, verify the export worker queue (`stellaops-export-worker` logs) and ensure PostgreSQL `vex.exports` has entries newer than `now()-10m`.
2. **Quota exhaustion** alert on `excitor_mirror_quota_exhausted_total{scope="download"}` increases. When triggered, inspect structured logs (`MirrorDomainId`, `QuotaScope`, `RemoteIp`) and either raise limits or throttle abusive clients.
3. **Bundle signature health** metric `excitor_mirror_bundle_signature_verified_total` should match download counts when signing enabled. Deltas indicate missing `.jws` files; rebuild the bundle via export job or copy artefacts from the authority mirror cache.
4. **HTTP errors** dashboards should track 4xx/5xx rates split by route; repeated `503` statuses imply misconfigured exports. Check `mirror/index` logs for `status=misconfigured`.
### Incident steps
1. Use `GET /excititor/mirror/domains/{id}/index` to capture current manifests. Attach the response to the incident log for reproducibility.
2. For quota incidents, temporarily raise `maxIndexRequestsPerHour`/`maxDownloadRequestsPerHour` via the `Excititor:Mirror:Domains` config override, redeploy, then work with the consuming team on caching.
3. For stale exports, trigger the export job (`Excititor.ExportRunner`) and confirm the artefacts are written to `outputRoot/<domain>`.
4. Validate DSSE artefacts by running `cosign verify-blob --certificate-rekor-url=<rekor> --bundle <domain>/bundle.json --signature <domain>/bundle.json.jws`.
### Logging fields (structured)
| Field | Description |
| --- | --- |
| `MirrorDomainId` | Domain handling the request (matches `id` in config). |
| `QuotaScope` | `index` / `download`, useful when alerting on quota events. |
| `ExportKey` | Included in download logs to pinpoint misconfigured exports. |
| `BundleDigest` | SHA-256 of the artefact; compare with index payload when debugging corruption. |
### OTEL signals
- **Counters:** `excitor.mirror.requests`, `excitor.mirror.quota_blocked`, `excitor.mirror.signature.failures`.
- **Histograms:** `excitor.mirror.download.duration`, `excitor.mirror.export.latency`.
- **Spans:** `mirror.index`, `mirror.download` include attributes `mirror.domain`, `mirror.export.key`, and `mirror.quota.remaining`.
Add these instruments via the `MirrorEndpoints` middleware; see `StellaOps.Excititor.WebService/Telemetry/MirrorMetrics.cs`.