feat: Add Scanner CI runner and related artifacts
- Implemented `run-scanner-ci.sh` to build and run tests for the Scanner solution with a warmed NuGet cache. - Created `excititor-vex-traces.json` dashboard for monitoring Excititor VEX observations. - Added Docker Compose configuration for the OTLP span sink in `docker-compose.spansink.yml`. - Configured OpenTelemetry collector in `otel-spansink.yaml` to receive and process traces. - Developed `run-spansink.sh` script to run the OTLP span sink for Excititor traces. - Introduced `FileSystemRiskBundleObjectStore` for storing risk bundle artifacts in the filesystem. - Built `RiskBundleBuilder` for creating risk bundles with associated metadata and providers. - Established `RiskBundleJob` to execute the risk bundle creation and storage process. - Defined models for risk bundle inputs, entries, and manifests in `RiskBundleModels.cs`. - Implemented signing functionality for risk bundle manifests with `HmacRiskBundleManifestSigner`. - Created unit tests for `RiskBundleBuilder`, `RiskBundleJob`, and signing functionality to ensure correctness. - Added filesystem artifact reader tests to validate manifest parsing and artifact listing. - Included test manifests for egress scenarios in the task runner tests. - Developed timeline query service tests to verify tenant and event ID handling.
This commit is contained in:
37
docs/modules/export-center/operations/observability.md
Normal file
37
docs/modules/export-center/operations/observability.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Export Center observability runbook (stub · 2025-11-29 demo)
|
||||
|
||||
## Dashboards (offline import)
|
||||
- Grafana JSON: `docs/modules/export-center/operations/dashboards/export-center-observability.json` (import locally; no external data sources assumed).
|
||||
- Planned panels: export job duration p95/p99, bundle size histogram, registry push latency, provenance/attestation verification failures, queue depth, and error rate per profile.
|
||||
|
||||
## Key metrics
|
||||
- `export_job_duration_seconds_bucket{profile}` — export duration by profile.
|
||||
- `export_bundle_size_bytes_bucket{profile}` — bundle size distribution.
|
||||
- `export_registry_push_latency_seconds_bucket{profile}` — registry push latency.
|
||||
- `export_attestation_failures_total{reason}` — DSSE/provenance verification failures.
|
||||
- `export_queue_depth` — pending export jobs.
|
||||
- `export_manifest_publish_total{result}` — manifest publish successes/failures.
|
||||
|
||||
## Logs & traces
|
||||
- Correlate by `exportId`, `profile`, `tenant`; include `bundleDigest`, `attestationStatus`, `registry`. Traces disabled by default; enable OTLP to on-prem collector when permitted.
|
||||
|
||||
## Health/diagnostics
|
||||
- `/health/liveness` and `/health/readiness` (export service) check storage, registry reachability, and attestation verification path.
|
||||
- `/status` exposes build version, commit, feature flags; verify against offline bundle manifest.
|
||||
- Verification probe: `stella export bundle verify --manifest <path>` once bundle available; validate hashes against manifest.
|
||||
|
||||
## Alert hints
|
||||
- Export job duration p99 > target SLA per profile.
|
||||
- Attestation verification failures > 0 over 10m.
|
||||
- Registry push latency spikes or error rate > threshold.
|
||||
- Queue depth growth without completion.
|
||||
|
||||
## Offline verification steps
|
||||
1) Import Grafana JSON locally; point to Prometheus scrape labeled `export-center`.
|
||||
2) Run `stella export bundle --profile <profile> --manifest out/manifest.json` and verify hashes via `jq -r '.files[].sha256'` against generated bundles.
|
||||
3) Fetch `/status` and compare commit/version to offline bundle manifest.
|
||||
|
||||
## Evidence locations
|
||||
- Sprint tracker: `docs/implplan/SPRINT_0320_0001_0001_docs_modules_export_center.md`.
|
||||
- Module docs: `README.md`, `architecture.md`, `implementation_plan.md`.
|
||||
- Dashboard stub: `operations/dashboards/export-center-observability.json`.
|
||||
Reference in New Issue
Block a user