- Implemented `run-scanner-ci.sh` to build and run tests for the Scanner solution with a warmed NuGet cache. - Created `excititor-vex-traces.json` dashboard for monitoring Excititor VEX observations. - Added Docker Compose configuration for the OTLP span sink in `docker-compose.spansink.yml`. - Configured OpenTelemetry collector in `otel-spansink.yaml` to receive and process traces. - Developed `run-spansink.sh` script to run the OTLP span sink for Excititor traces. - Introduced `FileSystemRiskBundleObjectStore` for storing risk bundle artifacts in the filesystem. - Built `RiskBundleBuilder` for creating risk bundles with associated metadata and providers. - Established `RiskBundleJob` to execute the risk bundle creation and storage process. - Defined models for risk bundle inputs, entries, and manifests in `RiskBundleModels.cs`. - Implemented signing functionality for risk bundle manifests with `HmacRiskBundleManifestSigner`. - Created unit tests for `RiskBundleBuilder`, `RiskBundleJob`, and signing functionality to ensure correctness. - Added filesystem artifact reader tests to validate manifest parsing and artifact listing. - Included test manifests for egress scenarios in the task runner tests. - Developed timeline query service tests to verify tenant and event ID handling.
2.3 KiB
2.3 KiB
Export Center observability runbook (stub · 2025-11-29 demo)
Dashboards (offline import)
- Grafana JSON:
docs/modules/export-center/operations/dashboards/export-center-observability.json(import locally; no external data sources assumed). - Planned panels: export job duration p95/p99, bundle size histogram, registry push latency, provenance/attestation verification failures, queue depth, and error rate per profile.
Key metrics
export_job_duration_seconds_bucket{profile}— export duration by profile.export_bundle_size_bytes_bucket{profile}— bundle size distribution.export_registry_push_latency_seconds_bucket{profile}— registry push latency.export_attestation_failures_total{reason}— DSSE/provenance verification failures.export_queue_depth— pending export jobs.export_manifest_publish_total{result}— manifest publish successes/failures.
Logs & traces
- Correlate by
exportId,profile,tenant; includebundleDigest,attestationStatus,registry. Traces disabled by default; enable OTLP to on-prem collector when permitted.
Health/diagnostics
/health/livenessand/health/readiness(export service) check storage, registry reachability, and attestation verification path./statusexposes build version, commit, feature flags; verify against offline bundle manifest.- Verification probe:
stella export bundle verify --manifest <path>once bundle available; validate hashes against manifest.
Alert hints
- Export job duration p99 > target SLA per profile.
- Attestation verification failures > 0 over 10m.
- Registry push latency spikes or error rate > threshold.
- Queue depth growth without completion.
Offline verification steps
- Import Grafana JSON locally; point to Prometheus scrape labeled
export-center. - Run
stella export bundle --profile <profile> --manifest out/manifest.jsonand verify hashes viajq -r '.files[].sha256'against generated bundles. - Fetch
/statusand compare commit/version to offline bundle manifest.
Evidence locations
- Sprint tracker:
docs/implplan/SPRINT_0320_0001_0001_docs_modules_export_center.md. - Module docs:
README.md,architecture.md,implementation_plan.md. - Dashboard stub:
operations/dashboards/export-center-observability.json.