Files
git.stella-ops.org/docs/modules/ui/operations/observability.md
StellaOps Bot 17d45a6d30
Some checks failed
Airgap Sealed CI Smoke / sealed-smoke (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
feat: Implement Filesystem and MongoDB provenance writers for PackRun execution context
- Added `FilesystemPackRunProvenanceWriter` to write provenance manifests to the filesystem.
- Introduced `MongoPackRunArtifactReader` to read artifacts from MongoDB.
- Created `MongoPackRunProvenanceWriter` to store provenance manifests in MongoDB.
- Developed unit tests for filesystem and MongoDB provenance writers.
- Established `ITimelineEventStore` and `ITimelineIngestionService` interfaces for timeline event handling.
- Implemented `TimelineIngestionService` to validate and persist timeline events with hashing.
- Created PostgreSQL schema and migration scripts for timeline indexing.
- Added dependency injection support for timeline indexer services.
- Developed tests for timeline ingestion and schema validation.
2025-11-30 15:38:14 +02:00

2.4 KiB

Console UI observability runbook (stub · 2025-11-29 demo)

Dashboards (offline import)

  • Grafana JSON: docs/modules/ui/operations/dashboards/console-ui-observability.json (import locally; no external data sources assumed).
  • Panels to include: API latency (p95/p99), error rate, WebSocket/SSE connection count, asset load time, bundle size budget, Core Web Vitals (LCP/FID/CLS), and triage view render time.

Key metrics

  • console_ui_http_request_duration_seconds_bucket{route} — API call latency.
  • console_ui_http_requests_total{status} — error rate tracking.
  • console_ui_websocket_connections — active live session count.
  • console_ui_bundle_bytes{chunk} — bundle size by chunk (ensures offline kit budget).
  • console_ui_core_web_vitals{metric} — LCP/FID/CLS gauges.
  • console_ui_export_duration_seconds_bucket — export trigger to download completion.

Logs & traces

  • Correlate by correlationId (propagated from API) and tenant. Include feature (triage, findings, policy) and route fields.
  • Traces disabled by default for air-gap; enable by pointing OTLP endpoint to on-prem collector and setting Telemetry:ExportEnabled=true.

Health/diagnostics

  • /health/liveness and /health/readiness (UI backend) must return 200; readiness checks asset storage + API gateway reachability.
  • /status exposes build version, commit, feature flags; ensure it matches the offline bundle manifest when shipping sealed kits.
  • Frontend self-check: open /health/ui to verify core bundles are reachable and integrity hashes match manifest.

Alert hints

  • p99 API latency > 1s for /api/findings or /api/policy.
  • SSE/WS disconnect rate > 2% over 5m window.
  • Bundle size > 3.5 MB for main chunk after gzip (offline kit budget breach).
  • Core Web Vitals: LCP > 2.5s, CLS > 0.1 on internal demo dataset.

Offline verification steps

  1. Import Grafana JSON locally; point to Prometheus scrape labeled console-ui.
  2. Run npm run build -- --configuration=production (or offline kit build) and verify bundle hashes against manifest used by /health/ui.
  3. Fetch /status and compare commit/version to the static asset manifest embedded in the offline kit.

Evidence locations

  • Sprint tracker: docs/implplan/SPRINT_0331_0001_0001_docs_modules_ui.md.
  • Module front doors: README.md, architecture.md, implementation_plan.md.
  • Dashboard stub: operations/dashboards/console-ui-observability.json.