Files
git.stella-ops.org/docs/modules/vex-lens/runbooks/observability.md
StellaOps Bot 17d45a6d30
Some checks failed
Airgap Sealed CI Smoke / sealed-smoke (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
feat: Implement Filesystem and MongoDB provenance writers for PackRun execution context
- Added `FilesystemPackRunProvenanceWriter` to write provenance manifests to the filesystem.
- Introduced `MongoPackRunArtifactReader` to read artifacts from MongoDB.
- Created `MongoPackRunProvenanceWriter` to store provenance manifests in MongoDB.
- Developed unit tests for filesystem and MongoDB provenance writers.
- Established `ITimelineEventStore` and `ITimelineIngestionService` interfaces for timeline event handling.
- Implemented `TimelineIngestionService` to validate and persist timeline events with hashing.
- Created PostgreSQL schema and migration scripts for timeline indexing.
- Added dependency injection support for timeline indexer services.
- Developed tests for timeline ingestion and schema validation.
2025-11-30 15:38:14 +02:00

2.4 KiB

VEX Lens observability runbook (stub · 2025-11-29 demo)

Dashboards (offline import)

  • Grafana JSON: docs/modules/vex-lens/runbooks/dashboards/vex-lens-observability.json (import locally; no external data sources assumed).
  • Planned panels: consensus latency, conflict backlog, recompute duration, issuer trust changes, export job success rate, and DSSE verification failures.

Key metrics

  • vex_consensus_latency_seconds_bucket — latency from observation intake to consensus write.
  • vex_conflict_queue_depth — size of unresolved conflict queue.
  • vex_recompute_duration_seconds_bucket{reason} — recompute times by trigger (issuer update, policy knob, ingestion delta).
  • vex_export_duration_seconds_bucket — export job runtime.
  • vex_dsse_verification_failures_total — failed attestations during export/ingest.
  • vex_consensus_conflicts_total{reason} — conflict counts by reason (status disagreement, scope mismatch, missing provenance).

Logs & traces

  • Correlate by correlationId, artifactKey, advisoryKey, and issuer. Include trustTier, weightBefore, weightAfter, and justification fields for audits.
  • Traces disabled by default for air-gap; enable by setting Telemetry:ExportEnabled=true and pointing OTLP endpoint to on-prem collector.

Health/diagnostics

  • /health/liveness and /health/readiness (service) must return 200; readiness checks Mongo + cache + event bus reachability.
  • /status exposes build version, commit, feature flags; verify it matches offline bundle manifest.
  • Export self-check: run stella vex export --format json --manifest out/manifest.json and validate hashes against manifest entries.

Alert hints

  • Consensus latency p99 > 1.5s over 5m.
  • Conflict queue depth > 500 for any tenant.
  • DSSE verification failures > 0 in a 10m window.
  • Export failure rate > 2% over 10m.

Offline verification steps

  1. Import Grafana JSON locally; point to Prometheus scrape labeled vex-lens.
  2. Run export CLI above and verify manifest.json hashes via jq -r '.files[].sha256'.
  3. Fetch /status and confirm commit/version match the exported manifest and offline kit bundle metadata.

Evidence locations

  • Sprint tracker: docs/implplan/SPRINT_0332_0001_0001_docs_modules_vex_lens.md.
  • Module docs: README.md, architecture.md, implementation_plan.md.
  • Dashboard stub: runbooks/dashboards/vex-lens-observability.json.