feat: Implement Filesystem and MongoDB provenance writers for PackRun execution context
- Added `FilesystemPackRunProvenanceWriter` to write provenance manifests to the filesystem. - Introduced `MongoPackRunArtifactReader` to read artifacts from MongoDB. - Created `MongoPackRunProvenanceWriter` to store provenance manifests in MongoDB. - Developed unit tests for filesystem and MongoDB provenance writers. - Established `ITimelineEventStore` and `ITimelineIngestionService` interfaces for timeline event handling. - Implemented `TimelineIngestionService` to validate and persist timeline events with hashing. - Created PostgreSQL schema and migration scripts for timeline indexing. - Added dependency injection support for timeline indexer services. - Developed tests for timeline ingestion and schema validation.
This commit is contained in:
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_note": "Placeholder Grafana dashboard stub for Notify. Replace panels when metrics endpoints are available; keep offline-import friendly.",
|
||||
"schemaVersion": 39,
|
||||
"title": "Notify Observability (stub)",
|
||||
"panels": []
|
||||
}
|
||||
38
docs/modules/notify/operations/observability.md
Normal file
38
docs/modules/notify/operations/observability.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Notify observability runbook (stub · 2025-11-29 demo)
|
||||
|
||||
## Dashboards (offline import)
|
||||
- Grafana JSON: `docs/modules/notify/operations/dashboards/notify-observability.json` (import locally; no external data sources assumed).
|
||||
- Planned panels: enqueue/dequeue rate, delivery latency p95/p99, channel error rate, retry/dead-letter counts, rule evaluation latency, tenant isolation breaches (should stay 0), and notification simulation outcomes.
|
||||
|
||||
## Key metrics
|
||||
- `notify_enqueue_total{channel}` — notifications enqueued by channel.
|
||||
- `notify_delivery_latency_seconds_bucket{channel}` — delivery latency per channel.
|
||||
- `notify_delivery_failures_total{channel,reason}` — failed deliveries.
|
||||
- `notify_retry_total{channel}` and `notify_deadletter_total{channel}` — retries and dead letters.
|
||||
- `notify_rule_eval_duration_seconds_bucket` — rule evaluation latency.
|
||||
- `notify_simulation_total{result}` — simulation outcomes when quiet hours/correlation rules applied.
|
||||
|
||||
## Logs & traces
|
||||
- Correlate by `notificationId`, `ruleId`, `tenant`, `channel`. Include `quietHoursApplied`, `correlationKey`, `retries` fields.
|
||||
- Traces disabled by default for air-gap; enable by pointing OTLP exporter to on-prem collector.
|
||||
|
||||
## Health/diagnostics
|
||||
- `/health/liveness` and `/health/readiness` check queue backend reachability and channel provider credentials.
|
||||
- `/status` exposes build version, commit, feature flags; verify against offline bundle manifest.
|
||||
- Simulation probe: `/api/notify/simulate` with sample rule set to validate correlation/digest wiring once NOTIFY-SVC-39-001..004 land.
|
||||
|
||||
## Alert hints
|
||||
- Delivery latency p99 > 1.5s for email/webhook channels.
|
||||
- Dead-letter queue growth > threshold.
|
||||
- Rule evaluation latency p99 > 500ms.
|
||||
- Correlation/quiet-hours simulation failures once enabled.
|
||||
|
||||
## Offline verification steps
|
||||
1) Import Grafana JSON locally; point to Prometheus scrape labeled `notify`.
|
||||
2) Run `stella notify simulate --rules samples/rules.yaml --dry-run` (once available) and ensure metrics/logs emit locally.
|
||||
3) Fetch `/status` and compare commit/version to offline bundle manifest.
|
||||
|
||||
## Evidence locations
|
||||
- Sprint tracker: `docs/implplan/SPRINT_322_docs_modules_notify.md`.
|
||||
- Module docs: `README.md`, `architecture.md`, `implementation_plan.md`.
|
||||
- Dashboard stub: `operations/dashboards/notify-observability.json`.
|
||||
Reference in New Issue
Block a user