Files
git.stella-ops.org/docs/modules/notify/implementation_plan.md
StellaOps Bot 17d45a6d30
Some checks failed
Airgap Sealed CI Smoke / sealed-smoke (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
feat: Implement Filesystem and MongoDB provenance writers for PackRun execution context
- Added `FilesystemPackRunProvenanceWriter` to write provenance manifests to the filesystem.
- Introduced `MongoPackRunArtifactReader` to read artifacts from MongoDB.
- Created `MongoPackRunProvenanceWriter` to store provenance manifests in MongoDB.
- Developed unit tests for filesystem and MongoDB provenance writers.
- Established `ITimelineEventStore` and `ITimelineIngestionService` interfaces for timeline event handling.
- Implemented `TimelineIngestionService` to validate and persist timeline events with hashing.
- Created PostgreSQL schema and migration scripts for timeline indexing.
- Added dependency injection support for timeline indexer services.
- Developed tests for timeline ingestion and schema validation.
2025-11-30 15:38:14 +02:00

11 KiB
Raw Blame History

Implementation plan — Notify

Delivery phases

  • Phase 1 Core rules engine & delivery ledger
    Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, and audit logging.
  • Phase 2 Connectors & rendering
    Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, and secret referencing.
  • Phase 3 Console & CLI authoring
    Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, and test sends.
  • Phase 4 Governance & observability
    Add approvals, RBAC, tenant quotas, Notify metrics/logs/traces, dashboards, Notify-specific alerts, and Notify runbooks.
  • Phase 5 Offline & compliance
    Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, and auditing for regulated environments.

Work breakdown

  • Service & worker
    • REST API for rules/channels/delivery history, idempotency middleware, digest scheduler.
    • Worker pipelines for event intake, rule matching, template rendering, delivery execution, retries, and throttling.
    • Delivery ledger capturing payload metadata, response, retry state, DSSE signatures.
  • Connectors
    • Slack/Teams/Email/Webhook plug-ins with configuration validation, rate limiting, error classification.
    • Secrets referenced via Authority/Secret store; no plaintext storage.
  • Console & CLI
    • Console module for rules builder, condition editor, preview, test send, delivery insights, digests and schedule configuration.
    • CLI (stella notify rule|channel|delivery) for automation, export/import.
  • Integrations
    • Event sources: Concelier, Excititor, Policy Engine, Vuln Explorer, Export Center, Attestor, Zastava, Scheduler.
    • Notify events to Notify (meta) for failure escalations, accepted-risk expiration reminders.
  • Observability & ops
    • Metrics: delivery success/failure, retry counts, throttle hits, digest generation, channel health.
    • Logs/traces with tenant, rule ID, channel, correlation ID; dashboards and alerts.
    • Runbooks for misconfigured channels, throttling, event backlog, incident digest.
  • Docs & compliance
    • Update Notifications Studio guides, channel runbooks, security/RBAC docs, Offline Kit instructions.
    • Provide compliance checklist (audit logging, retention, opt-out).

Acceptance criteria

  • Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures.
  • Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely.
  • Console/CLI support rule creation, testing, digests, delivery browsing, and export/import workflows.
  • Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation.
  • Offline Kit bundle contains configs, rules, digests, and deployment scripts for air-gapped installs.
  • Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules.

Risks & mitigations

  • Notification storms: throttling, digests, dedupe windows, preview/test gating.
  • Secret compromise: secret references only, rotation workflows, audit logging.
  • Connector API changes: versioned adapter layer, nightly health checks, fallback channels.
  • Noise vs signal: simulation previews, metrics, rule scoring, recommended defaults.
  • Offline parity: export/import of rules, connectors, and digests with signed manifests.

Test strategy

  • Unit: rule evaluation, template rendering, connector clients, throttling, digests.
  • Integration: end-to-end events from core services, multi-channel routing, retries, audit logging.
  • Performance: burst throttling, digest creation, large rule sets.
  • Security: RBAC tests, tenant isolation, secret reference validation, DSSE signature verification.
  • Offline: export/import round-trips, Offline Kit deployment, manual delivery replay.

Definition of done

  • Notify service, workers, connectors, Console/CLI, observability, and Offline Kit assets shipped with documentation and runbooks.
  • Compliance checklist appended to docs; ./TASKS.md and ../../TASKS.md updated with progress.

Sprint alignment (2025-11-30)

  • Docs sprint: docs/implplan/SPRINT_322_docs_modules_notify.md; statuses mirrored in docs/modules/notify/TASKS.md.
  • Observability evidence stub: operations/observability.md and operations/dashboards/notify-observability.json (to be populated after next demo outputs).
  • NOTIFY-DOCS-0002 remains blocked pending NOTIFY-SVC-39-001..004 (correlation/digests/simulation/quiet hours); keep sprint/TASKS synced when those land.

Sprint readiness tracker

Last updated: 2025-11-27 (NOTIFY-ENG-0001)

This section maps delivery phases to implementation sprints and tracks readiness checkpoints.

Phase 1 — Core rules engine & delivery ledger

Task ID Status Sprint Notes
NOTIFY-SVC-37-001 DONE (2025-11-24) SPRINT_0172_0001_0002_notifier_ii Pack approval contract published (OpenAPI schema, payloads).
NOTIFY-SVC-37-002 DONE (2025-11-24) SPRINT_0172_0001_0002_notifier_ii Ingestion endpoint with Mongo persistence, idempotent writes, audit trail.
NOTIFY-SVC-37-003 🔄 DOING SPRINT_0172_0001_0002_notifier_ii Approval/policy templates, routing predicates; dispatch/rendering pending.
NOTIFY-SVC-37-004 DONE (2025-11-24) SPRINT_0172_0001_0002_notifier_ii Acknowledgement API, test harness, metrics.
NOTIFY-OAS-61-001 DONE (2025-11-17) SPRINT_0171_0001_0001_notifier_i OAS with rules/templates/incidents/quiet hours endpoints.
NOTIFY-OAS-61-002 DONE (2025-11-17) SPRINT_0171_0001_0001_notifier_i /.well-known/openapi discovery endpoint.
NOTIFY-OAS-62-001 DONE (2025-11-17) SPRINT_0171_0001_0001_notifier_i SDK examples for rule CRUD.
NOTIFY-OAS-63-001 DONE (2025-11-17) SPRINT_0171_0001_0001_notifier_i Deprecation headers and templates.

Checkpoint: Core rules engine mostly complete; template dispatch/rendering in progress.

Phase 2 — Connectors & rendering

Task ID Status Sprint Notes
NOTIFY-SVC-38-002 📝 TODO SPRINT_0172_0001_0002_notifier_ii Channel adapters (email, chat webhook, generic webhook) with retry policies.
NOTIFY-SVC-38-003 📝 TODO SPRINT_0172_0001_0002_notifier_ii Template service, renderer with redaction and localization.
NOTIFY-SVC-38-004 📝 TODO SPRINT_0172_0001_0002_notifier_ii REST + WS APIs for rules CRUD, templates preview, incidents.
NOTIFY-DOC-70-001 DONE (2025-11-02) SPRINT_0171_0001_0001_notifier_i Architecture docs for src/Notify vs src/Notifier split.

Checkpoint: Connector and rendering work not yet started; depends on Phase 1 completion.

Phase 3 — Console & CLI authoring

Task ID Status Sprint Notes
NOTIFY-SVC-39-001 📝 TODO SPRINT_0172_0001_0002_notifier_ii Correlation engine with throttler, quiet hours, incident lifecycle.
NOTIFY-SVC-39-002 📝 TODO SPRINT_0172_0001_0002_notifier_ii Digest generator with schedule runner.
NOTIFY-SVC-39-003 📝 TODO SPRINT_0172_0001_0002_notifier_ii Simulation engine for dry-run rules against historical events.
NOTIFY-SVC-39-004 📝 TODO SPRINT_0172_0001_0002_notifier_ii Quiet hour calendars with audit logging.

Checkpoint: Console/CLI authoring work not started; depends on Phase 2 completion.

Phase 4 — Governance & observability

Task ID Status Sprint Notes
NOTIFY-SVC-40-001 📝 TODO SPRINT_0172_0001_0002_notifier_ii Escalations, on-call schedules, PagerDuty/OpsGenie adapters.
NOTIFY-SVC-40-002 📝 TODO SPRINT_0172_0001_0002_notifier_ii Summary storm breaker, localization bundles.
NOTIFY-SVC-40-003 📝 TODO SPRINT_0172_0001_0002_notifier_ii Security hardening (signed ack links, webhook HMAC).
NOTIFY-SVC-40-004 📝 TODO SPRINT_0172_0001_0002_notifier_ii Observability metrics/traces, dead-letter handling, chaos tests.
NOTIFY-OBS-51-001 DONE (2025-11-22) SPRINT_0171_0001_0001_notifier_i SLO evaluator webhooks with templates/routing/suppression.
NOTIFY-OBS-55-001 DONE (2025-11-22) SPRINT_0171_0001_0001_notifier_i Incident mode templates with evidence/trace/retention context.
NOTIFY-ATTEST-74-001 DONE (2025-11-16) SPRINT_0171_0001_0001_notifier_i Templates for verification failures, key revocations, transparency.
NOTIFY-ATTEST-74-002 📝 TODO SPRINT_0171_0001_0001_notifier_i Wire notifications to key rotation/revocation events.
NOTIFY-RISK-66-001 BLOCKED SPRINT_0171_0001_0001_notifier_i Risk severity escalation triggers; needs POLICY-RISK-40-002.
NOTIFY-RISK-67-001 BLOCKED SPRINT_0171_0001_0001_notifier_i Risk profile publish/deprecate notifications.
NOTIFY-RISK-68-001 BLOCKED SPRINT_0171_0001_0001_notifier_i Per-profile routing, quiet hours, dedupe.

Checkpoint: Core observability complete; governance and risk notifications blocked on upstream dependencies.

Phase 5 — Offline & compliance

Task ID Status Sprint Notes
NOTIFY-AIRGAP-56-002 DONE SPRINT_0171_0001_0001_notifier_i Bootstrap Pack with deterministic secrets and offline validation.
NOTIFY-TEN-48-001 BLOCKED SPRINT_0173_0001_0003_notifier_iii Tenant-scope rules/templates; needs Sprint 0172 tenancy model.

Checkpoint: Offline basics complete; tenancy work blocked on upstream Sprint 0172.


Overall readiness summary

Phase Status Blocking items
1 Core rules engine 🔄 In progress NOTIFY-SVC-37-003 dispatch/rendering
2 Connectors & rendering 📝 Not started Phase 1 completion
3 Console & CLI 📝 Not started Phase 2 completion
4 Governance & observability 🔄 Partial POLICY-RISK-40-002 for risk notifications
5 Offline & compliance 🔄 Partial Sprint 0172 tenancy model

Cross-module dependencies

Dependency Required by Status
Attestor payload localization NOTIFY-ATTEST-74-002 Freeze pending
POLICY-RISK-40-002 export NOTIFY-RISK-66/67/68 BLOCKED
Sprint 0172 tenancy model NOTIFY-TEN-48-001 In progress
Telemetry SLO webhook schema NOTIFY-OBS-51-001 Published (docs/notifications/slo-webhook-schema.md)

Next actions

  1. Complete NOTIFY-SVC-37-003 dispatch/rendering wiring (Sprint 0172).
  2. Start NOTIFY-SVC-38-002 channel adapters once Phase 1 closes.
  3. Track POLICY-RISK-40-002 to unblock risk notification tasks.
  4. Monitor Sprint 0172 tenancy model for NOTIFY-TEN-48-001.