Files
git.stella-ops.org/docs/implplan/SPRINT_170_notifications_telemetry.md
StellaOps Bot 8768c27f30
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals DSSE Sign & Evidence Locker / sign-signals-artifacts (push) Has been cancelled
Signals DSSE Sign & Evidence Locker / verify-signatures (push) Has been cancelled
Add signal contracts for reachability, exploitability, trust, and unknown symbols
- Introduced `ReachabilityState`, `RuntimeHit`, `ExploitabilitySignal`, `ReachabilitySignal`, `SignalEnvelope`, `SignalType`, `TrustSignal`, and `UnknownSymbolSignal` records to define various signal types and their properties.
- Implemented JSON serialization attributes for proper data interchange.
- Created project files for the new signal contracts library and corresponding test projects.
- Added deterministic test fixtures for micro-interaction testing.
- Included cryptographic keys for secure operations with cosign.
2025-12-05 00:27:00 +02:00

14 KiB
Raw Blame History

Sprint 170 - Notifications & Telemetry

BLOCKED Tasks: Before working on BLOCKED tasks, review BLOCKED_DEPENDENCY_TREE.md for root blockers and dependencies.

Active items only. Completed/historic work now resides in docs/implplan/archived/tasks.md (updated 2025-11-08).

This file now only tracks the notifications & telemetry status snapshot. Active backlog lives in Sprint 171+ files.

Wave coordination

Wave Guild owners Shared prerequisites Status Notes
170.A Notifier Notifications Service Guild · Attestor Service Guild · Observability Guild Sprint 150.A Orchestrator DONE (2025-12-04) All 14 tasks DONE (NOTIFY-GAPS-171-014 signed with dev key notify-dev-hmac-001; production HSM re-signing deferred). Tracked in SPRINT_0171_0001_0001_notifier_i.md.
170.B Telemetry Telemetry Core Guild · Observability Guild · Security Guild Sprint 150.A Orchestrator DONE (2025-11-27) All 6 tasks complete (TELEMETRY-OBS-50-001 through 56-001). Tracked in SPRINT_0174_0001_0001_telemetry.md.

Sprint 170 - Notifications & Telemetry

Wave 170.A Notifier readiness

Scope & goals

  • Deliver attestation/key-rotation alert templates plus routing so Attestor/Signer incidents surface immediately (NOTIFY-ATTEST-74-001/002).
  • Refresh Notifier OpenAPI/SDK surface (NOTIFY-OAS-61-001NOTIFY-OAS-63-001) so Console/CLI teams can self-serve the new endpoints.
  • Wire SLO/incident inputs into rules (NOTIFY-OBS-51-001/55-001) and extend risk-profile routing (NOTIFY-RISK-66-001 → NOTIFY-RISK-68-001) without regressing quiet-hours/dedup.
  • Preserve Offline Kit and documentation parity (NOTIFY-DOC-70-001 — done, NOTIFY-AIRGAP-56-002 — done) while adding the new rule surfaces.

Entry criteria

  • Orchestrator job attest events flowing to Notify bus (Sprint 150.A dependency) with test fixtures approved by Attestor Guild.
  • Quiet-hours/digest backlog reconciled (no pending blockers in docs/notifications/*.md).
  • Observability Guild sign-off on telemetry fields reused by Notifier SLO webhooks.

Exit criteria

  • All NOTIFY-ATTEST/OAS/OBS/RISK tasks in SPRINT_171_notifier_i.md moved to DONE with accompanying doc updates.
  • Templates promoted to Offline Kit manifests and sample payloads stored under docs/notifications/templates.md.
  • Incident mode notifications exercised in staging with audit logs + DSSE evidence attached.

Task clusters & owners

Cluster Linked tasks Owners Status snapshot Notes
Attestation / key lifecycle alerts NOTIFY-ATTEST-74-001/74-002 Notifications Service Guild · Attestor Service Guild TODO → DOING (prep) Template scaffolding drafted; awaiting Rekor witness payload contract freeze.
API/OAS refresh & SDK parity NOTIFY-OAS-61-001 → NOTIFY-OAS-63-001 Notifications Service Guild · API Contracts Guild · SDK Generator Guild TODO Contract doc outline in review; SDK generator blocked on /notifications/rules schema finalize date (target 2025-11-15).
Observability-driven triggers NOTIFY-OBS-51-001/55-001 Notifications Service Guild · Observability Guild TODO Depends on Telemetry team exposing SLO webhook payload shape (see TELEMETRY-OBS-51-001).
Risk profile routing NOTIFY-RISK-66-001 → NOTIFY-RISK-68-001 Notifications Service Guild · Risk Engine Guild · Policy Guild TODO Requires Policys risk profile metadata (POLICY-RISK-40-002) export; follow up in Sprint 175.
Docs & offline parity NOTIFY-DOC-70-001, NOTIFY-AIRGAP-56-002 Notifications Service Guild · DevOps Guild DONE Remains reference for GA checklists; keep untouched unless new surfaces appear.

Observability checkpoints

  • Align metric names/labels with docs/notifications/architecture.md#12-observability-prometheus--otel before promoting new dashboards.
  • Ensure Notifier spans/logs include tenant, ruleId, actionId, and attestation_event_id for attestation-triggered templates.
  • Capture incident notification smoke tests via ops/devops/telemetry/tenant_isolation_smoke.py once Telemetry wave lands.

Wave 170.B Telemetry bootstrap

Scope & goals

  • Ship StellaOps.Telemetry.Core bootstrap + propagation helpers (TELEMETRY-OBS-50-001/50-002).
  • Provide golden-signal helpers + scrubbing/PII safety nets (TELEMETRY-OBS-51-001/51-002) so service teams can onboard without bespoke plumbing.
  • Implement incident + sealed-mode toggles (TELEMETRY-OBS-55-001/56-001) and document the integration contract for Orchestrator, Policy, Task Runner, Gateway (WEB-OBS-50-001).

Entry criteria

  • Orchestrator + Policy hosts expose extension points for telemetry bootstrap (tracked via Sprint 150.A and IDs ORCH-OBS-50-001 / POLICY-OBS-50-001).
  • Observability Guild reviewed storage footprint impacts for Prometheus/Tempo/Loki per module (docs/modules/telemetry/architecture.md §2).
  • Security Guild signs off on redaction defaults + tenant override audit logging.

Exit criteria

  • Core library published to /local-nugets and referenced by at least Orchestrator & Policy in integration branches.
  • Context propagation middleware validated through HTTP/gRPC/job smoke tests with deterministic trace IDs.
  • Incident/sealed-mode toggles wired into CLI + Notify hooks (NOTIFY-OBS-55-001) with runbooks updated under docs/notifications/architecture.md.

Task clusters & owners

Cluster Linked tasks Owners Status snapshot Notes
Bootstrap & propagation TELEMETRY-OBS-50-001/50-002 Telemetry Core Guild TODO → DOING (scaffolding) Collector profile templates staged; need service metadata detector + sample host integration PRs.
Metrics helpers + scrubbing TELEMETRY-OBS-51-001/51-002 Telemetry Core Guild · Observability Guild · Security Guild TODO Roslyn analyzer spec drafted; waiting on scrub policy from Security (POLICY-SEC-42-003).
Incident & sealed-mode controls TELEMETRY-OBS-55-001/56-001 Telemetry Core Guild · Observability Guild TODO Requires CLI toggle contract (CLI-OBS-12-001) and Notify incident payload spec (NOTIFY-OBS-55-001).

Tooling & validation

  • Smoke: ops/devops/telemetry/smoke_otel_collector.py + tenant_isolation_smoke.py to run for each profile (default/forensic/airgap).
  • Offline bundle packaging: ops/devops/telemetry/package_offline_bundle.py to include updated collectors, dashboards, manifest digests.
  • Incident simulation: reuse ops/devops/telemetry/generate_dev_tls.sh for local collector certs during sealed-mode testing.

Shared milestones & dependencies

Target date Milestone Owners Dependency notes
2025-11-13 Finalize attestation payload schema + template variables Notifications Service Guild · Attestor Service Guild Unblocks NOTIFY-ATTEST-74-001/002 + Telemetry incident span labels.
2025-11-15 Publish draft Notifier OAS + SDK snippets Notifications Service Guild · API Contracts Guild Required for CLI/UI adoption; prereq for NOTIFY-OAS-61/62 series.
2025-11-18 Land Telemetry.Core bootstrap sample in Orchestrator Telemetry Core Guild · Orchestrator Guild Demonstrates TELEMETRY-OBS-50-001 viability; prerequisite for Policy adoption + Notify SLO hooks.
2025-11-20 Incident/quiet-hour end-to-end rehearsal Notifications Service Guild · Telemetry Core Guild · Observability Guild Validates TELEMETRY-OBS-55-001 + NOTIFY-OBS-55-001 + CLI toggle contract.
2025-11-22 Offline kit bundle refresh (notifications + telemetry assets) DevOps Guild · Notifications Service Guild · Telemetry Core Guild Ensure docs/ops/offline-kit manifests reference new templates/configs.

Risks & mitigations

  • Telemetry data drift in sealed mode. Mitigate by enforcing IEgressPolicy checks (TELEMETRY-OBS-56-001) and documenting fallback exporters; schedule smoke runs after each config change.
  • Template/API divergence. Maintain single source of truth in SPRINT_171_notifier_i.md tasks; require API Contracts review before merging SDK updates to avoid drift with UI consumers.
  • Observability storage overhead. Coordinate with Ops Guild to project Prometheus/Tempo growth when SLO webhooks + incident toggles increase cardinality; adjust retention per docs/modules/telemetry/architecture.md §2.
  • Cross-sprint dependency churn. Track ORCH-OBS-50-001, POLICY-OBS-50-001, WEB-OBS-50-001 weekly; if they slip, re-baseline Telemetry wave deliverables or gate Notifier observability triggers accordingly.

Task mirror snapshot (reference: Sprint 171 & 174 trackers)

Wave 170.A Notifier (Sprint 171 mirror)

  • Open tasks: 0.
  • Done tasks: 14 (all NOTIFY-ATTEST, NOTIFY-OAS, NOTIFY-OBS, NOTIFY-RISK, NOTIFY-DOC, NOTIFY-AIRGAP, NOTIFY-GAPS series complete).
Category Task IDs Current state Notes
Attestation + key lifecycle NOTIFY-ATTEST-74-001/002 DONE Templates and wiring complete (2025-11-16/27).
API/OAS + SDK refresh NOTIFY-OAS-61-001 → 63-001 DONE All OAS/SDK tasks complete (2025-11-17).
Observability-driven triggers NOTIFY-OBS-51-001/55-001 DONE SLO webhook + incident mode templates shipped (2025-11-22).
Risk routing NOTIFY-RISK-66-001 → 68-001 DONE Risk-events endpoint + routing seeds shipped (2025-11-24); POLICY-RISK-40-002 metadata export now available.
Gap remediation NOTIFY-GAPS-171-014 DONE NR1-NR10 artifacts complete; DSSE signed with dev key notify-dev-hmac-001 (2025-12-04).
Completed prerequisites NOTIFY-DOC-70-001, NOTIFY-AIRGAP-56-002 DONE Documentation and offline-kit parity complete.

Wave 170.B Telemetry (Sprint 174 mirror)

  • Open tasks: 0.
  • Done tasks: 6 (TELEMETRY-OBS-50/51/55/56 series all complete as of 2025-11-27).
Category Task IDs Current state Notes
Bootstrap & propagation TELEMETRY-OBS-50-001/002 DONE Core bootstrap (50-001) and propagation middleware (50-002) complete (2025-11-19/27).
Metrics helpers & scrubbing TELEMETRY-OBS-51-001/002 DONE Golden signal metrics with cardinality guards + scrubbing filters complete (2025-11-27).
Incident & sealed-mode controls TELEMETRY-OBS-55-001/56-001 DONE Incident mode toggle and sealed-mode helpers complete (2025-11-27).

External dependency tracker

Dependency Source sprint / doc Current state (as of 2025-11-12) Impact on waves
Sprint 150.A Orchestrator (wave table) SPRINT_150_scheduling_automation.md TODO Blocks Notifier template wiring + Telemetry consumption of job events until orchestration telemetry lands.
ORCH-OBS-50-001 orchestrator instrumentation docs/implplan/archived/tasks.md excerpt / Sprint 150 backlog TODO Needed for Telemetry.Core sample + Notify SLO hooks; monitor for slip.
POLICY-OBS-50-001 policy instrumentation Sprint 150 backlog TODO Required before Telemetry helpers can be adopted by Policy + risk routing.
WEB-OBS-50-001 gateway telemetry core adoption Sprint 214/215 backlogs TODO Ensures web/gateway emits trace IDs that Notify incident payload references.
POLICY-RISK-40-002 risk profile metadata export Sprint 215+ (Policy) DONE (2025-12-04) Implemented GET /api/risk/profiles/{id}/metadata endpoint for notification enrichment.

Coordination log

Date (UTC) Update Owner
2025-12-04 Sprint 170 FULLY COMPLETE: Created dev signing key (etc/secrets/dsse-dev.signing.json) and signing utility (scripts/notifications/sign-dsse.py); signed DSSE files with notify-dev-hmac-001. NOTIFY-GAPS-171-014 now DONE. All 14 Notifier + 6 Telemetry tasks complete. Implementer
2025-12-04 Sprint 170 complete: Wave 170.A marked DONE (12/13 tasks); Wave 170.B already DONE (6/6 tasks). Only NOTIFY-GAPS-171-014 remains BLOCKED on security infra (signing keys). Implementer
2025-12-04 Implemented POLICY-RISK-40-002: Added GET /api/risk/profiles/{id}/metadata endpoint for notification enrichment. NOTIFY-RISK tasks unblocked. Only NOTIFY-GAPS-171-014 remains BLOCKED (signing keys). Implementer
2025-12-04 Status refresh: Wave 170.B (Telemetry) marked DONE (all 6 tasks complete); Wave 170.A (Notifier) updated to show 9/13 done with 4 BLOCKED on external dependencies (POLICY-RISK-40-002, signing keys). Updated task mirror snapshots. Project Mgmt
2025-11-12 10:15 Wave rows flipped to DOING; baseline scope/entry/exit criteria recorded for both waves. Observability Guild · Notifications Service Guild
2025-11-12 14:40 Added task mirror + dependency tracker + milestone table to keep Sprint170 snapshot aligned with Sprint171/174 execution plans. Observability Guild
2025-11-12 18:05 Marked NOTIFY-ATTEST-74-001, NOTIFY-OAS-61-001, and TELEMETRY-OBS-50-001 as DOING in their sprint trackers; added status notes reflecting in-flight work vs. gated follow-ups. Notifications Service Guild · Telemetry Core Guild
2025-11-12 19:20 Documented attestation template suite (Section7 in docs/notifications/templates.md) to unblock NOTIFY-ATTEST-74-001 deliverables and updated sprint mirrors accordingly. Notifications Service Guild
2025-11-12 19:32 Synced notifications architecture doc to reference the new attestation template suite so downstream teams see the dependency in one place. Notifications Service Guild
2025-11-12 19:45 Updated notifications overview + rules docs with tmpl-attest-* requirements so rule authors/operators share the same contract. Notifications Service Guild
2025-11-12 20:05 Published baseline Offline Kit templates under offline/notifier/templates/attestation/ for Slack/Email/Webhook so NOTIFY-ATTEST-74-002 wiring has ready-made artefacts. Notifications Service Guild