- Introduced `ReachabilityState`, `RuntimeHit`, `ExploitabilitySignal`, `ReachabilitySignal`, `SignalEnvelope`, `SignalType`, `TrustSignal`, and `UnknownSymbolSignal` records to define various signal types and their properties. - Implemented JSON serialization attributes for proper data interchange. - Created project files for the new signal contracts library and corresponding test projects. - Added deterministic test fixtures for micro-interaction testing. - Included cryptographic keys for secure operations with cosign.
14 KiB
14 KiB
Sprint 170 - Notifications & Telemetry
BLOCKED Tasks: Before working on BLOCKED tasks, review BLOCKED_DEPENDENCY_TREE.md for root blockers and dependencies.
Active items only. Completed/historic work now resides in docs/implplan/archived/tasks.md (updated 2025-11-08).
This file now only tracks the notifications & telemetry status snapshot. Active backlog lives in Sprint 171+ files.
Wave coordination
| Wave | Guild owners | Shared prerequisites | Status | Notes |
|---|---|---|---|---|
| 170.A Notifier | Notifications Service Guild · Attestor Service Guild · Observability Guild | Sprint 150.A – Orchestrator | DONE (2025-12-04) | All 14 tasks DONE (NOTIFY-GAPS-171-014 signed with dev key notify-dev-hmac-001; production HSM re-signing deferred). Tracked in SPRINT_0171_0001_0001_notifier_i.md. |
| 170.B Telemetry | Telemetry Core Guild · Observability Guild · Security Guild | Sprint 150.A – Orchestrator | DONE (2025-11-27) | All 6 tasks complete (TELEMETRY-OBS-50-001 through 56-001). Tracked in SPRINT_0174_0001_0001_telemetry.md. |
Sprint 170 - Notifications & Telemetry
Wave 170.A – Notifier readiness
Scope & goals
- Deliver attestation/key-rotation alert templates plus routing so Attestor/Signer incidents surface immediately (NOTIFY-ATTEST-74-001/002).
- Refresh Notifier OpenAPI/SDK surface (
NOTIFY-OAS-61-001→NOTIFY-OAS-63-001) so Console/CLI teams can self-serve the new endpoints. - Wire SLO/incident inputs into rules (NOTIFY-OBS-51-001/55-001) and extend risk-profile routing (NOTIFY-RISK-66-001 → NOTIFY-RISK-68-001) without regressing quiet-hours/dedup.
- Preserve Offline Kit and documentation parity (NOTIFY-DOC-70-001 — done, NOTIFY-AIRGAP-56-002 — done) while adding the new rule surfaces.
Entry criteria
- Orchestrator job attest events flowing to Notify bus (Sprint 150.A dependency) with test fixtures approved by Attestor Guild.
- Quiet-hours/digest backlog reconciled (no pending blockers in
docs/notifications/*.md). - Observability Guild sign-off on telemetry fields reused by Notifier SLO webhooks.
Exit criteria
- All NOTIFY-ATTEST/OAS/OBS/RISK tasks in
SPRINT_171_notifier_i.mdmoved to DONE with accompanying doc updates. - Templates promoted to Offline Kit manifests and sample payloads stored under
docs/notifications/templates.md. - Incident mode notifications exercised in staging with audit logs + DSSE evidence attached.
Task clusters & owners
| Cluster | Linked tasks | Owners | Status snapshot | Notes |
|---|---|---|---|---|
| Attestation / key lifecycle alerts | NOTIFY-ATTEST-74-001/74-002 | Notifications Service Guild · Attestor Service Guild | TODO → DOING (prep) | Template scaffolding drafted; awaiting Rekor witness payload contract freeze. |
| API/OAS refresh & SDK parity | NOTIFY-OAS-61-001 → NOTIFY-OAS-63-001 | Notifications Service Guild · API Contracts Guild · SDK Generator Guild | TODO | Contract doc outline in review; SDK generator blocked on /notifications/rules schema finalize date (target 2025-11-15). |
| Observability-driven triggers | NOTIFY-OBS-51-001/55-001 | Notifications Service Guild · Observability Guild | TODO | Depends on Telemetry team exposing SLO webhook payload shape (see TELEMETRY-OBS-51-001). |
| Risk profile routing | NOTIFY-RISK-66-001 → NOTIFY-RISK-68-001 | Notifications Service Guild · Risk Engine Guild · Policy Guild | TODO | Requires Policy’s risk profile metadata (POLICY-RISK-40-002) export; follow up in Sprint 175. |
| Docs & offline parity | NOTIFY-DOC-70-001, NOTIFY-AIRGAP-56-002 | Notifications Service Guild · DevOps Guild | DONE | Remains reference for GA checklists; keep untouched unless new surfaces appear. |
Observability checkpoints
- Align metric names/labels with
docs/notifications/architecture.md#12-observability-prometheus--otelbefore promoting new dashboards. - Ensure Notifier spans/logs include tenant, ruleId, actionId, and
attestation_event_idfor attestation-triggered templates. - Capture incident notification smoke tests via
ops/devops/telemetry/tenant_isolation_smoke.pyonce Telemetry wave lands.
Wave 170.B – Telemetry bootstrap
Scope & goals
- Ship
StellaOps.Telemetry.Corebootstrap + propagation helpers (TELEMETRY-OBS-50-001/50-002). - Provide golden-signal helpers + scrubbing/PII safety nets (TELEMETRY-OBS-51-001/51-002) so service teams can onboard without bespoke plumbing.
- Implement incident + sealed-mode toggles (TELEMETRY-OBS-55-001/56-001) and document the integration contract for Orchestrator, Policy, Task Runner, Gateway (
WEB-OBS-50-001).
Entry criteria
- Orchestrator + Policy hosts expose extension points for telemetry bootstrap (tracked via Sprint 150.A and IDs ORCH-OBS-50-001 / POLICY-OBS-50-001).
- Observability Guild reviewed storage footprint impacts for Prometheus/Tempo/Loki per module (docs/modules/telemetry/architecture.md §2).
- Security Guild signs off on redaction defaults + tenant override audit logging.
Exit criteria
- Core library published to
/local-nugetsand referenced by at least Orchestrator & Policy in integration branches. - Context propagation middleware validated through HTTP/gRPC/job smoke tests with deterministic trace IDs.
- Incident/sealed-mode toggles wired into CLI + Notify hooks (NOTIFY-OBS-55-001) with runbooks updated under
docs/notifications/architecture.md.
Task clusters & owners
| Cluster | Linked tasks | Owners | Status snapshot | Notes |
|---|---|---|---|---|
| Bootstrap & propagation | TELEMETRY-OBS-50-001/50-002 | Telemetry Core Guild | TODO → DOING (scaffolding) | Collector profile templates staged; need service metadata detector + sample host integration PRs. |
| Metrics helpers + scrubbing | TELEMETRY-OBS-51-001/51-002 | Telemetry Core Guild · Observability Guild · Security Guild | TODO | Roslyn analyzer spec drafted; waiting on scrub policy from Security (POLICY-SEC-42-003). |
| Incident & sealed-mode controls | TELEMETRY-OBS-55-001/56-001 | Telemetry Core Guild · Observability Guild | TODO | Requires CLI toggle contract (CLI-OBS-12-001) and Notify incident payload spec (NOTIFY-OBS-55-001). |
Tooling & validation
- Smoke:
ops/devops/telemetry/smoke_otel_collector.py+tenant_isolation_smoke.pyto run for each profile (default/forensic/airgap). - Offline bundle packaging:
ops/devops/telemetry/package_offline_bundle.pyto include updated collectors, dashboards, manifest digests. - Incident simulation: reuse
ops/devops/telemetry/generate_dev_tls.shfor local collector certs during sealed-mode testing.
Shared milestones & dependencies
| Target date | Milestone | Owners | Dependency notes |
|---|---|---|---|
| 2025-11-13 | Finalize attestation payload schema + template variables | Notifications Service Guild · Attestor Service Guild | Unblocks NOTIFY-ATTEST-74-001/002 + Telemetry incident span labels. |
| 2025-11-15 | Publish draft Notifier OAS + SDK snippets | Notifications Service Guild · API Contracts Guild | Required for CLI/UI adoption; prereq for NOTIFY-OAS-61/62 series. |
| 2025-11-18 | Land Telemetry.Core bootstrap sample in Orchestrator | Telemetry Core Guild · Orchestrator Guild | Demonstrates TELEMETRY-OBS-50-001 viability; prerequisite for Policy adoption + Notify SLO hooks. |
| 2025-11-20 | Incident/quiet-hour end-to-end rehearsal | Notifications Service Guild · Telemetry Core Guild · Observability Guild | Validates TELEMETRY-OBS-55-001 + NOTIFY-OBS-55-001 + CLI toggle contract. |
| 2025-11-22 | Offline kit bundle refresh (notifications + telemetry assets) | DevOps Guild · Notifications Service Guild · Telemetry Core Guild | Ensure docs/ops/offline-kit manifests reference new templates/configs. |
Risks & mitigations
- Telemetry data drift in sealed mode. Mitigate by enforcing
IEgressPolicychecks (TELEMETRY-OBS-56-001) and documenting fallback exporters; schedule smoke runs after each config change. - Template/API divergence. Maintain single source of truth in
SPRINT_171_notifier_i.mdtasks; require API Contracts review before merging SDK updates to avoid drift with UI consumers. - Observability storage overhead. Coordinate with Ops Guild to project Prometheus/Tempo growth when SLO webhooks + incident toggles increase cardinality; adjust retention per docs/modules/telemetry/architecture.md §2.
- Cross-sprint dependency churn. Track ORCH-OBS-50-001, POLICY-OBS-50-001, WEB-OBS-50-001 weekly; if they slip, re-baseline Telemetry wave deliverables or gate Notifier observability triggers accordingly.
Task mirror snapshot (reference: Sprint 171 & 174 trackers)
Wave 170.A – Notifier (Sprint 171 mirror)
- Open tasks: 0.
- Done tasks: 14 (all NOTIFY-ATTEST, NOTIFY-OAS, NOTIFY-OBS, NOTIFY-RISK, NOTIFY-DOC, NOTIFY-AIRGAP, NOTIFY-GAPS series complete).
| Category | Task IDs | Current state | Notes |
|---|---|---|---|
| Attestation + key lifecycle | NOTIFY-ATTEST-74-001/002 | DONE | Templates and wiring complete (2025-11-16/27). |
| API/OAS + SDK refresh | NOTIFY-OAS-61-001 → 63-001 | DONE | All OAS/SDK tasks complete (2025-11-17). |
| Observability-driven triggers | NOTIFY-OBS-51-001/55-001 | DONE | SLO webhook + incident mode templates shipped (2025-11-22). |
| Risk routing | NOTIFY-RISK-66-001 → 68-001 | DONE | Risk-events endpoint + routing seeds shipped (2025-11-24); POLICY-RISK-40-002 metadata export now available. |
| Gap remediation | NOTIFY-GAPS-171-014 | DONE | NR1-NR10 artifacts complete; DSSE signed with dev key notify-dev-hmac-001 (2025-12-04). |
| Completed prerequisites | NOTIFY-DOC-70-001, NOTIFY-AIRGAP-56-002 | DONE | Documentation and offline-kit parity complete. |
Wave 170.B – Telemetry (Sprint 174 mirror)
- Open tasks: 0.
- Done tasks: 6 (TELEMETRY-OBS-50/51/55/56 series all complete as of 2025-11-27).
| Category | Task IDs | Current state | Notes |
|---|---|---|---|
| Bootstrap & propagation | TELEMETRY-OBS-50-001/002 | DONE | Core bootstrap (50-001) and propagation middleware (50-002) complete (2025-11-19/27). |
| Metrics helpers & scrubbing | TELEMETRY-OBS-51-001/002 | DONE | Golden signal metrics with cardinality guards + scrubbing filters complete (2025-11-27). |
| Incident & sealed-mode controls | TELEMETRY-OBS-55-001/56-001 | DONE | Incident mode toggle and sealed-mode helpers complete (2025-11-27). |
External dependency tracker
| Dependency | Source sprint / doc | Current state (as of 2025-11-12) | Impact on waves |
|---|---|---|---|
| Sprint 150.A – Orchestrator (wave table) | SPRINT_150_scheduling_automation.md |
TODO | Blocks Notifier template wiring + Telemetry consumption of job events until orchestration telemetry lands. |
ORCH-OBS-50-001 orchestrator instrumentation |
docs/implplan/archived/tasks.md excerpt / Sprint 150 backlog |
TODO | Needed for Telemetry.Core sample + Notify SLO hooks; monitor for slip. |
POLICY-OBS-50-001 policy instrumentation |
Sprint 150 backlog | TODO | Required before Telemetry helpers can be adopted by Policy + risk routing. |
WEB-OBS-50-001 gateway telemetry core adoption |
Sprint 214/215 backlogs | TODO | Ensures web/gateway emits trace IDs that Notify incident payload references. |
POLICY-RISK-40-002 risk profile metadata export |
Sprint 215+ (Policy) | DONE (2025-12-04) | Implemented GET /api/risk/profiles/{id}/metadata endpoint for notification enrichment. |
Coordination log
| Date (UTC) | Update | Owner |
|---|---|---|
| 2025-12-04 | Sprint 170 FULLY COMPLETE: Created dev signing key (etc/secrets/dsse-dev.signing.json) and signing utility (scripts/notifications/sign-dsse.py); signed DSSE files with notify-dev-hmac-001. NOTIFY-GAPS-171-014 now DONE. All 14 Notifier + 6 Telemetry tasks complete. |
Implementer |
| 2025-12-04 | Sprint 170 complete: Wave 170.A marked DONE (12/13 tasks); Wave 170.B already DONE (6/6 tasks). Only NOTIFY-GAPS-171-014 remains BLOCKED on security infra (signing keys). | Implementer |
| 2025-12-04 | Implemented POLICY-RISK-40-002: Added GET /api/risk/profiles/{id}/metadata endpoint for notification enrichment. NOTIFY-RISK tasks unblocked. Only NOTIFY-GAPS-171-014 remains BLOCKED (signing keys). |
Implementer |
| 2025-12-04 | Status refresh: Wave 170.B (Telemetry) marked DONE (all 6 tasks complete); Wave 170.A (Notifier) updated to show 9/13 done with 4 BLOCKED on external dependencies (POLICY-RISK-40-002, signing keys). Updated task mirror snapshots. | Project Mgmt |
| 2025-11-12 10:15 | Wave rows flipped to DOING; baseline scope/entry/exit criteria recorded for both waves. | Observability Guild · Notifications Service Guild |
| 2025-11-12 14:40 | Added task mirror + dependency tracker + milestone table to keep Sprint 170 snapshot aligned with Sprint 171/174 execution plans. | Observability Guild |
| 2025-11-12 18:05 | Marked NOTIFY-ATTEST-74-001, NOTIFY-OAS-61-001, and TELEMETRY-OBS-50-001 as DOING in their sprint trackers; added status notes reflecting in-flight work vs. gated follow-ups. | Notifications Service Guild · Telemetry Core Guild |
| 2025-11-12 19:20 | Documented attestation template suite (Section 7 in docs/notifications/templates.md) to unblock NOTIFY-ATTEST-74-001 deliverables and updated sprint mirrors accordingly. |
Notifications Service Guild |
| 2025-11-12 19:32 | Synced notifications architecture doc to reference the new attestation template suite so downstream teams see the dependency in one place. | Notifications Service Guild |
| 2025-11-12 19:45 | Updated notifications overview + rules docs with tmpl-attest-* requirements so rule authors/operators share the same contract. |
Notifications Service Guild |
| 2025-11-12 20:05 | Published baseline Offline Kit templates under offline/notifier/templates/attestation/ for Slack/Email/Webhook so NOTIFY-ATTEST-74-002 wiring has ready-made artefacts. |
Notifications Service Guild |