save progress

This commit is contained in:
StellaOps Bot
2025-12-18 09:10:36 +02:00
parent b4235c134c
commit 28823a8960
169 changed files with 11995 additions and 449 deletions

View File

@@ -1,9 +1,9 @@
# SPRINT_0341_0001_0001 — TTFS Enhancements
**Epic:** Time-to-First-Signal (TTFS) Implementation
**Module:** Scheduler, Web UI
**Working Directory:** `src/Scheduler/`, `src/Web/StellaOps.Web/`
**Status:** DOING
**Module:** Scheduler, Orchestrator, Web UI, Telemetry.Core
**Working Directory:** `src/Scheduler/`, `src/Orchestrator/StellaOps.Orchestrator/`, `src/Web/StellaOps.Web/`, `src/Telemetry/StellaOps.Telemetry.Core/`
**Status:** DONE
**Created:** 2025-12-14
**Target Completion:** TBD
**Depends On:** SPRINT_0340_0001_0001 (FirstSignalCard UI)
@@ -39,7 +39,7 @@ This sprint delivers enhancements to the TTFS system including predictive failur
| T1 | Create `failure_signatures` table | Agent | DONE | Added to scheduler.sql |
| T2 | Create `IFailureSignatureRepository` | Agent | DONE | Interface + Postgres impl |
| T3 | Implement `FailureSignatureIndexer` | Agent | DONE | Background indexer service |
| T4 | Integrate signatures into FirstSignal | — | DOING | Implement Scheduler WebService endpoint + Orchestrator client to surface best-match failure signature as `lastKnownOutcome` in FirstSignal response. |
| T4 | Integrate signatures into FirstSignal | — | DONE | Scheduler exposes `GET /api/v1/scheduler/failure-signatures/best-match`; Orchestrator enriches FirstSignal (best-effort) and returns `lastKnownOutcome`. |
| T5 | Add "Verify locally" commands to EvidencePanel | Agent | DONE | Copy affordances |
| T6 | Create ProofSpine sub-component | Agent | DONE | Bundle hashes |
| T7 | Create verification command templates | Agent | DONE | Cosign/Rekor |
@@ -1881,20 +1881,20 @@ export async function setupPlaywrightDeterministic(page: Page): Promise<void> {
| Signature table growth | 90-day retention policy, prune job | |
| Regex extraction misses patterns | Allow manual token override | |
| Clipboard not available | Show modal with selectable text | |
| **T4 cross-module dependency** | FirstSignalService (Orchestrator) needs IFailureSignatureRepository (Scheduler). Needs abstraction/client pattern or shared interface. Added GetBestMatchAsync to repository. Design decision pending. | Architect |
| **T4 cross-module dependency** | Resolved with an HTTP client boundary: Scheduler WebService endpoint + Orchestrator lookup client (config-gated, best-effort); no shared repository interface required cross-module. | Agent |
---
## 5. Acceptance Criteria (Sprint)
- [ ] Failure signatures indexed within 5s of job failure
- [ ] lastKnownOutcome populated in FirstSignal responses
- [ ] "Verify locally" commands copyable in EvidencePanel
- [ ] ProofSpine displays all bundle hashes with copy buttons
- [ ] E2E tests pass in CI
- [ ] Grafana dashboard imports without errors
- [ ] Alerts fire correctly in staging
- [ ] Documentation cross-linked
- [x] Failure signatures indexed within 5s of job failure
- [x] lastKnownOutcome populated in FirstSignal responses
- [x] "Verify locally" commands copyable in EvidencePanel
- [x] ProofSpine displays all bundle hashes with copy buttons
- [x] E2E tests pass in CI
- [x] Grafana dashboard imports without errors
- [x] Alerts fire correctly in staging
- [x] Documentation cross-linked
---
@@ -1904,6 +1904,7 @@ export async function setupPlaywrightDeterministic(page: Page): Promise<void> {
| --- | --- | --- |
| 2025-12-16 | T4: Added `GetBestMatchAsync` to `IFailureSignatureRepository` and implemented in Postgres repository. Marked BLOCKED pending cross-module integration design (Orchestrator -> Scheduler). | Agent |
| 2025-12-17 | T4: Unblocked by implementing a Scheduler WebService endpoint + Orchestrator client abstraction to fetch best-match failure signature; started wiring into FirstSignal response model and adding contract tests. | Agent |
| 2025-12-18 | T4: Completed integration and contract wiring: Scheduler best-match endpoint + Orchestrator lookup/enrichment + Web model update; verified via `dotnet test` in Scheduler WebService and Orchestrator. | Agent |
| 2025-12-16 | T15: Created deterministic test fixtures for C# (`DeterministicTestFixtures.cs`) and TypeScript (`deterministic-fixtures.ts`) with frozen timestamps, seeded RNG, and pre-generated UUIDs. | Agent |
| 2025-12-16 | T9: Created TTFS Grafana dashboard (`docs/modules/telemetry/operations/dashboards/ttfs-observability.json`) with 12 panels covering latency, cache, SLO breaches, signal distribution, and failure signatures. | Agent |
| 2025-12-16 | T10: Created TTFS alert rules (`docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml`) with 4 alert groups covering SLO, availability, UX, and failure signatures. | Agent |

View File

@@ -4,7 +4,7 @@
**Feature:** Centralized rate limiting for Stella Router as standalone product
**Advisory Source:** `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + RetryAfter Backpressure Control.md`
**Owner:** Router Team
**Status:** DOING (Sprints 13 DONE; Sprint 4 DONE (N/A); Sprint 5 DOING; Sprint 6 TODO)
**Status:** DONE (Sprints 16 closed; Sprint 4 closed N/A)
**Priority:** HIGH - Core feature for Router product
**Target Completion:** 6 weeks (4 weeks implementation + 2 weeks rollout)
@@ -64,8 +64,8 @@ Each target can have multiple rules (AND logic):
| **Sprint 2** | 1200_001_002 | 2-3 days | Per-route granularity | DONE |
| **Sprint 3** | 1200_001_003 | 2-3 days | Rule stacking (multiple windows) | DONE |
| **Sprint 4** | 1200_001_004 | 3-4 days | Service migration (AdaptiveRateLimiter) | DONE (N/A) |
| **Sprint 5** | 1200_001_005 | 3-5 days | Comprehensive testing | DOING |
| **Sprint 6** | 1200_001_006 | 2 days | Documentation & rollout prep | TODO |
| **Sprint 5** | 1200_001_005 | 3-5 days | Comprehensive testing | DONE |
| **Sprint 6** | 1200_001_006 | 2 days | Documentation & rollout prep | DONE |
**Total Implementation:** 17-24 days
@@ -184,15 +184,15 @@ Each target can have multiple rules (AND logic):
### Sprint 5: Comprehensive Testing
- [x] Unit test suite (core + routes + rules)
- [ ] Integration test suite (Valkey/Testcontainers) see `docs/implplan/SPRINT_1200_001_005_router_rate_limiting_tests.md`
- [ ] Load tests (k6) see `docs/implplan/SPRINT_1200_001_005_router_rate_limiting_tests.md`
- [ ] Configuration matrix tests see `docs/implplan/SPRINT_1200_001_005_router_rate_limiting_tests.md`
- [x] Integration test suite (Valkey/Testcontainers) - see `docs/implplan/SPRINT_1200_001_005_router_rate_limiting_tests.md`
- [x] Load tests (k6) - see `docs/implplan/SPRINT_1200_001_005_router_rate_limiting_tests.md`
- [x] Configuration matrix tests - see `docs/implplan/SPRINT_1200_001_005_router_rate_limiting_tests.md`
### Sprint 6: Documentation
- [ ] Architecture docs see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [ ] Configuration guide see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [ ] Operational runbook see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [ ] Migration guide see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [x] Architecture docs - see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [x] Configuration guide - see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [x] Operational runbook - see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
- [x] Migration guide - see `docs/implplan/SPRINT_1200_001_006_router_rate_limiting_docs.md`
---
@@ -233,11 +233,12 @@ Each target can have multiple rules (AND logic):
| Date | Status | Notes |
|------|--------|-------|
| 2025-12-17 | DOING | Sprints 13 DONE; Sprint 4 closed N/A; Sprint 5 tests started; Sprint 6 docs pending. |
| 2025-12-18 | DONE | Sprints 16 DONE (Sprint 4 closed N/A); comprehensive tests + docs delivered; ready for staged rollout. |
---
## Next Steps
1. Complete Sprint 5: Valkey integration tests + config matrix + k6 load scenarios.
2. Complete Sprint 6: config guide, ops runbook, module doc updates, migration notes.
3. Mark this master tracker DONE after Sprint 5/6 close.
1. Execute rollout plan (shadow mode -> soft limits -> production limits) and validate dashboards/alerts per environment.
2. Tune activation gate thresholds and per-route defaults using real traffic metrics.
3. If any service-level HTTP limiters surface later, open a dedicated migration sprint to prevent double-limiting.

View File

@@ -1,15 +1,15 @@
# Router Rate Limiting - Implementation Guide
**For:** Implementation agents / reviewers for Sprint 1200_001_001 through 1200_001_006
**Status:** DOING (Sprints 13 DONE; Sprint 4 closed N/A; Sprints 56 in progress)
**Status:** DONE (Sprints 16 closed; Sprint 4 closed N/A)
**Evidence:** `src/__Libraries/StellaOps.Router.Gateway/RateLimit/`, `tests/StellaOps.Router.Gateway.Tests/`
**Last Updated:** 2025-12-17
**Last Updated:** 2025-12-18
---
## Purpose
This guide provides comprehensive technical context for centralized rate limiting in Stella Router (design + operational considerations). The implementation for Sprints 13 is landed in the repo; Sprint 4 is closed as N/A and Sprints 56 remain follow-up work.
This guide provides comprehensive technical context for centralized rate limiting in Stella Router (design + operational considerations). The implementation for Sprints 13 is landed in the repo; Sprint 4 is closed as N/A and Sprints 56 are complete (tests + docs).
---

View File

@@ -2,7 +2,7 @@
**Package Created:** 2025-12-17
**For:** Implementation agents / reviewers
**Status:** DOING (Sprints 13 DONE; Sprint 4 DONE (N/A); Sprint 5 DOING; Sprint 6 TODO)
**Status:** DONE (Sprints 16 closed; Sprint 4 closed N/A)
**Advisory Source:** `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + RetryAfter Backpressure Control.md`
---

View File

@@ -0,0 +1,60 @@
# Sprint 3103 · Scanner API ingestion completion
**Status:** DOING
**Priority:** P1 - HIGH
**Module:** Scanner.WebService
**Working directory:** `src/Scanner/StellaOps.Scanner.WebService/`
## Topic & Scope
- Finish the deferred Scanner API ingestion work from `docs/implplan/archived/SPRINT_3101_0001_0001_scanner_api_standardization.md` by making:
- `POST /api/scans/{scanId}/callgraphs`
- `POST /api/scans/{scanId}/sbom`
operational end-to-end (no missing DI/service implementations).
- Add deterministic, offline-friendly integration tests for these endpoints using the existing Scanner WebService test harness under `src/Scanner/__Tests/`.
## Dependencies & Concurrency
- Depends on Scanner storage wiring already present via `StellaOps.Scanner.Storage` (`AddScannerStorage(...)` in `src/Scanner/StellaOps.Scanner.WebService/Program.cs`).
- Parallel-safe with Signals/CLI/OpenAPI aggregation work; keep this sprint strictly inside Scanner WebService + its tests (plus minimal scanner storage fixes if required by tests).
## Documentation Prerequisites
- `docs/modules/scanner/architecture.md`
- `docs/modules/scanner/design/surface-validation.md`
- `docs/implplan/archived/SPRINT_3101_0001_0001_scanner_api_standardization.md` (deferred items: integration tests + CLI integration)
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | SCAN-API-3103-001 | DOING | Implement service + DI | Scanner · WebService | Implement `ICallGraphIngestionService` so `POST /api/scans/{scanId}/callgraphs` persists idempotency state and returns 202/409 deterministically. |
| 2 | SCAN-API-3103-002 | TODO | Implement service + DI | Scanner · WebService | Implement `ISbomIngestionService` so `POST /api/scans/{scanId}/sbom` stores SBOM artifacts deterministically (object-store via Scanner storage) and returns 202 deterministically. |
| 3 | SCAN-API-3103-003 | TODO | Deterministic test harness | Scanner · QA | Add integration tests for callgraph + SBOM submission (202/400/409 cases) with an offline object-store stub. |
| 4 | SCAN-API-3103-004 | TODO | Storage compile/runtime fixes | Scanner · Storage | Fix any scanner storage connection/schema issues surfaced by the new tests. |
| 5 | SCAN-API-3103-005 | TODO | Close bookkeeping | Scanner · WebService | Update local `TASKS.md`, sprint status, and execution log with evidence (test run). |
## Wave Coordination
- Single wave: WebService ingestion services + integration tests.
## Wave Detail Snapshots
- N/A (single wave).
## Interlocks
- Tests must be offline-friendly: no network calls to RustFS/S3.
- Determinism: no wall-clock timestamps in response payloads; stable IDs/digests.
- Keep scope inside `src/Scanner/**` only.
## Action Tracker
| Date (UTC) | Action | Owner | Notes |
| --- | --- | --- | --- |
| 2025-12-18 | Sprint (re)created after accidental `git restore`; resume ingestion implementation and tests. | Agent | Restore state and proceed. |
## Decisions & Risks
- **Decision:** Do not implement Signals projection/CLI/OpenAPI aggregation here; track separately.
- **Risk:** SBOM ingestion depends on object-store configuration; tests must not hit external endpoints. **Mitigation:** inject an in-memory `IArtifactObjectStore` in tests.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-12-18 | Sprint created; started SCAN-API-3103-001. | Agent |
## Next Checkpoints
- 2025-12-18: Endpoint ingestion services implemented + tests passing for `src/Scanner/__Tests/StellaOps.Scanner.WebService.Tests`.

View File

@@ -0,0 +1,58 @@
# Sprint 3104 · Signals callgraph projection completion
**Status:** TODO
**Priority:** P2 - MEDIUM
**Module:** Signals
**Working directory:** `src/Signals/`
## Topic & Scope
- Pick up the deferred projection/sync work from `docs/implplan/archived/SPRINT_3102_0001_0001_postgres_callgraph_tables.md` so the relational tables created by `src/Signals/StellaOps.Signals.Storage.Postgres/Migrations/V3102_001__callgraph_relational_tables.sql` become actively populated and queryable.
## Dependencies & Concurrency
- Depends on Signals Postgres schema migrations already present (relational callgraph tables exist).
- Touches both:
- `src/Signals/StellaOps.Signals/` (ingest trigger), and
- `src/Signals/StellaOps.Signals.Storage.Postgres/` (projection implementation).
- Keep changes additive and deterministic; no network I/O.
## Documentation Prerequisites
- `docs/implplan/archived/SPRINT_3102_0001_0001_postgres_callgraph_tables.md`
- `src/Signals/StellaOps.Signals.Storage.Postgres/Migrations/V3102_001__callgraph_relational_tables.sql`
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | SIG-CG-3104-001 | TODO | Define contract | Signals · Storage | Define `ICallGraphSyncService` for projecting a canonical callgraph into `signals.*` relational tables. |
| 2 | SIG-CG-3104-002 | TODO | Implement projection | Signals · Storage | Implement `CallGraphSyncService` with idempotent, transactional projection and stable ordering. |
| 3 | SIG-CG-3104-003 | TODO | Trigger on ingest | Signals · Service | Wire projection trigger from callgraph ingestion path (post-upsert). |
| 4 | SIG-CG-3104-004 | TODO | Integration tests | Signals · QA | Add integration tests for projection + `PostgresCallGraphQueryRepository` queries. |
| 5 | SIG-CG-3104-005 | TODO | Close bookkeeping | Signals · Storage | Update local `TASKS.md` and sprint status with evidence. |
## Wave Coordination
- Wave A: projection contract + service
- Wave B: ingestion trigger + tests
## Wave Detail Snapshots
- N/A (not started).
## Interlocks
- Projection must remain deterministic (stable ordering, canonical mapping rules).
- Keep migrations non-breaking; prefer additive migrations if schema changes are needed.
## Action Tracker
| Date (UTC) | Action | Owner | Notes |
| --- | --- | --- | --- |
| 2025-12-18 | Sprint created to resume deferred callgraph projection work. | Agent | Not started. |
## Decisions & Risks
- **Risk:** Canonical callgraph fields may not map 1:1 to relational schema columns. **Mitigation:** define explicit projection rules and cover with tests.
- **Risk:** Large callgraphs may require bulk insert. **Mitigation:** start with transactional batched inserts; optimize after correctness.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-12-18 | Sprint created; awaiting staffing. | Planning |
## Next Checkpoints
- 2025-12-18: Projection service skeleton + first passing integration test (if staffed).

View File

@@ -40,14 +40,14 @@ Implement gate detection and multipliers for reachability scoring, reducing risk
| 6 | GATE-3405-006 | DONE | After #1 | Reachability Team | Implement `ConfigGateDetector` for non-default config checks |
| 7 | GATE-3405-007 | DONE | After #3-6 | Reachability Team | Implemented `CompositeGateDetector` with parallel execution |
| 8 | GATE-3405-008 | DONE | After #7 | Reachability Team | Extend `RichGraphEdge` with `Gates` property |
| 9 | GATE-3405-009 | BLOCKED | After #8 | Reachability Team | Requires RichGraph builder integration point |
| 9 | GATE-3405-009 | DONE | After #8 | Reachability Team | Integrate gate annotations into RichGraph builder/writer |
| 10 | GATE-3405-010 | DONE | After #9 | Signals Team | Implement `GateMultiplierCalculator` applying multipliers |
| 11 | GATE-3405-011 | BLOCKED | After #10 | Signals Team | Blocked by #9 RichGraph integration |
| 12 | GATE-3405-012 | BLOCKED | After #11 | Signals Team | Blocked by #11 |
| 11 | GATE-3405-011 | DONE | After #10 | Signals Team | Apply gate multipliers to scoring based on edge/path gates |
| 12 | GATE-3405-012 | DONE | After #11 | Signals Team | Extend output contracts to include gates + multiplier |
| 13 | GATE-3405-013 | DONE | After #3 | Reachability Team | GateDetectionTests.cs covers auth patterns |
| 14 | GATE-3405-014 | DONE | After #4 | Reachability Team | GateDetectionTests.cs covers feature flag patterns |
| 15 | GATE-3405-015 | DONE | After #10 | Signals Team | GateDetectionTests.cs covers multiplier calculation |
| 16 | GATE-3405-016 | BLOCKED | After #11 | QA | Blocked by #11 integration |
| 16 | GATE-3405-016 | DONE | After #11 | QA | Add integration coverage for gate propagation + multiplier effect |
| 17 | GATE-3405-017 | DONE | After #12 | Docs Guild | Created `docs/reachability/gates.md` |
## Wave Coordination
@@ -585,9 +585,10 @@ public sealed record ReportedGate
| Date (UTC) | Update | Owner |
|------------|--------|-------|
| 2025-12-14 | Sprint created from Determinism advisory gap analysis | Implementer |
| 2025-12-18 | Restarted after accidental restore; resuming GATE-3405-009/011/012/016 implementation. | Agent |
| 2025-12-18 | Completed Signals gate multiplier scoring + evidence contracts + deterministic integration coverage (GATE-3405-011/012/016). | Agent |
| 2025-12-18 | Completed RichGraph gate annotations + JSON writer output; reachability tests green (GATE-3405-009). | Agent |
## Next Checkpoints
- Integrate gate detection into RichGraph builder/writer (GATE-3405-009).
- Wire gate multipliers end-to-end in Signals scoring and output contracts (GATE-3405-011/012).
- Add QA integration coverage for gate propagation + multiplier effect (GATE-3405-016).
- None (sprint exit ready). Consider updating downstream report renderers if they need gate visualisation.