feat: add Attestation Chain and Triage Evidence API clients and models
- Implemented Attestation Chain API client with methods for verifying, fetching, and managing attestation chains. - Created models for Attestation Chain, including DSSE envelope structures and verification results. - Developed Triage Evidence API client for fetching finding evidence, including methods for evidence retrieval by CVE and component. - Added models for Triage Evidence, encapsulating evidence responses, entry points, boundary proofs, and VEX evidence. - Introduced mock implementations for both API clients to facilitate testing and development.
This commit is contained in:
502
docs/implplan/archived/SPRINT_1200_001_README.md
Normal file
502
docs/implplan/archived/SPRINT_1200_001_README.md
Normal file
@@ -0,0 +1,502 @@
|
||||
# Router Rate Limiting - Sprint Package README
|
||||
|
||||
**Package Created:** 2025-12-17
|
||||
**For:** Implementation agents / reviewers
|
||||
**Status:** DONE (Sprints 1–6 closed; Sprint 4 closed N/A)
|
||||
**Advisory Source:** `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + Retry‑After Backpressure Control.md`
|
||||
|
||||
---
|
||||
|
||||
## Package Contents
|
||||
|
||||
This sprint package contains the original plan plus the landed implementation for centralized rate limiting in Stella Router.
|
||||
|
||||
### Core Sprint Files
|
||||
|
||||
| File | Purpose | Agent Role |
|
||||
|------|---------|------------|
|
||||
| `SPRINT_1200_001_000_router_rate_limiting_master.md` | Master tracker | **START HERE** - Overview & progress tracking |
|
||||
| `SPRINT_1200_001_001_router_rate_limiting_core.md` | Sprint 1: Core implementation | Implementer - 5-7 days |
|
||||
| `SPRINT_1200_001_002_router_rate_limiting_per_route.md` | Sprint 2: Per-route granularity | Implementer - 2-3 days |
|
||||
| `SPRINT_1200_001_003_router_rate_limiting_rule_stacking.md` | Sprint 3: Rule stacking | Implementer - 2-3 days |
|
||||
| `SPRINT_1200_001_004_router_rate_limiting_service_migration.md` | Sprint 4: Service migration (closed N/A) | Project manager / reviewer |
|
||||
| `SPRINT_1200_001_005_router_rate_limiting_tests.md` | Sprint 5: Comprehensive testing | QA / implementer |
|
||||
| `SPRINT_1200_001_006_router_rate_limiting_docs.md` | Sprint 6: Documentation & rollout prep | Docs / implementer |
|
||||
| `SPRINT_1200_001_IMPLEMENTATION_GUIDE.md` | Technical reference | **READ FIRST** before coding |
|
||||
|
||||
### Documentation Files
|
||||
|
||||
| File | Purpose | Created In |
|
||||
|------|---------|------------|
|
||||
| `docs/router/rate-limiting-routes.md` | Per-route configuration guide | Sprint 2 |
|
||||
| `docs/router/rate-limiting.md` | User-facing configuration guide | Sprint 6 |
|
||||
| `docs/operations/router-rate-limiting.md` | Operational runbook | Sprint 6 |
|
||||
| `docs/modules/router/rate-limiting.md` | Module-level rate-limiting dossier | Sprint 6 |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Sequence
|
||||
|
||||
### Phase 1: Core Implementation (Sprints 1-3)
|
||||
|
||||
```
|
||||
Sprint 1 (5-7 days)
|
||||
├── Task 1.1: Configuration Models
|
||||
├── Task 1.2: Instance Rate Limiter
|
||||
├── Task 1.3: Valkey Backend
|
||||
├── Task 1.4: Middleware Integration
|
||||
├── Task 1.5: Metrics
|
||||
└── Task 1.6: Wire into Pipeline
|
||||
|
||||
Sprint 2 (2-3 days)
|
||||
├── Task 2.1: Extend Config for Routes
|
||||
├── Task 2.2: Route Matching
|
||||
├── Task 2.3: Inheritance Resolution
|
||||
├── Task 2.4: Integrate into Service
|
||||
└── Task 2.5: Documentation
|
||||
|
||||
Sprint 3 (2-3 days)
|
||||
├── Task 3.1: Config for Rule Arrays
|
||||
├── Task 3.2: Update Instance Limiter
|
||||
├── Task 3.3: Enhance Valkey Lua Script
|
||||
└── Task 3.4: Update Inheritance Resolver
|
||||
```
|
||||
|
||||
### Phase 2: Migration & Testing (Sprints 4-5)
|
||||
|
||||
```
|
||||
Sprint 4 (3-4 days) - Service Migration
|
||||
├── Extract AdaptiveRateLimiter configs
|
||||
├── Add to Router configuration
|
||||
├── Refactor AdaptiveRateLimiter
|
||||
└── Integration validation
|
||||
|
||||
Sprint 5 (3-5 days) - Comprehensive Testing
|
||||
├── Unit test suite
|
||||
├── Integration tests (Testcontainers)
|
||||
├── Load tests (k6 scenarios A-F)
|
||||
└── Configuration matrix tests
|
||||
```
|
||||
|
||||
### Phase 3: Documentation & Rollout (Sprint 6)
|
||||
|
||||
```
|
||||
Sprint 6 (2 days)
|
||||
├── Architecture docs
|
||||
├── Configuration guide
|
||||
├── Operational runbook
|
||||
└── Migration guide
|
||||
```
|
||||
|
||||
### Phase 4: Rollout (3 weeks, post-implementation)
|
||||
|
||||
```
|
||||
Week 1: Shadow Mode
|
||||
└── Metrics only, no enforcement
|
||||
|
||||
Week 2: Soft Limits
|
||||
└── 2x traffic peaks
|
||||
|
||||
Week 3: Production Limits
|
||||
└── Full enforcement
|
||||
|
||||
Week 4+: Service Migration
|
||||
└── Remove redundant limiters
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Start for Agents
|
||||
|
||||
### 1. Context Gathering (30 minutes)
|
||||
|
||||
**Read in this order:**
|
||||
|
||||
1. `SPRINT_1200_001_000_router_rate_limiting_master.md` - Overview
|
||||
2. `SPRINT_1200_001_IMPLEMENTATION_GUIDE.md` - Technical details
|
||||
3. Original advisory: `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + Retry‑After Backpressure Control.md`
|
||||
4. Analysis plan: `C:\Users\VladimirMoushkov\.claude\plans\vectorized-kindling-rocket.md`
|
||||
|
||||
### 2. Environment Setup
|
||||
|
||||
```bash
|
||||
# Working directory
|
||||
cd src/__Libraries/StellaOps.Router.Gateway/
|
||||
|
||||
# Verify dependencies
|
||||
dotnet restore
|
||||
|
||||
# Install Valkey for local testing
|
||||
docker run -d -p 6379:6379 valkey/valkey:latest
|
||||
|
||||
# Run existing tests to ensure baseline
|
||||
dotnet test
|
||||
```
|
||||
|
||||
### 3. Start Sprint 1
|
||||
|
||||
Open `SPRINT_1200_001_001_router_rate_limiting_core.md` and follow task breakdown.
|
||||
|
||||
**Task execution pattern:**
|
||||
|
||||
```
|
||||
For each task:
|
||||
1. Read task description
|
||||
2. Review implementation code samples
|
||||
3. Create files as specified
|
||||
4. Write unit tests
|
||||
5. Mark task complete in master tracker
|
||||
6. Commit with message: "feat(router): [Sprint 1.X] Task name"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions (Reference)
|
||||
|
||||
### 1. Status Codes
|
||||
- ✅ **429 Too Many Requests** for rate limiting
|
||||
- ❌ NOT 503 (that's for service health)
|
||||
- ❌ NOT 202 (that's for async job acceptance)
|
||||
|
||||
### 2. Two-Scope Architecture
|
||||
- **for_instance**: In-memory, protects single router
|
||||
- **for_environment**: Valkey-backed, protects aggregate
|
||||
|
||||
Both are necessary—can't replace one with the other.
|
||||
|
||||
### 3. Fail-Open Philosophy
|
||||
- Circuit breaker on Valkey failures
|
||||
- Activation gate optimization
|
||||
- Instance limits enforced even if Valkey down
|
||||
|
||||
### 4. Configuration Inheritance
|
||||
- Replacement semantics (not merge)
|
||||
- Most specific wins: route > microservice > environment > global
|
||||
|
||||
### 5. Rule Stacking
|
||||
- Multiple rules per target = AND logic
|
||||
- All rules must pass
|
||||
- Most restrictive Retry-After returned
|
||||
|
||||
---
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| Instance check latency | <1ms P99 | BenchmarkDotNet |
|
||||
| Environment check latency | <10ms P99 | k6 load test |
|
||||
| Router throughput | 100k req/sec | k6 constant-arrival-rate |
|
||||
| Valkey load per instance | <1000 ops/sec | redis-cli INFO |
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Unit Tests
|
||||
- **Coverage:** >90% for all RateLimit/* files
|
||||
- **Framework:** xUnit
|
||||
- **Patterns:** Arrange-Act-Assert
|
||||
|
||||
### Integration Tests
|
||||
- **Tool:** TestServer + Testcontainers (Valkey)
|
||||
- **Scope:** End-to-end middleware pipeline
|
||||
- **Scenarios:** All config combinations
|
||||
|
||||
### Load Tests
|
||||
- **Tool:** k6
|
||||
- **Scenarios:** A (instance), B (environment), C (activation gate), D (microservice), E (Valkey failure), F (max throughput)
|
||||
- **Duration:** 30s per scenario minimum
|
||||
|
||||
---
|
||||
|
||||
## Common Implementation Gotchas
|
||||
|
||||
⚠️ **Middleware Pipeline Order**
|
||||
```csharp
|
||||
// CORRECT:
|
||||
app.UsePayloadLimits();
|
||||
app.UseRateLimiting(); // BEFORE routing
|
||||
app.UseEndpointResolution();
|
||||
|
||||
// WRONG:
|
||||
app.UseEndpointResolution();
|
||||
app.UseRateLimiting(); // Too late, can't identify microservice
|
||||
```
|
||||
|
||||
⚠️ **Lua Script Deployment**
|
||||
```xml
|
||||
<!-- REQUIRED in .csproj -->
|
||||
<ItemGroup>
|
||||
<Content Include="RateLimit\Scripts\*.lua">
|
||||
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
|
||||
</Content>
|
||||
</ItemGroup>
|
||||
```
|
||||
|
||||
⚠️ **Clock Skew**
|
||||
```lua
|
||||
-- CORRECT: Use Valkey server time
|
||||
local now = tonumber(redis.call("TIME")[1])
|
||||
|
||||
-- WRONG: Use client time (clock skew issues)
|
||||
local now = os.time()
|
||||
```
|
||||
|
||||
⚠️ **Circuit Breaker Half-Open**
|
||||
```csharp
|
||||
// REQUIRED: Implement half-open state
|
||||
if (_state == CircuitState.Open && DateTime.UtcNow >= _halfOpenAt)
|
||||
{
|
||||
_state = CircuitState.HalfOpen; // Allow ONE test request
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria Checklist
|
||||
|
||||
Copy this to master tracker and update as you progress:
|
||||
|
||||
### Functional
|
||||
- [ ] Router enforces per-instance limits (in-memory)
|
||||
- [ ] Router enforces per-environment limits (Valkey-backed)
|
||||
- [ ] Per-microservice configuration works
|
||||
- [ ] Per-route configuration works
|
||||
- [ ] Multiple rules per target work (rule stacking)
|
||||
- [ ] 429 + Retry-After response format correct
|
||||
- [ ] Circuit breaker handles Valkey failures
|
||||
- [ ] Activation gate reduces Valkey load
|
||||
|
||||
### Performance
|
||||
- [ ] Instance check <1ms P99
|
||||
- [ ] Environment check <10ms P99
|
||||
- [ ] 100k req/sec throughput maintained
|
||||
- [ ] Valkey load <1000 ops/sec per instance
|
||||
|
||||
### Operational
|
||||
- [ ] Metrics exported to OpenTelemetry
|
||||
- [ ] Dashboards created (Grafana)
|
||||
- [ ] Alerts configured (Alertmanager)
|
||||
- [ ] Documentation complete
|
||||
- [ ] Migration from service-level rate limiters complete
|
||||
|
||||
### Quality
|
||||
- [ ] Unit test coverage >90%
|
||||
- [ ] Integration tests pass (all scenarios)
|
||||
- [ ] Load tests pass (k6 scenarios A-F)
|
||||
- [ ] Failure injection tests pass
|
||||
|
||||
---
|
||||
|
||||
## Escalation & Support
|
||||
|
||||
### Blocked on Technical Decision
|
||||
**Escalate to:** Architecture Guild (#stella-architecture)
|
||||
**Response SLA:** 24 hours
|
||||
|
||||
### Blocked on Resource (Valkey, config, etc.)
|
||||
**Escalate to:** Platform Engineering (#stella-platform)
|
||||
**Response SLA:** 4 hours
|
||||
|
||||
### Blocked on Clarification
|
||||
**Escalate to:** Router Team Lead (#stella-router-dev)
|
||||
**Response SLA:** 2 hours
|
||||
|
||||
### Sprint Falling Behind Schedule
|
||||
**Escalate to:** Project Manager (update master tracker with BLOCKED status)
|
||||
**Action:** Add note in "Decisions & Risks" section
|
||||
|
||||
---
|
||||
|
||||
## File Structure (After Implementation)
|
||||
|
||||
### Actual (landed)
|
||||
|
||||
```
|
||||
src/__Libraries/StellaOps.Router.Gateway/RateLimit/
|
||||
CircuitBreaker.cs
|
||||
EnvironmentRateLimiter.cs
|
||||
InMemoryValkeyRateLimitStore.cs
|
||||
InstanceRateLimiter.cs
|
||||
LimitInheritanceResolver.cs
|
||||
RateLimitConfig.cs
|
||||
RateLimitDecision.cs
|
||||
RateLimitMetrics.cs
|
||||
RateLimitMiddleware.cs
|
||||
RateLimitRule.cs
|
||||
RateLimitRouteMatcher.cs
|
||||
RateLimitService.cs
|
||||
RateLimitServiceCollectionExtensions.cs
|
||||
ValkeyRateLimitStore.cs
|
||||
|
||||
tests/StellaOps.Router.Gateway.Tests/
|
||||
LimitInheritanceResolverTests.cs
|
||||
InMemoryValkeyRateLimitStoreTests.cs
|
||||
InstanceRateLimiterTests.cs
|
||||
RateLimitConfigTests.cs
|
||||
RateLimitRouteMatcherTests.cs
|
||||
RateLimitServiceTests.cs
|
||||
|
||||
docs/router/rate-limiting-routes.md
|
||||
```
|
||||
|
||||
### Original plan (reference)
|
||||
|
||||
```
|
||||
src/__Libraries/StellaOps.Router.Gateway/
|
||||
├── RateLimit/
|
||||
│ ├── RateLimitConfig.cs
|
||||
│ ├── IRateLimiter.cs
|
||||
│ ├── InstanceRateLimiter.cs
|
||||
│ ├── EnvironmentRateLimiter.cs
|
||||
│ ├── RateLimitService.cs
|
||||
│ ├── RateLimitMetrics.cs
|
||||
│ ├── RateLimitDecision.cs
|
||||
│ ├── ValkeyRateLimitStore.cs
|
||||
│ ├── CircuitBreaker.cs
|
||||
│ ├── LimitInheritanceResolver.cs
|
||||
│ ├── Models/
|
||||
│ │ ├── InstanceLimitsConfig.cs
|
||||
│ │ ├── EnvironmentLimitsConfig.cs
|
||||
│ │ ├── MicroserviceLimitsConfig.cs
|
||||
│ │ ├── RouteLimitsConfig.cs
|
||||
│ │ ├── RateLimitRule.cs
|
||||
│ │ └── EffectiveLimits.cs
|
||||
│ ├── RouteMatching/
|
||||
│ │ ├── IRouteMatcher.cs
|
||||
│ │ ├── RouteMatcher.cs
|
||||
│ │ ├── ExactRouteMatcher.cs
|
||||
│ │ ├── PrefixRouteMatcher.cs
|
||||
│ │ └── RegexRouteMatcher.cs
|
||||
│ ├── Internal/
|
||||
│ │ └── SlidingWindowCounter.cs
|
||||
│ └── Scripts/
|
||||
│ └── rate_limit_check.lua
|
||||
├── Middleware/
|
||||
│ └── RateLimitMiddleware.cs
|
||||
├── ApplicationBuilderExtensions.cs (modified)
|
||||
└── ServiceCollectionExtensions.cs (modified)
|
||||
|
||||
__Tests/
|
||||
├── RateLimit/
|
||||
│ ├── InstanceRateLimiterTests.cs
|
||||
│ ├── EnvironmentRateLimiterTests.cs
|
||||
│ ├── ValkeyRateLimitStoreTests.cs
|
||||
│ ├── RateLimitMiddlewareTests.cs
|
||||
│ ├── ConfigurationTests.cs
|
||||
│ ├── RouteMatchingTests.cs
|
||||
│ └── InheritanceResolverTests.cs
|
||||
|
||||
tests/load/
|
||||
└── router-rate-limiting-load-test.js
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps After Package Review
|
||||
|
||||
1. **Acknowledge receipt** of sprint package
|
||||
2. **Set up development environment** (Valkey, dependencies)
|
||||
3. **Read Implementation Guide** in full
|
||||
4. **Start Sprint 1, Task 1.1** (Configuration Models)
|
||||
5. **Update master tracker** as tasks complete
|
||||
6. **Commit frequently** with clear messages
|
||||
7. **Run tests after each task**
|
||||
8. **Ask questions early** if blocked
|
||||
|
||||
---
|
||||
|
||||
## Configuration Quick Reference
|
||||
|
||||
### Minimal Config (Just Defaults)
|
||||
|
||||
```yaml
|
||||
rate_limiting:
|
||||
for_instance:
|
||||
per_seconds: 300
|
||||
max_requests: 30000
|
||||
```
|
||||
|
||||
### Full Config (All Features)
|
||||
|
||||
```yaml
|
||||
rate_limiting:
|
||||
process_back_pressure_when_more_than_per_5min: 5000
|
||||
|
||||
for_instance:
|
||||
rules:
|
||||
- per_seconds: 300
|
||||
max_requests: 30000
|
||||
- per_seconds: 30
|
||||
max_requests: 5000
|
||||
|
||||
for_environment:
|
||||
valkey_bucket: "stella-router-rate-limit"
|
||||
valkey_connection: "valkey.stellaops.local:6379"
|
||||
|
||||
circuit_breaker:
|
||||
failure_threshold: 5
|
||||
timeout_seconds: 30
|
||||
half_open_timeout: 10
|
||||
|
||||
rules:
|
||||
- per_seconds: 300
|
||||
max_requests: 30000
|
||||
|
||||
microservices:
|
||||
concelier:
|
||||
rules:
|
||||
- per_seconds: 1
|
||||
max_requests: 10
|
||||
- per_seconds: 3600
|
||||
max_requests: 3000
|
||||
|
||||
scanner:
|
||||
rules:
|
||||
- per_seconds: 60
|
||||
max_requests: 600
|
||||
|
||||
routes:
|
||||
scan_submit:
|
||||
pattern: "/api/scans"
|
||||
match_type: exact
|
||||
rules:
|
||||
- per_seconds: 10
|
||||
max_requests: 50
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Source Documents
|
||||
- **Advisory:** `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + Retry‑After Backpressure Control.md`
|
||||
- **Analysis Plan:** `C:\Users\VladimirMoushkov\.claude\plans\vectorized-kindling-rocket.md`
|
||||
- **Architecture:** `docs/modules/platform/architecture-overview.md`
|
||||
|
||||
### Implementation Sprints
|
||||
- **Master Tracker:** `SPRINT_1200_001_000_router_rate_limiting_master.md`
|
||||
- **Sprint 1:** `SPRINT_1200_001_001_router_rate_limiting_core.md`
|
||||
- **Sprint 2:** `SPRINT_1200_001_002_router_rate_limiting_per_route.md`
|
||||
- **Sprint 3:** `SPRINT_1200_001_003_router_rate_limiting_rule_stacking.md`
|
||||
- **Sprint 4:** `SPRINT_1200_001_004_router_rate_limiting_service_migration.md` (closed N/A)
|
||||
- **Sprint 5:** `SPRINT_1200_001_005_router_rate_limiting_tests.md`
|
||||
- **Sprint 6:** `SPRINT_1200_001_006_router_rate_limiting_docs.md`
|
||||
|
||||
### Technical Guides
|
||||
- **Implementation Guide:** `SPRINT_1200_001_IMPLEMENTATION_GUIDE.md` (comprehensive)
|
||||
- **HTTP 429 Semantics:** RFC 6585
|
||||
- **Valkey Documentation:** https://valkey.io/docs/
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0 | 2025-12-17 | Initial sprint package created |
|
||||
|
||||
---
|
||||
|
||||
**Already implemented.** Review the master tracker and run `dotnet test StellaOps.Router.slnx -c Release`.
|
||||
Reference in New Issue
Block a user