feat: add Attestation Chain and Triage Evidence API clients and models

- Implemented Attestation Chain API client with methods for verifying, fetching, and managing attestation chains.
- Created models for Attestation Chain, including DSSE envelope structures and verification results.
- Developed Triage Evidence API client for fetching finding evidence, including methods for evidence retrieval by CVE and component.
- Added models for Triage Evidence, encapsulating evidence responses, entry points, boundary proofs, and VEX evidence.
- Introduced mock implementations for both API clients to facilitate testing and development.
This commit is contained in:
master
2025-12-18 13:15:13 +02:00
parent 7d5250238c
commit 00d2c99af9
118 changed files with 13463 additions and 151 deletions

View File

@@ -0,0 +1,502 @@
# Router Rate Limiting - Sprint Package README
**Package Created:** 2025-12-17
**For:** Implementation agents / reviewers
**Status:** DONE (Sprints 16 closed; Sprint 4 closed N/A)
**Advisory Source:** `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + RetryAfter Backpressure Control.md`
---
## Package Contents
This sprint package contains the original plan plus the landed implementation for centralized rate limiting in Stella Router.
### Core Sprint Files
| File | Purpose | Agent Role |
|------|---------|------------|
| `SPRINT_1200_001_000_router_rate_limiting_master.md` | Master tracker | **START HERE** - Overview & progress tracking |
| `SPRINT_1200_001_001_router_rate_limiting_core.md` | Sprint 1: Core implementation | Implementer - 5-7 days |
| `SPRINT_1200_001_002_router_rate_limiting_per_route.md` | Sprint 2: Per-route granularity | Implementer - 2-3 days |
| `SPRINT_1200_001_003_router_rate_limiting_rule_stacking.md` | Sprint 3: Rule stacking | Implementer - 2-3 days |
| `SPRINT_1200_001_004_router_rate_limiting_service_migration.md` | Sprint 4: Service migration (closed N/A) | Project manager / reviewer |
| `SPRINT_1200_001_005_router_rate_limiting_tests.md` | Sprint 5: Comprehensive testing | QA / implementer |
| `SPRINT_1200_001_006_router_rate_limiting_docs.md` | Sprint 6: Documentation & rollout prep | Docs / implementer |
| `SPRINT_1200_001_IMPLEMENTATION_GUIDE.md` | Technical reference | **READ FIRST** before coding |
### Documentation Files
| File | Purpose | Created In |
|------|---------|------------|
| `docs/router/rate-limiting-routes.md` | Per-route configuration guide | Sprint 2 |
| `docs/router/rate-limiting.md` | User-facing configuration guide | Sprint 6 |
| `docs/operations/router-rate-limiting.md` | Operational runbook | Sprint 6 |
| `docs/modules/router/rate-limiting.md` | Module-level rate-limiting dossier | Sprint 6 |
---
## Implementation Sequence
### Phase 1: Core Implementation (Sprints 1-3)
```
Sprint 1 (5-7 days)
├── Task 1.1: Configuration Models
├── Task 1.2: Instance Rate Limiter
├── Task 1.3: Valkey Backend
├── Task 1.4: Middleware Integration
├── Task 1.5: Metrics
└── Task 1.6: Wire into Pipeline
Sprint 2 (2-3 days)
├── Task 2.1: Extend Config for Routes
├── Task 2.2: Route Matching
├── Task 2.3: Inheritance Resolution
├── Task 2.4: Integrate into Service
└── Task 2.5: Documentation
Sprint 3 (2-3 days)
├── Task 3.1: Config for Rule Arrays
├── Task 3.2: Update Instance Limiter
├── Task 3.3: Enhance Valkey Lua Script
└── Task 3.4: Update Inheritance Resolver
```
### Phase 2: Migration & Testing (Sprints 4-5)
```
Sprint 4 (3-4 days) - Service Migration
├── Extract AdaptiveRateLimiter configs
├── Add to Router configuration
├── Refactor AdaptiveRateLimiter
└── Integration validation
Sprint 5 (3-5 days) - Comprehensive Testing
├── Unit test suite
├── Integration tests (Testcontainers)
├── Load tests (k6 scenarios A-F)
└── Configuration matrix tests
```
### Phase 3: Documentation & Rollout (Sprint 6)
```
Sprint 6 (2 days)
├── Architecture docs
├── Configuration guide
├── Operational runbook
└── Migration guide
```
### Phase 4: Rollout (3 weeks, post-implementation)
```
Week 1: Shadow Mode
└── Metrics only, no enforcement
Week 2: Soft Limits
└── 2x traffic peaks
Week 3: Production Limits
└── Full enforcement
Week 4+: Service Migration
└── Remove redundant limiters
```
---
## Quick Start for Agents
### 1. Context Gathering (30 minutes)
**Read in this order:**
1. `SPRINT_1200_001_000_router_rate_limiting_master.md` - Overview
2. `SPRINT_1200_001_IMPLEMENTATION_GUIDE.md` - Technical details
3. Original advisory: `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + RetryAfter Backpressure Control.md`
4. Analysis plan: `C:\Users\VladimirMoushkov\.claude\plans\vectorized-kindling-rocket.md`
### 2. Environment Setup
```bash
# Working directory
cd src/__Libraries/StellaOps.Router.Gateway/
# Verify dependencies
dotnet restore
# Install Valkey for local testing
docker run -d -p 6379:6379 valkey/valkey:latest
# Run existing tests to ensure baseline
dotnet test
```
### 3. Start Sprint 1
Open `SPRINT_1200_001_001_router_rate_limiting_core.md` and follow task breakdown.
**Task execution pattern:**
```
For each task:
1. Read task description
2. Review implementation code samples
3. Create files as specified
4. Write unit tests
5. Mark task complete in master tracker
6. Commit with message: "feat(router): [Sprint 1.X] Task name"
```
---
## Key Design Decisions (Reference)
### 1. Status Codes
-**429 Too Many Requests** for rate limiting
- ❌ NOT 503 (that's for service health)
- ❌ NOT 202 (that's for async job acceptance)
### 2. Two-Scope Architecture
- **for_instance**: In-memory, protects single router
- **for_environment**: Valkey-backed, protects aggregate
Both are necessary—can't replace one with the other.
### 3. Fail-Open Philosophy
- Circuit breaker on Valkey failures
- Activation gate optimization
- Instance limits enforced even if Valkey down
### 4. Configuration Inheritance
- Replacement semantics (not merge)
- Most specific wins: route > microservice > environment > global
### 5. Rule Stacking
- Multiple rules per target = AND logic
- All rules must pass
- Most restrictive Retry-After returned
---
## Performance Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| Instance check latency | <1ms P99 | BenchmarkDotNet |
| Environment check latency | <10ms P99 | k6 load test |
| Router throughput | 100k req/sec | k6 constant-arrival-rate |
| Valkey load per instance | <1000 ops/sec | redis-cli INFO |
---
## Testing Requirements
### Unit Tests
- **Coverage:** >90% for all RateLimit/* files
- **Framework:** xUnit
- **Patterns:** Arrange-Act-Assert
### Integration Tests
- **Tool:** TestServer + Testcontainers (Valkey)
- **Scope:** End-to-end middleware pipeline
- **Scenarios:** All config combinations
### Load Tests
- **Tool:** k6
- **Scenarios:** A (instance), B (environment), C (activation gate), D (microservice), E (Valkey failure), F (max throughput)
- **Duration:** 30s per scenario minimum
---
## Common Implementation Gotchas
⚠️ **Middleware Pipeline Order**
```csharp
// CORRECT:
app.UsePayloadLimits();
app.UseRateLimiting(); // BEFORE routing
app.UseEndpointResolution();
// WRONG:
app.UseEndpointResolution();
app.UseRateLimiting(); // Too late, can't identify microservice
```
⚠️ **Lua Script Deployment**
```xml
<!-- REQUIRED in .csproj -->
<ItemGroup>
<Content Include="RateLimit\Scripts\*.lua">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
</ItemGroup>
```
⚠️ **Clock Skew**
```lua
-- CORRECT: Use Valkey server time
local now = tonumber(redis.call("TIME")[1])
-- WRONG: Use client time (clock skew issues)
local now = os.time()
```
⚠️ **Circuit Breaker Half-Open**
```csharp
// REQUIRED: Implement half-open state
if (_state == CircuitState.Open && DateTime.UtcNow >= _halfOpenAt)
{
_state = CircuitState.HalfOpen; // Allow ONE test request
}
```
---
## Success Criteria Checklist
Copy this to master tracker and update as you progress:
### Functional
- [ ] Router enforces per-instance limits (in-memory)
- [ ] Router enforces per-environment limits (Valkey-backed)
- [ ] Per-microservice configuration works
- [ ] Per-route configuration works
- [ ] Multiple rules per target work (rule stacking)
- [ ] 429 + Retry-After response format correct
- [ ] Circuit breaker handles Valkey failures
- [ ] Activation gate reduces Valkey load
### Performance
- [ ] Instance check <1ms P99
- [ ] Environment check <10ms P99
- [ ] 100k req/sec throughput maintained
- [ ] Valkey load <1000 ops/sec per instance
### Operational
- [ ] Metrics exported to OpenTelemetry
- [ ] Dashboards created (Grafana)
- [ ] Alerts configured (Alertmanager)
- [ ] Documentation complete
- [ ] Migration from service-level rate limiters complete
### Quality
- [ ] Unit test coverage >90%
- [ ] Integration tests pass (all scenarios)
- [ ] Load tests pass (k6 scenarios A-F)
- [ ] Failure injection tests pass
---
## Escalation & Support
### Blocked on Technical Decision
**Escalate to:** Architecture Guild (#stella-architecture)
**Response SLA:** 24 hours
### Blocked on Resource (Valkey, config, etc.)
**Escalate to:** Platform Engineering (#stella-platform)
**Response SLA:** 4 hours
### Blocked on Clarification
**Escalate to:** Router Team Lead (#stella-router-dev)
**Response SLA:** 2 hours
### Sprint Falling Behind Schedule
**Escalate to:** Project Manager (update master tracker with BLOCKED status)
**Action:** Add note in "Decisions & Risks" section
---
## File Structure (After Implementation)
### Actual (landed)
```
src/__Libraries/StellaOps.Router.Gateway/RateLimit/
CircuitBreaker.cs
EnvironmentRateLimiter.cs
InMemoryValkeyRateLimitStore.cs
InstanceRateLimiter.cs
LimitInheritanceResolver.cs
RateLimitConfig.cs
RateLimitDecision.cs
RateLimitMetrics.cs
RateLimitMiddleware.cs
RateLimitRule.cs
RateLimitRouteMatcher.cs
RateLimitService.cs
RateLimitServiceCollectionExtensions.cs
ValkeyRateLimitStore.cs
tests/StellaOps.Router.Gateway.Tests/
LimitInheritanceResolverTests.cs
InMemoryValkeyRateLimitStoreTests.cs
InstanceRateLimiterTests.cs
RateLimitConfigTests.cs
RateLimitRouteMatcherTests.cs
RateLimitServiceTests.cs
docs/router/rate-limiting-routes.md
```
### Original plan (reference)
```
src/__Libraries/StellaOps.Router.Gateway/
├── RateLimit/
│ ├── RateLimitConfig.cs
│ ├── IRateLimiter.cs
│ ├── InstanceRateLimiter.cs
│ ├── EnvironmentRateLimiter.cs
│ ├── RateLimitService.cs
│ ├── RateLimitMetrics.cs
│ ├── RateLimitDecision.cs
│ ├── ValkeyRateLimitStore.cs
│ ├── CircuitBreaker.cs
│ ├── LimitInheritanceResolver.cs
│ ├── Models/
│ │ ├── InstanceLimitsConfig.cs
│ │ ├── EnvironmentLimitsConfig.cs
│ │ ├── MicroserviceLimitsConfig.cs
│ │ ├── RouteLimitsConfig.cs
│ │ ├── RateLimitRule.cs
│ │ └── EffectiveLimits.cs
│ ├── RouteMatching/
│ │ ├── IRouteMatcher.cs
│ │ ├── RouteMatcher.cs
│ │ ├── ExactRouteMatcher.cs
│ │ ├── PrefixRouteMatcher.cs
│ │ └── RegexRouteMatcher.cs
│ ├── Internal/
│ │ └── SlidingWindowCounter.cs
│ └── Scripts/
│ └── rate_limit_check.lua
├── Middleware/
│ └── RateLimitMiddleware.cs
├── ApplicationBuilderExtensions.cs (modified)
└── ServiceCollectionExtensions.cs (modified)
__Tests/
├── RateLimit/
│ ├── InstanceRateLimiterTests.cs
│ ├── EnvironmentRateLimiterTests.cs
│ ├── ValkeyRateLimitStoreTests.cs
│ ├── RateLimitMiddlewareTests.cs
│ ├── ConfigurationTests.cs
│ ├── RouteMatchingTests.cs
│ └── InheritanceResolverTests.cs
tests/load/
└── router-rate-limiting-load-test.js
```
---
## Next Steps After Package Review
1. **Acknowledge receipt** of sprint package
2. **Set up development environment** (Valkey, dependencies)
3. **Read Implementation Guide** in full
4. **Start Sprint 1, Task 1.1** (Configuration Models)
5. **Update master tracker** as tasks complete
6. **Commit frequently** with clear messages
7. **Run tests after each task**
8. **Ask questions early** if blocked
---
## Configuration Quick Reference
### Minimal Config (Just Defaults)
```yaml
rate_limiting:
for_instance:
per_seconds: 300
max_requests: 30000
```
### Full Config (All Features)
```yaml
rate_limiting:
process_back_pressure_when_more_than_per_5min: 5000
for_instance:
rules:
- per_seconds: 300
max_requests: 30000
- per_seconds: 30
max_requests: 5000
for_environment:
valkey_bucket: "stella-router-rate-limit"
valkey_connection: "valkey.stellaops.local:6379"
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
half_open_timeout: 10
rules:
- per_seconds: 300
max_requests: 30000
microservices:
concelier:
rules:
- per_seconds: 1
max_requests: 10
- per_seconds: 3600
max_requests: 3000
scanner:
rules:
- per_seconds: 60
max_requests: 600
routes:
scan_submit:
pattern: "/api/scans"
match_type: exact
rules:
- per_seconds: 10
max_requests: 50
```
---
## Related Documentation
### Source Documents
- **Advisory:** `docs/product-advisories/unprocessed/15-Dec-2025 - Designing 202 + RetryAfter Backpressure Control.md`
- **Analysis Plan:** `C:\Users\VladimirMoushkov\.claude\plans\vectorized-kindling-rocket.md`
- **Architecture:** `docs/modules/platform/architecture-overview.md`
### Implementation Sprints
- **Master Tracker:** `SPRINT_1200_001_000_router_rate_limiting_master.md`
- **Sprint 1:** `SPRINT_1200_001_001_router_rate_limiting_core.md`
- **Sprint 2:** `SPRINT_1200_001_002_router_rate_limiting_per_route.md`
- **Sprint 3:** `SPRINT_1200_001_003_router_rate_limiting_rule_stacking.md`
- **Sprint 4:** `SPRINT_1200_001_004_router_rate_limiting_service_migration.md` (closed N/A)
- **Sprint 5:** `SPRINT_1200_001_005_router_rate_limiting_tests.md`
- **Sprint 6:** `SPRINT_1200_001_006_router_rate_limiting_docs.md`
### Technical Guides
- **Implementation Guide:** `SPRINT_1200_001_IMPLEMENTATION_GUIDE.md` (comprehensive)
- **HTTP 429 Semantics:** RFC 6585
- **Valkey Documentation:** https://valkey.io/docs/
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-17 | Initial sprint package created |
---
**Already implemented.** Review the master tracker and run `dotnet test StellaOps.Router.slnx -c Release`.