save progress

This commit is contained in:
StellaOps Bot
2025-12-26 22:03:32 +02:00
parent 9a4cd2e0f7
commit e6c47c8f50
3634 changed files with 253222 additions and 56632 deletions

View File

@@ -0,0 +1,325 @@
# component_architecture_riskengine.md - **Stella Ops RiskEngine** (2025Q4)
> Risk scoring runtime with pluggable providers and explainability.
> **Scope.** Implementation-ready architecture for **RiskEngine**: the scoring runtime that computes Risk Scoring Profiles across deployments while preserving provenance and explainability. Covers scoring workers, providers, caching, and integration with Policy Engine.
---
## 0) Mission & boundaries
**Mission.** Compute **deterministic, explainable risk scores** for vulnerabilities by aggregating signals from multiple data sources (EPSS, CVSS, KEV, VEX, reachability). Produce audit trails and explainability payloads for every scoring decision.
**Boundaries.**
* RiskEngine **does not** make PASS/FAIL decisions. It provides scores to the Policy Engine.
* RiskEngine **does not** own vulnerability data. It consumes from Concelier, Excititor, and Signals.
* Scoring is **deterministic**: same inputs produce identical scores.
* Supports **offline/air-gapped** operation via factor bundles.
---
## 1) Solution & project layout
```
src/RiskEngine/StellaOps.RiskEngine/
├─ StellaOps.RiskEngine.Core/ # Scoring orchestrators, provider contracts
│ ├─ Providers/
│ │ ├─ IRiskScoreProvider.cs # Provider interface
│ │ ├─ EpssProvider.cs # EPSS score provider
│ │ ├─ CvssKevProvider.cs # CVSS + KEV provider
│ │ ├─ VexGateProvider.cs # VEX status provider
│ │ ├─ FixExposureProvider.cs # Fix availability provider
│ │ └─ DefaultTransformsProvider.cs # Score transformations
│ ├─ Contracts/
│ │ ├─ ScoreRequest.cs # Scoring request DTO
│ │ └─ RiskScoreResult.cs # Scoring result with explanation
│ └─ Services/
│ ├─ RiskScoreWorker.cs # Scoring job executor
│ └─ RiskScoreQueue.cs # Job queue management
├─ StellaOps.RiskEngine.Infrastructure/ # Persistence, caching, connectors
│ └─ Stores/
│ └─ InMemoryRiskScoreResultStore.cs
├─ StellaOps.RiskEngine.WebService/ # REST API for jobs and results
│ └─ Program.cs
├─ StellaOps.RiskEngine.Worker/ # Background scoring workers
│ ├─ Program.cs
│ └─ Worker.cs
└─ StellaOps.RiskEngine.Tests/ # Unit and integration tests
```
---
## 2) External dependencies
* **PostgreSQL** - Score persistence and job state
* **Concelier** - Vulnerability advisory data, EPSS scores
* **Excititor** - VEX statements
* **Signals** - Reachability and runtime signals
* **Policy Engine** - Consumes risk scores for decision-making
* **Authority** - Authentication and authorization
* **Valkey/Redis** - Score caching (optional)
---
## 3) Contracts & data model
### 3.1 ScoreRequest
```csharp
public sealed record ScoreRequest
{
public required string VulnerabilityId { get; init; } // CVE or vuln ID
public required string ArtifactId { get; init; } // PURL or component ID
public string? TenantId { get; init; }
public string? ContextId { get; init; } // Scan or assessment ID
public IReadOnlyList<string>? EnabledProviders { get; init; }
}
```
### 3.2 RiskScoreResult
```csharp
public sealed record RiskScoreResult
{
public required string RequestId { get; init; }
public required decimal FinalScore { get; init; } // 0.0-10.0
public required string Tier { get; init; } // Critical/High/Medium/Low/Info
public required DateTimeOffset ComputedAt { get; init; }
public required IReadOnlyList<ProviderContribution> Contributions { get; init; }
public required ExplainabilityPayload Explanation { get; init; }
}
public sealed record ProviderContribution
{
public required string ProviderId { get; init; }
public required decimal RawScore { get; init; }
public required decimal Weight { get; init; }
public required decimal WeightedScore { get; init; }
public string? FactorSource { get; init; } // Where data came from
public DateTimeOffset? FactorTimestamp { get; init; } // When factor was computed
}
```
### 3.3 Provider Interface
```csharp
public interface IRiskScoreProvider
{
string ProviderId { get; }
decimal DefaultWeight { get; }
TimeSpan CacheTtl { get; }
Task<ProviderResult> ComputeAsync(
ScoreRequest request,
CancellationToken ct);
Task<bool> IsHealthyAsync(CancellationToken ct);
}
```
---
## 4) Score Providers
### 4.1 Built-in Providers
| Provider | Data Source | Weight | Description |
|----------|-------------|--------|-------------|
| `epss` | Concelier/EPSS | 0.25 | EPSS probability score (0-1 → 0-10) |
| `cvss-kev` | Concelier | 0.30 | CVSS base + KEV boost |
| `vex-gate` | Excititor | 0.20 | VEX status (affected/not_affected) |
| `fix-exposure` | Concelier | 0.15 | Fix availability window |
| `reachability` | Signals | 0.10 | Code path reachability |
### 4.2 Score Computation
```
FinalScore = Σ(provider.weight × provider.score) / Σ(provider.weight)
Tier mapping:
9.0-10.0 → Critical
7.0-8.9 → High
4.0-6.9 → Medium
1.0-3.9 → Low
0.0-0.9 → Info
```
### 4.3 Provider Data Sources
```csharp
public interface IEpssSources
{
Task<EpssScore?> GetScoreAsync(string cveId, CancellationToken ct);
}
public interface ICvssKevSources
{
Task<CvssData?> GetCvssAsync(string cveId, CancellationToken ct);
Task<bool> IsKevAsync(string cveId, CancellationToken ct);
}
```
---
## 5) REST API (RiskEngine.WebService)
All under `/api/v1/risk`. Auth: **OpTok**.
```
POST /scores { request: ScoreRequest } → { jobId }
GET /scores/{jobId} → { result: RiskScoreResult, status }
GET /scores/{jobId}/explain → { explanation: ExplainabilityPayload }
POST /batch { requests: ScoreRequest[] } → { batchId }
GET /batch/{batchId} → { results: RiskScoreResult[], status }
GET /providers → { providers: ProviderInfo[] }
GET /providers/{id}/health → { healthy: bool, lastCheck }
GET /healthz | /readyz | /metrics
```
---
## 6) Configuration (YAML)
```yaml
RiskEngine:
Postgres:
ConnectionString: "Host=postgres;Database=risk;..."
Cache:
Enabled: true
Provider: "valkey"
ConnectionString: "redis://valkey:6379"
DefaultTtl: "00:15:00"
Providers:
Epss:
Enabled: true
Weight: 0.25
CacheTtl: "01:00:00"
Source: "concelier"
CvssKev:
Enabled: true
Weight: 0.30
KevBoost: 2.0
VexGate:
Enabled: true
Weight: 0.20
NotAffectedScore: 0.0
AffectedScore: 10.0
FixExposure:
Enabled: true
Weight: 0.15
NoFixPenalty: 1.5
Reachability:
Enabled: true
Weight: 0.10
UnreachableDiscount: 0.5
Worker:
Concurrency: 4
BatchSize: 100
PollInterval: "00:00:05"
Offline:
FactorBundlePath: "/data/risk-factors"
AllowStaleData: true
MaxStalenessHours: 168
```
---
## 7) Security & compliance
* **AuthN/Z**: Authority-issued OpToks with `risk.score` scope
* **Tenant isolation**: Scores scoped by tenant ID
* **Audit trail**: All scoring decisions logged with inputs and factors
* **No PII**: Only vulnerability and artifact identifiers processed
---
## 8) Performance targets
* **Single score**: < 100ms P95 (cached factors)
* **Batch scoring**: < 500ms P95 for 100 items
* **Provider health check**: < 1s timeout
* **Cache hit rate**: > 80% for repeated CVEs
---
## 9) Observability
**Metrics:**
* `risk.scores.computed_total{tier,provider}`
* `risk.scores.duration_seconds`
* `risk.providers.health{provider,status}`
* `risk.cache.hits_total` / `risk.cache.misses_total`
* `risk.batch.size_histogram`
**Tracing:** Spans for each provider contribution, cache operations, and aggregation.
**Logs:** Structured logs with `cve_id`, `artifact_id`, `tenant`, `final_score`.
---
## 10) Testing matrix
* **Provider tests**: Each provider returns expected scores for fixture data
* **Aggregation tests**: Weighted combination produces correct final score
* **Determinism tests**: Same inputs produce identical scores
* **Cache tests**: Cache hit/miss behavior correct
* **Offline tests**: Factor bundles load and score correctly
* **Integration tests**: Full scoring pipeline with mocked data sources
---
## 11) Offline/Air-Gap Support
### Factor Bundles
Pre-computed factor data for offline operation:
```
/data/risk-factors/
├─ epss/
│ └─ epss-2025-01-15.json.gz
├─ cvss/
│ └─ cvss-2025-01-15.json.gz
├─ kev/
│ └─ kev-2025-01-15.json
└─ manifest.json
```
### Staleness Handling
When operating offline, scores include staleness indicators:
```json
{
"finalScore": 7.2,
"dataFreshness": {
"epss": { "age": "48h", "stale": false },
"kev": { "age": "24h", "stale": false }
}
}
```
---
## Related Documentation
* Policy scoring: `../policy/architecture.md`
* Concelier feeds: `../concelier/architecture.md`
* Excititor VEX: `../excititor/architecture.md`
* Signals reachability: `../signals/architecture.md`