326 lines
9.1 KiB
Markdown
326 lines
9.1 KiB
Markdown
# component_architecture_riskengine.md - **Stella Ops RiskEngine** (2025Q4)
|
||
|
||
> Risk scoring runtime with pluggable providers and explainability.
|
||
|
||
> **Scope.** Implementation-ready architecture for **RiskEngine**: the scoring runtime that computes Risk Scoring Profiles across deployments while preserving provenance and explainability. Covers scoring workers, providers, caching, and integration with Policy Engine.
|
||
|
||
---
|
||
|
||
## 0) Mission & boundaries
|
||
|
||
**Mission.** Compute **deterministic, explainable risk scores** for vulnerabilities by aggregating signals from multiple data sources (EPSS, CVSS, KEV, VEX, reachability). Produce audit trails and explainability payloads for every scoring decision.
|
||
|
||
**Boundaries.**
|
||
|
||
* RiskEngine **does not** make PASS/FAIL decisions. It provides scores to the Policy Engine.
|
||
* RiskEngine **does not** own vulnerability data. It consumes from Concelier, Excititor, and Signals.
|
||
* Scoring is **deterministic**: same inputs produce identical scores.
|
||
* Supports **offline/air-gapped** operation via factor bundles.
|
||
|
||
---
|
||
|
||
## 1) Solution & project layout
|
||
|
||
```
|
||
src/RiskEngine/StellaOps.RiskEngine/
|
||
├─ StellaOps.RiskEngine.Core/ # Scoring orchestrators, provider contracts
|
||
│ ├─ Providers/
|
||
│ │ ├─ IRiskScoreProvider.cs # Provider interface
|
||
│ │ ├─ EpssProvider.cs # EPSS score provider
|
||
│ │ ├─ CvssKevProvider.cs # CVSS + KEV provider
|
||
│ │ ├─ VexGateProvider.cs # VEX status provider
|
||
│ │ ├─ FixExposureProvider.cs # Fix availability provider
|
||
│ │ └─ DefaultTransformsProvider.cs # Score transformations
|
||
│ ├─ Contracts/
|
||
│ │ ├─ ScoreRequest.cs # Scoring request DTO
|
||
│ │ └─ RiskScoreResult.cs # Scoring result with explanation
|
||
│ └─ Services/
|
||
│ ├─ RiskScoreWorker.cs # Scoring job executor
|
||
│ └─ RiskScoreQueue.cs # Job queue management
|
||
│
|
||
├─ StellaOps.RiskEngine.Infrastructure/ # Persistence, caching, connectors
|
||
│ └─ Stores/
|
||
│ └─ InMemoryRiskScoreResultStore.cs
|
||
│
|
||
├─ StellaOps.RiskEngine.WebService/ # REST API for jobs and results
|
||
│ └─ Program.cs
|
||
│
|
||
├─ StellaOps.RiskEngine.Worker/ # Background scoring workers
|
||
│ ├─ Program.cs
|
||
│ └─ Worker.cs
|
||
│
|
||
└─ StellaOps.RiskEngine.Tests/ # Unit and integration tests
|
||
```
|
||
|
||
---
|
||
|
||
## 2) External dependencies
|
||
|
||
* **PostgreSQL** - Score persistence and job state
|
||
* **Concelier** - Vulnerability advisory data, EPSS scores
|
||
* **Excititor** - VEX statements
|
||
* **Signals** - Reachability and runtime signals
|
||
* **Policy Engine** - Consumes risk scores for decision-making
|
||
* **Authority** - Authentication and authorization
|
||
* **Valkey/Redis** - Score caching (optional)
|
||
|
||
---
|
||
|
||
## 3) Contracts & data model
|
||
|
||
### 3.1 ScoreRequest
|
||
|
||
```csharp
|
||
public sealed record ScoreRequest
|
||
{
|
||
public required string VulnerabilityId { get; init; } // CVE or vuln ID
|
||
public required string ArtifactId { get; init; } // PURL or component ID
|
||
public string? TenantId { get; init; }
|
||
public string? ContextId { get; init; } // Scan or assessment ID
|
||
public IReadOnlyList<string>? EnabledProviders { get; init; }
|
||
}
|
||
```
|
||
|
||
### 3.2 RiskScoreResult
|
||
|
||
```csharp
|
||
public sealed record RiskScoreResult
|
||
{
|
||
public required string RequestId { get; init; }
|
||
public required decimal FinalScore { get; init; } // 0.0-10.0
|
||
public required string Tier { get; init; } // Critical/High/Medium/Low/Info
|
||
public required DateTimeOffset ComputedAt { get; init; }
|
||
public required IReadOnlyList<ProviderContribution> Contributions { get; init; }
|
||
public required ExplainabilityPayload Explanation { get; init; }
|
||
}
|
||
|
||
public sealed record ProviderContribution
|
||
{
|
||
public required string ProviderId { get; init; }
|
||
public required decimal RawScore { get; init; }
|
||
public required decimal Weight { get; init; }
|
||
public required decimal WeightedScore { get; init; }
|
||
public string? FactorSource { get; init; } // Where data came from
|
||
public DateTimeOffset? FactorTimestamp { get; init; } // When factor was computed
|
||
}
|
||
```
|
||
|
||
### 3.3 Provider Interface
|
||
|
||
```csharp
|
||
public interface IRiskScoreProvider
|
||
{
|
||
string ProviderId { get; }
|
||
decimal DefaultWeight { get; }
|
||
TimeSpan CacheTtl { get; }
|
||
|
||
Task<ProviderResult> ComputeAsync(
|
||
ScoreRequest request,
|
||
CancellationToken ct);
|
||
|
||
Task<bool> IsHealthyAsync(CancellationToken ct);
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 4) Score Providers
|
||
|
||
### 4.1 Built-in Providers
|
||
|
||
| Provider | Data Source | Weight | Description |
|
||
|----------|-------------|--------|-------------|
|
||
| `epss` | Concelier/EPSS | 0.25 | EPSS probability score (0-1 → 0-10) |
|
||
| `cvss-kev` | Concelier | 0.30 | CVSS base + KEV boost |
|
||
| `vex-gate` | Excititor | 0.20 | VEX status (affected/not_affected) |
|
||
| `fix-exposure` | Concelier | 0.15 | Fix availability window |
|
||
| `reachability` | Signals | 0.10 | Code path reachability |
|
||
|
||
### 4.2 Score Computation
|
||
|
||
```
|
||
FinalScore = Σ(provider.weight × provider.score) / Σ(provider.weight)
|
||
|
||
Tier mapping:
|
||
9.0-10.0 → Critical
|
||
7.0-8.9 → High
|
||
4.0-6.9 → Medium
|
||
1.0-3.9 → Low
|
||
0.0-0.9 → Info
|
||
```
|
||
|
||
### 4.3 Provider Data Sources
|
||
|
||
```csharp
|
||
public interface IEpssSources
|
||
{
|
||
Task<EpssScore?> GetScoreAsync(string cveId, CancellationToken ct);
|
||
}
|
||
|
||
public interface ICvssKevSources
|
||
{
|
||
Task<CvssData?> GetCvssAsync(string cveId, CancellationToken ct);
|
||
Task<bool> IsKevAsync(string cveId, CancellationToken ct);
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 5) REST API (RiskEngine.WebService)
|
||
|
||
All under `/api/v1/risk`. Auth: **OpTok**.
|
||
|
||
```
|
||
POST /scores { request: ScoreRequest } → { jobId }
|
||
GET /scores/{jobId} → { result: RiskScoreResult, status }
|
||
GET /scores/{jobId}/explain → { explanation: ExplainabilityPayload }
|
||
|
||
POST /batch { requests: ScoreRequest[] } → { batchId }
|
||
GET /batch/{batchId} → { results: RiskScoreResult[], status }
|
||
|
||
GET /providers → { providers: ProviderInfo[] }
|
||
GET /providers/{id}/health → { healthy: bool, lastCheck }
|
||
|
||
GET /healthz | /readyz | /metrics
|
||
```
|
||
|
||
---
|
||
|
||
## 6) Configuration (YAML)
|
||
|
||
```yaml
|
||
RiskEngine:
|
||
Postgres:
|
||
ConnectionString: "Host=postgres;Database=risk;..."
|
||
|
||
Cache:
|
||
Enabled: true
|
||
Provider: "valkey"
|
||
ConnectionString: "redis://valkey:6379"
|
||
DefaultTtl: "00:15:00"
|
||
|
||
Providers:
|
||
Epss:
|
||
Enabled: true
|
||
Weight: 0.25
|
||
CacheTtl: "01:00:00"
|
||
Source: "concelier"
|
||
|
||
CvssKev:
|
||
Enabled: true
|
||
Weight: 0.30
|
||
KevBoost: 2.0
|
||
|
||
VexGate:
|
||
Enabled: true
|
||
Weight: 0.20
|
||
NotAffectedScore: 0.0
|
||
AffectedScore: 10.0
|
||
|
||
FixExposure:
|
||
Enabled: true
|
||
Weight: 0.15
|
||
NoFixPenalty: 1.5
|
||
|
||
Reachability:
|
||
Enabled: true
|
||
Weight: 0.10
|
||
UnreachableDiscount: 0.5
|
||
|
||
Worker:
|
||
Concurrency: 4
|
||
BatchSize: 100
|
||
PollInterval: "00:00:05"
|
||
|
||
Offline:
|
||
FactorBundlePath: "/data/risk-factors"
|
||
AllowStaleData: true
|
||
MaxStalenessHours: 168
|
||
```
|
||
|
||
---
|
||
|
||
## 7) Security & compliance
|
||
|
||
* **AuthN/Z**: Authority-issued OpToks with `risk.score` scope
|
||
* **Tenant isolation**: Scores scoped by tenant ID
|
||
* **Audit trail**: All scoring decisions logged with inputs and factors
|
||
* **No PII**: Only vulnerability and artifact identifiers processed
|
||
|
||
---
|
||
|
||
## 8) Performance targets
|
||
|
||
* **Single score**: < 100ms P95 (cached factors)
|
||
* **Batch scoring**: < 500ms P95 for 100 items
|
||
* **Provider health check**: < 1s timeout
|
||
* **Cache hit rate**: > 80% for repeated CVEs
|
||
|
||
---
|
||
|
||
## 9) Observability
|
||
|
||
**Metrics:**
|
||
* `risk.scores.computed_total{tier,provider}`
|
||
* `risk.scores.duration_seconds`
|
||
* `risk.providers.health{provider,status}`
|
||
* `risk.cache.hits_total` / `risk.cache.misses_total`
|
||
* `risk.batch.size_histogram`
|
||
|
||
**Tracing:** Spans for each provider contribution, cache operations, and aggregation.
|
||
|
||
**Logs:** Structured logs with `cve_id`, `artifact_id`, `tenant`, `final_score`.
|
||
|
||
---
|
||
|
||
## 10) Testing matrix
|
||
|
||
* **Provider tests**: Each provider returns expected scores for fixture data
|
||
* **Aggregation tests**: Weighted combination produces correct final score
|
||
* **Determinism tests**: Same inputs produce identical scores
|
||
* **Cache tests**: Cache hit/miss behavior correct
|
||
* **Offline tests**: Factor bundles load and score correctly
|
||
* **Integration tests**: Full scoring pipeline with mocked data sources
|
||
|
||
---
|
||
|
||
## 11) Offline/Air-Gap Support
|
||
|
||
### Factor Bundles
|
||
|
||
Pre-computed factor data for offline operation:
|
||
|
||
```
|
||
/data/risk-factors/
|
||
├─ epss/
|
||
│ └─ epss-2025-01-15.json.gz
|
||
├─ cvss/
|
||
│ └─ cvss-2025-01-15.json.gz
|
||
├─ kev/
|
||
│ └─ kev-2025-01-15.json
|
||
└─ manifest.json
|
||
```
|
||
|
||
### Staleness Handling
|
||
|
||
When operating offline, scores include staleness indicators:
|
||
|
||
```json
|
||
{
|
||
"finalScore": 7.2,
|
||
"dataFreshness": {
|
||
"epss": { "age": "48h", "stale": false },
|
||
"kev": { "age": "24h", "stale": false }
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
* Policy scoring: `../policy/architecture.md`
|
||
* Concelier feeds: `../concelier/architecture.md`
|
||
* Excititor VEX: `../excititor/architecture.md`
|
||
* Signals reachability: `../signals/architecture.md`
|