9.1 KiB
component_architecture_riskengine.md - Stella Ops RiskEngine (2025Q4)
Risk scoring runtime with pluggable providers and explainability.
Scope. Implementation-ready architecture for RiskEngine: the scoring runtime that computes Risk Scoring Profiles across deployments while preserving provenance and explainability. Covers scoring workers, providers, caching, and integration with Policy Engine.
0) Mission & boundaries
Mission. Compute deterministic, explainable risk scores for vulnerabilities by aggregating signals from multiple data sources (EPSS, CVSS, KEV, VEX, reachability). Produce audit trails and explainability payloads for every scoring decision.
Boundaries.
- RiskEngine does not make PASS/FAIL decisions. It provides scores to the Policy Engine.
- RiskEngine does not own vulnerability data. It consumes from Concelier, Excititor, and Signals.
- Scoring is deterministic: same inputs produce identical scores.
- Supports offline/air-gapped operation via factor bundles.
1) Solution & project layout
src/RiskEngine/StellaOps.RiskEngine/
├─ StellaOps.RiskEngine.Core/ # Scoring orchestrators, provider contracts
│ ├─ Providers/
│ │ ├─ IRiskScoreProvider.cs # Provider interface
│ │ ├─ EpssProvider.cs # EPSS score provider
│ │ ├─ CvssKevProvider.cs # CVSS + KEV provider
│ │ ├─ VexGateProvider.cs # VEX status provider
│ │ ├─ FixExposureProvider.cs # Fix availability provider
│ │ └─ DefaultTransformsProvider.cs # Score transformations
│ ├─ Contracts/
│ │ ├─ ScoreRequest.cs # Scoring request DTO
│ │ └─ RiskScoreResult.cs # Scoring result with explanation
│ └─ Services/
│ ├─ RiskScoreWorker.cs # Scoring job executor
│ └─ RiskScoreQueue.cs # Job queue management
│
├─ StellaOps.RiskEngine.Infrastructure/ # Persistence, caching, connectors
│ └─ Stores/
│ └─ InMemoryRiskScoreResultStore.cs
│
├─ StellaOps.RiskEngine.WebService/ # REST API for jobs and results
│ └─ Program.cs
│
├─ StellaOps.RiskEngine.Worker/ # Background scoring workers
│ ├─ Program.cs
│ └─ Worker.cs
│
└─ StellaOps.RiskEngine.Tests/ # Unit and integration tests
2) External dependencies
- PostgreSQL - Score persistence and job state
- Concelier - Vulnerability advisory data, EPSS scores
- Excititor - VEX statements
- Signals - Reachability and runtime signals
- Policy Engine - Consumes risk scores for decision-making
- Authority - Authentication and authorization
- Valkey/Redis - Score caching (optional)
3) Contracts & data model
3.1 ScoreRequest
public sealed record ScoreRequest
{
public required string VulnerabilityId { get; init; } // CVE or vuln ID
public required string ArtifactId { get; init; } // PURL or component ID
public string? TenantId { get; init; }
public string? ContextId { get; init; } // Scan or assessment ID
public IReadOnlyList<string>? EnabledProviders { get; init; }
}
3.2 RiskScoreResult
public sealed record RiskScoreResult
{
public required string RequestId { get; init; }
public required decimal FinalScore { get; init; } // 0.0-10.0
public required string Tier { get; init; } // Critical/High/Medium/Low/Info
public required DateTimeOffset ComputedAt { get; init; }
public required IReadOnlyList<ProviderContribution> Contributions { get; init; }
public required ExplainabilityPayload Explanation { get; init; }
}
public sealed record ProviderContribution
{
public required string ProviderId { get; init; }
public required decimal RawScore { get; init; }
public required decimal Weight { get; init; }
public required decimal WeightedScore { get; init; }
public string? FactorSource { get; init; } // Where data came from
public DateTimeOffset? FactorTimestamp { get; init; } // When factor was computed
}
3.3 Provider Interface
public interface IRiskScoreProvider
{
string ProviderId { get; }
decimal DefaultWeight { get; }
TimeSpan CacheTtl { get; }
Task<ProviderResult> ComputeAsync(
ScoreRequest request,
CancellationToken ct);
Task<bool> IsHealthyAsync(CancellationToken ct);
}
4) Score Providers
4.1 Built-in Providers
| Provider | Data Source | Weight | Description |
|---|---|---|---|
epss |
Concelier/EPSS | 0.25 | EPSS probability score (0-1 → 0-10) |
cvss-kev |
Concelier | 0.30 | CVSS base + KEV boost |
vex-gate |
Excititor | 0.20 | VEX status (affected/not_affected) |
fix-exposure |
Concelier | 0.15 | Fix availability window |
reachability |
Signals | 0.10 | Code path reachability |
4.2 Score Computation
FinalScore = Σ(provider.weight × provider.score) / Σ(provider.weight)
Tier mapping:
9.0-10.0 → Critical
7.0-8.9 → High
4.0-6.9 → Medium
1.0-3.9 → Low
0.0-0.9 → Info
4.3 Provider Data Sources
public interface IEpssSources
{
Task<EpssScore?> GetScoreAsync(string cveId, CancellationToken ct);
}
public interface ICvssKevSources
{
Task<CvssData?> GetCvssAsync(string cveId, CancellationToken ct);
Task<bool> IsKevAsync(string cveId, CancellationToken ct);
}
5) REST API (RiskEngine.WebService)
All under /api/v1/risk. Auth: OpTok.
POST /scores { request: ScoreRequest } → { jobId }
GET /scores/{jobId} → { result: RiskScoreResult, status }
GET /scores/{jobId}/explain → { explanation: ExplainabilityPayload }
POST /batch { requests: ScoreRequest[] } → { batchId }
GET /batch/{batchId} → { results: RiskScoreResult[], status }
GET /providers → { providers: ProviderInfo[] }
GET /providers/{id}/health → { healthy: bool, lastCheck }
GET /healthz | /readyz | /metrics
6) Configuration (YAML)
RiskEngine:
Postgres:
ConnectionString: "Host=postgres;Database=risk;..."
Cache:
Enabled: true
Provider: "valkey"
ConnectionString: "redis://valkey:6379"
DefaultTtl: "00:15:00"
Providers:
Epss:
Enabled: true
Weight: 0.25
CacheTtl: "01:00:00"
Source: "concelier"
CvssKev:
Enabled: true
Weight: 0.30
KevBoost: 2.0
VexGate:
Enabled: true
Weight: 0.20
NotAffectedScore: 0.0
AffectedScore: 10.0
FixExposure:
Enabled: true
Weight: 0.15
NoFixPenalty: 1.5
Reachability:
Enabled: true
Weight: 0.10
UnreachableDiscount: 0.5
Worker:
Concurrency: 4
BatchSize: 100
PollInterval: "00:00:05"
Offline:
FactorBundlePath: "/data/risk-factors"
AllowStaleData: true
MaxStalenessHours: 168
7) Security & compliance
- AuthN/Z: Authority-issued OpToks with
risk.scorescope - Tenant isolation: Scores scoped by tenant ID
- Audit trail: All scoring decisions logged with inputs and factors
- No PII: Only vulnerability and artifact identifiers processed
8) Performance targets
- Single score: < 100ms P95 (cached factors)
- Batch scoring: < 500ms P95 for 100 items
- Provider health check: < 1s timeout
- Cache hit rate: > 80% for repeated CVEs
9) Observability
Metrics:
risk.scores.computed_total{tier,provider}risk.scores.duration_secondsrisk.providers.health{provider,status}risk.cache.hits_total/risk.cache.misses_totalrisk.batch.size_histogram
Tracing: Spans for each provider contribution, cache operations, and aggregation.
Logs: Structured logs with cve_id, artifact_id, tenant, final_score.
10) Testing matrix
- Provider tests: Each provider returns expected scores for fixture data
- Aggregation tests: Weighted combination produces correct final score
- Determinism tests: Same inputs produce identical scores
- Cache tests: Cache hit/miss behavior correct
- Offline tests: Factor bundles load and score correctly
- Integration tests: Full scoring pipeline with mocked data sources
11) Offline/Air-Gap Support
Factor Bundles
Pre-computed factor data for offline operation:
/data/risk-factors/
├─ epss/
│ └─ epss-2025-01-15.json.gz
├─ cvss/
│ └─ cvss-2025-01-15.json.gz
├─ kev/
│ └─ kev-2025-01-15.json
└─ manifest.json
Staleness Handling
When operating offline, scores include staleness indicators:
{
"finalScore": 7.2,
"dataFreshness": {
"epss": { "age": "48h", "stale": false },
"kev": { "age": "24h", "stale": false }
}
}
Related Documentation
- Policy scoring:
../policy/architecture.md - Concelier feeds:
../concelier/architecture.md - Excititor VEX:
../excititor/architecture.md - Signals reachability:
../signals/architecture.md