# component_architecture_riskengine.md - **Stella Ops RiskEngine** (2025Q4) > Risk scoring runtime with pluggable providers and explainability. > **Scope.** Implementation-ready architecture for **RiskEngine**: the scoring runtime that computes Risk Scoring Profiles across deployments while preserving provenance and explainability. Covers scoring workers, providers, caching, and integration with Policy Engine. --- ## 0) Mission & boundaries **Mission.** Compute **deterministic, explainable risk scores** for vulnerabilities by aggregating signals from multiple data sources (EPSS, CVSS, KEV, VEX, reachability). Produce audit trails and explainability payloads for every scoring decision. **Boundaries.** * RiskEngine **does not** make PASS/FAIL decisions. It provides scores to the Policy Engine. * RiskEngine **does not** own vulnerability data. It consumes from Concelier, Excititor, and Signals. * Scoring is **deterministic**: same inputs produce identical scores. * Supports **offline/air-gapped** operation via factor bundles. --- ## 1) Solution & project layout ``` src/RiskEngine/StellaOps.RiskEngine/ ├─ StellaOps.RiskEngine.Core/ # Scoring orchestrators, provider contracts │ ├─ Providers/ │ │ ├─ IRiskScoreProvider.cs # Provider interface │ │ ├─ EpssProvider.cs # EPSS score provider │ │ ├─ CvssKevProvider.cs # CVSS + KEV provider │ │ ├─ VexGateProvider.cs # VEX status provider │ │ ├─ FixExposureProvider.cs # Fix availability provider │ │ └─ DefaultTransformsProvider.cs # Score transformations │ ├─ Contracts/ │ │ ├─ ScoreRequest.cs # Scoring request DTO │ │ └─ RiskScoreResult.cs # Scoring result with explanation │ └─ Services/ │ ├─ RiskScoreWorker.cs # Scoring job executor │ └─ RiskScoreQueue.cs # Job queue management │ ├─ StellaOps.RiskEngine.Infrastructure/ # Persistence, caching, connectors │ └─ Stores/ │ └─ InMemoryRiskScoreResultStore.cs │ ├─ StellaOps.RiskEngine.WebService/ # REST API for jobs and results │ └─ Program.cs │ ├─ StellaOps.RiskEngine.Worker/ # Background scoring workers │ ├─ Program.cs │ └─ Worker.cs │ └─ StellaOps.RiskEngine.Tests/ # Unit and integration tests ``` --- ## 2) External dependencies * **PostgreSQL** - Score persistence and job state * **Concelier** - Vulnerability advisory data, EPSS scores * **Excititor** - VEX statements * **Signals** - Reachability and runtime signals * **Policy Engine** - Consumes risk scores for decision-making * **Authority** - Authentication and authorization * **Valkey/Redis** - Score caching (optional) --- ## 3) Contracts & data model ### 3.1 ScoreRequest ```csharp public sealed record ScoreRequest { public required string VulnerabilityId { get; init; } // CVE or vuln ID public required string ArtifactId { get; init; } // PURL or component ID public string? TenantId { get; init; } public string? ContextId { get; init; } // Scan or assessment ID public IReadOnlyList? EnabledProviders { get; init; } } ``` ### 3.2 RiskScoreResult ```csharp public sealed record RiskScoreResult { public required string RequestId { get; init; } public required decimal FinalScore { get; init; } // 0.0-10.0 public required string Tier { get; init; } // Critical/High/Medium/Low/Info public required DateTimeOffset ComputedAt { get; init; } public required IReadOnlyList Contributions { get; init; } public required ExplainabilityPayload Explanation { get; init; } } public sealed record ProviderContribution { public required string ProviderId { get; init; } public required decimal RawScore { get; init; } public required decimal Weight { get; init; } public required decimal WeightedScore { get; init; } public string? FactorSource { get; init; } // Where data came from public DateTimeOffset? FactorTimestamp { get; init; } // When factor was computed } ``` ### 3.3 Provider Interface ```csharp public interface IRiskScoreProvider { string ProviderId { get; } decimal DefaultWeight { get; } TimeSpan CacheTtl { get; } Task ComputeAsync( ScoreRequest request, CancellationToken ct); Task IsHealthyAsync(CancellationToken ct); } ``` --- ## 4) Score Providers ### 4.1 Built-in Providers | Provider | Data Source | Weight | Description | |----------|-------------|--------|-------------| | `epss` | Concelier/EPSS | 0.25 | EPSS probability score (0-1 → 0-10) | | `cvss-kev` | Concelier | 0.30 | CVSS base + KEV boost | | `vex-gate` | Excititor | 0.20 | VEX status (affected/not_affected) | | `fix-exposure` | Concelier | 0.15 | Fix availability window | | `reachability` | Signals | 0.10 | Code path reachability | ### 4.2 Score Computation ``` FinalScore = Σ(provider.weight × provider.score) / Σ(provider.weight) Tier mapping: 9.0-10.0 → Critical 7.0-8.9 → High 4.0-6.9 → Medium 1.0-3.9 → Low 0.0-0.9 → Info ``` ### 4.3 Provider Data Sources ```csharp public interface IEpssSources { Task GetScoreAsync(string cveId, CancellationToken ct); } public interface ICvssKevSources { Task GetCvssAsync(string cveId, CancellationToken ct); Task IsKevAsync(string cveId, CancellationToken ct); } ``` --- ## 5) REST API (RiskEngine.WebService) All under `/api/v1/risk`. Auth: **OpTok**. ``` POST /scores { request: ScoreRequest } → { jobId } GET /scores/{jobId} → { result: RiskScoreResult, status } GET /scores/{jobId}/explain → { explanation: ExplainabilityPayload } POST /batch { requests: ScoreRequest[] } → { batchId } GET /batch/{batchId} → { results: RiskScoreResult[], status } GET /providers → { providers: ProviderInfo[] } GET /providers/{id}/health → { healthy: bool, lastCheck } GET /healthz | /readyz | /metrics ``` --- ## 6) Configuration (YAML) ```yaml RiskEngine: Postgres: ConnectionString: "Host=postgres;Database=risk;..." Cache: Enabled: true Provider: "valkey" ConnectionString: "redis://valkey:6379" DefaultTtl: "00:15:00" Providers: Epss: Enabled: true Weight: 0.25 CacheTtl: "01:00:00" Source: "concelier" CvssKev: Enabled: true Weight: 0.30 KevBoost: 2.0 VexGate: Enabled: true Weight: 0.20 NotAffectedScore: 0.0 AffectedScore: 10.0 FixExposure: Enabled: true Weight: 0.15 NoFixPenalty: 1.5 Reachability: Enabled: true Weight: 0.10 UnreachableDiscount: 0.5 Worker: Concurrency: 4 BatchSize: 100 PollInterval: "00:00:05" Offline: FactorBundlePath: "/data/risk-factors" AllowStaleData: true MaxStalenessHours: 168 ``` --- ## 7) Security & compliance * **AuthN/Z**: Authority-issued OpToks with `risk.score` scope * **Tenant isolation**: Scores scoped by tenant ID * **Audit trail**: All scoring decisions logged with inputs and factors * **No PII**: Only vulnerability and artifact identifiers processed --- ## 8) Performance targets * **Single score**: < 100ms P95 (cached factors) * **Batch scoring**: < 500ms P95 for 100 items * **Provider health check**: < 1s timeout * **Cache hit rate**: > 80% for repeated CVEs --- ## 9) Observability **Metrics:** * `risk.scores.computed_total{tier,provider}` * `risk.scores.duration_seconds` * `risk.providers.health{provider,status}` * `risk.cache.hits_total` / `risk.cache.misses_total` * `risk.batch.size_histogram` **Tracing:** Spans for each provider contribution, cache operations, and aggregation. **Logs:** Structured logs with `cve_id`, `artifact_id`, `tenant`, `final_score`. --- ## 10) Testing matrix * **Provider tests**: Each provider returns expected scores for fixture data * **Aggregation tests**: Weighted combination produces correct final score * **Determinism tests**: Same inputs produce identical scores * **Cache tests**: Cache hit/miss behavior correct * **Offline tests**: Factor bundles load and score correctly * **Integration tests**: Full scoring pipeline with mocked data sources --- ## 11) Offline/Air-Gap Support ### Factor Bundles Pre-computed factor data for offline operation: ``` /data/risk-factors/ ├─ epss/ │ └─ epss-2025-01-15.json.gz ├─ cvss/ │ └─ cvss-2025-01-15.json.gz ├─ kev/ │ └─ kev-2025-01-15.json └─ manifest.json ``` ### Staleness Handling When operating offline, scores include staleness indicators: ```json { "finalScore": 7.2, "dataFreshness": { "epss": { "age": "48h", "stale": false }, "kev": { "age": "24h", "stale": false } } } ``` --- ## Related Documentation * Policy scoring: `../policy/architecture.md` * Concelier feeds: `../concelier/architecture.md` * Excititor VEX: `../excititor/architecture.md` * Signals reachability: `../signals/architecture.md`