git.stella-ops.org/docs/modules/riskengine/architecture.md

# component_architecture_riskengine.md - **Stella Ops RiskEngine** (2025Q4)

> Risk scoring runtime with pluggable providers and explainability.

> **Scope.** Implementation-ready architecture for **RiskEngine**: the scoring runtime that computes Risk Scoring Profiles across deployments while preserving provenance and explainability. Covers scoring workers, providers, caching, and integration with Policy Engine.

---

## 0) Mission & boundaries

**Mission.** Compute **deterministic, explainable risk scores** for vulnerabilities by aggregating signals from multiple data sources (EPSS, CVSS, KEV, VEX, reachability). Produce audit trails and explainability payloads for every scoring decision.

**Boundaries.**

* RiskEngine **does not** make PASS/FAIL decisions. It provides scores to the Policy Engine.
* RiskEngine **does not** own vulnerability data. It consumes from Concelier, Excititor, and Signals.
* Scoring is **deterministic**: same inputs produce identical scores.
* Supports **offline/air-gapped** operation via factor bundles.

---

## 1) Solution & project layout

```
src/RiskEngine/StellaOps.RiskEngine/
 ├─ StellaOps.RiskEngine.Core/           # Scoring orchestrators, provider contracts
 │   ├─ Providers/
 │   │   ├─ IRiskScoreProvider.cs        # Provider interface
 │   │   ├─ EpssProvider.cs              # EPSS score provider
 │   │   ├─ CvssKevProvider.cs           # CVSS + KEV provider
 │   │   ├─ VexGateProvider.cs           # VEX status provider
 │   │   ├─ FixExposureProvider.cs       # Fix availability provider
 │   │   └─ DefaultTransformsProvider.cs # Score transformations
 │   ├─ Contracts/
 │   │   ├─ ScoreRequest.cs              # Scoring request DTO
 │   │   └─ RiskScoreResult.cs           # Scoring result with explanation
 │   └─ Services/
 │       ├─ RiskScoreWorker.cs           # Scoring job executor
 │       └─ RiskScoreQueue.cs            # Job queue management
 │
 ├─ StellaOps.RiskEngine.Infrastructure/ # Persistence, caching, connectors
 │   └─ Stores/
 │       └─ InMemoryRiskScoreResultStore.cs
 │
 ├─ StellaOps.RiskEngine.WebService/     # REST API for jobs and results
 │   └─ Program.cs
 │
 ├─ StellaOps.RiskEngine.Worker/         # Background scoring workers
 │   ├─ Program.cs
 │   └─ Worker.cs
 │
 └─ StellaOps.RiskEngine.Tests/          # Unit and integration tests
```

---

## 2) External dependencies

* **PostgreSQL** - Score persistence and job state
* **Concelier** - Vulnerability advisory data, EPSS scores
* **Excititor** - VEX statements
* **Signals** - Reachability and runtime signals
* **Policy Engine** - Consumes risk scores for decision-making
* **Authority** - Authentication and authorization
* **Valkey/Redis** - Score caching (optional)

---

## 3) Contracts & data model

### 3.1 ScoreRequest

```csharp
public sealed record ScoreRequest
{
    public required string VulnerabilityId { get; init; }   // CVE or vuln ID
    public required string ArtifactId { get; init; }        // PURL or component ID
    public string? TenantId { get; init; }
    public string? ContextId { get; init; }                 // Scan or assessment ID
    public IReadOnlyList<string>? EnabledProviders { get; init; }
}
```

### 3.2 RiskScoreResult

```csharp
public sealed record RiskScoreResult
{
    public required string RequestId { get; init; }
    public required decimal FinalScore { get; init; }       // 0.0-10.0
    public required string Tier { get; init; }              // Critical/High/Medium/Low/Info
    public required DateTimeOffset ComputedAt { get; init; }
    public required IReadOnlyList<ProviderContribution> Contributions { get; init; }
    public required ExplainabilityPayload Explanation { get; init; }
}

public sealed record ProviderContribution
{
    public required string ProviderId { get; init; }
    public required decimal RawScore { get; init; }
    public required decimal Weight { get; init; }
    public required decimal WeightedScore { get; init; }
    public string? FactorSource { get; init; }              // Where data came from
    public DateTimeOffset? FactorTimestamp { get; init; }   // When factor was computed
}
```

### 3.3 Provider Interface

```csharp
public interface IRiskScoreProvider
{
    string ProviderId { get; }
    decimal DefaultWeight { get; }
    TimeSpan CacheTtl { get; }

    Task<ProviderResult> ComputeAsync(
        ScoreRequest request,
        CancellationToken ct);

    Task<bool> IsHealthyAsync(CancellationToken ct);
}
```

---

## 4) Score Providers

### 4.1 Built-in Providers

| Provider | Data Source | Weight | Description |
|----------|-------------|--------|-------------|
| `epss` | Concelier/EPSS | 0.25 | EPSS probability score (0-1 → 0-10) |
| `cvss-kev` | Concelier | 0.30 | CVSS base + KEV boost |
| `vex-gate` | Excititor | 0.20 | VEX status (affected/not_affected) |
| `fix-exposure` | Concelier | 0.15 | Fix availability window |
| `reachability` | Signals | 0.10 | Code path reachability |

### 4.2 Score Computation

```
FinalScore = Σ(provider.weight × provider.score) / Σ(provider.weight)

Tier mapping:
  9.0-10.0 → Critical
  7.0-8.9  → High
  4.0-6.9  → Medium
  1.0-3.9  → Low
  0.0-0.9  → Info
```

### 4.3 Provider Data Sources

```csharp
public interface IEpssSources
{
    Task<EpssScore?> GetScoreAsync(string cveId, CancellationToken ct);
}

public interface ICvssKevSources
{
    Task<CvssData?> GetCvssAsync(string cveId, CancellationToken ct);
    Task<bool> IsKevAsync(string cveId, CancellationToken ct);
}
```

---

## 5) REST API (RiskEngine.WebService)

All under `/api/v1/risk`. Auth: **OpTok**.

```
POST /scores                    { request: ScoreRequest } → { jobId }
GET  /scores/{jobId}            → { result: RiskScoreResult, status }
GET  /scores/{jobId}/explain    → { explanation: ExplainabilityPayload }

POST /batch                     { requests: ScoreRequest[] } → { batchId }
GET  /batch/{batchId}           → { results: RiskScoreResult[], status }

GET  /providers                 → { providers: ProviderInfo[] }
GET  /providers/{id}/health     → { healthy: bool, lastCheck }

GET  /healthz | /readyz | /metrics
```

---

## 6) Configuration (YAML)

```yaml
RiskEngine:
  Postgres:
    ConnectionString: "Host=postgres;Database=risk;..."

  Cache:
    Enabled: true
    Provider: "valkey"
    ConnectionString: "redis://valkey:6379"
    DefaultTtl: "00:15:00"

  Providers:
    Epss:
      Enabled: true
      Weight: 0.25
      CacheTtl: "01:00:00"
      Source: "concelier"

    CvssKev:
      Enabled: true
      Weight: 0.30
      KevBoost: 2.0

    VexGate:
      Enabled: true
      Weight: 0.20
      NotAffectedScore: 0.0
      AffectedScore: 10.0

    FixExposure:
      Enabled: true
      Weight: 0.15
      NoFixPenalty: 1.5

    Reachability:
      Enabled: true
      Weight: 0.10
      UnreachableDiscount: 0.5

  Worker:
    Concurrency: 4
    BatchSize: 100
    PollInterval: "00:00:05"

  Offline:
    FactorBundlePath: "/data/risk-factors"
    AllowStaleData: true
    MaxStalenessHours: 168
```

---

## 7) Security & compliance

* **AuthN/Z**: Authority-issued OpToks with `risk.score` scope
* **Tenant isolation**: Scores scoped by tenant ID
* **Audit trail**: All scoring decisions logged with inputs and factors
* **No PII**: Only vulnerability and artifact identifiers processed

---

## 8) Performance targets

* **Single score**: < 100ms P95 (cached factors)
* **Batch scoring**: < 500ms P95 for 100 items
* **Provider health check**: < 1s timeout
* **Cache hit rate**: > 80% for repeated CVEs

---

## 9) Observability

**Metrics:**
* `risk.scores.computed_total{tier,provider}`
* `risk.scores.duration_seconds`
* `risk.providers.health{provider,status}`
* `risk.cache.hits_total` / `risk.cache.misses_total`
* `risk.batch.size_histogram`

**Tracing:** Spans for each provider contribution, cache operations, and aggregation.

**Logs:** Structured logs with `cve_id`, `artifact_id`, `tenant`, `final_score`.

---

## 10) Testing matrix

* **Provider tests**: Each provider returns expected scores for fixture data
* **Aggregation tests**: Weighted combination produces correct final score
* **Determinism tests**: Same inputs produce identical scores
* **Cache tests**: Cache hit/miss behavior correct
* **Offline tests**: Factor bundles load and score correctly
* **Integration tests**: Full scoring pipeline with mocked data sources

---

## 11) Offline/Air-Gap Support

### Factor Bundles

Pre-computed factor data for offline operation:

```
/data/risk-factors/
  ├─ epss/
  │   └─ epss-2025-01-15.json.gz
  ├─ cvss/
  │   └─ cvss-2025-01-15.json.gz
  ├─ kev/
  │   └─ kev-2025-01-15.json
  └─ manifest.json
```

### Staleness Handling

When operating offline, scores include staleness indicators:

```json
{
  "finalScore": 7.2,
  "dataFreshness": {
    "epss": { "age": "48h", "stale": false },
    "kev": { "age": "24h", "stale": false }
  }
}
```

---

## Related Documentation

* Policy scoring: `../policy/architecture.md`
* Concelier feeds: `../concelier/architecture.md`
* Excititor VEX: `../excititor/architecture.md`
* Signals reachability: `../signals/architecture.md`