Files
git.stella-ops.org/docs/risk/EPIC_18_RISK_PROFILES.md
master 651b8e0fa3 feat: Add new projects to solution and implement contract testing documentation
- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution.
- Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done.
- Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
2025-10-27 07:57:55 +02:00

11 KiB
Raw Blame History

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


Epic 18: Risk Scoring Profiles

Short name: Risk Profiles Primary components: Policy Engine, Findings Ledger, Conseiller (Feedser), Excitator (VEXer), StellaOps Console, Policy Studio, CLI, Export Center, Authority & Tenancy, Observability Surfaces: policy documents, scoring engine, factor providers, explainability artifacts, APIs, CLI, UI

AOC ground rule reminder: Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Risk scoring consumes linked items and computes a contextual score per finding and per asset without collapsing sources; provenance is preserved and shown.


1) What it is

Risk Scoring Profiles let users define, version and apply customizable formulas that turn raw signals (CVSS, EPSSlike exploit likelihood, KEVstyle exploited lists, VEX status, reachability, runtime evidence, fix availability, asset criticality, provenance trust, etc.) into a single normalized risk score from 0 to 100 with severity buckets. Profiles are authored in Policy Studio, attached to scopes/tenants/projects, simulated against inventories and SBOMs, and executed by a scoring engine that outputs:

  • A final score and severity.
  • A factorbyfactor contribution breakdown with math.
  • Gating decisions (e.g., VEX “not affected” forces score to 0).
  • Audit and provenance for every signal used.

Profiles can differ by environment: “Exploitaware prod,” “Compliancefocused,” “Safetycritical,” “Dev velocity,” and so on. The engine is pluggable: new signals can be added without breaking existing profiles.


2) Why

  • One size doesnt fit anyone. Different orgs weigh exploitability vs business criticality differently.
  • Reduce noise and accelerate triage by aligning scores with how teams actually make decisions.
  • Make risk explainable. If a score says 86, show why.
  • Enable policyaware flows elsewhere: gates, notifications, dashboards, remediation queues.

3) How it should work

3.1 Core model

A RiskProfile defines:

  • Metadata: name, version, description, owner, scope selector, status (draft/published/deprecated).
  • Signals: named inputs with source bindings and transforms.
  • Formula: a composition of weighted terms, caps, gates, and overrides producing a 0100 score.
  • Severity mapping: score→{Critical, High, Medium, Low, None}.
  • Gates: hard conditions that shortcircuit scoring (e.g., VEX Not Affected → 0).
  • Overrides: explicit perpackage/perCVE/perasset adjustments with audit.
  • Explainability: must compute contribution of each term and include raw values.
  • Versioning: immutable content hash, profile_id@version. Inheritance supported via extends.

3.2 Signals (factor) catalog

Initial signals supported out of the box:

Signal Description Expected range Default transform
cvss_base CVSS base score from each advisory 0..10 linear: x/10
epss_like Exploit likelihood (0..1) 0..1 identity
kev_flag Known exploited in the wild (boolean) {0,1} step: 0 or 1
vex_status VEX: affected, not_affected, under_investigation enum gate + multiplier
reachability Static reachability to vulnerable code path 0..1 identity
runtime_evidence Runtime evidence of vulnerable symbol/path 0..1 identity
internet_exposed Asset externally reachable {0,1} multiplier
asset_criticality Business criticality of asset 1..5 normalize: (x-1)/4
fix_available Patch or upgrade exists {0,1} negative weight
age_days Days since advisory published 0..∞ logistic decay
privilege_escalation Elevation potential {0,1} positive bump
rce_flag Remote code execution {0,1} positive bump
provenance_trust Signature/provenance (SLSAish) 0..1 inverse weight
pkg_popularity Package ecosystem usage 0..1 mild bump
source_consensus Count of agreeing sources (Conseillerlinked) 1..N saturating transform

Notes:

  • Conseiller can link multiple advisories per CVE. Signals like cvss_base and kev_flag are aggregated via declared reducers: max, mean, or consensus (e.g., count of sources claiming exploited).

3.3 Formula template

Default formula (normalized result 0..1 before scaling to 0..100):

score =
  gate(VEX_not_affected => 0) *
  clamp01(
    w1*cvss'
  + w2*epss'
  + w3*reachability'
  + w4*runtime_evidence'
  + w5*internet_exposed'
  + w6*asset_criticality'
  + w7*kev_flag'
  + w8*rce_flag'
  + w9*privilege_escalation'
  + w10*source_consensus'
  + w11*(1 - provenance_trust')
  + w12*(1 - fix_available')
  + w13*age_decay'
  + bias
  )
  • Each term is a transformed, normalized signal (denoted ').

  • Weights default to reasonable values (e.g., cvss 0.25, epss 0.2, reachability 0.1, runtime 0.1, internet_exposed 0.08, asset_criticality 0.08, kev 0.07, rce 0.04, priv_esc 0.03, consensus 0.03, provenance inverse 0.01, fix inverse 0.005, age 0.005).

  • Severity mapping (default):

    • Critical ≥ 85
    • High 7084
    • Medium 4069
    • Low 1539
    • None < 15

Profiles can override weights, gating, transforms and severity thresholds.

3.4 Reducers and provenance

For signals with multiple sources:

  • cvss_base: default reducer max.
  • kev_flag: reducer any.
  • epss_like: reducer max.
  • vex_status: gate precedence: if any linked VEX says not_affected, apply gate 0 unless an explicit policy disables that source; otherwise, most conservative status wins (affected > under_investigation > unknown).
  • Every reduction lists contributing sources in the explanation with their digests.

3.5 Explainability artifact

For every scored item, produce a JSON object:

{
  "profile_id": "risk-default",
  "profile_version": "1.2.0",
  "input": { "asset_id": "...", "package": "openssl@1.1.1u", "cve": "CVE-XXXX-YYYY" },
  "signals": {
    "cvss_base": { "values": [{"source":"nvd","value":9.8}, {"source":"vendor","value":9.1}], "reducer":"max", "reduced":9.8, "normalized":0.98 },
    "epss_like": { "value":0.72, "normalized":0.72 },
    "vex_status": { "values":[{"source":"vendor","value":"affected"}], "decision":"affected" }
  },
  "formula": {
    "weights": { "cvss":0.25, "epss":0.20 },
    "gates": [{ "name":"VEX_not_affected", "applied": false }]
  },
  "contributions": [
    { "signal":"cvss_base", "weight":0.25, "value":0.98, "contribution":24.5 },
    { "signal":"epss_like", "weight":0.20, "value":0.72, "contribution":14.4 }
  ],
  "score": 87.1,
  "severity": "Critical",
  "provenance": { "calculated_at":"2025-10-25T12:00:00Z", "engine":"risk-engine@v0.6.3", "trace_id":"..." }
}

3.6 Profile scoping and inheritance

  • Profiles attach to scopes via Authority & Tenancy: org/tenant/project/environment.
  • A scope resolves one active profile by precedence: project > environment > org default.
  • Profiles may extends a base profile, overriding weights and thresholds. Resolve via immutable parent chain.

3.7 Execution path

  1. New or updated findings arrive from Conseiller/Excitator into Findings Ledger.
  2. A Scoring Job is enqueued per scope with a batch of items.
  3. The engine pulls necessary signals via Factor Providers (reachability, runtime, KEV lists, etc.).
  4. The formula executes; results are upserted to Findings Ledger with an explainability blob pointer.
  5. Notifications Studio triggers based on severity deltas.
  6. Console and CLI read scored findings; filters and charts operate on score and severity.

3.8 Factor Provider interface

interface FactorProvider {
  id(): string;
  requiredInputs(): string[];
  fetch(ctx, inputs[]): Promise<Map<inputKey, FactorValue>>;
}

Providers must be deterministic and cacheable. Every factor has a TTL and a backfill policy.

3.9 Simulation

Policy Studio provides “Simulate with profile” functionality to test profiles against SBOMs or asset sets. Simulation outputs include distributions, severity shifts, and top movers, and can be exported.

3.10 Airgapped behavior

Profiles work offline; providers rely on bundled datasets produced by Export Center. Missing providers surface explicit gaps in explanations.


4) Architecture

4.1 New modules

  • src/StellaOps.RiskEngine/
  • src/StellaOps.RiskEngine/providers/
  • src/StellaOps.Policy.RiskProfile/
  • Database migrations for profiles/results/explanations
  • src/StellaOps.UI
  • src/StellaOps.Cli
  • src/StellaOps.ExportCenter.RiskBundles

4.2 Data model

Tables for risk_profiles, scoring_jobs, scoring_results, explanations with indexes on finding keys, scope, severity, and timestamps.


5) APIs and contracts

Endpoints include profile CRUD, publish, simulate, job enqueue, results queries, explanation retrieval, and schema discovery. Authentication scopes: risk.profile:*, risk.result:read, risk.job:write.


6) Documentation changes

List of required docs with banner statements covering overview, profiles, factors, formulas, explainability, API, console UI, CLI commands, air-gapped bundles, and AOC invariants.


7) Implementation plan

Seven phases: foundations, storage/APIs, Console & Policy Studio, CLI & SDKs, expanded factors, air-gapped support, quality/performance.


8) Engineering tasks

Detailed task list spanning schema, engine, providers, APIs, ledger integration, console, CLI, export center, observability, docs, and testing.


9) Feature changes required in other components

Defines cross-team expectations for Conseiller, Excitator, Findings Ledger, Policy Studio, Vulnerability Explorer, SBOM Graph Explorer, Notifications, Authority, Export Center, CLI & SDKs.


10) Acceptance criteria

Coverage of authoring, simulation, scoring, UI, CLI, air-gapped support, AOC invariants, and performance.


11) Risks and mitigations

Addresses signal drift, weight overfitting, performance, VEX trust, and compliance differences.


12) Philosophy

Principles: context, explainability, truth preservation, portability, and loud failures.


13) Example profile

Contains an abbreviated YAML example demonstrating schema usage, weights, gates, severity mapping, and overrides.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.