Files
git.stella-ops.org/docs/product-advisories/04-Dec-2025 - Ranking Unknowns in Reachability Graphs.md
2025-12-09 20:23:50 +02:00

13 KiB
Raw Blame History

Heres a simple, actionable way to keep “unknowns” from piling up in StellaOps: rank them by how risky they might be and how widely they could spread—then let Scheduler autorecheck or escalate based on that score.


Unknowns Triage: a lightweight, highleverage scheme

Goal: decide which “Unknown” findings (no proof yet; inconclusive reachability; unparsed advisory; mismatched version; missing evidence) to rescan first or route into VEX escalation—without waiting for perfect certainty.

1) Define the score

Score each Unknown U with a weighted sum (normalize each input to 01):

  • Component popularity (P): how many distinct workloads/images depend on this package (direct + transitive). Proxy: indegree or deployment count across environments.
  • CVSS uncertainty (C): how fuzzy the risk is (e.g., missing vector, version ranges like <=, vendor ambiguity). Proxy: 1 certainty; higher = less certain, more dangerous to ignore.
  • Graph centrality (G): how “hublike” the component is in your dependency graph. Proxy: normalized betweenness/degree centrality in your SBOM DAG.

TriageScore(U) = wP·P + wC·C + wG·G, with default weights: wP=0.4, wC=0.35, wG=0.25.

Thresholds (tuneable):

  • ≥ 0.70Hot: immediate rescan + VEX escalation job
  • 0.400.69Warm: schedule rescan within 2448h
  • < 0.40Cold: batch into weekly sweep

2) Minimal schema (Postgres or Mongo) to support it

  • unknowns(id, pkg_id, version, source, first_seen, last_seen, certainty, evidence_hash, status)
  • deploy_refs(pkg_id, image_id, env, first_seen, last_seen) → compute popularity P
  • graph_metrics(pkg_id, degree_c, betweenness_c, last_calc_at) → compute centrality G
  • advisory_gaps(pkg_id, missing_fields[], has_range_version, vendor_mismatch) → compute uncertainty C

Store triage_score, triage_band on write so Scheduler can act without recomputing everything.

3) Fast heuristics to fill inputs

  • P (popularity): P = min(1, log10(1 + deployments)/log10(1 + 100))
  • C (uncertainty): start at 0; +0.3 if version range, +0.2 if vendor mismatch, +0.2 if missing CVSS vector, +0.2 if evidence stale (>7d), cap at 1.0
  • G (centrality): precompute on SBOM DAG nightly; normalize to [0,1]

4) Scheduler rules (UnknownsRegistry → jobs)

  • On unknowns.upsert:

    • compute (P,C,G) → triage_score

    • if Hot → enqueue:

      • Deterministic rescan (fresh feeds + strict lattice)
      • VEX escalation (Excititor) with context pack (SBOM slice, provenance, last evidence)
    • if Warm → enqueue rescan with jitter (spread load)

    • if Cold → tag for weekly batch

  • Backoff: if the same Unknown stays Hot after N attempts, widen evidence (alternate feeds, secondary matcher, vendor OVAL, NVD mirror) and alert.

5) Operatorvisible UX (DevOpsfriendly)

  • Unknowns list: columns = pkg@ver, deployments, centrality, uncertainty flags, last evidence age, score badge (Hot/Warm/Cold), Next action chip.
  • Side panel: show why the score is high (P/C/G subscores) + scheduled jobs and last outcomes.
  • Bulk actions: “Recompute scores”, “Force VEX escalation”, “Dedupe aliases”.

6) Guardrails to keep it deterministic

  • Record the inputs + weights + feed hashes in the scan manifest (your “replay” object).
  • Any change to weights or heuristics → new policy version in the manifest; old runs remain replayable.

7) Reference snippets

SQL (Postgres) — compute and persist score:

update unknowns u
set triage_score = least(1, 0.4*u.popularity_p + 0.35*u.cvss_uncertainty_c + 0.25*u.graph_centrality_g),
    triage_band  = case
        when (0.4*u.popularity_p + 0.35*u.cvss_uncertainty_c + 0.25*u.graph_centrality_g) >= 0.70 then 'HOT'
        when (0.4*u.popularity_p + 0.35*u.cvss_uncertainty_c + 0.25*u.graph_centrality_g) >= 0.40 then 'WARM'
        else 'COLD'
    end,
    last_scored_at = now()
where u.status = 'OPEN';

C# (Common) — score helper:

public static (double score, string band) Score(double p, double c, double g,
    double wP=0.4, double wC=0.35, double wG=0.25)
{
    var s = Math.Min(1.0, wP*p + wC*c + wG*g);
    var band = s >= 0.70 ? "HOT" : s >= 0.40 ? "WARM" : "COLD";
    return (s, band);
}

8) Where this plugs into StellaOps

  • Scanner.WebService: writes Unknowns with raw flags (rangeversion, vector missing, vendor mismatch).
  • UnknownsRegistry: computes P/C/G, persists triage fields, emits Unknown.Triaged.
  • Scheduler: listens → enqueues Rescan / VEX Escalation with jitter/backoff.
  • Excititor (VEX): builds vendormerge proof or raises “Unresolvable” with rationale.
  • Authority: records policy version + weights in replay manifest.

If you want, I can drop in a readytouse UnknownsRegistry table DDL + EF Core 9 model and a tiny Scheduler job that implements these thresholds. Below is a complete, production-grade developer guideline for Ranking Unknowns in Reachability Graphs inside Stella Ops. It fits the existing architectural rules (scanner = origin of truth, Concelier/Vexer = prune-preservers, Authority = replay manifest owner, Scheduler = executor).

These guidelines give:

  1. Definitions
  2. Ranking dimensions
  3. Deterministic scoring formula
  4. Evidence capture
  5. Scheduler policies
  6. UX and API rules
  7. Testing rules and golden fixtures

Stella Ops Developer Guidelines

Ranking Unknowns in Reachability Graphs

0. Purpose

An Unknown is any vulnerability-like record where reachability, affectability, or evidence linkage cannot yet be proved true or false. We rank Unknowns to:

  1. Prioritize rescans
  2. Trigger VEX escalation
  3. Guide operators in constrained time windows
  4. Maintain deterministic behaviour under replay manifests
  5. Avoid non-deterministic or “probabilistic” security decisions

Unknown ranking never declares security state. It determines the order of proof acquisition.


1. Formal Definition of “Unknown”

A record is classified as Unknown if one or more of the following is true:

  1. Dependency Reachability Unproven

    • Graph traversal exists but is not validated by call-graph/rule-graph evidence.
    • Downstream node is reachable but no execution path has sufficient evidence.
  2. Version Semantics Uncertain

    • Advisory reports <=, <, >=, version ranges, or ambiguous pseudo-versions.
    • Normalized version mapping disagrees between data sources.
  3. Component Provenance Uncertain

    • Package cannot be deterministically linked to its SBOM node (name-alias confusion, epoch mismatch, distro backport case).
  4. Missing/Contradictory Evidence

    • Feeds disagree; Vendor VEX differs from NVD; OSS index has missing CVSS vector; environment evidence incomplete.
  5. Weak Graph Anchoring

    • Node exists but cannot be anchored to a layer digest or artifact hash (common in scratch/base images and badly packaged libs).

Unknowns must be stored with explicit flags—not as a collapsed bucket.


2. Dimensions for Ranking Unknowns

Each Unknown is ranked along five deterministic axes:

2.1 Popularity Impact (P)

How broadly the component is used across workloads.

Evidence sources:

  • SBOM deployment graph
  • Workload registry
  • Layer-to-package index

Compute: P = normalized log(deployment_count).

2.2 Exploit Consequence Potential (E)

Not risk. Consequence if the Unknown turns out to be an actual vulnerability.

Compute from:

  • Maximum CVSS across feeds
  • CWE category weight
  • Vendor “criticality marker” if present
  • If CVSS missing → use CWE fallback → mark uncertainty penalty.

2.3 Uncertainty Density (U)

How much is missing or contradictory.

Flags (examples):

  • version_range → +0.25
  • missing_vector → +0.15
  • conflicting_feeds → +0.20
  • no provenance anchor → +0.30
  • unreachable source advisory → +0.10

U ∈ [0, 1].

2.4 Graph Centrality (C)

Is this component a structural hub?

Use:

  • In-degree
  • Out-degree
  • Betweenness centrality

Normalize per artifact type.

2.5 Evidence Staleness (S)

Age of last successful evidence pull.

Decay function: S = min(1, age_days / 14).


3. Deterministic Ranking Score

All Unknowns get a reproducible score under replay manifest:

Score = clamp01(
    wP·P  +
    wE·E  +
    wU·U  +
    wC·C  +
    wS·S
)

Default recommended weights:

wP = 0.25   (deployment impact)
wE = 0.25   (potential consequence)
wU = 0.25   (uncertainty density)
wC = 0.15   (graph centrality)
wS = 0.10   (evidence staleness)

The manifest must record:

  • weights
  • transform functions
  • normalization rules
  • feed hashes
  • evidence hashes

Thus the ranking is replayable bit-for-bit.


4. Ranking Bands

After computing Score:

  • Hot (Score ≥ 0.70) Immediate rescan, VEX escalation, widen evidence sources.

  • Warm (0.40 ≤ Score < 0.70) Scheduled rescan, no escalation yet.

  • Cold (Score < 0.40) Batch weekly; suppressed from UI noise except on request.

Band assignment must be stored explicitly.


5. Evidence Capture Requirements

Every Unknown must persist:

  1. UnknownFlags[] all uncertainty flags
  2. GraphSliceHash deterministic hash of dependents/ancestors
  3. EvidenceSetHash hashes of advisories, vendor VEXes, feed extracts
  4. NormalizationTrace version normalization decision path
  5. CallGraphAttemptHash even if incomplete
  6. PackageMatchTrace exact match reasoning (name, epoch, distro backport heuristics)

This allows Inspector/Authority to replay everything and prevents “ghost Unknowns” caused by environment drift.


6. Scheduler Policies

6.1 On Unknown Created

Scheduler receives event: Unknown.Created.

Decision matrix:

Condition Action
Score ≥ 0.70 Immediate Rescan + VEX Escalation job
Score 0.400.69 Queue rescan within 1272h (jitter)
Score < 0.40 Add to weekly batch

6.2 On Unknown Unchanged after N rescans

If N = 3 consecutive runs with same UnknownFlags:

  • Force alternate feeds (mirror, vendor direct)
  • Run VEX excitor with full provenance pack
  • If still unresolved → emit Unknown.Unresolvable event (not an error; a state)

6.3 Failure Recovery

If fetch/feed errors → Unknown transitions to Unknown.EvidenceFailed. This must raise S (staleness) on next compute.


7. Scanner Implementation Guidelines (.NET 10)

7.1 Ranking Computation Location

Ranking is computed inside scanner.webservice immediately after Unknown classification. Concelier/Vexer must not touch ranking logic.

7.2 Graph Metrics Service

Maintain a cached daily calculation of centrality metrics to prevent per-scan recomputation cost explosion.

7.3 Compute Path

1. Build evidence set
2. Classify UnknownFlags
3. Compute P, E, U, C, S
4. Compute Score
5. Assign Band
6. Persist UnknownRecord
7. Emit Unknown.Triaged event

7.4 Storage Schema (Postgres)

Fields required:

unknown_id PK
pkg_id
pkg_version
digest_anchor
unknown_flags jsonb
popularity_p float
potential_e float
uncertainty_u float
centrality_c float
staleness_s float
score float
band enum
graph_slice_hash bytea
evidence_set_hash bytea
normalization_trace jsonb
callgraph_attempt_hash bytea
created_at, updated_at

8. API and UX Guidelines

8.1 Operator UI

For every Unknown:

  • Score badge (Hot/Warm/Cold)
  • Sub-component contributions (P/E/U/C/S)
  • Flags list
  • Evidence age
  • Scheduled next action
  • History graph of score evolution

8.2 Filters

Operators may filter by:

  • High P (impactful components)
  • High U (ambiguous advisories)
  • High S (stale data)
  • High C (graph hubs)

8.3 Reasoning Transparency

UI must show exactly why the ranking is high. No hidden heuristics.


9. Unit Testing & Golden Fixtures

9.1 Golden Unknown Cases

Provide frozen fixtures for:

  • Version range ambiguity
  • Mismatched epoch/backport
  • Missing vector
  • Conflicting severity between vendor/NVD
  • Unanchored filesystem library

Each fixture stores expected:

  • Flags
  • P/E/U/C/S
  • Score
  • Band

9.2 Replay Manifest Tests

Given a manifest containing:

  • feed hashes
  • rules version
  • normalization logic
  • lattice rules (for overall system)

Ensure ranking recomputes identically.


10. Developer Checklist (must be followed)

  1. Did I persist all traces needed for deterministic replay?
  2. Does ranking depend only on manifest-declared parameters (not environment)?
  3. Are all uncertainty factors explicit flags, never inferred fuzzily?
  4. Is the scoring reproducible under identical inputs?
  5. Is Scheduler decision table deterministic and exhaustively tested?
  6. Does API expose full reasoning without hiding rules?

If you want, I can now produce:

  1. A full Postgres DDL for Unknowns.
  2. A .NET 10 service class for ranking calculation.
  3. A golden test suite with 20 fixtures.
  4. UI wireframe for Unknown triage screen.

Which one should I generate?