stella-ops.org/git.stella-ops.org

Fork 0

Files

master 5a480a3c2a

Reachability Corpus Validation / validate-corpus (push) Waiting to run

Details

Reachability Corpus Validation / validate-ground-truths (push) Waiting to run

Details

Reachability Corpus Validation / determinism-check (push) Blocked by required conditions

Details

Scanner Analyzers / Discover Analyzers (push) Waiting to run

Details

Scanner Analyzers / Build Analyzers (push) Blocked by required conditions

Details

Scanner Analyzers / Test Language Analyzers (push) Blocked by required conditions

Details

Scanner Analyzers / Validate Test Fixtures (push) Waiting to run

Details

Scanner Analyzers / Verify Deterministic Output (push) Blocked by required conditions

Details

Signals CI & Image / signals-ci (push) Waiting to run

Details

Signals Reachability Scoring & Events / reachability-smoke (push) Waiting to run

Details

Signals Reachability Scoring & Events / sign-and-upload (push) Blocked by required conditions

Details

AOC Guard CI / aoc-guard (push) Has been cancelled

Details

AOC Guard CI / aoc-verify (push) Has been cancelled

Details

Docs CI / lint-and-preview (push) Has been cancelled

Details

Export Center CI / export-ci (push) Has been cancelled

Details

Findings Ledger CI / build-test (push) Has been cancelled

Details

Findings Ledger CI / migration-validation (push) Has been cancelled

Details

Findings Ledger CI / generate-manifest (push) Has been cancelled

Details

Lighthouse CI / Lighthouse Audit (push) Has been cancelled

Details

Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled

Details

Policy Lint & Smoke / policy-lint (push) Has been cancelled

Details

Add call graph fixtures for various languages and scenarios

- Introduced `all-edge-reasons.json` to test edge resolution reasons in .NET.
- Added `all-visibility-levels.json` to validate method visibility levels in .NET.
- Created `dotnet-aspnetcore-minimal.json` for a minimal ASP.NET Core application.
- Included `go-gin-api.json` for a Go Gin API application structure.
- Added `java-spring-boot.json` for the Spring PetClinic application in Java.
- Introduced `legacy-no-schema.json` for legacy application structure without schema.
- Created `node-express-api.json` for an Express.js API application structure.

2025-12-16 10:44:24 +02:00

11 KiB

Raw Permalink Blame History

Unknowns Ranking Algorithm Reference

This document describes the multi-factor scoring algorithm used to rank and triage unknowns in the StellaOps Signals module.

Purpose

When reachability analysis encounters unresolved symbols, edges, or package identities, these are recorded as unknowns. The ranking algorithm prioritizes unknowns by computing a composite score from five factors, then assigns each to a triage band (HOT/WARM/COLD) that determines rescan scheduling and escalation policies.

Scoring Formula

The composite score is computed as:

Score = wP × P + wE × E + wU × U + wC × C + wS × S

Where:

P = Popularity (deployment impact)
E = Exploit potential (CVE severity)
U = Uncertainty density (flag accumulation)
C = Centrality (graph position importance)
S = Staleness (evidence age)

All factors are normalized to [0.0, 1.0] before weighting. The final score is clamped to [0.0, 1.0].

Default Weights

Factor	Weight	Description
wP	0.25	Popularity weight
wE	0.25	Exploit potential weight
wU	0.25	Uncertainty density weight
wC	0.15	Centrality weight
wS	0.10	Staleness weight

Weights must sum to 1.0 and are configurable via Signals:UnknownsScoring settings.

Factor Details

Factor P: Popularity (Deployment Impact)

Measures how widely the unknown's package is deployed across monitored environments.

Formula:

P = min(1, log10(1 + deploymentCount) / log10(1 + maxDeployments))

Parameters:

deploymentCount: Number of deployments referencing the package (from deploy_refs table)
maxDeployments: Normalization ceiling (default: 100)

Rationale: Logarithmic scaling prevents a single highly-deployed package from dominating scores while still prioritizing widely-used dependencies.

Factor E: Exploit Potential (CVE Severity)

Estimates the consequence severity if the unknown resolves to a vulnerable component.

Current Implementation:

Returns 0.5 (medium potential) when no CVE association exists
Future: Integrate KEV lookup, EPSS scores, and exploit database references

Planned Enhancements:

CVE severity mapping (Critical=1.0, High=0.8, Medium=0.5, Low=0.2)
KEV (Known Exploited Vulnerabilities) flag boost
EPSS (Exploit Prediction Scoring System) integration

Factor U: Uncertainty Density (Flag Accumulation)

Aggregates uncertainty signals from multiple sources. Each flag contributes a weighted penalty.

Flag Weights:

Flag	Weight	Description
`NoProvenanceAnchor`	0.30	Cannot verify package source
`VersionRange`	0.25	Version specified as range, not exact
`DynamicCallTarget`	0.25	Reflection, eval, or dynamic dispatch
`ConflictingFeeds`	0.20	Contradictory info from different feeds
`ExternalAssembly`	0.20	Assembly outside analysis scope
`MissingVector`	0.15	No CVSS vector for severity assessment
`UnreachableSourceAdvisory`	0.10	Source advisory URL unreachable

Formula:

U = min(1.0, sum(activeFlags × flagWeight))

Example:

NoProvenanceAnchor (0.30) + VersionRange (0.25) + MissingVector (0.15) = 0.70

Factor C: Centrality (Graph Position Importance)

Measures the unknown's position importance in the call graph using betweenness centrality.

Formula:

C = min(1.0, betweenness / maxBetweenness)

Parameters:

betweenness: Raw betweenness centrality from graph analysis
maxBetweenness: Normalization ceiling (default: 1000)

Rationale: High-betweenness nodes appear on many shortest paths, meaning they're likely to be reached regardless of entry point.

Related Metrics:

DegreeCentrality: Number of incoming + outgoing edges (stored but not used in score)
BetweennessCentrality: Raw betweenness value (stored for debugging)

Factor S: Staleness (Evidence Age)

Measures how old the evidence is since the last successful analysis attempt.

Formula:

S = min(1.0, daysSinceLastAnalysis / maxDays)

With exponential decay enhancement (optional):

S = 1 - exp(-daysSinceLastAnalysis / tau)

Parameters:

daysSinceLastAnalysis: Days since LastAnalyzedAt timestamp
maxDays: Staleness ceiling (default: 14 days)
tau: Decay constant for exponential model (default: 14)

Special Cases:

Never analyzed (LastAnalyzedAt is null): S = 1.0 (maximum staleness)

Band Assignment

Based on the composite score, unknowns are assigned to triage bands:

Band	Threshold	Rescan Policy	Description
HOT	Score >= 0.70	15 minutes	Immediate rescan + VEX escalation
WARM	0.40 <= Score < 0.70	24 hours	Scheduled rescan within 12-72h
COLD	Score < 0.40	7 days	Weekly batch processing

Thresholds are configurable:

Signals:
  UnknownsScoring:
    HotThreshold: 0.70
    WarmThreshold: 0.40

Scheduler Integration

The UnknownsRescanWorker processes unknowns based on their band:

HOT Band Processing

Poll interval: 1 minute
Batch size: 10 items
Action: Trigger immediate rescan via IRescanOrchestrator
On failure: Exponential backoff, max 3 retries before demotion to WARM

WARM Band Processing

Poll interval: 5 minutes
Batch size: 50 items
Scheduled window: 12-72 hours based on score within band
On failure: Increment RescanAttempts, re-queue with delay

COLD Band Processing

Schedule: Weekly on configurable day (default: Sunday)
Batch size: 500 items
Action: Batch rescan job submission
On failure: Log and retry next week

Normalization Trace

Each scored unknown includes a NormalizationTrace for debugging and replay:

{
  "rawPopularity": 42,
  "normalizedPopularity": 0.65,
  "popularityFormula": "min(1, log10(1 + 42) / log10(1 + 100))",

  "rawExploitPotential": 0.5,
  "normalizedExploitPotential": 0.5,

  "rawUncertainty": 0.55,
  "normalizedUncertainty": 0.55,
  "activeFlags": ["NoProvenanceAnchor", "VersionRange"],

  "rawCentrality": 250.0,
  "normalizedCentrality": 0.25,

  "rawStaleness": 7,
  "normalizedStaleness": 0.5,

  "weights": {
    "wP": 0.25,
    "wE": 0.25,
    "wU": 0.25,
    "wC": 0.15,
    "wS": 0.10
  },
  "finalScore": 0.52,
  "assignedBand": "Warm",
  "computedAt": "2025-12-15T10:00:00Z"
}

Replay Capability: Given the trace, the exact score can be recomputed:

Score = 0.25×0.65 + 0.25×0.5 + 0.25×0.55 + 0.15×0.25 + 0.10×0.5
      = 0.1625 + 0.125 + 0.1375 + 0.0375 + 0.05
      = 0.5125 ≈ 0.52

API Endpoints

Query Unknowns by Band

GET /api/signals/unknowns?band=hot&limit=50&offset=0

Response:

{
  "items": [
    {
      "id": "unk-123",
      "subjectKey": "myapp|1.0.0",
      "purl": "pkg:npm/lodash@4.17.21",
      "score": 0.82,
      "band": "Hot",
      "flags": { "noProvenanceAnchor": true, "versionRange": true },
      "nextScheduledRescan": "2025-12-15T10:15:00Z"
    }
  ],
  "total": 15,
  "hasMore": false
}

Get Score Explanation

GET /api/signals/unknowns/{id}/explain

Response:

{
  "unknown": { /* full UnknownSymbolDocument */ },
  "normalizationTrace": { /* trace object */ },
  "factorBreakdown": {
    "popularity": { "raw": 42, "normalized": 0.65, "weighted": 0.1625 },
    "exploitPotential": { "raw": 0.5, "normalized": 0.5, "weighted": 0.125 },
    "uncertainty": { "raw": 0.55, "normalized": 0.55, "weighted": 0.1375 },
    "centrality": { "raw": 250, "normalized": 0.25, "weighted": 0.0375 },
    "staleness": { "raw": 7, "normalized": 0.5, "weighted": 0.05 }
  },
  "bandThresholds": { "hot": 0.70, "warm": 0.40 }
}

Configuration Reference

Signals:
  UnknownsScoring:
    # Factor weights (must sum to 1.0)
    WeightPopularity: 0.25
    WeightExploitPotential: 0.25
    WeightUncertainty: 0.25
    WeightCentrality: 0.15
    WeightStaleness: 0.10

    # Popularity normalization
    PopularityMaxDeployments: 100

    # Uncertainty flag weights
    FlagWeightNoProvenance: 0.30
    FlagWeightVersionRange: 0.25
    FlagWeightConflictingFeeds: 0.20
    FlagWeightMissingVector: 0.15
    FlagWeightUnreachableSource: 0.10
    FlagWeightDynamicTarget: 0.25
    FlagWeightExternalAssembly: 0.20

    # Centrality normalization
    CentralityMaxBetweenness: 1000.0

    # Staleness normalization
    StalenessMaxDays: 14
    StalenessTau: 14  # For exponential decay

    # Band thresholds
    HotThreshold: 0.70
    WarmThreshold: 0.40

    # Rescan scheduling
    HotRescanMinutes: 15
    WarmRescanHours: 24
    ColdRescanDays: 7

  UnknownsDecay:
    # Nightly batch decay
    BatchEnabled: true
    MaxSubjectsPerBatch: 1000
    ColdBatchDay: Sunday

Determinism Requirements

The scoring algorithm is fully deterministic:

Same inputs produce identical scores - Given identical UnknownSymbolDocument, deployment counts, and graph metrics, the score will always be the same
Normalization trace enables replay - The trace contains all raw values and weights needed to reproduce the score
Timestamps use UTC ISO 8601 - All ComputedAt, LastAnalyzedAt, and NextScheduledRescan timestamps are UTC
Weights logged per computation - The trace includes the exact weights used, allowing audit of configuration changes

Database Schema

-- Unknowns table (enhanced)
CREATE TABLE signals.unknowns (
    id UUID PRIMARY KEY,
    subject_key TEXT NOT NULL,
    purl TEXT,
    symbol_id TEXT,
    callgraph_id TEXT,

    -- Scoring factors
    popularity_score FLOAT DEFAULT 0,
    deployment_count INT DEFAULT 0,
    exploit_potential_score FLOAT DEFAULT 0,
    uncertainty_score FLOAT DEFAULT 0,
    centrality_score FLOAT DEFAULT 0,
    degree_centrality INT DEFAULT 0,
    betweenness_centrality FLOAT DEFAULT 0,
    staleness_score FLOAT DEFAULT 0,
    days_since_last_analysis INT DEFAULT 0,

    -- Composite score and band
    score FLOAT DEFAULT 0,
    band TEXT DEFAULT 'cold' CHECK (band IN ('hot', 'warm', 'cold')),

    -- Metadata
    flags JSONB DEFAULT '{}',
    normalization_trace JSONB,
    rescan_attempts INT DEFAULT 0,
    last_rescan_result TEXT,

    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    last_analyzed_at TIMESTAMPTZ,
    next_scheduled_rescan TIMESTAMPTZ
);

-- Indexes for band-based queries
CREATE INDEX idx_unknowns_band ON signals.unknowns(band);
CREATE INDEX idx_unknowns_score ON signals.unknowns(score DESC);
CREATE INDEX idx_unknowns_next_rescan ON signals.unknowns(next_scheduled_rescan)
    WHERE next_scheduled_rescan IS NOT NULL;
CREATE INDEX idx_unknowns_subject ON signals.unknowns(subject_key);

Metrics and Observability

The following metrics are exposed for monitoring:

Metric	Type	Description
`signals_unknowns_total`	Gauge	Total unknowns by band
`signals_unknowns_rescans_total`	Counter	Rescans triggered by band
`signals_unknowns_scoring_duration_seconds`	Histogram	Scoring computation time
`signals_unknowns_band_transitions_total`	Counter	Band changes (e.g., WARM->HOT)

Unknowns Registry - Data model and API for unknowns
Reachability Analysis - Reachability scoring integration
Callgraph Schema - Graph structure for centrality computation

11 KiB Raw Permalink Blame History Unescape Escape

Unknowns Ranking Algorithm Reference

Purpose

Scoring Formula

Default Weights

Factor Details

Factor P: Popularity (Deployment Impact)

Factor E: Exploit Potential (CVE Severity)

Factor U: Uncertainty Density (Flag Accumulation)

Factor C: Centrality (Graph Position Importance)

Factor S: Staleness (Evidence Age)

Band Assignment

Scheduler Integration

HOT Band Processing

WARM Band Processing

COLD Band Processing

Normalization Trace

API Endpoints

Query Unknowns by Band

Get Score Explanation

Configuration Reference

Determinism Requirements

Database Schema

Metrics and Observability

Related Documentation

11 KiB

Raw Permalink Blame History