- Introduced `all-edge-reasons.json` to test edge resolution reasons in .NET. - Added `all-visibility-levels.json` to validate method visibility levels in .NET. - Created `dotnet-aspnetcore-minimal.json` for a minimal ASP.NET Core application. - Included `go-gin-api.json` for a Go Gin API application structure. - Added `java-spring-boot.json` for the Spring PetClinic application in Java. - Introduced `legacy-no-schema.json` for legacy application structure without schema. - Created `node-express-api.json` for an Express.js API application structure.
11 KiB
Unknowns Ranking Algorithm Reference
This document describes the multi-factor scoring algorithm used to rank and triage unknowns in the StellaOps Signals module.
Purpose
When reachability analysis encounters unresolved symbols, edges, or package identities, these are recorded as unknowns. The ranking algorithm prioritizes unknowns by computing a composite score from five factors, then assigns each to a triage band (HOT/WARM/COLD) that determines rescan scheduling and escalation policies.
Scoring Formula
The composite score is computed as:
Score = wP × P + wE × E + wU × U + wC × C + wS × S
Where:
- P = Popularity (deployment impact)
- E = Exploit potential (CVE severity)
- U = Uncertainty density (flag accumulation)
- C = Centrality (graph position importance)
- S = Staleness (evidence age)
All factors are normalized to [0.0, 1.0] before weighting. The final score is clamped to [0.0, 1.0].
Default Weights
| Factor | Weight | Description |
|---|---|---|
| wP | 0.25 | Popularity weight |
| wE | 0.25 | Exploit potential weight |
| wU | 0.25 | Uncertainty density weight |
| wC | 0.15 | Centrality weight |
| wS | 0.10 | Staleness weight |
Weights must sum to 1.0 and are configurable via Signals:UnknownsScoring settings.
Factor Details
Factor P: Popularity (Deployment Impact)
Measures how widely the unknown's package is deployed across monitored environments.
Formula:
P = min(1, log10(1 + deploymentCount) / log10(1 + maxDeployments))
Parameters:
deploymentCount: Number of deployments referencing the package (fromdeploy_refstable)maxDeployments: Normalization ceiling (default: 100)
Rationale: Logarithmic scaling prevents a single highly-deployed package from dominating scores while still prioritizing widely-used dependencies.
Factor E: Exploit Potential (CVE Severity)
Estimates the consequence severity if the unknown resolves to a vulnerable component.
Current Implementation:
- Returns 0.5 (medium potential) when no CVE association exists
- Future: Integrate KEV lookup, EPSS scores, and exploit database references
Planned Enhancements:
- CVE severity mapping (Critical=1.0, High=0.8, Medium=0.5, Low=0.2)
- KEV (Known Exploited Vulnerabilities) flag boost
- EPSS (Exploit Prediction Scoring System) integration
Factor U: Uncertainty Density (Flag Accumulation)
Aggregates uncertainty signals from multiple sources. Each flag contributes a weighted penalty.
Flag Weights:
| Flag | Weight | Description |
|---|---|---|
NoProvenanceAnchor |
0.30 | Cannot verify package source |
VersionRange |
0.25 | Version specified as range, not exact |
DynamicCallTarget |
0.25 | Reflection, eval, or dynamic dispatch |
ConflictingFeeds |
0.20 | Contradictory info from different feeds |
ExternalAssembly |
0.20 | Assembly outside analysis scope |
MissingVector |
0.15 | No CVSS vector for severity assessment |
UnreachableSourceAdvisory |
0.10 | Source advisory URL unreachable |
Formula:
U = min(1.0, sum(activeFlags × flagWeight))
Example:
- NoProvenanceAnchor (0.30) + VersionRange (0.25) + MissingVector (0.15) = 0.70
Factor C: Centrality (Graph Position Importance)
Measures the unknown's position importance in the call graph using betweenness centrality.
Formula:
C = min(1.0, betweenness / maxBetweenness)
Parameters:
betweenness: Raw betweenness centrality from graph analysismaxBetweenness: Normalization ceiling (default: 1000)
Rationale: High-betweenness nodes appear on many shortest paths, meaning they're likely to be reached regardless of entry point.
Related Metrics:
DegreeCentrality: Number of incoming + outgoing edges (stored but not used in score)BetweennessCentrality: Raw betweenness value (stored for debugging)
Factor S: Staleness (Evidence Age)
Measures how old the evidence is since the last successful analysis attempt.
Formula:
S = min(1.0, daysSinceLastAnalysis / maxDays)
With exponential decay enhancement (optional):
S = 1 - exp(-daysSinceLastAnalysis / tau)
Parameters:
daysSinceLastAnalysis: Days sinceLastAnalyzedAttimestampmaxDays: Staleness ceiling (default: 14 days)tau: Decay constant for exponential model (default: 14)
Special Cases:
- Never analyzed (
LastAnalyzedAtis null): S = 1.0 (maximum staleness)
Band Assignment
Based on the composite score, unknowns are assigned to triage bands:
| Band | Threshold | Rescan Policy | Description |
|---|---|---|---|
| HOT | Score >= 0.70 | 15 minutes | Immediate rescan + VEX escalation |
| WARM | 0.40 <= Score < 0.70 | 24 hours | Scheduled rescan within 12-72h |
| COLD | Score < 0.40 | 7 days | Weekly batch processing |
Thresholds are configurable:
Signals:
UnknownsScoring:
HotThreshold: 0.70
WarmThreshold: 0.40
Scheduler Integration
The UnknownsRescanWorker processes unknowns based on their band:
HOT Band Processing
- Poll interval: 1 minute
- Batch size: 10 items
- Action: Trigger immediate rescan via
IRescanOrchestrator - On failure: Exponential backoff, max 3 retries before demotion to WARM
WARM Band Processing
- Poll interval: 5 minutes
- Batch size: 50 items
- Scheduled window: 12-72 hours based on score within band
- On failure: Increment
RescanAttempts, re-queue with delay
COLD Band Processing
- Schedule: Weekly on configurable day (default: Sunday)
- Batch size: 500 items
- Action: Batch rescan job submission
- On failure: Log and retry next week
Normalization Trace
Each scored unknown includes a NormalizationTrace for debugging and replay:
{
"rawPopularity": 42,
"normalizedPopularity": 0.65,
"popularityFormula": "min(1, log10(1 + 42) / log10(1 + 100))",
"rawExploitPotential": 0.5,
"normalizedExploitPotential": 0.5,
"rawUncertainty": 0.55,
"normalizedUncertainty": 0.55,
"activeFlags": ["NoProvenanceAnchor", "VersionRange"],
"rawCentrality": 250.0,
"normalizedCentrality": 0.25,
"rawStaleness": 7,
"normalizedStaleness": 0.5,
"weights": {
"wP": 0.25,
"wE": 0.25,
"wU": 0.25,
"wC": 0.15,
"wS": 0.10
},
"finalScore": 0.52,
"assignedBand": "Warm",
"computedAt": "2025-12-15T10:00:00Z"
}
Replay Capability: Given the trace, the exact score can be recomputed:
Score = 0.25×0.65 + 0.25×0.5 + 0.25×0.55 + 0.15×0.25 + 0.10×0.5
= 0.1625 + 0.125 + 0.1375 + 0.0375 + 0.05
= 0.5125 ≈ 0.52
API Endpoints
Query Unknowns by Band
GET /api/signals/unknowns?band=hot&limit=50&offset=0
Response:
{
"items": [
{
"id": "unk-123",
"subjectKey": "myapp|1.0.0",
"purl": "pkg:npm/lodash@4.17.21",
"score": 0.82,
"band": "Hot",
"flags": { "noProvenanceAnchor": true, "versionRange": true },
"nextScheduledRescan": "2025-12-15T10:15:00Z"
}
],
"total": 15,
"hasMore": false
}
Get Score Explanation
GET /api/signals/unknowns/{id}/explain
Response:
{
"unknown": { /* full UnknownSymbolDocument */ },
"normalizationTrace": { /* trace object */ },
"factorBreakdown": {
"popularity": { "raw": 42, "normalized": 0.65, "weighted": 0.1625 },
"exploitPotential": { "raw": 0.5, "normalized": 0.5, "weighted": 0.125 },
"uncertainty": { "raw": 0.55, "normalized": 0.55, "weighted": 0.1375 },
"centrality": { "raw": 250, "normalized": 0.25, "weighted": 0.0375 },
"staleness": { "raw": 7, "normalized": 0.5, "weighted": 0.05 }
},
"bandThresholds": { "hot": 0.70, "warm": 0.40 }
}
Configuration Reference
Signals:
UnknownsScoring:
# Factor weights (must sum to 1.0)
WeightPopularity: 0.25
WeightExploitPotential: 0.25
WeightUncertainty: 0.25
WeightCentrality: 0.15
WeightStaleness: 0.10
# Popularity normalization
PopularityMaxDeployments: 100
# Uncertainty flag weights
FlagWeightNoProvenance: 0.30
FlagWeightVersionRange: 0.25
FlagWeightConflictingFeeds: 0.20
FlagWeightMissingVector: 0.15
FlagWeightUnreachableSource: 0.10
FlagWeightDynamicTarget: 0.25
FlagWeightExternalAssembly: 0.20
# Centrality normalization
CentralityMaxBetweenness: 1000.0
# Staleness normalization
StalenessMaxDays: 14
StalenessTau: 14 # For exponential decay
# Band thresholds
HotThreshold: 0.70
WarmThreshold: 0.40
# Rescan scheduling
HotRescanMinutes: 15
WarmRescanHours: 24
ColdRescanDays: 7
UnknownsDecay:
# Nightly batch decay
BatchEnabled: true
MaxSubjectsPerBatch: 1000
ColdBatchDay: Sunday
Determinism Requirements
The scoring algorithm is fully deterministic:
- Same inputs produce identical scores - Given identical
UnknownSymbolDocument, deployment counts, and graph metrics, the score will always be the same - Normalization trace enables replay - The trace contains all raw values and weights needed to reproduce the score
- Timestamps use UTC ISO 8601 - All
ComputedAt,LastAnalyzedAt, andNextScheduledRescantimestamps are UTC - Weights logged per computation - The trace includes the exact weights used, allowing audit of configuration changes
Database Schema
-- Unknowns table (enhanced)
CREATE TABLE signals.unknowns (
id UUID PRIMARY KEY,
subject_key TEXT NOT NULL,
purl TEXT,
symbol_id TEXT,
callgraph_id TEXT,
-- Scoring factors
popularity_score FLOAT DEFAULT 0,
deployment_count INT DEFAULT 0,
exploit_potential_score FLOAT DEFAULT 0,
uncertainty_score FLOAT DEFAULT 0,
centrality_score FLOAT DEFAULT 0,
degree_centrality INT DEFAULT 0,
betweenness_centrality FLOAT DEFAULT 0,
staleness_score FLOAT DEFAULT 0,
days_since_last_analysis INT DEFAULT 0,
-- Composite score and band
score FLOAT DEFAULT 0,
band TEXT DEFAULT 'cold' CHECK (band IN ('hot', 'warm', 'cold')),
-- Metadata
flags JSONB DEFAULT '{}',
normalization_trace JSONB,
rescan_attempts INT DEFAULT 0,
last_rescan_result TEXT,
-- Timestamps
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_analyzed_at TIMESTAMPTZ,
next_scheduled_rescan TIMESTAMPTZ
);
-- Indexes for band-based queries
CREATE INDEX idx_unknowns_band ON signals.unknowns(band);
CREATE INDEX idx_unknowns_score ON signals.unknowns(score DESC);
CREATE INDEX idx_unknowns_next_rescan ON signals.unknowns(next_scheduled_rescan)
WHERE next_scheduled_rescan IS NOT NULL;
CREATE INDEX idx_unknowns_subject ON signals.unknowns(subject_key);
Metrics and Observability
The following metrics are exposed for monitoring:
| Metric | Type | Description |
|---|---|---|
signals_unknowns_total |
Gauge | Total unknowns by band |
signals_unknowns_rescans_total |
Counter | Rescans triggered by band |
signals_unknowns_scoring_duration_seconds |
Histogram | Scoring computation time |
signals_unknowns_band_transitions_total |
Counter | Band changes (e.g., WARM->HOT) |
Related Documentation
- Unknowns Registry - Data model and API for unknowns
- Reachability Analysis - Reachability scoring integration
- Callgraph Schema - Graph structure for centrality computation