Add call graph fixtures for various languages and scenarios

- Introduced `all-edge-reasons.json` to test edge resolution reasons in .NET. - Added `all-visibility-levels.json` to validate method visibility levels in .NET. - Created `dotnet-aspnetcore-minimal.json` for a minimal ASP.NET Core application. - Included `go-gin-api.json` for a Go Gin API application structure. - Added `java-spring-boot.json` for the Spring PetClinic application in Java. - Introduced `legacy-no-schema.json` for legacy application structure without schema. - Created `node-express-api.json` for an Express.js API application structure.
2025-12-16 10:44:24 +02:00
parent 4391f35d8a
commit 5a480a3c2a
223 changed files with 19367 additions and 727 deletions
--- a/docs/signals/callgraph-formats.md
+++ b/docs/signals/callgraph-formats.md
@@ -1,15 +1,355 @@
-# Callgraph Formats (outline)
+# Callgraph Schema Reference

-## Pending Inputs
- See sprint SPRINT_0309_0001_0009_docs_tasks_md_ix action tracker; inputs due 2025-12-09..12 from owning guilds.
+This document describes the `stella.callgraph.v1` schema used for representing call graphs in StellaOps.

-## Determinism Checklist
- [ ] Hash any inbound assets/payloads; place sums alongside artifacts (e.g., SHA256SUMS in this folder).
- [ ] Keep examples offline-friendly and deterministic (fixed seeds, pinned versions, stable ordering).
- [ ] Note source/approver for any provided captures or schemas.
+## Schema Version

-## Sections to fill (once inputs arrive)
- Supported callgraph schema versions and shapes.
- Field definitions and validation rules.
- Common validation errors with deterministic examples.
- Hashes for any sample graphs provided.
+**Current Version:** `stella.callgraph.v1`
+
+All call graphs should include the `schema` field set to `stella.callgraph.v1`. Legacy call graphs without this field are automatically migrated on ingestion.
+
+## Document Structure
+
+A `CallgraphDocument` contains the following top-level fields:
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `schema` | string | Yes | Schema identifier: `stella.callgraph.v1` |
+| `scanKey` | string | No | Scan context identifier |
+| `language` | CallgraphLanguage | No | Primary language of the call graph |
+| `artifacts` | CallgraphArtifact[] | No | Artifacts included in the graph |
+| `nodes` | CallgraphNode[] | Yes | Graph nodes representing symbols |
+| `edges` | CallgraphEdge[] | Yes | Call edges between nodes |
+| `entrypoints` | CallgraphEntrypoint[] | No | Discovered entrypoints |
+| `metadata` | CallgraphMetadata | No | Graph-level metadata |
+| `id` | string | Yes | Unique graph identifier |
+| `component` | string | No | Component name |
+| `version` | string | No | Component version |
+| `ingestedAt` | DateTimeOffset | No | Ingestion timestamp (ISO 8601) |
+| `graphHash` | string | No | Content hash for deduplication |
+
+### Legacy Fields
+
+These fields are preserved for backward compatibility:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `languageString` | string | Legacy language string |
+| `roots` | CallgraphRoot[] | Legacy root/entrypoint representation |
+| `schemaVersion` | string | Legacy schema version field |
+
+## Enumerations
+
+### CallgraphLanguage
+
+Supported languages for call graph analysis:
+
+| Value | Description |
+|-------|-------------|
+| `Unknown` | Language not determined |
+| `DotNet` | .NET (C#, F#, VB.NET) |
+| `Java` | Java and JVM languages |
+| `Node` | Node.js / JavaScript / TypeScript |
+| `Python` | Python |
+| `Go` | Go |
+| `Rust` | Rust |
+| `Ruby` | Ruby |
+| `Php` | PHP |
+| `Binary` | Native binary (ELF, PE) |
+| `Swift` | Swift |
+| `Kotlin` | Kotlin |
+
+### SymbolVisibility
+
+Access visibility levels for symbols:
+
+| Value | Description |
+|-------|-------------|
+| `Unknown` | Visibility not determined |
+| `Public` | Publicly accessible |
+| `Internal` | Internal to assembly/module |
+| `Protected` | Protected (subclass accessible) |
+| `Private` | Private to containing type |
+
+### EdgeKind
+
+Edge classification based on analysis confidence:
+
+| Value | Description | Confidence |
+|-------|-------------|------------|
+| `Static` | Statically determined call | High |
+| `Heuristic` | Heuristically inferred | Medium |
+| `Runtime` | Runtime-observed edge | Highest |
+
+### EdgeReason
+
+Reason codes explaining why an edge exists (critical for explainability):
+
+| Value | Description | Typical Kind |
+|-------|-------------|--------------|
+| `DirectCall` | Direct method/function call | Static |
+| `VirtualCall` | Virtual/interface dispatch | Static |
+| `ReflectionString` | Reflection-based invocation | Heuristic |
+| `DiBinding` | Dependency injection binding | Heuristic |
+| `DynamicImport` | Dynamic import/require | Heuristic |
+| `NewObj` | Constructor/object instantiation | Static |
+| `DelegateCreate` | Delegate/function pointer creation | Static |
+| `AsyncContinuation` | Async/await continuation | Static |
+| `EventHandler` | Event handler subscription | Heuristic |
+| `GenericInstantiation` | Generic type instantiation | Static |
+| `NativeInterop` | Native interop (P/Invoke, JNI, FFI) | Static |
+| `RuntimeMinted` | Runtime-minted edge from execution | Runtime |
+| `Unknown` | Reason could not be determined | - |
+
+### EntrypointKind
+
+Types of entrypoints:
+
+| Value | Description |
+|-------|-------------|
+| `Unknown` | Type not determined |
+| `Http` | HTTP endpoint |
+| `Grpc` | gRPC endpoint |
+| `Cli` | CLI command handler |
+| `Job` | Background job |
+| `Event` | Event handler |
+| `MessageQueue` | Message queue consumer |
+| `Timer` | Timer/scheduled task |
+| `Test` | Test method |
+| `Main` | Main entry point |
+| `ModuleInit` | Module initializer |
+| `StaticConstructor` | Static constructor |
+
+### EntrypointFramework
+
+Frameworks that expose entrypoints:
+
+| Value | Description | Language |
+|-------|-------------|----------|
+| `Unknown` | Framework not determined | - |
+| `AspNetCore` | ASP.NET Core | DotNet |
+| `MinimalApi` | ASP.NET Core Minimal APIs | DotNet |
+| `Spring` | Spring Framework | Java |
+| `SpringBoot` | Spring Boot | Java |
+| `Express` | Express.js | Node |
+| `Fastify` | Fastify | Node |
+| `NestJs` | NestJS | Node |
+| `FastApi` | FastAPI | Python |
+| `Flask` | Flask | Python |
+| `Django` | Django | Python |
+| `Rails` | Ruby on Rails | Ruby |
+| `Gin` | Gin | Go |
+| `Echo` | Echo | Go |
+| `Actix` | Actix Web | Rust |
+| `Rocket` | Rocket | Rust |
+| `AzureFunctions` | Azure Functions | Multi |
+| `AwsLambda` | AWS Lambda | Multi |
+| `CloudFunctions` | Google Cloud Functions | Multi |
+
+### EntrypointPhase
+
+Execution phase for entrypoints:
+
+| Value | Description |
+|-------|-------------|
+| `ModuleInit` | Module/assembly initialization |
+| `AppStart` | Application startup (Main) |
+| `Runtime` | Runtime request handling |
+| `Shutdown` | Shutdown/cleanup handlers |
+
+## Node Structure
+
+A `CallgraphNode` represents a symbol (method, function, type) in the call graph:
+
+```json
+{
+  "id": "n001",
+  "nodeId": "n001",
+  "name": "GetWeatherForecast",
+  "kind": "method",
+  "namespace": "SampleApi.Controllers",
+  "file": "WeatherForecastController.cs",
+  "line": 15,
+  "symbolKey": "SampleApi.Controllers.WeatherForecastController::GetWeatherForecast()",
+  "artifactKey": "SampleApi.dll",
+  "visibility": "Public",
+  "isEntrypointCandidate": true,
+  "attributes": {
+    "returnType": "IEnumerable<WeatherForecast>",
+    "httpMethod": "GET",
+    "route": "/weatherforecast"
+  },
+  "flags": 3
+}
+```
+
+### Node Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `id` | string | Yes | Unique identifier within the graph |
+| `nodeId` | string | No | Alias for id (v1 schema convention) |
+| `name` | string | Yes | Human-readable symbol name |
+| `kind` | string | Yes | Symbol kind (method, function, class) |
+| `namespace` | string | No | Namespace or module path |
+| `file` | string | No | Source file path |
+| `line` | int | No | Source line number |
+| `symbolKey` | string | No | Canonical symbol key (v1) |
+| `artifactKey` | string | No | Reference to containing artifact |
+| `visibility` | SymbolVisibility | No | Access visibility |
+| `isEntrypointCandidate` | bool | No | Whether node is an entrypoint candidate |
+| `purl` | string | No | Package URL for external packages |
+| `symbolDigest` | string | No | Content-addressed symbol digest |
+| `attributes` | object | No | Additional attributes |
+| `flags` | int | No | Bitmask for efficient filtering |
+
+### Symbol Key Format
+
+The `symbolKey` follows a canonical format:
+
+```
+{Namespace}.{Type}[`Arity][+Nested]::{Method}[`Arity]({ParamTypes})
+```
+
+Examples:
+- `System.String::Concat(string, string)`
+- `MyApp.Controllers.UserController::GetUser(int)`
+- `System.Collections.Generic.List`1::Add(T)`
+
+## Edge Structure
+
+A `CallgraphEdge` represents a call relationship between two symbols:
+
+```json
+{
+  "sourceId": "n001",
+  "targetId": "n002",
+  "from": "n001",
+  "to": "n002",
+  "type": "call",
+  "kind": "Static",
+  "reason": "DirectCall",
+  "weight": 1.0,
+  "offset": 42,
+  "isResolved": true,
+  "provenance": "static-analysis"
+}
+```
+
+### Edge Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `sourceId` | string | Yes | Source node ID (caller) |
+| `targetId` | string | Yes | Target node ID (callee) |
+| `from` | string | No | Alias for sourceId (v1) |
+| `to` | string | No | Alias for targetId (v1) |
+| `type` | string | No | Legacy edge type |
+| `kind` | EdgeKind | No | Edge classification |
+| `reason` | EdgeReason | No | Reason for edge existence |
+| `weight` | double | No | Confidence weight (0.0-1.0) |
+| `offset` | int | No | IL/bytecode offset |
+| `isResolved` | bool | No | Whether target was fully resolved |
+| `provenance` | string | No | Provenance information |
+| `candidates` | string[] | No | Virtual dispatch candidates |
+
+## Entrypoint Structure
+
+A `CallgraphEntrypoint` represents a discovered entrypoint:
+
+```json
+{
+  "nodeId": "n001",
+  "kind": "Http",
+  "route": "/api/users/{id}",
+  "httpMethod": "GET",
+  "framework": "AspNetCore",
+  "source": "attribute",
+  "phase": "Runtime",
+  "order": 0
+}
+```
+
+### Entrypoint Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `nodeId` | string | Yes | Reference to the node |
+| `kind` | EntrypointKind | Yes | Type of entrypoint |
+| `route` | string | No | HTTP route pattern |
+| `httpMethod` | string | No | HTTP method (GET, POST, etc.) |
+| `framework` | EntrypointFramework | No | Framework exposing the entrypoint |
+| `source` | string | No | Discovery source |
+| `phase` | EntrypointPhase | No | Execution phase |
+| `order` | int | No | Deterministic ordering |
+
+## Determinism Requirements
+
+For reproducible analysis, call graphs must be deterministic:
+
+1. **Stable Ordering**
+   - Nodes must be sorted by `id` (ordinal string comparison)
+   - Edges must be sorted by `sourceId`, then `targetId`
+   - Entrypoints must be sorted by `order`
+
+2. **Enum Serialization**
+   - All enums serialize as camelCase strings
+   - Example: `EdgeReason.DirectCall` → `"directCall"`
+
+3. **Timestamps**
+   - All timestamps must be UTC ISO 8601 format
+   - Example: `2025-01-15T10:00:00Z`
+
+4. **Content Hashing**
+   - The `graphHash` field should contain a stable content hash
+   - Hash algorithm: SHA-256
+   - Format: `sha256:{hex-digest}`
+
+## Schema Migration
+
+Legacy call graphs without the `schema` field are automatically migrated:
+
+1. **Schema Field**: Set to `stella.callgraph.v1`
+2. **Language Parsing**: String language converted to `CallgraphLanguage` enum
+3. **Visibility Inference**: Inferred from symbol key patterns:
+   - Contains `.Internal.` → `Internal`
+   - Contains `._` or `<` → `Private`
+   - Default → `Public`
+4. **Edge Reason Inference**: Based on legacy `type` field:
+   - `call`, `direct` → `DirectCall`
+   - `virtual`, `callvirt` → `VirtualCall`
+   - `newobj` → `NewObj`
+   - etc.
+5. **Entrypoint Inference**: Built from legacy `roots` and candidate nodes
+6. **Symbol Key Generation**: Built from namespace and name if missing
+
+## Validation Rules
+
+Call graphs are validated against these rules:
+
+1. All node `id` values must be unique
+2. All edge `sourceId` and `targetId` must reference existing nodes
+3. All entrypoint `nodeId` must reference existing nodes
+4. Edge `weight` must be between 0.0 and 1.0
+5. Artifacts referenced by nodes must exist in the `artifacts` list
+
+## Golden Fixtures
+
+Reference fixtures for testing are located at:
+`tests/reachability/fixtures/callgraph-schema-v1/`
+
+| Fixture | Description |
+|---------|-------------|
+| `dotnet-aspnetcore-minimal.json` | ASP.NET Core application |
+| `java-spring-boot.json` | Spring Boot application |
+| `node-express-api.json` | Express.js API |
+| `go-gin-api.json` | Go Gin API |
+| `legacy-no-schema.json` | Legacy format for migration testing |
+| `all-edge-reasons.json` | All 13 edge reason codes |
+| `all-visibility-levels.json` | All 5 visibility levels |
+
+## Related Documentation
+
+- [Reachability Analysis Technical Reference](../reachability/README.md)
+- [Schema Migration Implementation](../../src/Signals/StellaOps.Signals/Parsing/CallgraphSchemaMigrator.cs)
+- [SPRINT_1100: CallGraph Schema Enhancement](../implplan/SPRINT_1100_0001_0001_callgraph_schema_enhancement.md)
--- a/docs/signals/unknowns-ranking.md
+++ b/docs/signals/unknowns-ranking.md
@@ -0,0 +1,383 @@
+# Unknowns Ranking Algorithm Reference
+
+This document describes the multi-factor scoring algorithm used to rank and triage unknowns in the StellaOps Signals module.
+
+## Purpose
+
+When reachability analysis encounters unresolved symbols, edges, or package identities, these are recorded as **unknowns**. The ranking algorithm prioritizes unknowns by computing a composite score from five factors, then assigns each to a triage band (HOT/WARM/COLD) that determines rescan scheduling and escalation policies.
+
+## Scoring Formula
+
+The composite score is computed as:
+
+```
+Score = wP × P + wE × E + wU × U + wC × C + wS × S
+```
+
+Where:
+- **P** = Popularity (deployment impact)
+- **E** = Exploit potential (CVE severity)
+- **U** = Uncertainty density (flag accumulation)
+- **C** = Centrality (graph position importance)
+- **S** = Staleness (evidence age)
+
+All factors are normalized to [0.0, 1.0] before weighting. The final score is clamped to [0.0, 1.0].
+
+### Default Weights
+
+| Factor | Weight | Description |
+|--------|--------|-------------|
+| wP | 0.25 | Popularity weight |
+| wE | 0.25 | Exploit potential weight |
+| wU | 0.25 | Uncertainty density weight |
+| wC | 0.15 | Centrality weight |
+| wS | 0.10 | Staleness weight |
+
+Weights must sum to 1.0 and are configurable via `Signals:UnknownsScoring` settings.
+
+## Factor Details
+
+### Factor P: Popularity (Deployment Impact)
+
+Measures how widely the unknown's package is deployed across monitored environments.
+
+**Formula:**
+```
+P = min(1, log10(1 + deploymentCount) / log10(1 + maxDeployments))
+```
+
+**Parameters:**
+- `deploymentCount`: Number of deployments referencing the package (from `deploy_refs` table)
+- `maxDeployments`: Normalization ceiling (default: 100)
+
+**Rationale:** Logarithmic scaling prevents a single highly-deployed package from dominating scores while still prioritizing widely-used dependencies.
+
+### Factor E: Exploit Potential (CVE Severity)
+
+Estimates the consequence severity if the unknown resolves to a vulnerable component.
+
+**Current Implementation:**
+- Returns 0.5 (medium potential) when no CVE association exists
+- Future: Integrate KEV lookup, EPSS scores, and exploit database references
+
+**Planned Enhancements:**
+- CVE severity mapping (Critical=1.0, High=0.8, Medium=0.5, Low=0.2)
+- KEV (Known Exploited Vulnerabilities) flag boost
+- EPSS (Exploit Prediction Scoring System) integration
+
+### Factor U: Uncertainty Density (Flag Accumulation)
+
+Aggregates uncertainty signals from multiple sources. Each flag contributes a weighted penalty.
+
+**Flag Weights:**
+
+| Flag | Weight | Description |
+|------|--------|-------------|
+| `NoProvenanceAnchor` | 0.30 | Cannot verify package source |
+| `VersionRange` | 0.25 | Version specified as range, not exact |
+| `DynamicCallTarget` | 0.25 | Reflection, eval, or dynamic dispatch |
+| `ConflictingFeeds` | 0.20 | Contradictory info from different feeds |
+| `ExternalAssembly` | 0.20 | Assembly outside analysis scope |
+| `MissingVector` | 0.15 | No CVSS vector for severity assessment |
+| `UnreachableSourceAdvisory` | 0.10 | Source advisory URL unreachable |
+
+**Formula:**
+```
+U = min(1.0, sum(activeFlags × flagWeight))
+```
+
+**Example:**
+- NoProvenanceAnchor (0.30) + VersionRange (0.25) + MissingVector (0.15) = 0.70
+
+### Factor C: Centrality (Graph Position Importance)
+
+Measures the unknown's position importance in the call graph using betweenness centrality.
+
+**Formula:**
+```
+C = min(1.0, betweenness / maxBetweenness)
+```
+
+**Parameters:**
+- `betweenness`: Raw betweenness centrality from graph analysis
+- `maxBetweenness`: Normalization ceiling (default: 1000)
+
+**Rationale:** High-betweenness nodes appear on many shortest paths, meaning they're likely to be reached regardless of entry point.
+
+**Related Metrics:**
+- `DegreeCentrality`: Number of incoming + outgoing edges (stored but not used in score)
+- `BetweennessCentrality`: Raw betweenness value (stored for debugging)
+
+### Factor S: Staleness (Evidence Age)
+
+Measures how old the evidence is since the last successful analysis attempt.
+
+**Formula:**
+```
+S = min(1.0, daysSinceLastAnalysis / maxDays)
+```
+
+With exponential decay enhancement (optional):
+```
+S = 1 - exp(-daysSinceLastAnalysis / tau)
+```
+
+**Parameters:**
+- `daysSinceLastAnalysis`: Days since `LastAnalyzedAt` timestamp
+- `maxDays`: Staleness ceiling (default: 14 days)
+- `tau`: Decay constant for exponential model (default: 14)
+
+**Special Cases:**
+- Never analyzed (`LastAnalyzedAt` is null): S = 1.0 (maximum staleness)
+
+## Band Assignment
+
+Based on the composite score, unknowns are assigned to triage bands:
+
+| Band | Threshold | Rescan Policy | Description |
+|------|-----------|---------------|-------------|
+| **HOT** | Score >= 0.70 | 15 minutes | Immediate rescan + VEX escalation |
+| **WARM** | 0.40 <= Score < 0.70 | 24 hours | Scheduled rescan within 12-72h |
+| **COLD** | Score < 0.40 | 7 days | Weekly batch processing |
+
+Thresholds are configurable:
+```yaml
+Signals:
+  UnknownsScoring:
+    HotThreshold: 0.70
+    WarmThreshold: 0.40
+```
+
+## Scheduler Integration
+
+The `UnknownsRescanWorker` processes unknowns based on their band:
+
+### HOT Band Processing
+- Poll interval: 1 minute
+- Batch size: 10 items
+- Action: Trigger immediate rescan via `IRescanOrchestrator`
+- On failure: Exponential backoff, max 3 retries before demotion to WARM
+
+### WARM Band Processing
+- Poll interval: 5 minutes
+- Batch size: 50 items
+- Scheduled window: 12-72 hours based on score within band
+- On failure: Increment `RescanAttempts`, re-queue with delay
+
+### COLD Band Processing
+- Schedule: Weekly on configurable day (default: Sunday)
+- Batch size: 500 items
+- Action: Batch rescan job submission
+- On failure: Log and retry next week
+
+## Normalization Trace
+
+Each scored unknown includes a `NormalizationTrace` for debugging and replay:
+
+```json
+{
+  "rawPopularity": 42,
+  "normalizedPopularity": 0.65,
+  "popularityFormula": "min(1, log10(1 + 42) / log10(1 + 100))",
+
+  "rawExploitPotential": 0.5,
+  "normalizedExploitPotential": 0.5,
+
+  "rawUncertainty": 0.55,
+  "normalizedUncertainty": 0.55,
+  "activeFlags": ["NoProvenanceAnchor", "VersionRange"],
+
+  "rawCentrality": 250.0,
+  "normalizedCentrality": 0.25,
+
+  "rawStaleness": 7,
+  "normalizedStaleness": 0.5,
+
+  "weights": {
+    "wP": 0.25,
+    "wE": 0.25,
+    "wU": 0.25,
+    "wC": 0.15,
+    "wS": 0.10
+  },
+  "finalScore": 0.52,
+  "assignedBand": "Warm",
+  "computedAt": "2025-12-15T10:00:00Z"
+}
+```
+
+**Replay Capability:** Given the trace, the exact score can be recomputed:
+```
+Score = 0.25×0.65 + 0.25×0.5 + 0.25×0.55 + 0.15×0.25 + 0.10×0.5
+      = 0.1625 + 0.125 + 0.1375 + 0.0375 + 0.05
+      = 0.5125 ≈ 0.52
+```
+
+## API Endpoints
+
+### Query Unknowns by Band
+
+```
+GET /api/signals/unknowns?band=hot&limit=50&offset=0
+```
+
+Response:
+```json
+{
+  "items": [
+    {
+      "id": "unk-123",
+      "subjectKey": "myapp|1.0.0",
+      "purl": "pkg:npm/lodash@4.17.21",
+      "score": 0.82,
+      "band": "Hot",
+      "flags": { "noProvenanceAnchor": true, "versionRange": true },
+      "nextScheduledRescan": "2025-12-15T10:15:00Z"
+    }
+  ],
+  "total": 15,
+  "hasMore": false
+}
+```
+
+### Get Score Explanation
+
+```
+GET /api/signals/unknowns/{id}/explain
+```
+
+Response:
+```json
+{
+  "unknown": { /* full UnknownSymbolDocument */ },
+  "normalizationTrace": { /* trace object */ },
+  "factorBreakdown": {
+    "popularity": { "raw": 42, "normalized": 0.65, "weighted": 0.1625 },
+    "exploitPotential": { "raw": 0.5, "normalized": 0.5, "weighted": 0.125 },
+    "uncertainty": { "raw": 0.55, "normalized": 0.55, "weighted": 0.1375 },
+    "centrality": { "raw": 250, "normalized": 0.25, "weighted": 0.0375 },
+    "staleness": { "raw": 7, "normalized": 0.5, "weighted": 0.05 }
+  },
+  "bandThresholds": { "hot": 0.70, "warm": 0.40 }
+}
+```
+
+## Configuration Reference
+
+```yaml
+Signals:
+  UnknownsScoring:
+    # Factor weights (must sum to 1.0)
+    WeightPopularity: 0.25
+    WeightExploitPotential: 0.25
+    WeightUncertainty: 0.25
+    WeightCentrality: 0.15
+    WeightStaleness: 0.10
+
+    # Popularity normalization
+    PopularityMaxDeployments: 100
+
+    # Uncertainty flag weights
+    FlagWeightNoProvenance: 0.30
+    FlagWeightVersionRange: 0.25
+    FlagWeightConflictingFeeds: 0.20
+    FlagWeightMissingVector: 0.15
+    FlagWeightUnreachableSource: 0.10
+    FlagWeightDynamicTarget: 0.25
+    FlagWeightExternalAssembly: 0.20
+
+    # Centrality normalization
+    CentralityMaxBetweenness: 1000.0
+
+    # Staleness normalization
+    StalenessMaxDays: 14
+    StalenessTau: 14  # For exponential decay
+
+    # Band thresholds
+    HotThreshold: 0.70
+    WarmThreshold: 0.40
+
+    # Rescan scheduling
+    HotRescanMinutes: 15
+    WarmRescanHours: 24
+    ColdRescanDays: 7
+
+  UnknownsDecay:
+    # Nightly batch decay
+    BatchEnabled: true
+    MaxSubjectsPerBatch: 1000
+    ColdBatchDay: Sunday
+```
+
+## Determinism Requirements
+
+The scoring algorithm is fully deterministic:
+
+1. **Same inputs produce identical scores** - Given identical `UnknownSymbolDocument`, deployment counts, and graph metrics, the score will always be the same
+2. **Normalization trace enables replay** - The trace contains all raw values and weights needed to reproduce the score
+3. **Timestamps use UTC ISO 8601** - All `ComputedAt`, `LastAnalyzedAt`, and `NextScheduledRescan` timestamps are UTC
+4. **Weights logged per computation** - The trace includes the exact weights used, allowing audit of configuration changes
+
+## Database Schema
+
+```sql
+-- Unknowns table (enhanced)
+CREATE TABLE signals.unknowns (
+    id UUID PRIMARY KEY,
+    subject_key TEXT NOT NULL,
+    purl TEXT,
+    symbol_id TEXT,
+    callgraph_id TEXT,
+
+    -- Scoring factors
+    popularity_score FLOAT DEFAULT 0,
+    deployment_count INT DEFAULT 0,
+    exploit_potential_score FLOAT DEFAULT 0,
+    uncertainty_score FLOAT DEFAULT 0,
+    centrality_score FLOAT DEFAULT 0,
+    degree_centrality INT DEFAULT 0,
+    betweenness_centrality FLOAT DEFAULT 0,
+    staleness_score FLOAT DEFAULT 0,
+    days_since_last_analysis INT DEFAULT 0,
+
+    -- Composite score and band
+    score FLOAT DEFAULT 0,
+    band TEXT DEFAULT 'cold' CHECK (band IN ('hot', 'warm', 'cold')),
+
+    -- Metadata
+    flags JSONB DEFAULT '{}',
+    normalization_trace JSONB,
+    rescan_attempts INT DEFAULT 0,
+    last_rescan_result TEXT,
+
+    -- Timestamps
+    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    last_analyzed_at TIMESTAMPTZ,
+    next_scheduled_rescan TIMESTAMPTZ
+);
+
+-- Indexes for band-based queries
+CREATE INDEX idx_unknowns_band ON signals.unknowns(band);
+CREATE INDEX idx_unknowns_score ON signals.unknowns(score DESC);
+CREATE INDEX idx_unknowns_next_rescan ON signals.unknowns(next_scheduled_rescan)
+    WHERE next_scheduled_rescan IS NOT NULL;
+CREATE INDEX idx_unknowns_subject ON signals.unknowns(subject_key);
+```
+
+## Metrics and Observability
+
+The following metrics are exposed for monitoring:
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `signals_unknowns_total` | Gauge | Total unknowns by band |
+| `signals_unknowns_rescans_total` | Counter | Rescans triggered by band |
+| `signals_unknowns_scoring_duration_seconds` | Histogram | Scoring computation time |
+| `signals_unknowns_band_transitions_total` | Counter | Band changes (e.g., WARM->HOT) |
+
+## Related Documentation
+
+- [Unknowns Registry](./unknowns-registry.md) - Data model and API for unknowns
+- [Reachability Analysis](./reachability.md) - Reachability scoring integration
+- [Callgraph Schema](./callgraph-formats.md) - Graph structure for centrality computation
--- a/docs/signals/unknowns-registry.md
+++ b/docs/signals/unknowns-registry.md
@@ -46,6 +46,22 @@ All endpoints are additive; no hard deletes. Payloads must include tenant bindin
 - Policy can block `not_affected` claims when `unknowns_pressure` exceeds thresholds.
 - UI/CLI show unknown chips with reason and depth; operators can triage or suppress.

+### 5.1 Multi-Factor Ranking
+
+Unknowns are ranked using a 5-factor scoring algorithm that computes a composite score from:
+- **Popularity (P)** - Deployment impact based on usage count
+- **Exploit Potential (E)** - CVE severity if known
+- **Uncertainty (U)** - Accumulated flag weights
+- **Centrality (C)** - Graph position importance (betweenness)
+- **Staleness (S)** - Evidence age since last analysis
+
+Based on the composite score, unknowns are assigned to triage bands:
+- **HOT** (score >= 0.70): Immediate rescan, 15-minute scheduling
+- **WARM** (0.40 <= score < 0.70): Scheduled rescan within 12-72h
+- **COLD** (score < 0.40): Weekly batch processing
+
+See [Unknowns Ranking Algorithm](./unknowns-ranking.md) for the complete formula reference.
+
 ## 6. Storage & CAS

 - Primary store: append-only KV/graph in Mongo (collections `unknowns`, `unknown_metrics`).