docs consolidation and others

2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions
--- a/docs/modules/signals/guides/callgraph-formats.md
+++ b/docs/modules/signals/guides/callgraph-formats.md
@@ -0,0 +1,355 @@
+# Callgraph Schema Reference
+
+This document describes the `stella.callgraph.v1` schema used for representing call graphs in StellaOps.
+
+## Schema Version
+
+**Current Version:** `stella.callgraph.v1`
+
+All call graphs should include the `schema` field set to `stella.callgraph.v1`. Legacy call graphs without this field are automatically migrated on ingestion.
+
+## Document Structure
+
+A `CallgraphDocument` contains the following top-level fields:
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `schema` | string | Yes | Schema identifier: `stella.callgraph.v1` |
+| `scanKey` | string | No | Scan context identifier |
+| `language` | CallgraphLanguage | No | Primary language of the call graph |
+| `artifacts` | CallgraphArtifact[] | No | Artifacts included in the graph |
+| `nodes` | CallgraphNode[] | Yes | Graph nodes representing symbols |
+| `edges` | CallgraphEdge[] | Yes | Call edges between nodes |
+| `entrypoints` | CallgraphEntrypoint[] | No | Discovered entrypoints |
+| `metadata` | CallgraphMetadata | No | Graph-level metadata |
+| `id` | string | Yes | Unique graph identifier |
+| `component` | string | No | Component name |
+| `version` | string | No | Component version |
+| `ingestedAt` | DateTimeOffset | No | Ingestion timestamp (ISO 8601) |
+| `graphHash` | string | No | Content hash for deduplication |
+
+### Legacy Fields
+
+These fields are preserved for backward compatibility:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `languageString` | string | Legacy language string |
+| `roots` | CallgraphRoot[] | Legacy root/entrypoint representation |
+| `schemaVersion` | string | Legacy schema version field |
+
+## Enumerations
+
+### CallgraphLanguage
+
+Supported languages for call graph analysis:
+
+| Value | Description |
+|-------|-------------|
+| `Unknown` | Language not determined |
+| `DotNet` | .NET (C#, F#, VB.NET) |
+| `Java` | Java and JVM languages |
+| `Node` | Node.js / JavaScript / TypeScript |
+| `Python` | Python |
+| `Go` | Go |
+| `Rust` | Rust |
+| `Ruby` | Ruby |
+| `Php` | PHP |
+| `Binary` | Native binary (ELF, PE) |
+| `Swift` | Swift |
+| `Kotlin` | Kotlin |
+
+### SymbolVisibility
+
+Access visibility levels for symbols:
+
+| Value | Description |
+|-------|-------------|
+| `Unknown` | Visibility not determined |
+| `Public` | Publicly accessible |
+| `Internal` | Internal to assembly/module |
+| `Protected` | Protected (subclass accessible) |
+| `Private` | Private to containing type |
+
+### EdgeKind
+
+Edge classification based on analysis confidence:
+
+| Value | Description | Confidence |
+|-------|-------------|------------|
+| `Static` | Statically determined call | High |
+| `Heuristic` | Heuristically inferred | Medium |
+| `Runtime` | Runtime-observed edge | Highest |
+
+### EdgeReason
+
+Reason codes explaining why an edge exists (critical for explainability):
+
+| Value | Description | Typical Kind |
+|-------|-------------|--------------|
+| `DirectCall` | Direct method/function call | Static |
+| `VirtualCall` | Virtual/interface dispatch | Static |
+| `ReflectionString` | Reflection-based invocation | Heuristic |
+| `DiBinding` | Dependency injection binding | Heuristic |
+| `DynamicImport` | Dynamic import/require | Heuristic |
+| `NewObj` | Constructor/object instantiation | Static |
+| `DelegateCreate` | Delegate/function pointer creation | Static |
+| `AsyncContinuation` | Async/await continuation | Static |
+| `EventHandler` | Event handler subscription | Heuristic |
+| `GenericInstantiation` | Generic type instantiation | Static |
+| `NativeInterop` | Native interop (P/Invoke, JNI, FFI) | Static |
+| `RuntimeMinted` | Runtime-minted edge from execution | Runtime |
+| `Unknown` | Reason could not be determined | - |
+
+### EntrypointKind
+
+Types of entrypoints:
+
+| Value | Description |
+|-------|-------------|
+| `Unknown` | Type not determined |
+| `Http` | HTTP endpoint |
+| `Grpc` | gRPC endpoint |
+| `Cli` | CLI command handler |
+| `Job` | Background job |
+| `Event` | Event handler |
+| `MessageQueue` | Message queue consumer |
+| `Timer` | Timer/scheduled task |
+| `Test` | Test method |
+| `Main` | Main entry point |
+| `ModuleInit` | Module initializer |
+| `StaticConstructor` | Static constructor |
+
+### EntrypointFramework
+
+Frameworks that expose entrypoints:
+
+| Value | Description | Language |
+|-------|-------------|----------|
+| `Unknown` | Framework not determined | - |
+| `AspNetCore` | ASP.NET Core | DotNet |
+| `MinimalApi` | ASP.NET Core Minimal APIs | DotNet |
+| `Spring` | Spring Framework | Java |
+| `SpringBoot` | Spring Boot | Java |
+| `Express` | Express.js | Node |
+| `Fastify` | Fastify | Node |
+| `NestJs` | NestJS | Node |
+| `FastApi` | FastAPI | Python |
+| `Flask` | Flask | Python |
+| `Django` | Django | Python |
+| `Rails` | Ruby on Rails | Ruby |
+| `Gin` | Gin | Go |
+| `Echo` | Echo | Go |
+| `Actix` | Actix Web | Rust |
+| `Rocket` | Rocket | Rust |
+| `AzureFunctions` | Azure Functions | Multi |
+| `AwsLambda` | AWS Lambda | Multi |
+| `CloudFunctions` | Google Cloud Functions | Multi |
+
+### EntrypointPhase
+
+Execution phase for entrypoints:
+
+| Value | Description |
+|-------|-------------|
+| `ModuleInit` | Module/assembly initialization |
+| `AppStart` | Application startup (Main) |
+| `Runtime` | Runtime request handling |
+| `Shutdown` | Shutdown/cleanup handlers |
+
+## Node Structure
+
+A `CallgraphNode` represents a symbol (method, function, type) in the call graph:
+
+```json
+{
+  "id": "n001",
+  "nodeId": "n001",
+  "name": "GetWeatherForecast",
+  "kind": "method",
+  "namespace": "SampleApi.Controllers",
+  "file": "WeatherForecastController.cs",
+  "line": 15,
+  "symbolKey": "SampleApi.Controllers.WeatherForecastController::GetWeatherForecast()",
+  "artifactKey": "SampleApi.dll",
+  "visibility": "Public",
+  "isEntrypointCandidate": true,
+  "attributes": {
+    "returnType": "IEnumerable<WeatherForecast>",
+    "httpMethod": "GET",
+    "route": "/weatherforecast"
+  },
+  "flags": 3
+}
+```
+
+### Node Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `id` | string | Yes | Unique identifier within the graph |
+| `nodeId` | string | No | Alias for id (v1 schema convention) |
+| `name` | string | Yes | Human-readable symbol name |
+| `kind` | string | Yes | Symbol kind (method, function, class) |
+| `namespace` | string | No | Namespace or module path |
+| `file` | string | No | Source file path |
+| `line` | int | No | Source line number |
+| `symbolKey` | string | No | Canonical symbol key (v1) |
+| `artifactKey` | string | No | Reference to containing artifact |
+| `visibility` | SymbolVisibility | No | Access visibility |
+| `isEntrypointCandidate` | bool | No | Whether node is an entrypoint candidate |
+| `purl` | string | No | Package URL for external packages |
+| `symbolDigest` | string | No | Content-addressed symbol digest |
+| `attributes` | object | No | Additional attributes |
+| `flags` | int | No | Bitmask for efficient filtering |
+
+### Symbol Key Format
+
+The `symbolKey` follows a canonical format:
+
+```
+{Namespace}.{Type}[`Arity][+Nested]::{Method}[`Arity]({ParamTypes})
+```
+
+Examples:
+- `System.String::Concat(string, string)`
+- `MyApp.Controllers.UserController::GetUser(int)`
+- `System.Collections.Generic.List`1::Add(T)`
+
+## Edge Structure
+
+A `CallgraphEdge` represents a call relationship between two symbols:
+
+```json
+{
+  "sourceId": "n001",
+  "targetId": "n002",
+  "from": "n001",
+  "to": "n002",
+  "type": "call",
+  "kind": "Static",
+  "reason": "DirectCall",
+  "weight": 1.0,
+  "offset": 42,
+  "isResolved": true,
+  "provenance": "static-analysis"
+}
+```
+
+### Edge Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `sourceId` | string | Yes | Source node ID (caller) |
+| `targetId` | string | Yes | Target node ID (callee) |
+| `from` | string | No | Alias for sourceId (v1) |
+| `to` | string | No | Alias for targetId (v1) |
+| `type` | string | No | Legacy edge type |
+| `kind` | EdgeKind | No | Edge classification |
+| `reason` | EdgeReason | No | Reason for edge existence |
+| `weight` | double | No | Confidence weight (0.0-1.0) |
+| `offset` | int | No | IL/bytecode offset |
+| `isResolved` | bool | No | Whether target was fully resolved |
+| `provenance` | string | No | Provenance information |
+| `candidates` | string[] | No | Virtual dispatch candidates |
+
+## Entrypoint Structure
+
+A `CallgraphEntrypoint` represents a discovered entrypoint:
+
+```json
+{
+  "nodeId": "n001",
+  "kind": "Http",
+  "route": "/api/users/{id}",
+  "httpMethod": "GET",
+  "framework": "AspNetCore",
+  "source": "attribute",
+  "phase": "Runtime",
+  "order": 0
+}
+```
+
+### Entrypoint Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `nodeId` | string | Yes | Reference to the node |
+| `kind` | EntrypointKind | Yes | Type of entrypoint |
+| `route` | string | No | HTTP route pattern |
+| `httpMethod` | string | No | HTTP method (GET, POST, etc.) |
+| `framework` | EntrypointFramework | No | Framework exposing the entrypoint |
+| `source` | string | No | Discovery source |
+| `phase` | EntrypointPhase | No | Execution phase |
+| `order` | int | No | Deterministic ordering |
+
+## Determinism Requirements
+
+For reproducible analysis, call graphs must be deterministic:
+
+1. **Stable Ordering**
+   - Nodes must be sorted by `id` (ordinal string comparison)
+   - Edges must be sorted by `sourceId`, then `targetId`
+   - Entrypoints must be sorted by `order`
+
+2. **Enum Serialization**
+   - All enums serialize as camelCase strings
+   - Example: `EdgeReason.DirectCall` → `"directCall"`
+
+3. **Timestamps**
+   - All timestamps must be UTC ISO 8601 format
+   - Example: `2025-01-15T10:00:00Z`
+
+4. **Content Hashing**
+   - The `graphHash` field should contain a stable content hash
+   - Hash algorithm: SHA-256
+   - Format: `sha256:{hex-digest}`
+
+## Schema Migration
+
+Legacy call graphs without the `schema` field are automatically migrated:
+
+1. **Schema Field**: Set to `stella.callgraph.v1`
+2. **Language Parsing**: String language converted to `CallgraphLanguage` enum
+3. **Visibility Inference**: Inferred from symbol key patterns:
+   - Contains `.Internal.` → `Internal`
+   - Contains `._` or `<` → `Private`
+   - Default → `Public`
+4. **Edge Reason Inference**: Based on legacy `type` field:
+   - `call`, `direct` → `DirectCall`
+   - `virtual`, `callvirt` → `VirtualCall`
+   - `newobj` → `NewObj`
+   - etc.
+5. **Entrypoint Inference**: Built from legacy `roots` and candidate nodes
+6. **Symbol Key Generation**: Built from namespace and name if missing
+
+## Validation Rules
+
+Call graphs are validated against these rules:
+
+1. All node `id` values must be unique
+2. All edge `sourceId` and `targetId` must reference existing nodes
+3. All entrypoint `nodeId` must reference existing nodes
+4. Edge `weight` must be between 0.0 and 1.0
+5. Artifacts referenced by nodes must exist in the `artifacts` list
+
+## Golden Fixtures
+
+Reference fixtures for testing are located at:
+`tests/reachability/fixtures/callgraph-schema-v1/`
+
+| Fixture | Description |
+|---------|-------------|
+| `dotnet-aspnetcore-minimal.json` | ASP.NET Core application |
+| `java-spring-boot.json` | Spring Boot application |
+| `node-express-api.json` | Express.js API |
+| `go-gin-api.json` | Go Gin API |
+| `legacy-no-schema.json` | Legacy format for migration testing |
+| `all-edge-reasons.json` | All 13 edge reason codes |
+| `all-visibility-levels.json` | All 5 visibility levels |
+
+## Related Documentation
+
+- [Reachability Analysis Technical Reference](../reachability/README.md)
+- [Schema Migration Implementation](../../src/Signals/StellaOps.Signals/Parsing/CallgraphSchemaMigrator.cs)
+- [SPRINT_1100: CallGraph Schema Enhancement](../implplan/SPRINT_1100_0001_0001_callgraph_schema_enhancement.md)
--- a/docs/modules/signals/guides/cas-promotion-24-002.md
+++ b/docs/modules/signals/guides/cas-promotion-24-002.md
@@ -0,0 +1,34 @@
+# SIGNALS-24-002 · CAS promotion checklist (v1)
+
+Purpose: unblock CAS promotion + signed manifest rollout for callgraph storage so SIGNALS-24-002 can move from BLOCKED to implementation.
+
+## Preconditions
+- CAS bucket created for `signals-callgraphs` with write limited to Signals service principals.
+- Surface bundle mock hash recorded; real scanner cache ETA published.
+- Signed manifest tooling available (sigstore or in-house signer) with add-only policy.
+
+## Steps
+1) Freeze manifest schema (fields: `graph_id`, `digest`, `language`, `source`, `created`, `signer`, `signature`).
+2) Generate manifests for existing callgraphs; store under `cas://signals/manifests/{graph_id}.json`.
+3) Sign each manifest; attach DSSE envelope; store under `cas://signals/manifests/{graph_id}.json.dsse`.
+4) Apply bucket policy: read-only for downstream, write for Signals service; deny deletes.
+5) Configure GC policy: retain manifests indefinitely; callgraph blobs keep 30d rolling unless referenced.
+6) Enable alerts for failed retrievals and missing manifest/DSSE pairs.
+7) Record hash list and signer key IDs in release notes.
+
+## Deliverables
+- Policy document + proof of applied IAM
+- Manifest schema JSON
+- Signed manifest samples (see tests)
+- Hash list of all published callgraphs (sha256)
+
+## Evidence locations (repo paths)
+- Policy & schema: `docs/modules/signals/guides/cas-promotion-24-002.md` (this file)
+- Sample manifest + DSSE: `tests/reachability/corpus/manifest.json` (already present) maps to expected structure.
+
+## Owners
+- Signals Guild (implementation)
+- Platform Storage Guild (policy/approvals)
+
+## Status
+- Checklist published 2025-11-19; awaiting Platform Storage approval to proceed.
--- a/docs/modules/signals/guides/events-24-005.md
+++ b/docs/modules/signals/guides/events-24-005.md
@@ -0,0 +1,49 @@
+# signals.fact.updated event contract (SIGNALS-24-005 prep)
+
+**Purpose**: replace the in-memory logger used during Signals development with a real event bus contract so reachability caches can be invalidated and downstream consumers (Policy Engine, Notifications, Console) can subscribe deterministically.
+
+## Topic / channel
+- Primary topic: `signals.fact.updated.v1`
+- Dead-letter topic: `signals.fact.updated.dlq`
+- Delivery: at-least-once; consumers must de-duplicate using `event_id`.
+
+## Message envelope
+```jsonc
+{
+  "event_id": "uuid-v4",                // stable across retries; used for idempotency
+  "emitted_at": "2025-11-20T00:00:00Z", // UTC, RFC3339
+  "tenant": "acme",                     // required; lower-case
+  "subject_key": "sbom:sha256:…" ,       // subject of facts (asset, sbom, host). Deterministic model key.
+  "fact_kind": "callgraph" | "runtime" | "reachability" | "signal", // enums mapped from Signals domain
+  "fact_version": 1,                     // monotonically increasing per subject_key + fact_kind
+  "digest": "sha256:…",                 // CAS digest of canonical fact document
+  "content_type": "application/json",   // or application/vnd.stellaops.ndjson when chunked
+  "producer": "StellaOps.Signals",      // emitting service
+  "source": {
+    "pipeline": "signals",             // consistent with Observability tags
+    "release": "0.4.0-alpha"           // optional
+  },
+  "trace": {
+    "trace_id": "…",                   // pass-through if available
+    "span_id": "…"
+  }
+}
+```
+
+## Routing / partitions
+- Partition key: `tenant` to keep per-tenant ordering.
+- Retry policy: exponential backoff up to 5 minutes; move to DLQ thereafter with `dlq_reason` header.
+
+## Consumer expectations
+- De-duplicate on `event_id` and `digest`.
+- Fetch fact body from CAS using `digest`; avoid embedding large payloads in the message.
+- If consumer cannot resolve CAS, treat as transient and retry later (do not drop).
+
+## Security / air-gap posture
+- No PII; tenant id only.
+- Works offline when bus is intra-cluster (e.g., NATS/Valkey Streams); external exporters disabled in sealed mode.
+
+## Provenance
+- This contract supersedes the temporary log-based publisher referenced in Signals sprint 0143 Execution Log (2025-11-18). Aligns with `signals.fact.updated@v1` payload shape already covered by unit tests.
+- Implementation: `Signals.Events` defaults to Valkey Streams (`signals.fact.updated.v1` with `signals.fact.updated.dlq`), emitting envelopes that include `event_id`, `fact_version`, and deterministic `fact.digest` (sha256) generated by the reachability fact hasher.
+- Router transport: set `Signals.Events.Driver=router` to POST envelopes to the StellaOps Router gateway (`BaseUrl` + `Path`, default `/router/events/signals.fact.updated`) with optional API key/headers. This path should forward to downstream consumers registered in Router; Valkey remains mandatory for reachability cache but not for event fan-out when router is enabled.
--- a/docs/modules/signals/guides/provenance-24-003.md
+++ b/docs/modules/signals/guides/provenance-24-003.md
@@ -0,0 +1,31 @@
+# SIGNALS-24-003 · Provenance appendix checklist (v1)
+
+Purpose: unblock provenance enrichment for runtime facts so SIGNALS-24-003 can advance once CAS promotion is approved.
+
+## Required fields (per runtime fact)
+- `callgraph_id` (matches CAS manifest id)
+- `ingested_at` (UTC ISO-8601), `received_at`
+- `tenant`
+- `source` (host/service emitting facts)
+- `pipeline_version` (git SHA or build ID)
+- `provenance_hash` (sha256 of raw fact blob)
+- `signer` (key id) and optional `rekor_uuid` or `skip_reason: offline`
+
+## Steps
+1) Freeze provenance JSON schema (`provenance.runtime.fact.v1`).
+2) Add enrichment stage writing provenance into CAS alongside runtime facts.
+3) Emit DSSE attestation per batch of runtime facts; store in CAS.
+4) Update `/signals/runtime-facts/ndjson` handler to return `provenance_hash` and `callgraph_id` when available.
+5) Add validation tests to ensure add-only evolution and deterministic ordering.
+
+## Deliverables
+- Schema file: `docs/modules/signals/guides/provenance-24-003.md` (this file) with field list and invariants.
+- Test fixtures: reuse `tests/reachability/corpus/*/vex.openvex.json` provenance anchors; add `provenance_hash` coverage to `ReachabilityLatticeTests` when available.
+
+## Owners
+- Signals Guild (implementation)
+- Runtime Guild (schema review)
+- Authority Guild (signing/attestation)
+
+## Status
+- Checklist published 2025-11-19; awaiting schema/signing approval to proceed.
--- a/docs/modules/signals/guides/reachability.md
+++ b/docs/modules/signals/guides/reachability.md
@@ -0,0 +1,16 @@
+# Reachability Signals (outline)
+
+## Pending Inputs
+- See sprint SPRINT_0309_0001_0009_docs_tasks_md_ix action tracker; inputs due 2025-12-09..12 from owning guilds.
+
+## Determinism Checklist
+- [ ] Hash any inbound assets/payloads; place sums alongside artifacts (e.g., SHA256SUMS in this folder).
+- [ ] Keep examples offline-friendly and deterministic (fixed seeds, pinned versions, stable ordering).
+- [ ] Note source/approver for any provided captures or schemas.
+
+## Sections to fill (once inputs arrive)
+- Purpose & scope (what “reachability” means across components).
+- States and scoring semantics.
+- Provenance and evidence sources.
+- Retention and TTL policy.
+- Sample payloads (with hashes recorded alongside).
--- a/docs/modules/signals/guides/runtime-facts.md
+++ b/docs/modules/signals/guides/runtime-facts.md
@@ -0,0 +1,15 @@
+# Runtime Facts (outline)
+
+## Pending Inputs
+- See sprint SPRINT_0309_0001_0009_docs_tasks_md_ix action tracker; inputs due 2025-12-09..12 from owning guilds.
+
+## Determinism Checklist
+- [ ] Hash any inbound assets/payloads; place sums alongside artifacts (e.g., SHA256SUMS in this folder).
+- [ ] Keep examples offline-friendly and deterministic (fixed seeds, pinned versions, stable ordering).
+- [ ] Note source/approver for any provided captures or schemas.
+
+## Sections to fill (once inputs arrive)
+- Runtime agent capabilities captured.
+- Privacy safeguards and opt-in flags.
+- Payload schema and field descriptions.
+- Examples and hash listings for sample traces.
--- a/docs/modules/signals/guides/unknowns-ranking.md
+++ b/docs/modules/signals/guides/unknowns-ranking.md
@@ -0,0 +1,383 @@
+# Unknowns Ranking Algorithm Reference
+
+This document describes the multi-factor scoring algorithm used to rank and triage unknowns in the StellaOps Signals module.
+
+## Purpose
+
+When reachability analysis encounters unresolved symbols, edges, or package identities, these are recorded as **unknowns**. The ranking algorithm prioritizes unknowns by computing a composite score from five factors, then assigns each to a triage band (HOT/WARM/COLD) that determines rescan scheduling and escalation policies.
+
+## Scoring Formula
+
+The composite score is computed as:
+
+```
+Score = wP × P + wE × E + wU × U + wC × C + wS × S
+```
+
+Where:
+- **P** = Popularity (deployment impact)
+- **E** = Exploit potential (CVE severity)
+- **U** = Uncertainty density (flag accumulation)
+- **C** = Centrality (graph position importance)
+- **S** = Staleness (evidence age)
+
+All factors are normalized to [0.0, 1.0] before weighting. The final score is clamped to [0.0, 1.0].
+
+### Default Weights
+
+| Factor | Weight | Description |
+|--------|--------|-------------|
+| wP | 0.25 | Popularity weight |
+| wE | 0.25 | Exploit potential weight |
+| wU | 0.25 | Uncertainty density weight |
+| wC | 0.15 | Centrality weight |
+| wS | 0.10 | Staleness weight |
+
+Weights must sum to 1.0 and are configurable via `Signals:UnknownsScoring` settings.
+
+## Factor Details
+
+### Factor P: Popularity (Deployment Impact)
+
+Measures how widely the unknown's package is deployed across monitored environments.
+
+**Formula:**
+```
+P = min(1, log10(1 + deploymentCount) / log10(1 + maxDeployments))
+```
+
+**Parameters:**
+- `deploymentCount`: Number of deployments referencing the package (from `deploy_refs` table)
+- `maxDeployments`: Normalization ceiling (default: 100)
+
+**Rationale:** Logarithmic scaling prevents a single highly-deployed package from dominating scores while still prioritizing widely-used dependencies.
+
+### Factor E: Exploit Potential (CVE Severity)
+
+Estimates the consequence severity if the unknown resolves to a vulnerable component.
+
+**Current Implementation:**
+- Returns 0.5 (medium potential) when no CVE association exists
+- Future: Integrate KEV lookup, EPSS scores, and exploit database references
+
+**Planned Enhancements:**
+- CVE severity mapping (Critical=1.0, High=0.8, Medium=0.5, Low=0.2)
+- KEV (Known Exploited Vulnerabilities) flag boost
+- EPSS (Exploit Prediction Scoring System) integration
+
+### Factor U: Uncertainty Density (Flag Accumulation)
+
+Aggregates uncertainty signals from multiple sources. Each flag contributes a weighted penalty.
+
+**Flag Weights:**
+
+| Flag | Weight | Description |
+|------|--------|-------------|
+| `NoProvenanceAnchor` | 0.30 | Cannot verify package source |
+| `VersionRange` | 0.25 | Version specified as range, not exact |
+| `DynamicCallTarget` | 0.25 | Reflection, eval, or dynamic dispatch |
+| `ConflictingFeeds` | 0.20 | Contradictory info from different feeds |
+| `ExternalAssembly` | 0.20 | Assembly outside analysis scope |
+| `MissingVector` | 0.15 | No CVSS vector for severity assessment |
+| `UnreachableSourceAdvisory` | 0.10 | Source advisory URL unreachable |
+
+**Formula:**
+```
+U = min(1.0, sum(activeFlags × flagWeight))
+```
+
+**Example:**
+- NoProvenanceAnchor (0.30) + VersionRange (0.25) + MissingVector (0.15) = 0.70
+
+### Factor C: Centrality (Graph Position Importance)
+
+Measures the unknown's position importance in the call graph using betweenness centrality.
+
+**Formula:**
+```
+C = min(1.0, betweenness / maxBetweenness)
+```
+
+**Parameters:**
+- `betweenness`: Raw betweenness centrality from graph analysis
+- `maxBetweenness`: Normalization ceiling (default: 1000)
+
+**Rationale:** High-betweenness nodes appear on many shortest paths, meaning they're likely to be reached regardless of entry point.
+
+**Related Metrics:**
+- `DegreeCentrality`: Number of incoming + outgoing edges (stored but not used in score)
+- `BetweennessCentrality`: Raw betweenness value (stored for debugging)
+
+### Factor S: Staleness (Evidence Age)
+
+Measures how old the evidence is since the last successful analysis attempt.
+
+**Formula:**
+```
+S = min(1.0, daysSinceLastAnalysis / maxDays)
+```
+
+With exponential decay enhancement (optional):
+```
+S = 1 - exp(-daysSinceLastAnalysis / tau)
+```
+
+**Parameters:**
+- `daysSinceLastAnalysis`: Days since `LastAnalyzedAt` timestamp
+- `maxDays`: Staleness ceiling (default: 14 days)
+- `tau`: Decay constant for exponential model (default: 14)
+
+**Special Cases:**
+- Never analyzed (`LastAnalyzedAt` is null): S = 1.0 (maximum staleness)
+
+## Band Assignment
+
+Based on the composite score, unknowns are assigned to triage bands:
+
+| Band | Threshold | Rescan Policy | Description |
+|------|-----------|---------------|-------------|
+| **HOT** | Score >= 0.70 | 15 minutes | Immediate rescan + VEX escalation |
+| **WARM** | 0.40 <= Score < 0.70 | 24 hours | Scheduled rescan within 12-72h |
+| **COLD** | Score < 0.40 | 7 days | Weekly batch processing |
+
+Thresholds are configurable:
+```yaml
+Signals:
+  UnknownsScoring:
+    HotThreshold: 0.70
+    WarmThreshold: 0.40
+```
+
+## Scheduler Integration
+
+The `UnknownsRescanWorker` processes unknowns based on their band:
+
+### HOT Band Processing
+- Poll interval: 1 minute
+- Batch size: 10 items
+- Action: Trigger immediate rescan via `IRescanOrchestrator`
+- On failure: Exponential backoff, max 3 retries before demotion to WARM
+
+### WARM Band Processing
+- Poll interval: 5 minutes
+- Batch size: 50 items
+- Scheduled window: 12-72 hours based on score within band
+- On failure: Increment `RescanAttempts`, re-queue with delay
+
+### COLD Band Processing
+- Schedule: Weekly on configurable day (default: Sunday)
+- Batch size: 500 items
+- Action: Batch rescan job submission
+- On failure: Log and retry next week
+
+## Normalization Trace
+
+Each scored unknown includes a `NormalizationTrace` for debugging and replay:
+
+```json
+{
+  "rawPopularity": 42,
+  "normalizedPopularity": 0.65,
+  "popularityFormula": "min(1, log10(1 + 42) / log10(1 + 100))",
+
+  "rawExploitPotential": 0.5,
+  "normalizedExploitPotential": 0.5,
+
+  "rawUncertainty": 0.55,
+  "normalizedUncertainty": 0.55,
+  "activeFlags": ["NoProvenanceAnchor", "VersionRange"],
+
+  "rawCentrality": 250.0,
+  "normalizedCentrality": 0.25,
+
+  "rawStaleness": 7,
+  "normalizedStaleness": 0.5,
+
+  "weights": {
+    "wP": 0.25,
+    "wE": 0.25,
+    "wU": 0.25,
+    "wC": 0.15,
+    "wS": 0.10
+  },
+  "finalScore": 0.52,
+  "assignedBand": "Warm",
+  "computedAt": "2025-12-15T10:00:00Z"
+}
+```
+
+**Replay Capability:** Given the trace, the exact score can be recomputed:
+```
+Score = 0.25×0.65 + 0.25×0.5 + 0.25×0.55 + 0.15×0.25 + 0.10×0.5
+      = 0.1625 + 0.125 + 0.1375 + 0.0375 + 0.05
+      = 0.5125 ≈ 0.52
+```
+
+## API Endpoints
+
+### Query Unknowns by Band
+
+```
+GET /api/signals/unknowns?band=hot&limit=50&offset=0
+```
+
+Response:
+```json
+{
+  "items": [
+    {
+      "id": "unk-123",
+      "subjectKey": "myapp|1.0.0",
+      "purl": "pkg:npm/lodash@4.17.21",
+      "score": 0.82,
+      "band": "Hot",
+      "flags": { "noProvenanceAnchor": true, "versionRange": true },
+      "nextScheduledRescan": "2025-12-15T10:15:00Z"
+    }
+  ],
+  "total": 15,
+  "hasMore": false
+}
+```
+
+### Get Score Explanation
+
+```
+GET /api/signals/unknowns/{id}/explain
+```
+
+Response:
+```json
+{
+  "unknown": { /* full UnknownSymbolDocument */ },
+  "normalizationTrace": { /* trace object */ },
+  "factorBreakdown": {
+    "popularity": { "raw": 42, "normalized": 0.65, "weighted": 0.1625 },
+    "exploitPotential": { "raw": 0.5, "normalized": 0.5, "weighted": 0.125 },
+    "uncertainty": { "raw": 0.55, "normalized": 0.55, "weighted": 0.1375 },
+    "centrality": { "raw": 250, "normalized": 0.25, "weighted": 0.0375 },
+    "staleness": { "raw": 7, "normalized": 0.5, "weighted": 0.05 }
+  },
+  "bandThresholds": { "hot": 0.70, "warm": 0.40 }
+}
+```
+
+## Configuration Reference
+
+```yaml
+Signals:
+  UnknownsScoring:
+    # Factor weights (must sum to 1.0)
+    WeightPopularity: 0.25
+    WeightExploitPotential: 0.25
+    WeightUncertainty: 0.25
+    WeightCentrality: 0.15
+    WeightStaleness: 0.10
+
+    # Popularity normalization
+    PopularityMaxDeployments: 100
+
+    # Uncertainty flag weights
+    FlagWeightNoProvenance: 0.30
+    FlagWeightVersionRange: 0.25
+    FlagWeightConflictingFeeds: 0.20
+    FlagWeightMissingVector: 0.15
+    FlagWeightUnreachableSource: 0.10
+    FlagWeightDynamicTarget: 0.25
+    FlagWeightExternalAssembly: 0.20
+
+    # Centrality normalization
+    CentralityMaxBetweenness: 1000.0
+
+    # Staleness normalization
+    StalenessMaxDays: 14
+    StalenessTau: 14  # For exponential decay
+
+    # Band thresholds
+    HotThreshold: 0.70
+    WarmThreshold: 0.40
+
+    # Rescan scheduling
+    HotRescanMinutes: 15
+    WarmRescanHours: 24
+    ColdRescanDays: 7
+
+  UnknownsDecay:
+    # Nightly batch decay
+    BatchEnabled: true
+    MaxSubjectsPerBatch: 1000
+    ColdBatchDay: Sunday
+```
+
+## Determinism Requirements
+
+The scoring algorithm is fully deterministic:
+
+1. **Same inputs produce identical scores** - Given identical `UnknownSymbolDocument`, deployment counts, and graph metrics, the score will always be the same
+2. **Normalization trace enables replay** - The trace contains all raw values and weights needed to reproduce the score
+3. **Timestamps use UTC ISO 8601** - All `ComputedAt`, `LastAnalyzedAt`, and `NextScheduledRescan` timestamps are UTC
+4. **Weights logged per computation** - The trace includes the exact weights used, allowing audit of configuration changes
+
+## Database Schema
+
+```sql
+-- Unknowns table (enhanced)
+CREATE TABLE signals.unknowns (
+    id UUID PRIMARY KEY,
+    subject_key TEXT NOT NULL,
+    purl TEXT,
+    symbol_id TEXT,
+    callgraph_id TEXT,
+
+    -- Scoring factors
+    popularity_score FLOAT DEFAULT 0,
+    deployment_count INT DEFAULT 0,
+    exploit_potential_score FLOAT DEFAULT 0,
+    uncertainty_score FLOAT DEFAULT 0,
+    centrality_score FLOAT DEFAULT 0,
+    degree_centrality INT DEFAULT 0,
+    betweenness_centrality FLOAT DEFAULT 0,
+    staleness_score FLOAT DEFAULT 0,
+    days_since_last_analysis INT DEFAULT 0,
+
+    -- Composite score and band
+    score FLOAT DEFAULT 0,
+    band TEXT DEFAULT 'cold' CHECK (band IN ('hot', 'warm', 'cold')),
+
+    -- Metadata
+    flags JSONB DEFAULT '{}',
+    normalization_trace JSONB,
+    rescan_attempts INT DEFAULT 0,
+    last_rescan_result TEXT,
+
+    -- Timestamps
+    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    last_analyzed_at TIMESTAMPTZ,
+    next_scheduled_rescan TIMESTAMPTZ
+);
+
+-- Indexes for band-based queries
+CREATE INDEX idx_unknowns_band ON signals.unknowns(band);
+CREATE INDEX idx_unknowns_score ON signals.unknowns(score DESC);
+CREATE INDEX idx_unknowns_next_rescan ON signals.unknowns(next_scheduled_rescan)
+    WHERE next_scheduled_rescan IS NOT NULL;
+CREATE INDEX idx_unknowns_subject ON signals.unknowns(subject_key);
+```
+
+## Metrics and Observability
+
+The following metrics are exposed for monitoring:
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `signals_unknowns_total` | Gauge | Total unknowns by band |
+| `signals_unknowns_rescans_total` | Counter | Rescans triggered by band |
+| `signals_unknowns_scoring_duration_seconds` | Histogram | Scoring computation time |
+| `signals_unknowns_band_transitions_total` | Counter | Band changes (e.g., WARM->HOT) |
+
+## Related Documentation
+
+- [Unknowns Registry](./unknowns-registry.md) - Data model and API for unknowns
+- [Reachability Analysis](./reachability.md) - Reachability scoring integration
+- [Callgraph Schema](./callgraph-formats.md) - Graph structure for centrality computation
--- a/docs/modules/signals/guides/unknowns-registry.md
+++ b/docs/modules/signals/guides/unknowns-registry.md
@@ -0,0 +1,81 @@
+# Unknowns Registry (Signals) — November 2026
+
+This document defines the Unknowns Registry that turns unresolved identities or edges into first-class signals. It replaces the temporary notes from late 2026 advisories.
+
+## 1. Purpose
+
+When scanners or runtime probes cannot decisively map artifacts, symbols, or package identities, the gap is recorded as an **Unknown** instead of being dropped. Policy and scoring can then incorporate “unknowns pressure” to avoid silent false negatives.
+
+## 2. Data model (v0)
+
+```json
+{
+  "unknown_id": "unk:sha256:<type+scope+evidence>",
+  "observed_at": "2025-11-20T00:00:00Z",
+  "provenance": { "source": "Scanner|Signals|SbomService|Vexer", "host": "runner-42", "scan_id": "scan:..." },
+  "scope": { "artifact": { "type": "oci.image", "ref": "registry/app@sha256:..." }, "subpath": "/app/bin/libssl.so.3", "phase": "scan|runtime|build" },
+  "unknown_type": "identity_gap|version_conflict|hash_mismatch|missing_edge|runtime_shadow|policy_undecidable",
+  "evidence": { "raw": "dynsym missing for libssl.so.3", "signals": ["sym:memcpy", "import:SSL_free"] },
+  "transitive": { "depth": 1, "parents": ["pkg:deb/openssl@3.0.2"], "children": [] },
+  "confidence": { "p": 0.42, "method": "rule" },
+  "exposure_hints": { "surface": ["startup"], "runtime_hits": 0 },
+  "status": "open|triaged|suppressed|resolved",
+  "labels": ["reachability:possible", "sbom:incomplete"]
+}
+```
+
+## 3. API (idempotent)
+
+- `POST /unknowns/ingest` — upsert by `unknown_id`; repeat payloads are no-ops.
+- `GET /unknowns?artifact=...&status=open` — list unknowns for a target.
+- `POST /unknowns/{id}/triage` — update `status`/`labels`, attach rationale.
+- `GET /unknowns/metrics` — density by artifact / unknown_type / depth.
+
+All endpoints are additive; no hard deletes. Payloads must include tenant bindings and CAS URIs when evidence is stored externally.
+
+## 4. Producers
+
+- **Scanner**: unresolved symbol → package mapping (stripped binaries), missing build-id, ambiguous purl; log with `unknown_type=identity_gap` or `missing_edge`.
+- **Signals**: runtime hits that cannot map to a graph node or purl; unresolved call edges.
+- **SbomService**: conflicting versions for same path; hash mismatch between SBOM and observed file.
+- **Vexer/Policy**: advisory without trustable provenance (`unknown_type=policy_undecidable`).
+
+## 5. Consumers & scoring
+
+- Signals scoring adds `unknowns_pressure = f(density(depth<=1), runtime_shadow, policy_undecidable)` and feeds it into reachability/risk scores.
+- Policy can block `not_affected` claims when `unknowns_pressure` exceeds thresholds.
+- UI/CLI show unknown chips with reason and depth; operators can triage or suppress.
+
+### 5.1 Multi-Factor Ranking
+
+Unknowns are ranked using a 5-factor scoring algorithm that computes a composite score from:
+- **Popularity (P)** - Deployment impact based on usage count
+- **Exploit Potential (E)** - CVE severity if known
+- **Uncertainty (U)** - Accumulated flag weights
+- **Centrality (C)** - Graph position importance (betweenness)
+- **Staleness (S)** - Evidence age since last analysis
+
+Based on the composite score, unknowns are assigned to triage bands:
+- **HOT** (score >= 0.70): Immediate rescan, 15-minute scheduling
+- **WARM** (0.40 <= score < 0.70): Scheduled rescan within 12-72h
+- **COLD** (score < 0.40): Weekly batch processing
+
+See [Unknowns Ranking Algorithm](./unknowns-ranking.md) for the complete formula reference.
+
+## 6. Storage & CAS
+
+- Primary store: append-only KV/graph in PostgreSQL (tables `unknowns`, `unknown_metrics`).
+- Evidence blobs: CAS under `cas://unknowns/{sha256}` for large payloads (runtime traces, partial SBOMs).
+- Include analyzer fingerprint + schema version in each record for replay.
+
+## 7. Integration checkpoints
+
+- Add writer hooks in Scanner/Signals once `richgraph-v1` and runtime ingestion surface unmapped items.
+- Extend reachability lattice docs to note `unknowns_pressure` input.
+- Add Grafana panel for unknown density per artifact/namespace.
+
+## 8. Acceptance criteria
+
+- APIs deployed with idempotent behavior and tenant guards.
+- At least two producer paths writing Unknowns (Scanner unresolved symbol; Signals runtime shadow).
+- Metrics endpoint shows density and trend; UI/CLI expose triage status.