Files
git.stella-ops.org/docs/signals/unknowns-registry.md
master 5a480a3c2a
Some checks failed
Reachability Corpus Validation / validate-corpus (push) Waiting to run
Reachability Corpus Validation / validate-ground-truths (push) Waiting to run
Reachability Corpus Validation / determinism-check (push) Blocked by required conditions
Scanner Analyzers / Discover Analyzers (push) Waiting to run
Scanner Analyzers / Build Analyzers (push) Blocked by required conditions
Scanner Analyzers / Test Language Analyzers (push) Blocked by required conditions
Scanner Analyzers / Validate Test Fixtures (push) Waiting to run
Scanner Analyzers / Verify Deterministic Output (push) Blocked by required conditions
Signals CI & Image / signals-ci (push) Waiting to run
Signals Reachability Scoring & Events / reachability-smoke (push) Waiting to run
Signals Reachability Scoring & Events / sign-and-upload (push) Blocked by required conditions
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Add call graph fixtures for various languages and scenarios
- Introduced `all-edge-reasons.json` to test edge resolution reasons in .NET.
- Added `all-visibility-levels.json` to validate method visibility levels in .NET.
- Created `dotnet-aspnetcore-minimal.json` for a minimal ASP.NET Core application.
- Included `go-gin-api.json` for a Go Gin API application structure.
- Added `java-spring-boot.json` for the Spring PetClinic application in Java.
- Introduced `legacy-no-schema.json` for legacy application structure without schema.
- Created `node-express-api.json` for an Express.js API application structure.
2025-12-16 10:44:24 +02:00

82 lines
4.2 KiB
Markdown

# Unknowns Registry (Signals) — November 2026
This document defines the Unknowns Registry that turns unresolved identities or edges into first-class signals. It replaces the temporary notes from late 2026 advisories.
## 1. Purpose
When scanners or runtime probes cannot decisively map artifacts, symbols, or package identities, the gap is recorded as an **Unknown** instead of being dropped. Policy and scoring can then incorporate “unknowns pressure” to avoid silent false negatives.
## 2. Data model (v0)
```json
{
"unknown_id": "unk:sha256:<type+scope+evidence>",
"observed_at": "2025-11-20T00:00:00Z",
"provenance": { "source": "Scanner|Signals|SbomService|Vexer", "host": "runner-42", "scan_id": "scan:..." },
"scope": { "artifact": { "type": "oci.image", "ref": "registry/app@sha256:..." }, "subpath": "/app/bin/libssl.so.3", "phase": "scan|runtime|build" },
"unknown_type": "identity_gap|version_conflict|hash_mismatch|missing_edge|runtime_shadow|policy_undecidable",
"evidence": { "raw": "dynsym missing for libssl.so.3", "signals": ["sym:memcpy", "import:SSL_free"] },
"transitive": { "depth": 1, "parents": ["pkg:deb/openssl@3.0.2"], "children": [] },
"confidence": { "p": 0.42, "method": "rule" },
"exposure_hints": { "surface": ["startup"], "runtime_hits": 0 },
"status": "open|triaged|suppressed|resolved",
"labels": ["reachability:possible", "sbom:incomplete"]
}
```
## 3. API (idempotent)
- `POST /unknowns/ingest` — upsert by `unknown_id`; repeat payloads are no-ops.
- `GET /unknowns?artifact=...&status=open` — list unknowns for a target.
- `POST /unknowns/{id}/triage` — update `status`/`labels`, attach rationale.
- `GET /unknowns/metrics` — density by artifact / unknown_type / depth.
All endpoints are additive; no hard deletes. Payloads must include tenant bindings and CAS URIs when evidence is stored externally.
## 4. Producers
- **Scanner**: unresolved symbol → package mapping (stripped binaries), missing build-id, ambiguous purl; log with `unknown_type=identity_gap` or `missing_edge`.
- **Signals**: runtime hits that cannot map to a graph node or purl; unresolved call edges.
- **SbomService**: conflicting versions for same path; hash mismatch between SBOM and observed file.
- **Vexer/Policy**: advisory without trustable provenance (`unknown_type=policy_undecidable`).
## 5. Consumers & scoring
- Signals scoring adds `unknowns_pressure = f(density(depth<=1), runtime_shadow, policy_undecidable)` and feeds it into reachability/risk scores.
- Policy can block `not_affected` claims when `unknowns_pressure` exceeds thresholds.
- UI/CLI show unknown chips with reason and depth; operators can triage or suppress.
### 5.1 Multi-Factor Ranking
Unknowns are ranked using a 5-factor scoring algorithm that computes a composite score from:
- **Popularity (P)** - Deployment impact based on usage count
- **Exploit Potential (E)** - CVE severity if known
- **Uncertainty (U)** - Accumulated flag weights
- **Centrality (C)** - Graph position importance (betweenness)
- **Staleness (S)** - Evidence age since last analysis
Based on the composite score, unknowns are assigned to triage bands:
- **HOT** (score >= 0.70): Immediate rescan, 15-minute scheduling
- **WARM** (0.40 <= score < 0.70): Scheduled rescan within 12-72h
- **COLD** (score < 0.40): Weekly batch processing
See [Unknowns Ranking Algorithm](./unknowns-ranking.md) for the complete formula reference.
## 6. Storage & CAS
- Primary store: append-only KV/graph in Mongo (collections `unknowns`, `unknown_metrics`).
- Evidence blobs: CAS under `cas://unknowns/{sha256}` for large payloads (runtime traces, partial SBOMs).
- Include analyzer fingerprint + schema version in each record for replay.
## 7. Integration checkpoints
- Add writer hooks in Scanner/Signals once `richgraph-v1` and runtime ingestion surface unmapped items.
- Extend reachability lattice docs to note `unknowns_pressure` input.
- Add Grafana panel for unknown density per artifact/namespace.
## 8. Acceptance criteria
- APIs deployed with idempotent behavior and tenant guards.
- At least two producer paths writing Unknowns (Scanner unresolved symbol; Signals runtime shadow).
- Metrics endpoint shows density and trend; UI/CLI expose triage status.