Files
git.stella-ops.org/docs/uncertainty/README.md
StellaOps Bot 999e26a48e up
2025-12-13 02:22:15 +02:00

229 lines
6.7 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Uncertainty States & Entropy Scoring
> **Status:** Implemented v0 for reachability facts (Signals).
> **Owners:** Signals Guild · Policy Guild · UI Guild.
StellaOps treats missing data and untrusted evidence as **first-class uncertainty states**, not silent false negatives. Signals persists uncertainty state entries alongside reachability facts and derives a deterministic `riskScore` that increases when entropy is high.
---
## 1. Core states (extensible)
| Code | Name | Meaning |
|------|------|---------|
| `U1` | `MissingSymbolResolution` | Unresolved symbols/edges prevent a complete reachability proof. |
| `U2` | `MissingPurl` | Package identity/version is ambiguous (lockfile absent, heuristics only). |
| `U3` | `UntrustedAdvisory` | Advisory source lacks provenance/corroboration. |
| `U4` | `Unknown` | No analyzers have processed this subject; baseline uncertainty. |
Each state records:
- `entropy` (0..1)
- `evidence[]` list pointing to analyzers/heuristics/sources
- optional `timestamp` (UTC)
---
## 1.1 Uncertainty Tiers (v1 — Sprint 0401)
Uncertainty states are grouped into **tiers** that determine policy thresholds and UI treatment.
### Tier Definitions
| Tier | Entropy Range | States | Risk Modifier | Policy Implication |
|------|---------------|--------|---------------|-------------------|
| **T1 (High)** | `0.7 - 1.0` | `U1` (high), `U4` | `+50%` | Block "not_affected", require human review |
| **T2 (Medium)** | `0.4 - 0.69` | `U1` (medium), `U2` | `+25%` | Warn on "not_affected", flag for review |
| **T3 (Low)** | `0.1 - 0.39` | `U2` (low), `U3` | `+10%` | Allow "not_affected" with advisory note |
| **T4 (Negligible)** | `0.0 - 0.09` | `U3` (low) | `+0%` | Normal processing, no special handling |
### Tier Assignment Rules
1. **U1 (MissingSymbolResolution):**
- `entropy >= 0.7` → T1 (>30% unknowns in callgraph)
- `entropy >= 0.4` → T2 (15-30% unknowns)
- `entropy < 0.4` → T3 (<15% unknowns)
2. **U2 (MissingPurl):**
- `entropy >= 0.5` T2 (>50% packages unresolved)
- `entropy < 0.5` → T3 (<50% packages unresolved)
3. **U3 (UntrustedAdvisory):**
- `entropy >= 0.6` T3 (no corroboration)
- `entropy < 0.6` T4 (partial corroboration)
4. **U4 (Unknown):**
- Always T1 (no analysis performed = maximum uncertainty)
### Aggregate Tier Calculation
When multiple uncertainty states exist, the aggregate tier is the **maximum** (most severe):
```
aggregateTier = max(tier(state) for state in uncertainty.states)
```
---
## 2. JSON shape
```json
{
"uncertainty": {
"states": [
{
"code": "U1",
"name": "MissingSymbolResolution",
"entropy": 0.72,
"timestamp": "2025-11-12T14:12:00Z",
"evidence": [
{
"type": "UnknownsRegistry",
"sourceId": "signals.unknowns",
"detail": "unknownsCount=12;unknownsPressure=0.375"
}
]
}
]
}
}
```
---
## 3. Risk score math (Signals)
Signals computes a `riskScore` deterministically during reachability recompute:
```
meanEntropy = avg(uncertainty.states[].entropy) // 0 when no states
entropyBoost = clamp(meanEntropy * k, 0 .. boostCeiling)
riskScore = clamp(baseScore * (1 + entropyBoost), 0 .. 1)
```
Where:
- `baseScore` is the average of per-target reachability state scores (before unknowns penalty).
- `k` defaults to `0.5` (`SignalsOptions:Scoring:UncertaintyEntropyMultiplier`).
- `boostCeiling` defaults to `0.5` (`SignalsOptions:Scoring:UncertaintyBoostCeiling`).
---
## 4. Policy guidance (high level)
Uncertainty should bias decisions away from "not affected" when evidence is missing:
- High entropy (`U1` with high `entropy`) should lead to **under investigation** and drive remediation (upload symbols, run probes, close unknowns).
- Low entropy should allow normal confidence-based gates.
See `docs/reachability/lattice.md` for the current reachability score model and `docs/api/signals/reachability-contract.md` for the Signals contract.
---
## 5. Tier-Based Risk Score (v1 — Sprint 0401)
### Risk Score Formula
Building on §3, the v1 risk score incorporates tier-based modifiers:
```
tierModifier = {
T1: 0.50,
T2: 0.25,
T3: 0.10,
T4: 0.00
}[aggregateTier]
riskScore = clamp(baseScore * (1 + tierModifier + entropyBoost), 0 .. 1)
```
Where:
- `baseScore` is the average of per-target reachability state scores
- `tierModifier` is the tier-based risk increase
- `entropyBoost` is the existing entropy-based boost 3)
### Example Calculation
```
Given:
- baseScore = 0.4 (moderate reachability)
- uncertainty.states = [
{code: "U1", entropy: 0.72}, // T1 tier
{code: "U3", entropy: 0.45} // T3 tier
]
- aggregateTier = T1 (max of T1, T3)
- tierModifier = 0.50
meanEntropy = (0.72 + 0.45) / 2 = 0.585
entropyBoost = clamp(0.585 * 0.5, 0 .. 0.5) = 0.2925
riskScore = clamp(0.4 * (1 + 0.50 + 0.2925), 0 .. 1)
= clamp(0.4 * 1.7925, 0 .. 1)
= clamp(0.717, 0 .. 1)
= 0.717
```
### Tier Thresholds for Policy Gates
| Tier | `riskScore` Range | VEX "not_affected" | VEX "affected" | Auto-triage |
|------|-------------------|-------------------|----------------|-------------|
| T1 | `>= 0.6` | blocked | review | `under_investigation` |
| T2 | `0.4 - 0.59` | warning | allowed | Manual review |
| T3 | `0.2 - 0.39` | with note | allowed | Normal |
| T4 | `< 0.2` | allowed | allowed | Normal |
---
## 6. JSON Schema (v1)
Extended schema with tier information:
```json
{
"uncertainty": {
"states": [
{
"code": "U1",
"name": "MissingSymbolResolution",
"entropy": 0.72,
"tier": "T1",
"timestamp": "2025-12-13T10:00:00Z",
"evidence": [
{
"type": "UnknownsRegistry",
"sourceId": "signals.unknowns",
"detail": "unknownsCount=45;totalSymbols=125;unknownsPressure=0.36"
}
]
},
{
"code": "U4",
"name": "Unknown",
"entropy": 1.0,
"tier": "T1",
"timestamp": "2025-12-13T10:00:00Z",
"evidence": [
{
"type": "NoAnalysis",
"sourceId": "signals.bootstrap",
"detail": "subject not yet analyzed"
}
]
}
],
"aggregateTier": "T1",
"riskScore": 0.717,
"computedAt": "2025-12-13T10:00:00Z"
}
}
```
---
## 7. Implementation Pointers
- **Tier calculation:** `UncertaintyTierCalculator` in `src/Signals/StellaOps.Signals/Services/`
- **Risk score math:** `ReachabilityScoringService.ComputeRiskScore()` (extend existing)
- **Policy integration:** `docs/reachability/policy-gate.md` for gate rules
- **Lattice integration:** `docs/reachability/lattice.md` §9 for v1 lattice states