git.stella-ops.org/docs/uncertainty/README.md

# Uncertainty States & Entropy Scoring

> **Status:** Implemented v0 for reachability facts (Signals).
> **Owners:** Signals Guild · Policy Guild · UI Guild.

StellaOps treats missing data and untrusted evidence as **first-class uncertainty states**, not silent false negatives. Signals persists uncertainty state entries alongside reachability facts and derives a deterministic `riskScore` that increases when entropy is high.

---

## 1. Core states (extensible)

| Code | Name | Meaning |
|------|------|---------|
| `U1` | `MissingSymbolResolution` | Unresolved symbols/edges prevent a complete reachability proof. |
| `U2` | `MissingPurl` | Package identity/version is ambiguous (lockfile absent, heuristics only). |
| `U3` | `UntrustedAdvisory` | Advisory source lacks provenance/corroboration. |
| `U4` | `Unknown` | No analyzers have processed this subject; baseline uncertainty. |

Each state records:

- `entropy` (0..1)
- `evidence[]` list pointing to analyzers/heuristics/sources
- optional `timestamp` (UTC)

---

## 1.1 Uncertainty Tiers (v1 — Sprint 0401)

Uncertainty states are grouped into **tiers** that determine policy thresholds and UI treatment.

### Tier Definitions

| Tier | Entropy Range | States | Risk Modifier | Policy Implication |
|------|---------------|--------|---------------|-------------------|
| **T1 (High)** | `0.7 - 1.0` | `U1` (high), `U4` | `+50%` | Block "not_affected", require human review |
| **T2 (Medium)** | `0.4 - 0.69` | `U1` (medium), `U2` | `+25%` | Warn on "not_affected", flag for review |
| **T3 (Low)** | `0.1 - 0.39` | `U2` (low), `U3` | `+10%` | Allow "not_affected" with advisory note |
| **T4 (Negligible)** | `0.0 - 0.09` | `U3` (low) | `+0%` | Normal processing, no special handling |

### Tier Assignment Rules

1. **U1 (MissingSymbolResolution):**
   - `entropy >= 0.7` → T1 (>30% unknowns in callgraph)
   - `entropy >= 0.4` → T2 (15-30% unknowns)
   - `entropy < 0.4` → T3 (<15% unknowns)

2. **U2 (MissingPurl):**
   - `entropy >= 0.5` → T2 (>50% packages unresolved)
   - `entropy < 0.5` → T3 (<50% packages unresolved)

3. **U3 (UntrustedAdvisory):**
   - `entropy >= 0.6` → T3 (no corroboration)
   - `entropy < 0.6` → T4 (partial corroboration)

4. **U4 (Unknown):**
   - Always T1 (no analysis performed = maximum uncertainty)

### Aggregate Tier Calculation

When multiple uncertainty states exist, the aggregate tier is the **maximum** (most severe):

```
aggregateTier = max(tier(state) for state in uncertainty.states)
```

---

## 2. JSON shape

```json
{
  "uncertainty": {
    "states": [
      {
        "code": "U1",
        "name": "MissingSymbolResolution",
        "entropy": 0.72,
        "timestamp": "2025-11-12T14:12:00Z",
        "evidence": [
          {
            "type": "UnknownsRegistry",
            "sourceId": "signals.unknowns",
            "detail": "unknownsCount=12;unknownsPressure=0.375"
          }
        ]
      }
    ]
  }
}
```

---

## 3. Risk score math (Signals)

Signals computes a `riskScore` deterministically during reachability recompute:

```
meanEntropy  = avg(uncertainty.states[].entropy)              // 0 when no states
entropyBoost = clamp(meanEntropy * k, 0 .. boostCeiling)
riskScore    = clamp(baseScore * (1 + entropyBoost), 0 .. 1)
```

Where:

- `baseScore` is the average of per-target reachability state scores (before unknowns penalty).
- `k` defaults to `0.5` (`SignalsOptions:Scoring:UncertaintyEntropyMultiplier`).
- `boostCeiling` defaults to `0.5` (`SignalsOptions:Scoring:UncertaintyBoostCeiling`).

---

## 4. Policy guidance (high level)

Uncertainty should bias decisions away from "not affected" when evidence is missing:

- High entropy (`U1` with high `entropy`) should lead to **under investigation** and drive remediation (upload symbols, run probes, close unknowns).
- Low entropy should allow normal confidence-based gates.

See `docs/reachability/lattice.md` for the current reachability score model and `docs/api/signals/reachability-contract.md` for the Signals contract.

---

## 5. Tier-Based Risk Score (v1 — Sprint 0401)

### Risk Score Formula

Building on §3, the v1 risk score incorporates tier-based modifiers:

```
tierModifier = {
  T1: 0.50,
  T2: 0.25,
  T3: 0.10,
  T4: 0.00
}[aggregateTier]

riskScore = clamp(baseScore * (1 + tierModifier + entropyBoost), 0 .. 1)
```

Where:
- `baseScore` is the average of per-target reachability state scores
- `tierModifier` is the tier-based risk increase
- `entropyBoost` is the existing entropy-based boost (§3)

### Example Calculation

```
Given:
  - baseScore = 0.4 (moderate reachability)
  - uncertainty.states = [
      {code: "U1", entropy: 0.72},  // T1 tier
      {code: "U3", entropy: 0.45}   // T3 tier
    ]
  - aggregateTier = T1 (max of T1, T3)
  - tierModifier = 0.50

  meanEntropy = (0.72 + 0.45) / 2 = 0.585
  entropyBoost = clamp(0.585 * 0.5, 0 .. 0.5) = 0.2925

  riskScore = clamp(0.4 * (1 + 0.50 + 0.2925), 0 .. 1)
            = clamp(0.4 * 1.7925, 0 .. 1)
            = clamp(0.717, 0 .. 1)
            = 0.717
```

### Tier Thresholds for Policy Gates

| Tier | `riskScore` Range | VEX "not_affected" | VEX "affected" | Auto-triage |
|------|-------------------|-------------------|----------------|-------------|
| T1 | `>= 0.6` | ❌ blocked | ⚠️ review | → `under_investigation` |
| T2 | `0.4 - 0.59` | ⚠️ warning | ✅ allowed | Manual review |
| T3 | `0.2 - 0.39` | ✅ with note | ✅ allowed | Normal |
| T4 | `< 0.2` | ✅ allowed | ✅ allowed | Normal |

---

## 6. JSON Schema (v1)

Extended schema with tier information:

```json
{
  "uncertainty": {
    "states": [
      {
        "code": "U1",
        "name": "MissingSymbolResolution",
        "entropy": 0.72,
        "tier": "T1",
        "timestamp": "2025-12-13T10:00:00Z",
        "evidence": [
          {
            "type": "UnknownsRegistry",
            "sourceId": "signals.unknowns",
            "detail": "unknownsCount=45;totalSymbols=125;unknownsPressure=0.36"
          }
        ]
      },
      {
        "code": "U4",
        "name": "Unknown",
        "entropy": 1.0,
        "tier": "T1",
        "timestamp": "2025-12-13T10:00:00Z",
        "evidence": [
          {
            "type": "NoAnalysis",
            "sourceId": "signals.bootstrap",
            "detail": "subject not yet analyzed"
          }
        ]
      }
    ],
    "aggregateTier": "T1",
    "riskScore": 0.717,
    "computedAt": "2025-12-13T10:00:00Z"
  }
}
```

---

## 7. Implementation Pointers

- **Tier calculation:** `UncertaintyTierCalculator` in `src/Signals/StellaOps.Signals/Services/`
- **Risk score math:** `ReachabilityScoringService.ComputeRiskScore()` (extend existing)
- **Policy integration:** `docs/reachability/policy-gate.md` for gate rules
- **Lattice integration:** `docs/reachability/lattice.md` §9 for v1 lattice states