Files
git.stella-ops.org/docs/uncertainty/README.md
StellaOps Bot 6e45066e37
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
up
2025-12-13 09:37:15 +02:00

327 lines
9.7 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Uncertainty States & Entropy Scoring
> **Status:** Implemented v0 for reachability facts (Signals).
> **Owners:** Signals Guild · Policy Guild · UI Guild.
StellaOps treats missing data and untrusted evidence as **first-class uncertainty states**, not silent false negatives. Signals persists uncertainty state entries alongside reachability facts and derives a deterministic `riskScore` that increases when entropy is high.
---
## 1. Core states (extensible)
| Code | Name | Meaning |
|------|------|---------|
| `U1` | `MissingSymbolResolution` | Unresolved symbols/edges prevent a complete reachability proof. |
| `U2` | `MissingPurl` | Package identity/version is ambiguous (lockfile absent, heuristics only). |
| `U3` | `UntrustedAdvisory` | Advisory source lacks provenance/corroboration. |
| `U4` | `Unknown` | No analyzers have processed this subject; baseline uncertainty. |
Each state records:
- `entropy` (0..1)
- `evidence[]` list pointing to analyzers/heuristics/sources
- optional `timestamp` (UTC)
---
## 1.1 Uncertainty Tiers (v1 — Sprint 0401)
Uncertainty states are grouped into **tiers** that determine policy thresholds and UI treatment.
### Tier Definitions
| Tier | Entropy Range | States | Risk Modifier | Policy Implication |
|------|---------------|--------|---------------|-------------------|
| **T1 (High)** | `0.7 - 1.0` | `U1` (high), `U4` | `+50%` | Block "not_affected", require human review |
| **T2 (Medium)** | `0.4 - 0.69` | `U1` (medium), `U2` | `+25%` | Warn on "not_affected", flag for review |
| **T3 (Low)** | `0.1 - 0.39` | `U2` (low), `U3` | `+10%` | Allow "not_affected" with advisory note |
| **T4 (Negligible)** | `0.0 - 0.09` | `U3` (low) | `+0%` | Normal processing, no special handling |
### Tier Assignment Rules
1. **U1 (MissingSymbolResolution):**
- `entropy >= 0.7` → T1 (>30% unknowns in callgraph)
- `entropy >= 0.4` → T2 (15-30% unknowns)
- `entropy < 0.4` → T3 (<15% unknowns)
2. **U2 (MissingPurl):**
- `entropy >= 0.5` T2 (>50% packages unresolved)
- `entropy < 0.5` → T3 (<50% packages unresolved)
3. **U3 (UntrustedAdvisory):**
- `entropy >= 0.6` T3 (no corroboration)
- `entropy < 0.6` T4 (partial corroboration)
4. **U4 (Unknown):**
- Always T1 (no analysis performed = maximum uncertainty)
### Aggregate Tier Calculation
When multiple uncertainty states exist, the aggregate tier is the **maximum** (most severe):
```
aggregateTier = max(tier(state) for state in uncertainty.states)
```
---
## 2. JSON shape
```json
{
"uncertainty": {
"states": [
{
"code": "U1",
"name": "MissingSymbolResolution",
"entropy": 0.72,
"timestamp": "2025-11-12T14:12:00Z",
"evidence": [
{
"type": "UnknownsRegistry",
"sourceId": "signals.unknowns",
"detail": "unknownsCount=12;unknownsPressure=0.375"
}
]
}
]
}
}
```
---
## 3. Risk score math (Signals)
Signals computes a `riskScore` deterministically during reachability recompute:
```
meanEntropy = avg(uncertainty.states[].entropy) // 0 when no states
entropyBoost = clamp(meanEntropy * k, 0 .. boostCeiling)
riskScore = clamp(baseScore * (1 + entropyBoost), 0 .. 1)
```
Where:
- `baseScore` is the average of per-target reachability state scores (before unknowns penalty).
- `k` defaults to `0.5` (`SignalsOptions:Scoring:UncertaintyEntropyMultiplier`).
- `boostCeiling` defaults to `0.5` (`SignalsOptions:Scoring:UncertaintyBoostCeiling`).
---
## 4. Policy guidance (high level)
Uncertainty should bias decisions away from "not affected" when evidence is missing:
- High entropy (`U1` with high `entropy`) should lead to **under investigation** and drive remediation (upload symbols, run probes, close unknowns).
- Low entropy should allow normal confidence-based gates.
See `docs/reachability/lattice.md` for the current reachability score model and `docs/api/signals/reachability-contract.md` for the Signals contract.
---
## 5. Tier-Based Risk Score (v1 — Sprint 0401)
### Risk Score Formula
Building on §3, the v1 risk score incorporates tier-based modifiers:
```
tierModifier = {
T1: 0.50,
T2: 0.25,
T3: 0.10,
T4: 0.00
}[aggregateTier]
riskScore = clamp(baseScore * (1 + tierModifier + entropyBoost), 0 .. 1)
```
Where:
- `baseScore` is the average of per-target reachability state scores
- `tierModifier` is the tier-based risk increase
- `entropyBoost` is the existing entropy-based boost 3)
### Example Calculation
```
Given:
- baseScore = 0.4 (moderate reachability)
- uncertainty.states = [
{code: "U1", entropy: 0.72}, // T1 tier
{code: "U3", entropy: 0.45} // T3 tier
]
- aggregateTier = T1 (max of T1, T3)
- tierModifier = 0.50
meanEntropy = (0.72 + 0.45) / 2 = 0.585
entropyBoost = clamp(0.585 * 0.5, 0 .. 0.5) = 0.2925
riskScore = clamp(0.4 * (1 + 0.50 + 0.2925), 0 .. 1)
= clamp(0.4 * 1.7925, 0 .. 1)
= clamp(0.717, 0 .. 1)
= 0.717
```
### Tier Thresholds for Policy Gates
| Tier | `riskScore` Range | VEX "not_affected" | VEX "affected" | Auto-triage |
|------|-------------------|-------------------|----------------|-------------|
| T1 | `>= 0.6` | blocked | review | `under_investigation` |
| T2 | `0.4 - 0.59` | warning | allowed | Manual review |
| T3 | `0.2 - 0.39` | with note | allowed | Normal |
| T4 | `< 0.2` | allowed | allowed | Normal |
---
## 6. JSON Schema (v1)
Extended schema with tier information:
```json
{
"uncertainty": {
"states": [
{
"code": "U1",
"name": "MissingSymbolResolution",
"entropy": 0.72,
"tier": "T1",
"timestamp": "2025-12-13T10:00:00Z",
"evidence": [
{
"type": "UnknownsRegistry",
"sourceId": "signals.unknowns",
"detail": "unknownsCount=45;totalSymbols=125;unknownsPressure=0.36"
}
]
},
{
"code": "U4",
"name": "Unknown",
"entropy": 1.0,
"tier": "T1",
"timestamp": "2025-12-13T10:00:00Z",
"evidence": [
{
"type": "NoAnalysis",
"sourceId": "signals.bootstrap",
"detail": "subject not yet analyzed"
}
]
}
],
"aggregateTier": "T1",
"riskScore": 0.717,
"computedAt": "2025-12-13T10:00:00Z"
}
}
```
---
## 7. Implementation Pointers
- **Tier calculation:** `UncertaintyTierCalculator` in `src/Signals/StellaOps.Signals/Services/`
- **Risk score math:** `ReachabilityScoringService.ComputeRiskScore()` (extend existing)
- **Policy integration:** `docs/policy/dsl.md` §12 for uncertainty gates
- **Lattice integration:** `docs/reachability/lattice.md` §9 for v1 lattice states
---
## 8. Policy Guidance (v1 — Sprint 0401)
Uncertainty gates enforce evidence-quality thresholds in the Policy Engine. When entropy is too high or evidence is missing, policies block or downgrade VEX decisions.
### 8.1 Gate Mapping
| Gate | Uncertainty State | Tier | Policy Action |
|------|------------------|------|---------------|
| `U1` | `MissingSymbolResolution` | T1/T2 | Block `not_affected`, require review |
| `U2` | `MissingPurl` | T2/T3 | Warn on `not_affected`, add review flag |
| `U3` | `UntrustedAdvisory` | T3/T4 | Advisory caveat, no blocking |
### 8.2 Sample Policy Rules
```dsl
// Block not_affected when symbol resolution has high entropy
rule u1_gate_high_entropy priority 5 {
when signals.uncertainty.level == "U1"
and signals.uncertainty.entropy >= 0.7
then status := "under_investigation"
annotate gate := "U1"
annotate remediation := "Upload symbols or close unknowns registry"
because "High symbol entropy blocks strong VEX claims";
}
// Tier-based compound gate
rule tier1_block_not_affected priority 3 {
when signals.uncertainty.aggregateTier == "T1"
and vex.any(status == "not_affected")
then status := "under_investigation"
annotate blocked_reason := "T1 uncertainty requires evidence"
because "Maximum uncertainty tier blocks all exclusion claims";
}
```
### 8.3 YAML Configuration
```yaml
uncertainty_gates:
u1_gate:
entropy_threshold: 0.7
blocked_statuses: [not_affected]
fallback_status: under_investigation
remediation_hint: "Upload symbols or resolve unknowns"
u2_gate:
entropy_threshold: 0.4
blocked_statuses: [not_affected]
warn_on_block: true
u3_gate:
entropy_threshold: 0.1
annotate_caveat: true
```
See `docs/policy/dsl.md` §12 for complete gate rules and tier-aware compound patterns.
---
## 9. Remediation Actions
Each uncertainty state has recommended remediation steps:
| State | Code | Remediation | CLI Command |
|-------|------|-------------|-------------|
| MissingSymbolResolution | `U1` | Upload debug symbols, resolve unknowns | `stella symbols ingest --path <symbols>` |
| MissingPurl | `U2` | Generate lockfile, verify package coordinates | `stella sbom refresh --resolve` |
| UntrustedAdvisory | `U3` | Cross-reference trusted sources | `stella advisory verify --source NVD,GHSA` |
| Unknown | `U4` | Run initial analysis | `stella scan --full` |
### 9.1 Automated Remediation Flow
```
1. Policy blocks decision with U1/U2 gate
2. Console/CLI shows remediation hint
3. User runs remediation command (e.g., stella symbols ingest)
4. Signals recomputes uncertainty states
5. Risk score updates, tier may drop
6. Policy re-evaluates, decision may proceed
```
### 9.2 Remediation Priority
When multiple uncertainty states exist, prioritize by tier:
1. **T1 states first** Block all exclusions until resolved
2. **T2 states** May proceed with warnings if T1 cleared
3. **T3/T4 states** Normal flow with caveats
---
*Last updated: 2025-12-13 (Sprint 0401).*