Files
git.stella-ops.org/docs/reachability/lattice.md
StellaOps Bot 999e26a48e up
2025-12-13 02:22:15 +02:00

255 lines
9.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reachability Lattice & Scoring Model
> **Status:** Implemented v0 in Signals; this document describes the current deterministic bucket model and its policy-facing implications.
> **Owners:** Scanner Guild · Signals Guild · Policy Guild.
StellaOps models reachability as a deterministic, evidence-linked outcome that can safely represent "unknown" without silently producing false safety. Signals produces a `ReachabilityFactDocument` with per-target `states[]` and a top-level `score` that is stable under replays.
---
## 1. Current model (Signals v0)
Signals scoring (`src/Signals/StellaOps.Signals/Services/ReachabilityScoringService.cs`) computes, for each `target` symbol:
- `reachable`: whether there exists a path from the selected `entryPoints[]` to `target`.
- `bucket`: a coarse classification of *why* the target is/was reachable.
- `confidence` (0..1): a bounded confidence value.
- `weight` (0..1): bucket multiplier.
- `score` (0..1): `confidence * weight`.
- `path[]`: the discovered path (if reachable), deterministically ordered.
- `evidence.runtimeHits[]`: runtime hit symbols that appear on the chosen path.
The fact-level `score` is the average of per-target scores, penalized by unknowns pressure (see §4).
---
## 2. Buckets & default weights
Bucket assignment is deterministic and uses this precedence:
1. `unreachable` — no path exists.
2. `entrypoint` — the `target` itself is an entrypoint.
3. `runtime` — at least one runtime hit overlaps the discovered path.
4. `direct` — reachable and the discovered path is length ≤ 2.
5. `unknown` — reachable but none of the above classifications apply.
Default weights (configurable via `SignalsOptions:Scoring:ReachabilityBuckets`):
| Bucket | Default weight |
|--------|----------------|
| `entrypoint` | `1.0` |
| `direct` | `0.85` |
| `runtime` | `0.45` |
| `unknown` | `0.5` |
| `unreachable` | `0.0` |
---
## 3. Confidence (reachable vs unreachable)
Default confidence values (configurable via `SignalsOptions:Scoring:*`):
| Input | Default |
|-------|---------|
| `reachableConfidence` | `0.75` |
| `unreachableConfidence` | `0.25` |
| `runtimeBonus` | `0.15` |
| `minConfidence` | `0.05` |
| `maxConfidence` | `0.99` |
Rules:
- Base confidence is `reachableConfidence` when `reachable=true`, otherwise `unreachableConfidence`.
- When `reachable=true` and runtime evidence overlaps the selected path, add `runtimeBonus` (bounded by `maxConfidence`).
- The final confidence is clamped to `[minConfidence, maxConfidence]`.
---
## 4. Unknowns pressure (missing/ambiguous evidence)
Signals tracks unresolved symbols/edges as **Unknowns** (see `docs/signals/unknowns-registry.md`). The number of unknowns for a subject influences the final score:
```
unknownsPressure = unknownsCount / (targetsCount + unknownsCount)
pressurePenalty = min(unknownsPenaltyCeiling, unknownsPressure)
fact.score = avg(states[i].score) * (1 - pressurePenalty)
```
Default `unknownsPenaltyCeiling` is `0.35` (configurable).
This keeps the system deterministic while preventing unknown-heavy subjects from appearing "safe" by omission.
---
## 5. Evidence references & determinism anchors
Signals produces stable references intended for downstream evidence chains:
- `metadata.fact.digest` — canonical digest of the reachability fact (`sha256:<hex>`).
- `metadata.fact.version` — monotonically increasing integer for the same `subjectKey`.
- Callgraph ingestion returns a deterministic `graphHash` (sha256) for the normalized callgraph.
Downstream services (Policy, UI/CLI explainers, replay tooling) should use these fields as stable evidence references.
---
## 6. Policy-facing guidance (avoid false "not affected")
Policy should treat `unreachable` (or low fact score) as **insufficient** to claim "not affected" unless:
- the reachability evidence is present and referenced (`metadata.fact.digest`), and
- confidence is above a high-confidence threshold.
When evidence is missing or confidence is low, the correct output is **under investigation** rather than "not affected".
---
## 7. Signals API pointers
- `docs/api/signals/reachability-contract.md`
- `docs/api/signals/samples/facts-sample.json`
---
## 8. Roadmap (tracked in Sprint 0401)
- Introduce first-class uncertainty state lists + entropy-derived `riskScore` (see `docs/uncertainty/README.md`).
- Extend evidence refs to include CAS/DSSE pointers for graph-level and edge-bundle attestations.
---
## 9. Formal Lattice Model v1 (design — Sprint 0401)
The v0 bucket model provides coarse classification. The v1 lattice model introduces a formal 7-state lattice with algebraic join/meet operations for monotonic, deterministic reachability analysis across evidence types.
### 9.1 State Definitions
| State | Code | Ordering | Description |
|-------|------|----------|-------------|
| `Unknown` | `U` | ⊥ (bottom) | No evidence available; default state |
| `StaticallyReachable` | `SR` | 1 | Static analysis suggests path exists |
| `StaticallyUnreachable` | `SU` | 1 | Static analysis finds no path |
| `RuntimeObserved` | `RO` | 2 | Runtime probe/hit confirms execution |
| `RuntimeUnobserved` | `RU` | 2 | Runtime probe active but no hit observed |
| `ConfirmedReachable` | `CR` | 3 | Both static + runtime agree reachable |
| `ConfirmedUnreachable` | `CU` | 3 | Both static + runtime agree unreachable |
| `Contested` | `X` | (top) | Static and runtime evidence conflict |
### 9.2 Lattice Ordering (Hasse Diagram)
```
Contested (X)
/ | \
/ | \
ConfirmedReachable | ConfirmedUnreachable
(CR) | (CU)
| \ / / |
| \ / / |
| \ / / |
RuntimeObserved RuntimeUnobserved
(RO) (RU)
| |
| |
StaticallyReachable StaticallyUnreachable
(SR) (SU)
\ /
\ /
Unknown (U)
```
### 9.3 Join Rules (⊔ — least upper bound)
When combining evidence from multiple sources, use the join operation:
```
U ⊔ S = S (any evidence beats unknown)
SR ⊔ RO = CR (static reachable + runtime hit = confirmed)
SU ⊔ RU = CU (static unreachable + runtime miss = confirmed)
SR ⊔ RU = X (static reachable but runtime miss = contested)
SU ⊔ RO = X (static unreachable but runtime hit = contested)
CR ⊔ CU = X (conflicting confirmations = contested)
X ⊔ * = X (contested absorbs all)
```
**Full join table:**
| ⊔ | U | SR | SU | RO | RU | CR | CU | X |
|---|---|----|----|----|----|----|----|---|
| **U** | U | SR | SU | RO | RU | CR | CU | X |
| **SR** | SR | SR | X | CR | X | CR | X | X |
| **SU** | SU | X | SU | X | CU | X | CU | X |
| **RO** | RO | CR | X | RO | X | CR | X | X |
| **RU** | RU | X | CU | X | RU | X | CU | X |
| **CR** | CR | CR | X | CR | X | CR | X | X |
| **CU** | CU | X | CU | X | CU | X | CU | X |
| **X** | X | X | X | X | X | X | X | X |
### 9.4 Meet Rules (⊓ — greatest lower bound)
Used for conservative intersection (e.g., multi-entry-point consensus):
```
U ⊓ * = U (unknown is bottom)
CR ⊓ CR = CR (agreement preserved)
X ⊓ S = S (drop contested to either side)
```
### 9.5 Monotonicity Properties
1. **Evidence accumulation is monotonic:** Once state rises in the lattice, it cannot descend without explicit revocation.
2. **Revocation resets to Unknown:** When evidence is invalidated (e.g., graph invalidation), state resets to `U`.
3. **Contested states require human triage:** `X` state triggers policy flags and UI attention.
### 9.6 Mapping v0 Buckets to v1 States
| v0 Bucket | v1 State(s) | Notes |
|-----------|-------------|-------|
| `unreachable` | `SU`, `CU` | Depends on runtime evidence availability |
| `entrypoint` | `CR` | Entry points are by definition reachable |
| `runtime` | `RO`, `CR` | Depends on static analysis agreement |
| `direct` | `SR`, `CR` | Direct paths with/without runtime confirmation |
| `unknown` | `U` | No evidence available |
### 9.7 Policy Decision Matrix
| v1 State | VEX "not_affected" | VEX "affected" | VEX "under_investigation" |
|----------|-------------------|----------------|---------------------------|
| `U` | ❌ blocked | ⚠️ needs evidence | ✅ default |
| `SR` | ❌ blocked | ✅ allowed | ✅ allowed |
| `SU` | ⚠️ low confidence | ❌ contested | ✅ allowed |
| `RO` | ❌ blocked | ✅ allowed | ✅ allowed |
| `RU` | ⚠️ medium confidence | ❌ contested | ✅ allowed |
| `CR` | ❌ blocked | ✅ required | ❌ invalid |
| `CU` | ✅ allowed | ❌ blocked | ❌ invalid |
| `X` | ❌ blocked | ❌ blocked | ✅ required |
### 9.8 Implementation Notes
- **State storage:** `ReachabilityFactDocument.states[].latticeState` field (enum)
- **Join implementation:** `ReachabilityLattice.Join(a, b)` in `src/Signals/StellaOps.Signals/Services/`
- **Backward compatibility:** v0 bucket computed from v1 state for API consumers
### 9.9 Evidence Chain Requirements
Each lattice state transition must be accompanied by evidence references:
```json
{
"symbol": "sym:java:...",
"latticeState": "CR",
"previousState": "SR",
"evidence": {
"static": {
"graphHash": "blake3:...",
"pathLength": 3,
"confidence": 0.92
},
"runtime": {
"probeId": "probe:...",
"hitCount": 47,
"observedAt": "2025-12-13T10:00:00Z"
}
},
"transitionAt": "2025-12-13T10:00:00Z"
}