# Competitive Benchmark Implementation Milestones

> Source: `docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md`
>
> This document translates the competitive matrix into concrete implementation milestones with measurable acceptance criteria.

---

## Executive Summary

The competitive analysis identifies **seven structural gaps** in existing container security tools (Trivy, Syft/Grype, Snyk, Prisma, Aqua, Anchore) that Stella Ops can exploit:

| Gap | Competitor Status | Stella Ops Target |
|-----|-------------------|-------------------|
| SBOM as static artifact | Generate → store → scan | Stateful ledger with lineage |
| VEX as metadata | Annotation/suppression | Formal lattice reasoning |
| Probability-based scoring | CVSS + heuristics | Deterministic provable scores |
| File-level diffing | Image hash comparison | Semantic smart-diff |
| Runtime context ≠ reachability | Coarse correlation | Call-path proofs |
| Uncertainty suppressed | Hidden/ignored | Explicit unknowns state |
| Offline = operational only | Can run offline | Epistemic completeness |

---

## Milestone 1: SBOM Ledger (SBOM-L)

**Goal:** Transform SBOM from static artifact to stateful ledger with lineage tracking.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| SBOM-L-001 | Component identity = (source + digest + build recipe hash) | TBD | TODO |
| SBOM-L-002 | Binary → source mapping (ELF Build-ID, PE hash, Mach-O UUID) | 3700 | DOING |
| SBOM-L-003 | Layer-aware dependency graphs with loader resolution | TBD | TODO |
| SBOM-L-004 | SBOM versioning and merge semantics | TBD | TODO |
| SBOM-L-005 | Replay manifest with exact feeds/policies/timestamps | 3500 | DONE |

### Acceptance Criteria

- [ ] Component identity includes build recipe hash
- [ ] Binary provenance tracked via Build-ID/UUID
- [ ] Dependency graph includes loader rules
- [ ] SBOM versions can be diffed semantically
- [ ] Replay manifests are content-addressed

### Competitive Edge

> "No competitor offers SBOM lineage + merge semantics with proofs."

---

## Milestone 2: VEX Lattice Reasoning (VEX-L)

**Goal:** VEX becomes logical input to lattice merge, not just annotation.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| VEX-L-001 | VEX statement → lattice predicate conversion | 3500 | DONE |
| VEX-L-002 | Multi-source VEX conflict resolution (vendor/distro/internal) | 3500 | DONE |
| VEX-L-003 | Jurisdiction-specific trust rules | TBD | TODO |
| VEX-L-004 | Customer override with audit trail | TBD | TODO |
| VEX-L-005 | VEX evidence linking (proof pointers) | 3800 | TODO |

### Acceptance Criteria

- [ ] Conflicting VEX from multiple sources merges deterministically
- [ ] Trust rules are configurable per jurisdiction
- [ ] All overrides have signed audit trails
- [ ] Every VEX decision links to evidence bundle

### Competitive Edge

> "First tool with formal VEX reasoning, not just ingestion."

---

## Milestone 3: Explainable Findings (EXP-F)

**Goal:** Every finding answers four questions: evidence, path, assumptions, falsifiability.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| EXP-F-001 | Evidence bundle per finding (SBOM + graph + loader + runtime) | 3800 | TODO |
| EXP-F-002 | Assumption set capture (compiler flags, runtime config, gates) | 3600 | DONE |
| EXP-F-003 | Confidence score from evidence density | 3700 | DONE |
| EXP-F-004 | Falsification conditions ("what would change this verdict") | TBD | TODO |
| EXP-F-005 | Evidence drawer UI with proof tabs | 4100 | TODO |

### Acceptance Criteria

- [ ] Each finding has explicit evidence bundle
- [ ] Assumptions are captured and displayed
- [ ] Confidence derives from evidence, not CVSS
- [ ] UI shows "what would falsify this"

### Competitive Edge

> "Only tool that answers: what would falsify this conclusion?"

---

## Milestone 4: Semantic Smart-Diff (S-DIFF)

**Goal:** Diff security meaning, not just artifacts.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| S-DIFF-001 | Reachability graph diffing | 3600 | DONE |
| S-DIFF-002 | Policy outcome diffing | TBD | TODO |
| S-DIFF-003 | Trust weight diffing | TBD | TODO |
| S-DIFF-004 | Unknowns delta tracking | 3500 | DONE |
| S-DIFF-005 | Risk delta summary ("reduced surface by X% despite +N CVEs") | 3600 | DONE |

### Acceptance Criteria

- [ ] Diff output shows semantic security changes
- [ ] Same CVE with removed call path shows as mitigated
- [ ] New binary with dead code shows no new risk
- [ ] Summary quantifies net security posture change

### Competitive Edge

> "Outputs 'This release reduces exploitability by 41%' — no competitor does this."

---

## Milestone 5: Call-Path Reachability (CPR)

**Goal:** Three-layer reachability proof: static graph + binary resolution + runtime gating.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| CPR-001 | Static call graph from entrypoints to vulnerable symbols | 3600 | DONE |
| CPR-002 | Binary resolution (dynamic loader rules, symbol versioning) | 3700 | DOING |
| CPR-003 | Runtime gating (feature flags, config, environment) | 3600 | DONE |
| CPR-004 | Confidence tiers (Confirmed/Likely/Present/Unreachable) | 3700 | DONE |
| CPR-005 | Path witnesses with surface evidence | 3700 | DONE |

### Acceptance Criteria

- [ ] All three layers must align for exploitability
- [ ] False positives structurally impossible (not heuristically reduced)
- [ ] Confidence tier reflects evidence quality
- [ ] Witnesses are DSSE-signed

### Competitive Edge

> "Makes false positives structurally impossible, not heuristically reduced."

---

## Milestone 6: Deterministic Scoring (D-SCORE)

**Goal:** Score = deterministic function with signed proofs.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| D-SCORE-001 | Score from evidence count/strength | 3500 | DONE |
| D-SCORE-002 | Assumption penalties in score | TBD | TODO |
| D-SCORE-003 | Trust source weights | TBD | TODO |
| D-SCORE-004 | Policy constraint integration | 3500 | DONE |
| D-SCORE-005 | Signed score attestation | 3800 | TODO |

### Acceptance Criteria

- [ ] Same inputs → same score → forever
- [ ] Score attestation is DSSE-signed
- [ ] Cross-org verification possible
- [ ] Scoring rules are auditable

### Competitive Edge

> "Signed risk decisions that are legally defensible."

---

## Milestone 7: Unknowns as First-Class State (UNK)

**Goal:** Explicit unknowns modeling with risk implications.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| UNK-001 | Unknown-reachable and unknown-unreachable states | 3500 | DONE |
| UNK-002 | Unknowns pressure in scoring | 3500 | DONE |
| UNK-003 | Unknowns registry and API | 3500 | DONE |
| UNK-004 | UI unknowns chips and triage actions | 4100 | TODO |
| UNK-005 | Zero-day window tracking | TBD | TODO |

### Acceptance Criteria

- [ ] Unknowns are distinct from vulnerabilities
- [ ] Scoring reflects unknowns pressure
- [ ] UI surfaces unknowns prominently
- [ ] Air-gap and zero-day scenarios handled

### Competitive Edge

> "No competitor models uncertainty explicitly."

---

## Milestone 8: Epistemic Offline (E-OFF)

**Goal:** Offline = cryptographically bound knowledge state.

### Deliverables

| ID | Deliverable | Sprint | Status |
|----|-------------|--------|--------|
| E-OFF-001 | Feed snapshot with digest | Existing | DONE |
| E-OFF-002 | Policy snapshot with digest | Existing | DONE |
| E-OFF-003 | Scoring rules snapshot | TBD | TODO |
| E-OFF-004 | Trust anchor snapshot | Existing | DONE |
| E-OFF-005 | Knowledge state attestation in scan result | 3500 | DONE |

### Acceptance Criteria

- [ ] Every offline scan knows exactly what knowledge it had
- [ ] Forensic replayability, not just offline execution
- [ ] Audit can answer: "what did you know when you made this decision?"

### Competitive Edge

> "Epistemic completeness vs. just operational offline."

---

## Priority Matrix

| Milestone | Strategic Value | Implementation Effort | Priority |
|-----------|-----------------|----------------------|----------|
| CPR (Call-Path Reachability) | ★★★★★ | High | P0 |
| S-DIFF (Semantic Smart-Diff) | ★★★★★ | Medium | P0 |
| EXP-F (Explainable Findings) | ★★★★☆ | Medium | P1 |
| VEX-L (VEX Lattice) | ★★★★☆ | Medium | P1 |
| D-SCORE (Deterministic Scoring) | ★★★★☆ | Medium | P1 |
| UNK (Unknowns State) | ★★★★☆ | Low | P1 |
| SBOM-L (SBOM Ledger) | ★★★☆☆ | High | P2 |
| E-OFF (Epistemic Offline) | ★★★☆☆ | Low | P2 |

---

## Sprint Alignment

| Sprint | Milestones Addressed |
|--------|---------------------|
| 3500 (Smart-Diff) | S-DIFF, UNK, D-SCORE, E-OFF |
| 3600 (Reachability Drift) | CPR, S-DIFF, EXP-F |
| 3700 (Vuln Surfaces) | CPR, SBOM-L |
| 3800 (Explainable Triage) | EXP-F, VEX-L, D-SCORE |
| 4100 (Triage UI) | EXP-F, UNK |

---

## Benchmark Tests

Each milestone should have corresponding benchmark tests in `bench/`:

| Benchmark | Tests |
|-----------|-------|
| `bench/reachability-benchmark/` | CPR accuracy vs. ground truth |
| `bench/smart-diff/` | Semantic diff correctness |
| `bench/determinism/` | Replay fidelity |
| `bench/unknowns/` | Unknowns tracking accuracy |
| `bench/vex-lattice/` | VEX merge correctness |

---

## References

- Source advisory: `docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md`
- Moat spec: `docs/moat.md`
- Key features: `docs/key-features.md`
- Reachability delivery: `docs/reachability/DELIVERY_GUIDE.md`