# Competitive Benchmark Implementation Milestones > Source: `docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md` > > This document translates the competitive matrix into concrete implementation milestones with measurable acceptance criteria. --- ## Executive Summary The competitive analysis identifies **seven structural gaps** in existing container security tools (Trivy, Syft/Grype, Snyk, Prisma, Aqua, Anchore) that Stella Ops can exploit: | Gap | Competitor Status | Stella Ops Target | |-----|-------------------|-------------------| | SBOM as static artifact | Generate → store → scan | Stateful ledger with lineage | | VEX as metadata | Annotation/suppression | Formal lattice reasoning | | Probability-based scoring | CVSS + heuristics | Deterministic provable scores | | File-level diffing | Image hash comparison | Semantic smart-diff | | Runtime context ≠ reachability | Coarse correlation | Call-path proofs | | Uncertainty suppressed | Hidden/ignored | Explicit unknowns state | | Offline = operational only | Can run offline | Epistemic completeness | --- ## Milestone 1: SBOM Ledger (SBOM-L) **Goal:** Transform SBOM from static artifact to stateful ledger with lineage tracking. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | SBOM-L-001 | Component identity = (source + digest + build recipe hash) | TBD | TODO | | SBOM-L-002 | Binary → source mapping (ELF Build-ID, PE hash, Mach-O UUID) | 3700 | DOING | | SBOM-L-003 | Layer-aware dependency graphs with loader resolution | TBD | TODO | | SBOM-L-004 | SBOM versioning and merge semantics | TBD | TODO | | SBOM-L-005 | Replay manifest with exact feeds/policies/timestamps | 3500 | DONE | ### Acceptance Criteria - [ ] Component identity includes build recipe hash - [ ] Binary provenance tracked via Build-ID/UUID - [ ] Dependency graph includes loader rules - [ ] SBOM versions can be diffed semantically - [ ] Replay manifests are content-addressed ### Competitive Edge > "No competitor offers SBOM lineage + merge semantics with proofs." --- ## Milestone 2: VEX Lattice Reasoning (VEX-L) **Goal:** VEX becomes logical input to lattice merge, not just annotation. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | VEX-L-001 | VEX statement → lattice predicate conversion | 3500 | DONE | | VEX-L-002 | Multi-source VEX conflict resolution (vendor/distro/internal) | 3500 | DONE | | VEX-L-003 | Jurisdiction-specific trust rules | TBD | TODO | | VEX-L-004 | Customer override with audit trail | TBD | TODO | | VEX-L-005 | VEX evidence linking (proof pointers) | 3800 | TODO | ### Acceptance Criteria - [ ] Conflicting VEX from multiple sources merges deterministically - [ ] Trust rules are configurable per jurisdiction - [ ] All overrides have signed audit trails - [ ] Every VEX decision links to evidence bundle ### Competitive Edge > "First tool with formal VEX reasoning, not just ingestion." --- ## Milestone 3: Explainable Findings (EXP-F) **Goal:** Every finding answers four questions: evidence, path, assumptions, falsifiability. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | EXP-F-001 | Evidence bundle per finding (SBOM + graph + loader + runtime) | 3800 | TODO | | EXP-F-002 | Assumption set capture (compiler flags, runtime config, gates) | 3600 | DONE | | EXP-F-003 | Confidence score from evidence density | 3700 | DONE | | EXP-F-004 | Falsification conditions ("what would change this verdict") | TBD | TODO | | EXP-F-005 | Evidence drawer UI with proof tabs | 4100 | TODO | ### Acceptance Criteria - [ ] Each finding has explicit evidence bundle - [ ] Assumptions are captured and displayed - [ ] Confidence derives from evidence, not CVSS - [ ] UI shows "what would falsify this" ### Competitive Edge > "Only tool that answers: what would falsify this conclusion?" --- ## Milestone 4: Semantic Smart-Diff (S-DIFF) **Goal:** Diff security meaning, not just artifacts. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | S-DIFF-001 | Reachability graph diffing | 3600 | DONE | | S-DIFF-002 | Policy outcome diffing | TBD | TODO | | S-DIFF-003 | Trust weight diffing | TBD | TODO | | S-DIFF-004 | Unknowns delta tracking | 3500 | DONE | | S-DIFF-005 | Risk delta summary ("reduced surface by X% despite +N CVEs") | 3600 | DONE | ### Acceptance Criteria - [ ] Diff output shows semantic security changes - [ ] Same CVE with removed call path shows as mitigated - [ ] New binary with dead code shows no new risk - [ ] Summary quantifies net security posture change ### Competitive Edge > "Outputs 'This release reduces exploitability by 41%' — no competitor does this." --- ## Milestone 5: Call-Path Reachability (CPR) **Goal:** Three-layer reachability proof: static graph + binary resolution + runtime gating. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | CPR-001 | Static call graph from entrypoints to vulnerable symbols | 3600 | DONE | | CPR-002 | Binary resolution (dynamic loader rules, symbol versioning) | 3700 | DOING | | CPR-003 | Runtime gating (feature flags, config, environment) | 3600 | DONE | | CPR-004 | Confidence tiers (Confirmed/Likely/Present/Unreachable) | 3700 | DONE | | CPR-005 | Path witnesses with surface evidence | 3700 | DONE | ### Acceptance Criteria - [ ] All three layers must align for exploitability - [ ] False positives structurally impossible (not heuristically reduced) - [ ] Confidence tier reflects evidence quality - [ ] Witnesses are DSSE-signed ### Competitive Edge > "Makes false positives structurally impossible, not heuristically reduced." --- ## Milestone 6: Deterministic Scoring (D-SCORE) **Goal:** Score = deterministic function with signed proofs. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | D-SCORE-001 | Score from evidence count/strength | 3500 | DONE | | D-SCORE-002 | Assumption penalties in score | TBD | TODO | | D-SCORE-003 | Trust source weights | TBD | TODO | | D-SCORE-004 | Policy constraint integration | 3500 | DONE | | D-SCORE-005 | Signed score attestation | 3800 | TODO | ### Acceptance Criteria - [ ] Same inputs → same score → forever - [ ] Score attestation is DSSE-signed - [ ] Cross-org verification possible - [ ] Scoring rules are auditable ### Competitive Edge > "Signed risk decisions that are legally defensible." --- ## Milestone 7: Unknowns as First-Class State (UNK) **Goal:** Explicit unknowns modeling with risk implications. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | UNK-001 | Unknown-reachable and unknown-unreachable states | 3500 | DONE | | UNK-002 | Unknowns pressure in scoring | 3500 | DONE | | UNK-003 | Unknowns registry and API | 3500 | DONE | | UNK-004 | UI unknowns chips and triage actions | 4100 | TODO | | UNK-005 | Zero-day window tracking | TBD | TODO | ### Acceptance Criteria - [ ] Unknowns are distinct from vulnerabilities - [ ] Scoring reflects unknowns pressure - [ ] UI surfaces unknowns prominently - [ ] Air-gap and zero-day scenarios handled ### Competitive Edge > "No competitor models uncertainty explicitly." --- ## Milestone 8: Epistemic Offline (E-OFF) **Goal:** Offline = cryptographically bound knowledge state. ### Deliverables | ID | Deliverable | Sprint | Status | |----|-------------|--------|--------| | E-OFF-001 | Feed snapshot with digest | Existing | DONE | | E-OFF-002 | Policy snapshot with digest | Existing | DONE | | E-OFF-003 | Scoring rules snapshot | TBD | TODO | | E-OFF-004 | Trust anchor snapshot | Existing | DONE | | E-OFF-005 | Knowledge state attestation in scan result | 3500 | DONE | ### Acceptance Criteria - [ ] Every offline scan knows exactly what knowledge it had - [ ] Forensic replayability, not just offline execution - [ ] Audit can answer: "what did you know when you made this decision?" ### Competitive Edge > "Epistemic completeness vs. just operational offline." --- ## Priority Matrix | Milestone | Strategic Value | Implementation Effort | Priority | |-----------|-----------------|----------------------|----------| | CPR (Call-Path Reachability) | ★★★★★ | High | P0 | | S-DIFF (Semantic Smart-Diff) | ★★★★★ | Medium | P0 | | EXP-F (Explainable Findings) | ★★★★☆ | Medium | P1 | | VEX-L (VEX Lattice) | ★★★★☆ | Medium | P1 | | D-SCORE (Deterministic Scoring) | ★★★★☆ | Medium | P1 | | UNK (Unknowns State) | ★★★★☆ | Low | P1 | | SBOM-L (SBOM Ledger) | ★★★☆☆ | High | P2 | | E-OFF (Epistemic Offline) | ★★★☆☆ | Low | P2 | --- ## Sprint Alignment | Sprint | Milestones Addressed | |--------|---------------------| | 3500 (Smart-Diff) | S-DIFF, UNK, D-SCORE, E-OFF | | 3600 (Reachability Drift) | CPR, S-DIFF, EXP-F | | 3700 (Vuln Surfaces) | CPR, SBOM-L | | 3800 (Explainable Triage) | EXP-F, VEX-L, D-SCORE | | 4100 (Triage UI) | EXP-F, UNK | --- ## Benchmark Tests Each milestone should have corresponding benchmark tests in `bench/`: | Benchmark | Tests | |-----------|-------| | `bench/reachability-benchmark/` | CPR accuracy vs. ground truth | | `bench/smart-diff/` | Semantic diff correctness | | `bench/determinism/` | Replay fidelity | | `bench/unknowns/` | Unknowns tracking accuracy | | `bench/vex-lattice/` | VEX merge correctness | --- ## References - Source advisory: `docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md` - Moat spec: `docs/moat.md` - Key features: `docs/key-features.md` - Reachability delivery: `docs/reachability/DELIVERY_GUIDE.md`