Files
git.stella-ops.org/docs/benchmarks/competitive-implementation-milestones.md
master 951a38d561 Add Canonical JSON serialization library with tests and documentation
- Implemented CanonJson class for deterministic JSON serialization and hashing.
- Added unit tests for CanonJson functionality, covering various scenarios including key sorting, handling of nested objects, arrays, and special characters.
- Created project files for the Canonical JSON library and its tests, including necessary package references.
- Added README.md for library usage and API reference.
- Introduced RabbitMqIntegrationFactAttribute for conditional RabbitMQ integration tests.
2025-12-19 15:35:00 +02:00

9.6 KiB

Competitive Benchmark Implementation Milestones

Source: docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md

This document translates the competitive matrix into concrete implementation milestones with measurable acceptance criteria.


Executive Summary

The competitive analysis identifies seven structural gaps in existing container security tools (Trivy, Syft/Grype, Snyk, Prisma, Aqua, Anchore) that Stella Ops can exploit:

Gap Competitor Status Stella Ops Target
SBOM as static artifact Generate → store → scan Stateful ledger with lineage
VEX as metadata Annotation/suppression Formal lattice reasoning
Probability-based scoring CVSS + heuristics Deterministic provable scores
File-level diffing Image hash comparison Semantic smart-diff
Runtime context ≠ reachability Coarse correlation Call-path proofs
Uncertainty suppressed Hidden/ignored Explicit unknowns state
Offline = operational only Can run offline Epistemic completeness

Milestone 1: SBOM Ledger (SBOM-L)

Goal: Transform SBOM from static artifact to stateful ledger with lineage tracking.

Deliverables

ID Deliverable Sprint Status
SBOM-L-001 Component identity = (source + digest + build recipe hash) TBD TODO
SBOM-L-002 Binary → source mapping (ELF Build-ID, PE hash, Mach-O UUID) 3700 DOING
SBOM-L-003 Layer-aware dependency graphs with loader resolution TBD TODO
SBOM-L-004 SBOM versioning and merge semantics TBD TODO
SBOM-L-005 Replay manifest with exact feeds/policies/timestamps 3500 DONE

Acceptance Criteria

  • Component identity includes build recipe hash
  • Binary provenance tracked via Build-ID/UUID
  • Dependency graph includes loader rules
  • SBOM versions can be diffed semantically
  • Replay manifests are content-addressed

Competitive Edge

"No competitor offers SBOM lineage + merge semantics with proofs."


Milestone 2: VEX Lattice Reasoning (VEX-L)

Goal: VEX becomes logical input to lattice merge, not just annotation.

Deliverables

ID Deliverable Sprint Status
VEX-L-001 VEX statement → lattice predicate conversion 3500 DONE
VEX-L-002 Multi-source VEX conflict resolution (vendor/distro/internal) 3500 DONE
VEX-L-003 Jurisdiction-specific trust rules TBD TODO
VEX-L-004 Customer override with audit trail TBD TODO
VEX-L-005 VEX evidence linking (proof pointers) 3800 TODO

Acceptance Criteria

  • Conflicting VEX from multiple sources merges deterministically
  • Trust rules are configurable per jurisdiction
  • All overrides have signed audit trails
  • Every VEX decision links to evidence bundle

Competitive Edge

"First tool with formal VEX reasoning, not just ingestion."


Milestone 3: Explainable Findings (EXP-F)

Goal: Every finding answers four questions: evidence, path, assumptions, falsifiability.

Deliverables

ID Deliverable Sprint Status
EXP-F-001 Evidence bundle per finding (SBOM + graph + loader + runtime) 3800 TODO
EXP-F-002 Assumption set capture (compiler flags, runtime config, gates) 3600 DONE
EXP-F-003 Confidence score from evidence density 3700 DONE
EXP-F-004 Falsification conditions ("what would change this verdict") TBD TODO
EXP-F-005 Evidence drawer UI with proof tabs 4100 TODO

Acceptance Criteria

  • Each finding has explicit evidence bundle
  • Assumptions are captured and displayed
  • Confidence derives from evidence, not CVSS
  • UI shows "what would falsify this"

Competitive Edge

"Only tool that answers: what would falsify this conclusion?"


Milestone 4: Semantic Smart-Diff (S-DIFF)

Goal: Diff security meaning, not just artifacts.

Deliverables

ID Deliverable Sprint Status
S-DIFF-001 Reachability graph diffing 3600 DONE
S-DIFF-002 Policy outcome diffing TBD TODO
S-DIFF-003 Trust weight diffing TBD TODO
S-DIFF-004 Unknowns delta tracking 3500 DONE
S-DIFF-005 Risk delta summary ("reduced surface by X% despite +N CVEs") 3600 DONE

Acceptance Criteria

  • Diff output shows semantic security changes
  • Same CVE with removed call path shows as mitigated
  • New binary with dead code shows no new risk
  • Summary quantifies net security posture change

Competitive Edge

"Outputs 'This release reduces exploitability by 41%' — no competitor does this."


Milestone 5: Call-Path Reachability (CPR)

Goal: Three-layer reachability proof: static graph + binary resolution + runtime gating.

Deliverables

ID Deliverable Sprint Status
CPR-001 Static call graph from entrypoints to vulnerable symbols 3600 DONE
CPR-002 Binary resolution (dynamic loader rules, symbol versioning) 3700 DOING
CPR-003 Runtime gating (feature flags, config, environment) 3600 DONE
CPR-004 Confidence tiers (Confirmed/Likely/Present/Unreachable) 3700 DONE
CPR-005 Path witnesses with surface evidence 3700 DONE

Acceptance Criteria

  • All three layers must align for exploitability
  • False positives structurally impossible (not heuristically reduced)
  • Confidence tier reflects evidence quality
  • Witnesses are DSSE-signed

Competitive Edge

"Makes false positives structurally impossible, not heuristically reduced."


Milestone 6: Deterministic Scoring (D-SCORE)

Goal: Score = deterministic function with signed proofs.

Deliverables

ID Deliverable Sprint Status
D-SCORE-001 Score from evidence count/strength 3500 DONE
D-SCORE-002 Assumption penalties in score TBD TODO
D-SCORE-003 Trust source weights TBD TODO
D-SCORE-004 Policy constraint integration 3500 DONE
D-SCORE-005 Signed score attestation 3800 TODO

Acceptance Criteria

  • Same inputs → same score → forever
  • Score attestation is DSSE-signed
  • Cross-org verification possible
  • Scoring rules are auditable

Competitive Edge

"Signed risk decisions that are legally defensible."


Milestone 7: Unknowns as First-Class State (UNK)

Goal: Explicit unknowns modeling with risk implications.

Deliverables

ID Deliverable Sprint Status
UNK-001 Unknown-reachable and unknown-unreachable states 3500 DONE
UNK-002 Unknowns pressure in scoring 3500 DONE
UNK-003 Unknowns registry and API 3500 DONE
UNK-004 UI unknowns chips and triage actions 4100 TODO
UNK-005 Zero-day window tracking TBD TODO

Acceptance Criteria

  • Unknowns are distinct from vulnerabilities
  • Scoring reflects unknowns pressure
  • UI surfaces unknowns prominently
  • Air-gap and zero-day scenarios handled

Competitive Edge

"No competitor models uncertainty explicitly."


Milestone 8: Epistemic Offline (E-OFF)

Goal: Offline = cryptographically bound knowledge state.

Deliverables

ID Deliverable Sprint Status
E-OFF-001 Feed snapshot with digest Existing DONE
E-OFF-002 Policy snapshot with digest Existing DONE
E-OFF-003 Scoring rules snapshot TBD TODO
E-OFF-004 Trust anchor snapshot Existing DONE
E-OFF-005 Knowledge state attestation in scan result 3500 DONE

Acceptance Criteria

  • Every offline scan knows exactly what knowledge it had
  • Forensic replayability, not just offline execution
  • Audit can answer: "what did you know when you made this decision?"

Competitive Edge

"Epistemic completeness vs. just operational offline."


Priority Matrix

Milestone Strategic Value Implementation Effort Priority
CPR (Call-Path Reachability) ★★★★★ High P0
S-DIFF (Semantic Smart-Diff) ★★★★★ Medium P0
EXP-F (Explainable Findings) ★★★★☆ Medium P1
VEX-L (VEX Lattice) ★★★★☆ Medium P1
D-SCORE (Deterministic Scoring) ★★★★☆ Medium P1
UNK (Unknowns State) ★★★★☆ Low P1
SBOM-L (SBOM Ledger) ★★★☆☆ High P2
E-OFF (Epistemic Offline) ★★★☆☆ Low P2

Sprint Alignment

Sprint Milestones Addressed
3500 (Smart-Diff) S-DIFF, UNK, D-SCORE, E-OFF
3600 (Reachability Drift) CPR, S-DIFF, EXP-F
3700 (Vuln Surfaces) CPR, SBOM-L
3800 (Explainable Triage) EXP-F, VEX-L, D-SCORE
4100 (Triage UI) EXP-F, UNK

Benchmark Tests

Each milestone should have corresponding benchmark tests in bench/:

Benchmark Tests
bench/reachability-benchmark/ CPR accuracy vs. ground truth
bench/smart-diff/ Semantic diff correctness
bench/determinism/ Replay fidelity
bench/unknowns/ Unknowns tracking accuracy
bench/vex-lattice/ VEX merge correctness

References

  • Source advisory: docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md
  • Moat spec: docs/moat.md
  • Key features: docs/key-features.md
  • Reachability delivery: docs/reachability/DELIVERY_GUIDE.md