- Implemented CanonJson class for deterministic JSON serialization and hashing. - Added unit tests for CanonJson functionality, covering various scenarios including key sorting, handling of nested objects, arrays, and special characters. - Created project files for the Canonical JSON library and its tests, including necessary package references. - Added README.md for library usage and API reference. - Introduced RabbitMqIntegrationFactAttribute for conditional RabbitMQ integration tests.
9.6 KiB
Competitive Benchmark Implementation Milestones
Source:
docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.mdThis document translates the competitive matrix into concrete implementation milestones with measurable acceptance criteria.
Executive Summary
The competitive analysis identifies seven structural gaps in existing container security tools (Trivy, Syft/Grype, Snyk, Prisma, Aqua, Anchore) that Stella Ops can exploit:
| Gap | Competitor Status | Stella Ops Target |
|---|---|---|
| SBOM as static artifact | Generate → store → scan | Stateful ledger with lineage |
| VEX as metadata | Annotation/suppression | Formal lattice reasoning |
| Probability-based scoring | CVSS + heuristics | Deterministic provable scores |
| File-level diffing | Image hash comparison | Semantic smart-diff |
| Runtime context ≠ reachability | Coarse correlation | Call-path proofs |
| Uncertainty suppressed | Hidden/ignored | Explicit unknowns state |
| Offline = operational only | Can run offline | Epistemic completeness |
Milestone 1: SBOM Ledger (SBOM-L)
Goal: Transform SBOM from static artifact to stateful ledger with lineage tracking.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| SBOM-L-001 | Component identity = (source + digest + build recipe hash) | TBD | TODO |
| SBOM-L-002 | Binary → source mapping (ELF Build-ID, PE hash, Mach-O UUID) | 3700 | DOING |
| SBOM-L-003 | Layer-aware dependency graphs with loader resolution | TBD | TODO |
| SBOM-L-004 | SBOM versioning and merge semantics | TBD | TODO |
| SBOM-L-005 | Replay manifest with exact feeds/policies/timestamps | 3500 | DONE |
Acceptance Criteria
- Component identity includes build recipe hash
- Binary provenance tracked via Build-ID/UUID
- Dependency graph includes loader rules
- SBOM versions can be diffed semantically
- Replay manifests are content-addressed
Competitive Edge
"No competitor offers SBOM lineage + merge semantics with proofs."
Milestone 2: VEX Lattice Reasoning (VEX-L)
Goal: VEX becomes logical input to lattice merge, not just annotation.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| VEX-L-001 | VEX statement → lattice predicate conversion | 3500 | DONE |
| VEX-L-002 | Multi-source VEX conflict resolution (vendor/distro/internal) | 3500 | DONE |
| VEX-L-003 | Jurisdiction-specific trust rules | TBD | TODO |
| VEX-L-004 | Customer override with audit trail | TBD | TODO |
| VEX-L-005 | VEX evidence linking (proof pointers) | 3800 | TODO |
Acceptance Criteria
- Conflicting VEX from multiple sources merges deterministically
- Trust rules are configurable per jurisdiction
- All overrides have signed audit trails
- Every VEX decision links to evidence bundle
Competitive Edge
"First tool with formal VEX reasoning, not just ingestion."
Milestone 3: Explainable Findings (EXP-F)
Goal: Every finding answers four questions: evidence, path, assumptions, falsifiability.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| EXP-F-001 | Evidence bundle per finding (SBOM + graph + loader + runtime) | 3800 | TODO |
| EXP-F-002 | Assumption set capture (compiler flags, runtime config, gates) | 3600 | DONE |
| EXP-F-003 | Confidence score from evidence density | 3700 | DONE |
| EXP-F-004 | Falsification conditions ("what would change this verdict") | TBD | TODO |
| EXP-F-005 | Evidence drawer UI with proof tabs | 4100 | TODO |
Acceptance Criteria
- Each finding has explicit evidence bundle
- Assumptions are captured and displayed
- Confidence derives from evidence, not CVSS
- UI shows "what would falsify this"
Competitive Edge
"Only tool that answers: what would falsify this conclusion?"
Milestone 4: Semantic Smart-Diff (S-DIFF)
Goal: Diff security meaning, not just artifacts.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| S-DIFF-001 | Reachability graph diffing | 3600 | DONE |
| S-DIFF-002 | Policy outcome diffing | TBD | TODO |
| S-DIFF-003 | Trust weight diffing | TBD | TODO |
| S-DIFF-004 | Unknowns delta tracking | 3500 | DONE |
| S-DIFF-005 | Risk delta summary ("reduced surface by X% despite +N CVEs") | 3600 | DONE |
Acceptance Criteria
- Diff output shows semantic security changes
- Same CVE with removed call path shows as mitigated
- New binary with dead code shows no new risk
- Summary quantifies net security posture change
Competitive Edge
"Outputs 'This release reduces exploitability by 41%' — no competitor does this."
Milestone 5: Call-Path Reachability (CPR)
Goal: Three-layer reachability proof: static graph + binary resolution + runtime gating.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| CPR-001 | Static call graph from entrypoints to vulnerable symbols | 3600 | DONE |
| CPR-002 | Binary resolution (dynamic loader rules, symbol versioning) | 3700 | DOING |
| CPR-003 | Runtime gating (feature flags, config, environment) | 3600 | DONE |
| CPR-004 | Confidence tiers (Confirmed/Likely/Present/Unreachable) | 3700 | DONE |
| CPR-005 | Path witnesses with surface evidence | 3700 | DONE |
Acceptance Criteria
- All three layers must align for exploitability
- False positives structurally impossible (not heuristically reduced)
- Confidence tier reflects evidence quality
- Witnesses are DSSE-signed
Competitive Edge
"Makes false positives structurally impossible, not heuristically reduced."
Milestone 6: Deterministic Scoring (D-SCORE)
Goal: Score = deterministic function with signed proofs.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| D-SCORE-001 | Score from evidence count/strength | 3500 | DONE |
| D-SCORE-002 | Assumption penalties in score | TBD | TODO |
| D-SCORE-003 | Trust source weights | TBD | TODO |
| D-SCORE-004 | Policy constraint integration | 3500 | DONE |
| D-SCORE-005 | Signed score attestation | 3800 | TODO |
Acceptance Criteria
- Same inputs → same score → forever
- Score attestation is DSSE-signed
- Cross-org verification possible
- Scoring rules are auditable
Competitive Edge
"Signed risk decisions that are legally defensible."
Milestone 7: Unknowns as First-Class State (UNK)
Goal: Explicit unknowns modeling with risk implications.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| UNK-001 | Unknown-reachable and unknown-unreachable states | 3500 | DONE |
| UNK-002 | Unknowns pressure in scoring | 3500 | DONE |
| UNK-003 | Unknowns registry and API | 3500 | DONE |
| UNK-004 | UI unknowns chips and triage actions | 4100 | TODO |
| UNK-005 | Zero-day window tracking | TBD | TODO |
Acceptance Criteria
- Unknowns are distinct from vulnerabilities
- Scoring reflects unknowns pressure
- UI surfaces unknowns prominently
- Air-gap and zero-day scenarios handled
Competitive Edge
"No competitor models uncertainty explicitly."
Milestone 8: Epistemic Offline (E-OFF)
Goal: Offline = cryptographically bound knowledge state.
Deliverables
| ID | Deliverable | Sprint | Status |
|---|---|---|---|
| E-OFF-001 | Feed snapshot with digest | Existing | DONE |
| E-OFF-002 | Policy snapshot with digest | Existing | DONE |
| E-OFF-003 | Scoring rules snapshot | TBD | TODO |
| E-OFF-004 | Trust anchor snapshot | Existing | DONE |
| E-OFF-005 | Knowledge state attestation in scan result | 3500 | DONE |
Acceptance Criteria
- Every offline scan knows exactly what knowledge it had
- Forensic replayability, not just offline execution
- Audit can answer: "what did you know when you made this decision?"
Competitive Edge
"Epistemic completeness vs. just operational offline."
Priority Matrix
| Milestone | Strategic Value | Implementation Effort | Priority |
|---|---|---|---|
| CPR (Call-Path Reachability) | ★★★★★ | High | P0 |
| S-DIFF (Semantic Smart-Diff) | ★★★★★ | Medium | P0 |
| EXP-F (Explainable Findings) | ★★★★☆ | Medium | P1 |
| VEX-L (VEX Lattice) | ★★★★☆ | Medium | P1 |
| D-SCORE (Deterministic Scoring) | ★★★★☆ | Medium | P1 |
| UNK (Unknowns State) | ★★★★☆ | Low | P1 |
| SBOM-L (SBOM Ledger) | ★★★☆☆ | High | P2 |
| E-OFF (Epistemic Offline) | ★★★☆☆ | Low | P2 |
Sprint Alignment
| Sprint | Milestones Addressed |
|---|---|
| 3500 (Smart-Diff) | S-DIFF, UNK, D-SCORE, E-OFF |
| 3600 (Reachability Drift) | CPR, S-DIFF, EXP-F |
| 3700 (Vuln Surfaces) | CPR, SBOM-L |
| 3800 (Explainable Triage) | EXP-F, VEX-L, D-SCORE |
| 4100 (Triage UI) | EXP-F, UNK |
Benchmark Tests
Each milestone should have corresponding benchmark tests in bench/:
| Benchmark | Tests |
|---|---|
bench/reachability-benchmark/ |
CPR accuracy vs. ground truth |
bench/smart-diff/ |
Semantic diff correctness |
bench/determinism/ |
Replay fidelity |
bench/unknowns/ |
Unknowns tracking accuracy |
bench/vex-lattice/ |
VEX merge correctness |
References
- Source advisory:
docs/product-advisories/19-Dec-2025 - Benchmarking Container Scanners Against Stella Ops.md - Moat spec:
docs/moat.md - Key features:
docs/key-features.md - Reachability delivery:
docs/reachability/DELIVERY_GUIDE.md