# Benchmark Module Architecture ## Overview The Benchmark module provides infrastructure for validating and demonstrating Stella Ops' competitive advantages through automated comparison against other container security scanners (Trivy, Grype, Syft, etc.). **Module Path**: `src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/` **Status**: PLANNED (Sprint 7000.0001.0001) --- ## Mission Establish verifiable, reproducible benchmarks that: 1. Validate competitive claims with evidence 2. Detect regressions in accuracy or performance 3. Generate marketing-ready comparison materials 4. Provide ground-truth corpus for testing --- ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ Benchmark Module │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Corpus │ │ Harness │ │ Metrics │ │ │ │ Manager │───▶│ Runner │───▶│ Calculator │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │Ground Truth │ │ Competitor │ │ Claims │ │ │ │ Manifest │ │ Adapters │ │ Index │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Components ### 1. Corpus Manager **Namespace**: `StellaOps.Scanner.Benchmark.Corpus` Manages the ground-truth corpus of container images with known vulnerabilities. ```csharp public interface ICorpusManager { Task LoadCorpusAsync(string corpusPath, CancellationToken ct); Task GetImageAsync(string digest, CancellationToken ct); Task GetGroundTruthAsync(string digest, CancellationToken ct); } public record Corpus( string Version, DateTimeOffset CreatedAt, ImmutableArray Images ); public record CorpusImage( string Digest, string Name, string Tag, CorpusCategory Category, GroundTruth GroundTruth ); public record GroundTruth( ImmutableArray TruePositives, ImmutableArray KnownFalsePositives, ImmutableArray Notes ); public enum CorpusCategory { BaseOS, // Alpine, Debian, Ubuntu, RHEL ApplicationNode, // Node.js applications ApplicationPython,// Python applications ApplicationJava, // Java applications ApplicationDotNet,// .NET applications BackportScenario, // Known backported fixes Unreachable // Known unreachable vulns } ``` ### 2. Harness Runner **Namespace**: `StellaOps.Scanner.Benchmark.Harness` Executes scans using Stella Ops and competitor tools. ```csharp public interface IHarnessRunner { Task RunAsync( Corpus corpus, ImmutableArray tools, BenchmarkOptions options, CancellationToken ct ); } public interface ITool { string Name { get; } string Version { get; } Task ScanAsync(string imageRef, CancellationToken ct); } public record BenchmarkRun( string RunId, DateTimeOffset StartedAt, DateTimeOffset CompletedAt, ImmutableArray Results ); public record ToolResult( string ToolName, string ToolVersion, string ImageDigest, ImmutableArray Findings, TimeSpan Duration ); ``` ### 3. Competitor Adapters **Namespace**: `StellaOps.Scanner.Benchmark.Adapters` Normalize output from competitor tools. ```csharp public interface ICompetitorAdapter : ITool { Task> ParseOutputAsync( string output, CancellationToken ct ); } // Implementations public class TrivyAdapter : ICompetitorAdapter { } public class GrypeAdapter : ICompetitorAdapter { } public class SyftAdapter : ICompetitorAdapter { } public class StellaOpsAdapter : ICompetitorAdapter { } ``` ### 4. Metrics Calculator **Namespace**: `StellaOps.Scanner.Benchmark.Metrics` Calculate precision, recall, F1, and other metrics. ```csharp public interface IMetricsCalculator { BenchmarkMetrics Calculate( ToolResult result, GroundTruth groundTruth ); ComparativeMetrics Compare( BenchmarkMetrics baseline, BenchmarkMetrics comparison ); } public record BenchmarkMetrics( int TruePositives, int FalsePositives, int TrueNegatives, int FalseNegatives, double Precision, double Recall, double F1Score, ImmutableDictionary ByCategory ); public record ComparativeMetrics( string BaselineTool, string ComparisonTool, double PrecisionDelta, double RecallDelta, double F1Delta, ImmutableArray UniqueFindings, ImmutableArray MissedFindings ); ``` ### 5. Claims Index **Namespace**: `StellaOps.Scanner.Benchmark.Claims` Manage verifiable claims with evidence links. ```csharp public interface IClaimsIndex { Task> GetAllClaimsAsync(CancellationToken ct); Task VerifyClaimAsync(string claimId, CancellationToken ct); Task UpdateClaimsAsync(BenchmarkRun run, CancellationToken ct); } public record Claim( string Id, ClaimCategory Category, string Statement, string EvidencePath, ClaimStatus Status, DateTimeOffset LastVerified ); public enum ClaimStatus { Pending, Verified, Published, Disputed, Resolved } public record ClaimVerification( string ClaimId, bool IsValid, string? Evidence, string? FailureReason ); ``` --- ## Data Flow ``` ┌────────────────┐ │ Corpus Images │ │ (50+ images) │ └───────┬────────┘ │ ▼ ┌────────────────┐ ┌────────────────┐ │ Stella Ops Scan│ │ Trivy/Grype │ │ │ │ Scan │ └───────┬────────┘ └───────┬────────┘ │ │ ▼ ▼ ┌────────────────┐ ┌────────────────┐ │ Normalized │ │ Normalized │ │ Findings │ │ Findings │ └───────┬────────┘ └───────┬────────┘ │ │ └──────────┬───────────┘ │ ▼ ┌──────────────┐ │ Ground Truth │ │ Comparison │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Metrics │ │ (P/R/F1) │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Claims Index │ │ Update │ └──────────────┘ ``` --- ## Corpus Structure ``` bench/competitors/ ├── corpus/ │ ├── manifest.json # Corpus metadata │ ├── ground-truth/ │ │ ├── alpine-3.18.json # Per-image ground truth │ │ ├── debian-bookworm.json │ │ └── ... │ └── images/ │ ├── base-os/ │ ├── applications/ │ └── edge-cases/ ├── results/ │ ├── 2025-12-22/ │ │ ├── stellaops.json │ │ ├── trivy.json │ │ ├── grype.json │ │ └── comparison.json │ └── latest -> 2025-12-22/ └── fixtures/ └── adapters/ # Test fixtures for adapters ``` --- ## Ground Truth Format ```json { "imageDigest": "sha256:abc123...", "imageName": "alpine:3.18", "category": "BaseOS", "groundTruth": { "truePositives": [ { "cveId": "CVE-2024-1234", "package": "openssl", "version": "3.0.8", "notes": "Fixed in 3.0.9" } ], "knownFalsePositives": [ { "cveId": "CVE-2024-9999", "package": "zlib", "version": "1.2.13", "reason": "Backported in alpine:3.18" } ], "expectedUnreachable": [ { "cveId": "CVE-2024-5678", "package": "curl", "reason": "Vulnerable function not linked" } ] }, "lastVerified": "2025-12-01T00:00:00Z", "verifiedBy": "security-team" } ``` --- ## CI Integration ### Workflow: `benchmark-vs-competitors.yml` ```yaml name: Competitive Benchmark on: schedule: - cron: '0 2 * * 0' # Weekly Sunday 2 AM workflow_dispatch: push: paths: - 'src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/**' - 'bench/competitors/**' jobs: benchmark: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install competitor tools run: | # Install Trivy curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh # Install Grype curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh - name: Run benchmark run: stella benchmark run --corpus bench/competitors/corpus --output bench/competitors/results/$(date +%Y-%m-%d) - name: Update claims index run: stella benchmark claims --output docs/claims-index.md - name: Upload results uses: actions/upload-artifact@v4 with: name: benchmark-results path: bench/competitors/results/ ``` --- ## CLI Commands ```bash # Run full benchmark stella benchmark run --corpus --competitors trivy,grype,syft # Verify a specific claim stella benchmark verify # Generate claims index stella benchmark claims --output docs/claims-index.md # Generate marketing battlecard stella benchmark battlecard --output docs/marketing/battlecard.md # Show comparison summary stella benchmark summary --format table|json|markdown ``` --- ## Testing | Test Type | Location | Purpose | |-----------|----------|---------| | Unit | `StellaOps.Scanner.Benchmark.Tests/` | Adapter parsing, metrics calculation | | Integration | `StellaOps.Scanner.Benchmark.Integration.Tests/` | Full benchmark flow | | Golden | `bench/competitors/fixtures/` | Deterministic output verification | --- ## Security Considerations 1. **Competitor binaries**: Run in isolated containers, no network access during scan 2. **Corpus images**: Verified digests, no external pulls during benchmark 3. **Results**: Signed with DSSE before publishing 4. **Claims**: Require PR review before status change --- ## Dependencies - `StellaOps.Scanner.Core` - Normalized finding models - `StellaOps.Attestor.Dsse` - Result signing - Docker - Competitor tool execution - Ground-truth corpus (maintained separately) --- ## Related Documentation - [Claims Index](../../claims-index.md) - [Sprint 7000.0001.0001](../../implplan/SPRINT_7000_0001_0001_competitive_benchmarking.md) - [Testing Strategy](../../implplan/SPRINT_5100_SUMMARY.md) --- *Document Version*: 1.0.0 *Created*: 2025-12-22