git.stella-ops.org/docs/modules/benchmark/architecture.md

# Benchmark Module Architecture

## Overview

The Benchmark module provides infrastructure for validating and demonstrating Stella Ops' competitive advantages through automated comparison against other container security scanners (Trivy, Grype, Syft, etc.).

**Module Path**: `src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/`
**Status**: PLANNED (Sprint 7000.0001.0001)

---

## Mission

Establish verifiable, reproducible benchmarks that:
1. Validate competitive claims with evidence
2. Detect regressions in accuracy or performance
3. Generate marketing-ready comparison materials
4. Provide ground-truth corpus for testing

---

## Architecture

```
â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
â”‚                      Benchmark Module                            â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚                                                                  â”‚
â”‚  â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”         â”‚
â”‚  â”‚   Corpus    â”‚    â”‚   Harness   â”‚    â”‚   Metrics   â”‚         â”‚
â”‚  â”‚  Manager    â”‚â”€â”€â”€â–¶â”‚   Runner    â”‚â”€â”€â”€â–¶â”‚  Calculator â”‚         â”‚
â”‚  â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜         â”‚
â”‚         â”‚                  â”‚                  â”‚                 â”‚
â”‚         â”‚                  â”‚                  â”‚                 â”‚
â”‚         â–¼                  â–¼                  â–¼                 â”‚
â”‚  â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”         â”‚
â”‚  â”‚Ground Truth â”‚    â”‚ Competitor  â”‚    â”‚   Claims    â”‚         â”‚
â”‚  â”‚  Manifest   â”‚    â”‚  Adapters   â”‚    â”‚   Index     â”‚         â”‚
â”‚  â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜         â”‚
â”‚                                                                  â”‚
â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
```

---

## Components

### 1. Corpus Manager

**Namespace**: `StellaOps.Scanner.Benchmark.Corpus`

Manages the ground-truth corpus of container images with known vulnerabilities.

```csharp
public interface ICorpusManager
{
    Task<Corpus> LoadCorpusAsync(string corpusPath, CancellationToken ct);
    Task<CorpusImage> GetImageAsync(string digest, CancellationToken ct);
    Task<GroundTruth> GetGroundTruthAsync(string digest, CancellationToken ct);
}

public record Corpus(
    string Version,
    DateTimeOffset CreatedAt,
    ImmutableArray<CorpusImage> Images
);

public record CorpusImage(
    string Digest,
    string Name,
    string Tag,
    CorpusCategory Category,
    GroundTruth GroundTruth
);

public record GroundTruth(
    ImmutableArray<string> TruePositives,
    ImmutableArray<string> KnownFalsePositives,
    ImmutableArray<string> Notes
);

public enum CorpusCategory
{
    BaseOS,           // Alpine, Debian, Ubuntu, RHEL
    ApplicationNode,  // Node.js applications
    ApplicationPython,// Python applications
    ApplicationJava,  // Java applications
    ApplicationDotNet,// .NET applications
    BackportScenario, // Known backported fixes
    Unreachable       // Known unreachable vulns
}
```

### 2. Harness Runner

**Namespace**: `StellaOps.Scanner.Benchmark.Harness`

Executes scans using Stella Ops and competitor tools.

```csharp
public interface IHarnessRunner
{
    Task<BenchmarkRun> RunAsync(
        Corpus corpus,
        ImmutableArray<ITool> tools,
        BenchmarkOptions options,
        CancellationToken ct
    );
}

public interface ITool
{
    string Name { get; }
    string Version { get; }
    Task<ToolResult> ScanAsync(string imageRef, CancellationToken ct);
}

public record BenchmarkRun(
    string RunId,
    DateTimeOffset StartedAt,
    DateTimeOffset CompletedAt,
    ImmutableArray<ToolResult> Results
);

public record ToolResult(
    string ToolName,
    string ToolVersion,
    string ImageDigest,
    ImmutableArray<NormalizedFinding> Findings,
    TimeSpan Duration
);
```

### 3. Competitor Adapters

**Namespace**: `StellaOps.Scanner.Benchmark.Adapters`

Normalize output from competitor tools.

```csharp
public interface ICompetitorAdapter : ITool
{
    Task<ImmutableArray<NormalizedFinding>> ParseOutputAsync(
        string output,
        CancellationToken ct
    );
}

// Implementations
public class TrivyAdapter : ICompetitorAdapter { }
public class GrypeAdapter : ICompetitorAdapter { }
public class SyftAdapter : ICompetitorAdapter { }
public class StellaOpsAdapter : ICompetitorAdapter { }
```

### 4. Metrics Calculator

**Namespace**: `StellaOps.Scanner.Benchmark.Metrics`

Calculate precision, recall, F1, and other metrics.

```csharp
public interface IMetricsCalculator
{
    BenchmarkMetrics Calculate(
        ToolResult result,
        GroundTruth groundTruth
    );

    ComparativeMetrics Compare(
        BenchmarkMetrics baseline,
        BenchmarkMetrics comparison
    );
}

public record BenchmarkMetrics(
    int TruePositives,
    int FalsePositives,
    int TrueNegatives,
    int FalseNegatives,
    double Precision,
    double Recall,
    double F1Score,
    ImmutableDictionary<string, BenchmarkMetrics> ByCategory
);

public record ComparativeMetrics(
    string BaselineTool,
    string ComparisonTool,
    double PrecisionDelta,
    double RecallDelta,
    double F1Delta,
    ImmutableArray<string> UniqueFindings,
    ImmutableArray<string> MissedFindings
);
```

### 5. Claims Index

**Namespace**: `StellaOps.Scanner.Benchmark.Claims`

Manage verifiable claims with evidence links.

```csharp
public interface IClaimsIndex
{
    Task<ImmutableArray<Claim>> GetAllClaimsAsync(CancellationToken ct);
    Task<ClaimVerification> VerifyClaimAsync(string claimId, CancellationToken ct);
    Task UpdateClaimsAsync(BenchmarkRun run, CancellationToken ct);
}

public record Claim(
    string Id,
    ClaimCategory Category,
    string Statement,
    string EvidencePath,
    ClaimStatus Status,
    DateTimeOffset LastVerified
);

public enum ClaimStatus { Pending, Verified, Published, Disputed, Resolved }

public record ClaimVerification(
    string ClaimId,
    bool IsValid,
    string? Evidence,
    string? FailureReason
);
```

---

## Data Flow

```
â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
â”‚  Corpus Images â”‚
â”‚  (50+ images)  â”‚
â””â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”˜
        â”‚
        â–¼
â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”     â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
â”‚ Stella Ops Scanâ”‚     â”‚ Trivy/Grype    â”‚
â”‚                â”‚     â”‚ Scan           â”‚
â””â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”˜     â””â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”˜
        â”‚                      â”‚
        â–¼                      â–¼
â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”     â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
â”‚ Normalized     â”‚     â”‚ Normalized     â”‚
â”‚ Findings       â”‚     â”‚ Findings       â”‚
â””â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”˜     â””â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”˜
        â”‚                      â”‚
        â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
                   â”‚
                   â–¼
           â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
           â”‚ Ground Truth â”‚
           â”‚ Comparison   â”‚
           â””â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”˜
                  â”‚
                  â–¼
           â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
           â”‚   Metrics    â”‚
           â”‚  (P/R/F1)    â”‚
           â””â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”˜
                  â”‚
                  â–¼
           â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
           â”‚ Claims Index â”‚
           â”‚   Update     â”‚
           â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
```

---

## Corpus Structure

```
bench/competitors/
â”œâ”€â”€ corpus/
â”‚   â”œâ”€â”€ manifest.json              # Corpus metadata
â”‚   â”œâ”€â”€ ground-truth/
â”‚   â”‚   â”œâ”€â”€ alpine-3.18.json       # Per-image ground truth
â”‚   â”‚   â”œâ”€â”€ debian-bookworm.json
â”‚   â”‚   â””â”€â”€ ...
â”‚   â””â”€â”€ images/
â”‚       â”œâ”€â”€ base-os/
â”‚       â”œâ”€â”€ applications/
â”‚       â””â”€â”€ edge-cases/
â”œâ”€â”€ results/
â”‚   â”œâ”€â”€ 2025-12-22/
â”‚   â”‚   â”œâ”€â”€ stellaops.json
â”‚   â”‚   â”œâ”€â”€ trivy.json
â”‚   â”‚   â”œâ”€â”€ grype.json
â”‚   â”‚   â””â”€â”€ comparison.json
â”‚   â””â”€â”€ latest -> 2025-12-22/
â””â”€â”€ fixtures/
    â””â”€â”€ adapters/                  # Test fixtures for adapters
```

---

## Ground Truth Format

```json
{
  "imageDigest": "sha256:abc123...",
  "imageName": "alpine:3.18",
  "category": "BaseOS",
  "groundTruth": {
    "truePositives": [
      {
        "cveId": "CVE-2024-1234",
        "package": "openssl",
        "version": "3.0.8",
        "notes": "Fixed in 3.0.9"
      }
    ],
    "knownFalsePositives": [
      {
        "cveId": "CVE-2024-9999",
        "package": "zlib",
        "version": "1.2.13",
        "reason": "Backported in alpine:3.18"
      }
    ],
    "expectedUnreachable": [
      {
        "cveId": "CVE-2024-5678",
        "package": "curl",
        "reason": "Vulnerable function not linked"
      }
    ]
  },
  "lastVerified": "2025-12-01T00:00:00Z",
  "verifiedBy": "security-team"
}
```

---

## CI Integration

### Workflow: `benchmark-vs-competitors.yml`

```yaml
name: Competitive Benchmark

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly Sunday 2 AM
  workflow_dispatch:
  push:
    paths:
      - 'src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/**'
      - 'bench/competitors/**'

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install competitor tools
        run: |
          # Install Trivy
          curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh
          # Install Grype
          curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh

      - name: Run benchmark
        run: stella benchmark run --corpus bench/competitors/corpus --output bench/competitors/results/$(date +%Y-%m-%d)

      - name: Update claims index
        run: stella benchmark claims --output docs/claims-index.md

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: bench/competitors/results/
```

---

## CLI Commands

```bash
# Run full benchmark
stella benchmark run --corpus <path> --competitors trivy,grype,syft

# Verify a specific claim
stella benchmark verify <CLAIM_ID>

# Generate claims index
stella benchmark claims --output docs/claims-index.md

# Generate marketing battlecard
stella benchmark battlecard --output docs/marketing/battlecard.md

# Show comparison summary
stella benchmark summary --format table|json|markdown
```

---

## Testing

| Test Type | Location | Purpose |
|-----------|----------|---------|
| Unit | `StellaOps.Scanner.Benchmark.Tests/` | Adapter parsing, metrics calculation |
| Integration | `StellaOps.Scanner.Benchmark.Integration.Tests/` | Full benchmark flow |
| Golden | `bench/competitors/fixtures/` | Deterministic output verification |

---

## Security Considerations

1. **Competitor binaries**: Run in isolated containers, no network access during scan
2. **Corpus images**: Verified digests, no external pulls during benchmark
3. **Results**: Signed with DSSE before publishing
4. **Claims**: Require PR review before status change

---

## Dependencies

- `StellaOps.Scanner.Core` - Normalized finding models
- `StellaOps.Attestor.Dsse` - Result signing
- Docker - Competitor tool execution
- Ground-truth corpus (maintained separately)

---

## Related Documentation

- [Claims Index](../../claims-index.md)
- [Sprint 7000.0001.0001](../../implplan/SPRINT_7000_0001_0001_competitive_benchmarking.md)
- [Testing Strategy](../../implplan/SPRINT_5100_0000_0000_epic_summary.md)

---

*Document Version*: 1.0.0
*Created*: 2025-12-22