git.stella-ops.org/docs/modules/benchmark/architecture.md

# Benchmark Module Architecture

## Overview

The Benchmark module provides infrastructure for validating and demonstrating Stella Ops' competitive advantages through automated comparison against other container security scanners (Trivy, Grype, Syft, etc.).

**Module Path**: `src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/`
**Status**: PLANNED (Sprint 7000.0001.0001)

---

## Mission

Establish verifiable, reproducible benchmarks that:
1. Validate competitive claims with evidence
2. Detect regressions in accuracy or performance
3. Generate marketing-ready comparison materials
4. Provide ground-truth corpus for testing

---

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                      Benchmark Module                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │   Corpus    │    │   Harness   │    │   Metrics   │         │
│  │  Manager    │───▶│   Runner    │───▶│  Calculator │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│         │                  │                  │                 │
│         │                  │                  │                 │
│         ▼                  ▼                  ▼                 │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │Ground Truth │    │ Competitor  │    │   Claims    │         │
│  │  Manifest   │    │  Adapters   │    │   Index     │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Components

### 1. Corpus Manager

**Namespace**: `StellaOps.Scanner.Benchmark.Corpus`

Manages the ground-truth corpus of container images with known vulnerabilities.

```csharp
public interface ICorpusManager
{
    Task<Corpus> LoadCorpusAsync(string corpusPath, CancellationToken ct);
    Task<CorpusImage> GetImageAsync(string digest, CancellationToken ct);
    Task<GroundTruth> GetGroundTruthAsync(string digest, CancellationToken ct);
}

public record Corpus(
    string Version,
    DateTimeOffset CreatedAt,
    ImmutableArray<CorpusImage> Images
);

public record CorpusImage(
    string Digest,
    string Name,
    string Tag,
    CorpusCategory Category,
    GroundTruth GroundTruth
);

public record GroundTruth(
    ImmutableArray<string> TruePositives,
    ImmutableArray<string> KnownFalsePositives,
    ImmutableArray<string> Notes
);

public enum CorpusCategory
{
    BaseOS,           // Alpine, Debian, Ubuntu, RHEL
    ApplicationNode,  // Node.js applications
    ApplicationPython,// Python applications
    ApplicationJava,  // Java applications
    ApplicationDotNet,// .NET applications
    BackportScenario, // Known backported fixes
    Unreachable       // Known unreachable vulns
}
```

### 2. Harness Runner

**Namespace**: `StellaOps.Scanner.Benchmark.Harness`

Executes scans using Stella Ops and competitor tools.

```csharp
public interface IHarnessRunner
{
    Task<BenchmarkRun> RunAsync(
        Corpus corpus,
        ImmutableArray<ITool> tools,
        BenchmarkOptions options,
        CancellationToken ct
    );
}

public interface ITool
{
    string Name { get; }
    string Version { get; }
    Task<ToolResult> ScanAsync(string imageRef, CancellationToken ct);
}

public record BenchmarkRun(
    string RunId,
    DateTimeOffset StartedAt,
    DateTimeOffset CompletedAt,
    ImmutableArray<ToolResult> Results
);

public record ToolResult(
    string ToolName,
    string ToolVersion,
    string ImageDigest,
    ImmutableArray<NormalizedFinding> Findings,
    TimeSpan Duration
);
```

### 3. Competitor Adapters

**Namespace**: `StellaOps.Scanner.Benchmark.Adapters`

Normalize output from competitor tools.

```csharp
public interface ICompetitorAdapter : ITool
{
    Task<ImmutableArray<NormalizedFinding>> ParseOutputAsync(
        string output,
        CancellationToken ct
    );
}

// Implementations
public class TrivyAdapter : ICompetitorAdapter { }
public class GrypeAdapter : ICompetitorAdapter { }
public class SyftAdapter : ICompetitorAdapter { }
public class StellaOpsAdapter : ICompetitorAdapter { }
```

### 4. Metrics Calculator

**Namespace**: `StellaOps.Scanner.Benchmark.Metrics`

Calculate precision, recall, F1, and other metrics.

```csharp
public interface IMetricsCalculator
{
    BenchmarkMetrics Calculate(
        ToolResult result,
        GroundTruth groundTruth
    );

    ComparativeMetrics Compare(
        BenchmarkMetrics baseline,
        BenchmarkMetrics comparison
    );
}

public record BenchmarkMetrics(
    int TruePositives,
    int FalsePositives,
    int TrueNegatives,
    int FalseNegatives,
    double Precision,
    double Recall,
    double F1Score,
    ImmutableDictionary<string, BenchmarkMetrics> ByCategory
);

public record ComparativeMetrics(
    string BaselineTool,
    string ComparisonTool,
    double PrecisionDelta,
    double RecallDelta,
    double F1Delta,
    ImmutableArray<string> UniqueFindings,
    ImmutableArray<string> MissedFindings
);
```

### 5. Claims Index

**Namespace**: `StellaOps.Scanner.Benchmark.Claims`

Manage verifiable claims with evidence links.

```csharp
public interface IClaimsIndex
{
    Task<ImmutableArray<Claim>> GetAllClaimsAsync(CancellationToken ct);
    Task<ClaimVerification> VerifyClaimAsync(string claimId, CancellationToken ct);
    Task UpdateClaimsAsync(BenchmarkRun run, CancellationToken ct);
}

public record Claim(
    string Id,
    ClaimCategory Category,
    string Statement,
    string EvidencePath,
    ClaimStatus Status,
    DateTimeOffset LastVerified
);

public enum ClaimStatus { Pending, Verified, Published, Disputed, Resolved }

public record ClaimVerification(
    string ClaimId,
    bool IsValid,
    string? Evidence,
    string? FailureReason
);
```

---

## Data Flow

```
┌────────────────┐
│  Corpus Images │
│  (50+ images)  │
└───────┬────────┘
        │
        ▼
┌────────────────┐     ┌────────────────┐
│ Stella Ops Scan│     │ Trivy/Grype    │
│                │     │ Scan           │
└───────┬────────┘     └───────┬────────┘
        │                      │
        ▼                      ▼
┌────────────────┐     ┌────────────────┐
│ Normalized     │     │ Normalized     │
│ Findings       │     │ Findings       │
└───────┬────────┘     └───────┬────────┘
        │                      │
        └──────────┬───────────┘
                   │
                   ▼
           ┌──────────────┐
           │ Ground Truth │
           │ Comparison   │
           └──────┬───────┘
                  │
                  ▼
           ┌──────────────┐
           │   Metrics    │
           │  (P/R/F1)    │
           └──────┬───────┘
                  │
                  ▼
           ┌──────────────┐
           │ Claims Index │
           │   Update     │
           └──────────────┘
```

---

## Corpus Structure

```
bench/competitors/
├── corpus/
│   ├── manifest.json              # Corpus metadata
│   ├── ground-truth/
│   │   ├── alpine-3.18.json       # Per-image ground truth
│   │   ├── debian-bookworm.json
│   │   └── ...
│   └── images/
│       ├── base-os/
│       ├── applications/
│       └── edge-cases/
├── results/
│   ├── 2025-12-22/
│   │   ├── stellaops.json
│   │   ├── trivy.json
│   │   ├── grype.json
│   │   └── comparison.json
│   └── latest -> 2025-12-22/
└── fixtures/
    └── adapters/                  # Test fixtures for adapters
```

---

## Ground Truth Format

```json
{
  "imageDigest": "sha256:abc123...",
  "imageName": "alpine:3.18",
  "category": "BaseOS",
  "groundTruth": {
    "truePositives": [
      {
        "cveId": "CVE-2024-1234",
        "package": "openssl",
        "version": "3.0.8",
        "notes": "Fixed in 3.0.9"
      }
    ],
    "knownFalsePositives": [
      {
        "cveId": "CVE-2024-9999",
        "package": "zlib",
        "version": "1.2.13",
        "reason": "Backported in alpine:3.18"
      }
    ],
    "expectedUnreachable": [
      {
        "cveId": "CVE-2024-5678",
        "package": "curl",
        "reason": "Vulnerable function not linked"
      }
    ]
  },
  "lastVerified": "2025-12-01T00:00:00Z",
  "verifiedBy": "security-team"
}
```

---

## CI Integration

### Workflow: `benchmark-vs-competitors.yml`

```yaml
name: Competitive Benchmark

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly Sunday 2 AM
  workflow_dispatch:
  push:
    paths:
      - 'src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/**'
      - 'bench/competitors/**'

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install competitor tools
        run: |
          # Install Trivy
          curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh
          # Install Grype
          curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh

      - name: Run benchmark
        run: stella benchmark run --corpus bench/competitors/corpus --output bench/competitors/results/$(date +%Y-%m-%d)

      - name: Update claims index
        run: stella benchmark claims --output docs/claims-index.md

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: bench/competitors/results/
```

---

## CLI Commands

```bash
# Run full benchmark
stella benchmark run --corpus <path> --competitors trivy,grype,syft

# Verify a specific claim
stella benchmark verify <CLAIM_ID>

# Generate claims index
stella benchmark claims --output docs/claims-index.md

# Generate marketing battlecard
stella benchmark battlecard --output docs/marketing/battlecard.md

# Show comparison summary
stella benchmark summary --format table|json|markdown
```

---

## Testing

| Test Type | Location | Purpose |
|-----------|----------|---------|
| Unit | `StellaOps.Scanner.Benchmark.Tests/` | Adapter parsing, metrics calculation |
| Integration | `StellaOps.Scanner.Benchmark.Integration.Tests/` | Full benchmark flow |
| Golden | `bench/competitors/fixtures/` | Deterministic output verification |

---

## Security Considerations

1. **Competitor binaries**: Run in isolated containers, no network access during scan
2. **Corpus images**: Verified digests, no external pulls during benchmark
3. **Results**: Signed with DSSE before publishing
4. **Claims**: Require PR review before status change

---

## Dependencies

- `StellaOps.Scanner.Core` - Normalized finding models
- `StellaOps.Attestor.Dsse` - Result signing
- Docker - Competitor tool execution
- Ground-truth corpus (maintained separately)

---

## Related Documentation

- [Claims Index](../../claims-index.md)
- [Sprint 7000.0001.0001](../../implplan/SPRINT_7000_0001_0001_competitive_benchmarking.md)
- [Testing Strategy](../../implplan/SPRINT_5100_SUMMARY.md)

---

*Document Version*: 1.0.0
*Created*: 2025-12-22