446 lines
17 KiB
Markdown
446 lines
17 KiB
Markdown
# Benchmark Module Architecture
|
||
|
||
## Overview
|
||
|
||
The Benchmark module provides infrastructure for validating and demonstrating Stella Ops' competitive advantages through automated comparison against other container security scanners (Trivy, Grype, Syft, etc.).
|
||
|
||
**Module Path**: `src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/`
|
||
**Status**: PLANNED (Sprint 7000.0001.0001)
|
||
|
||
---
|
||
|
||
## Mission
|
||
|
||
Establish verifiable, reproducible benchmarks that:
|
||
1. Validate competitive claims with evidence
|
||
2. Detect regressions in accuracy or performance
|
||
3. Generate marketing-ready comparison materials
|
||
4. Provide ground-truth corpus for testing
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────â”
|
||
│ Benchmark Module │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────┠┌─────────────┠┌─────────────┠│
|
||
│ │ Corpus │ │ Harness │ │ Metrics │ │
|
||
│ │ Manager │───▶│ Runner │───▶│ Calculator │ │
|
||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||
│ │ │ │ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────────┠┌─────────────┠┌─────────────┠│
|
||
│ │Ground Truth │ │ Competitor │ │ Claims │ │
|
||
│ │ Manifest │ │ Adapters │ │ Index │ │
|
||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Components
|
||
|
||
### 1. Corpus Manager
|
||
|
||
**Namespace**: `StellaOps.Scanner.Benchmark.Corpus`
|
||
|
||
Manages the ground-truth corpus of container images with known vulnerabilities.
|
||
|
||
```csharp
|
||
public interface ICorpusManager
|
||
{
|
||
Task<Corpus> LoadCorpusAsync(string corpusPath, CancellationToken ct);
|
||
Task<CorpusImage> GetImageAsync(string digest, CancellationToken ct);
|
||
Task<GroundTruth> GetGroundTruthAsync(string digest, CancellationToken ct);
|
||
}
|
||
|
||
public record Corpus(
|
||
string Version,
|
||
DateTimeOffset CreatedAt,
|
||
ImmutableArray<CorpusImage> Images
|
||
);
|
||
|
||
public record CorpusImage(
|
||
string Digest,
|
||
string Name,
|
||
string Tag,
|
||
CorpusCategory Category,
|
||
GroundTruth GroundTruth
|
||
);
|
||
|
||
public record GroundTruth(
|
||
ImmutableArray<string> TruePositives,
|
||
ImmutableArray<string> KnownFalsePositives,
|
||
ImmutableArray<string> Notes
|
||
);
|
||
|
||
public enum CorpusCategory
|
||
{
|
||
BaseOS, // Alpine, Debian, Ubuntu, RHEL
|
||
ApplicationNode, // Node.js applications
|
||
ApplicationPython,// Python applications
|
||
ApplicationJava, // Java applications
|
||
ApplicationDotNet,// .NET applications
|
||
BackportScenario, // Known backported fixes
|
||
Unreachable // Known unreachable vulns
|
||
}
|
||
```
|
||
|
||
### 2. Harness Runner
|
||
|
||
**Namespace**: `StellaOps.Scanner.Benchmark.Harness`
|
||
|
||
Executes scans using Stella Ops and competitor tools.
|
||
|
||
```csharp
|
||
public interface IHarnessRunner
|
||
{
|
||
Task<BenchmarkRun> RunAsync(
|
||
Corpus corpus,
|
||
ImmutableArray<ITool> tools,
|
||
BenchmarkOptions options,
|
||
CancellationToken ct
|
||
);
|
||
}
|
||
|
||
public interface ITool
|
||
{
|
||
string Name { get; }
|
||
string Version { get; }
|
||
Task<ToolResult> ScanAsync(string imageRef, CancellationToken ct);
|
||
}
|
||
|
||
public record BenchmarkRun(
|
||
string RunId,
|
||
DateTimeOffset StartedAt,
|
||
DateTimeOffset CompletedAt,
|
||
ImmutableArray<ToolResult> Results
|
||
);
|
||
|
||
public record ToolResult(
|
||
string ToolName,
|
||
string ToolVersion,
|
||
string ImageDigest,
|
||
ImmutableArray<NormalizedFinding> Findings,
|
||
TimeSpan Duration
|
||
);
|
||
```
|
||
|
||
### 3. Competitor Adapters
|
||
|
||
**Namespace**: `StellaOps.Scanner.Benchmark.Adapters`
|
||
|
||
Normalize output from competitor tools.
|
||
|
||
```csharp
|
||
public interface ICompetitorAdapter : ITool
|
||
{
|
||
Task<ImmutableArray<NormalizedFinding>> ParseOutputAsync(
|
||
string output,
|
||
CancellationToken ct
|
||
);
|
||
}
|
||
|
||
// Implementations
|
||
public class TrivyAdapter : ICompetitorAdapter { }
|
||
public class GrypeAdapter : ICompetitorAdapter { }
|
||
public class SyftAdapter : ICompetitorAdapter { }
|
||
public class StellaOpsAdapter : ICompetitorAdapter { }
|
||
```
|
||
|
||
### 4. Metrics Calculator
|
||
|
||
**Namespace**: `StellaOps.Scanner.Benchmark.Metrics`
|
||
|
||
Calculate precision, recall, F1, and other metrics.
|
||
|
||
```csharp
|
||
public interface IMetricsCalculator
|
||
{
|
||
BenchmarkMetrics Calculate(
|
||
ToolResult result,
|
||
GroundTruth groundTruth
|
||
);
|
||
|
||
ComparativeMetrics Compare(
|
||
BenchmarkMetrics baseline,
|
||
BenchmarkMetrics comparison
|
||
);
|
||
}
|
||
|
||
public record BenchmarkMetrics(
|
||
int TruePositives,
|
||
int FalsePositives,
|
||
int TrueNegatives,
|
||
int FalseNegatives,
|
||
double Precision,
|
||
double Recall,
|
||
double F1Score,
|
||
ImmutableDictionary<string, BenchmarkMetrics> ByCategory
|
||
);
|
||
|
||
public record ComparativeMetrics(
|
||
string BaselineTool,
|
||
string ComparisonTool,
|
||
double PrecisionDelta,
|
||
double RecallDelta,
|
||
double F1Delta,
|
||
ImmutableArray<string> UniqueFindings,
|
||
ImmutableArray<string> MissedFindings
|
||
);
|
||
```
|
||
|
||
### 5. Claims Index
|
||
|
||
**Namespace**: `StellaOps.Scanner.Benchmark.Claims`
|
||
|
||
Manage verifiable claims with evidence links.
|
||
|
||
```csharp
|
||
public interface IClaimsIndex
|
||
{
|
||
Task<ImmutableArray<Claim>> GetAllClaimsAsync(CancellationToken ct);
|
||
Task<ClaimVerification> VerifyClaimAsync(string claimId, CancellationToken ct);
|
||
Task UpdateClaimsAsync(BenchmarkRun run, CancellationToken ct);
|
||
}
|
||
|
||
public record Claim(
|
||
string Id,
|
||
ClaimCategory Category,
|
||
string Statement,
|
||
string EvidencePath,
|
||
ClaimStatus Status,
|
||
DateTimeOffset LastVerified
|
||
);
|
||
|
||
public enum ClaimStatus { Pending, Verified, Published, Disputed, Resolved }
|
||
|
||
public record ClaimVerification(
|
||
string ClaimId,
|
||
bool IsValid,
|
||
string? Evidence,
|
||
string? FailureReason
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## Data Flow
|
||
|
||
```
|
||
┌────────────────â”
|
||
│ Corpus Images │
|
||
│ (50+ images) │
|
||
└───────┬────────┘
|
||
│
|
||
â–¼
|
||
┌────────────────┠┌────────────────â”
|
||
│ Stella Ops Scan│ │ Trivy/Grype │
|
||
│ │ │ Scan │
|
||
└───────┬────────┘ └───────┬────────┘
|
||
│ │
|
||
â–¼ â–¼
|
||
┌────────────────┠┌────────────────â”
|
||
│ Normalized │ │ Normalized │
|
||
│ Findings │ │ Findings │
|
||
└───────┬────────┘ └───────┬────────┘
|
||
│ │
|
||
└──────────┬───────────┘
|
||
│
|
||
â–¼
|
||
┌──────────────â”
|
||
│ Ground Truth │
|
||
│ Comparison │
|
||
└──────┬───────┘
|
||
│
|
||
â–¼
|
||
┌──────────────â”
|
||
│ Metrics │
|
||
│ (P/R/F1) │
|
||
└──────┬───────┘
|
||
│
|
||
â–¼
|
||
┌──────────────â”
|
||
│ Claims Index │
|
||
│ Update │
|
||
└──────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Corpus Structure
|
||
|
||
```
|
||
bench/competitors/
|
||
├── corpus/
|
||
│ ├── manifest.json # Corpus metadata
|
||
│ ├── ground-truth/
|
||
│ │ ├── alpine-3.18.json # Per-image ground truth
|
||
│ │ ├── debian-bookworm.json
|
||
│ │ └── ...
|
||
│ └── images/
|
||
│ ├── base-os/
|
||
│ ├── applications/
|
||
│ └── edge-cases/
|
||
├── results/
|
||
│ ├── 2025-12-22/
|
||
│ │ ├── stellaops.json
|
||
│ │ ├── trivy.json
|
||
│ │ ├── grype.json
|
||
│ │ └── comparison.json
|
||
│ └── latest -> 2025-12-22/
|
||
└── fixtures/
|
||
└── adapters/ # Test fixtures for adapters
|
||
```
|
||
|
||
---
|
||
|
||
## Ground Truth Format
|
||
|
||
```json
|
||
{
|
||
"imageDigest": "sha256:abc123...",
|
||
"imageName": "alpine:3.18",
|
||
"category": "BaseOS",
|
||
"groundTruth": {
|
||
"truePositives": [
|
||
{
|
||
"cveId": "CVE-2024-1234",
|
||
"package": "openssl",
|
||
"version": "3.0.8",
|
||
"notes": "Fixed in 3.0.9"
|
||
}
|
||
],
|
||
"knownFalsePositives": [
|
||
{
|
||
"cveId": "CVE-2024-9999",
|
||
"package": "zlib",
|
||
"version": "1.2.13",
|
||
"reason": "Backported in alpine:3.18"
|
||
}
|
||
],
|
||
"expectedUnreachable": [
|
||
{
|
||
"cveId": "CVE-2024-5678",
|
||
"package": "curl",
|
||
"reason": "Vulnerable function not linked"
|
||
}
|
||
]
|
||
},
|
||
"lastVerified": "2025-12-01T00:00:00Z",
|
||
"verifiedBy": "security-team"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## CI Integration
|
||
|
||
### Workflow: `benchmark-vs-competitors.yml`
|
||
|
||
```yaml
|
||
name: Competitive Benchmark
|
||
|
||
on:
|
||
schedule:
|
||
- cron: '0 2 * * 0' # Weekly Sunday 2 AM
|
||
workflow_dispatch:
|
||
push:
|
||
paths:
|
||
- 'src/Scanner/__Libraries/StellaOps.Scanner.Benchmark/**'
|
||
- 'bench/competitors/**'
|
||
|
||
jobs:
|
||
benchmark:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
|
||
- name: Install competitor tools
|
||
run: |
|
||
# Install Trivy
|
||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh
|
||
# Install Grype
|
||
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh
|
||
|
||
- name: Run benchmark
|
||
run: stella benchmark run --corpus bench/competitors/corpus --output bench/competitors/results/$(date +%Y-%m-%d)
|
||
|
||
- name: Update claims index
|
||
run: stella benchmark claims --output docs/claims-index.md
|
||
|
||
- name: Upload results
|
||
uses: actions/upload-artifact@v4
|
||
with:
|
||
name: benchmark-results
|
||
path: bench/competitors/results/
|
||
```
|
||
|
||
---
|
||
|
||
## CLI Commands
|
||
|
||
```bash
|
||
# Run full benchmark
|
||
stella benchmark run --corpus <path> --competitors trivy,grype,syft
|
||
|
||
# Verify a specific claim
|
||
stella benchmark verify <CLAIM_ID>
|
||
|
||
# Generate claims index
|
||
stella benchmark claims --output docs/claims-index.md
|
||
|
||
# Generate marketing battlecard
|
||
stella benchmark battlecard --output docs/marketing/battlecard.md
|
||
|
||
# Show comparison summary
|
||
stella benchmark summary --format table|json|markdown
|
||
```
|
||
|
||
---
|
||
|
||
## Testing
|
||
|
||
| Test Type | Location | Purpose |
|
||
|-----------|----------|---------|
|
||
| Unit | `StellaOps.Scanner.Benchmark.Tests/` | Adapter parsing, metrics calculation |
|
||
| Integration | `StellaOps.Scanner.Benchmark.Integration.Tests/` | Full benchmark flow |
|
||
| Golden | `bench/competitors/fixtures/` | Deterministic output verification |
|
||
|
||
---
|
||
|
||
## Security Considerations
|
||
|
||
1. **Competitor binaries**: Run in isolated containers, no network access during scan
|
||
2. **Corpus images**: Verified digests, no external pulls during benchmark
|
||
3. **Results**: Signed with DSSE before publishing
|
||
4. **Claims**: Require PR review before status change
|
||
|
||
---
|
||
|
||
## Dependencies
|
||
|
||
- `StellaOps.Scanner.Core` - Normalized finding models
|
||
- `StellaOps.Attestor.Dsse` - Result signing
|
||
- Docker - Competitor tool execution
|
||
- Ground-truth corpus (maintained separately)
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
- [Claims Index](../../claims-index.md)
|
||
- [Sprint 7000.0001.0001](../../implplan/SPRINT_7000_0001_0001_competitive_benchmarking.md)
|
||
- [Testing Strategy](../../implplan/SPRINT_5100_0000_0000_epic_summary.md)
|
||
|
||
---
|
||
|
||
*Document Version*: 1.0.0
|
||
*Created*: 2025-12-22
|
||
|