38 lines
3.3 KiB
Markdown
38 lines
3.3 KiB
Markdown
# Reachability benchmarks with ground-truth datasets
|
|
|
|
## Module
|
|
Bench
|
|
|
|
## Status
|
|
VERIFIED
|
|
|
|
## Description
|
|
Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.
|
|
|
|
## Implementation Details
|
|
- **Scanner.Analyzers Benchmark**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- benchmark runner for scanner analyzers against ground-truth datasets. Key files: `ScenarioRunners.cs` (orchestrates benchmark scenarios against corpus data), `NodeBenchMetrics.cs` (captures per-node precision/recall metrics), `BenchmarkConfig.cs` (configures which datasets and analyzers to run).
|
|
- **Baseline Infrastructure**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs` (ground-truth entry model), `BaselineLoader.cs` (loads ground-truth datasets from fixture files).
|
|
- **Reporting**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs` (JSON output), `BenchmarkScenarioReport.cs` (report with precision/recall/F1), `PrometheusWriter.cs` (metric export).
|
|
- **Tests**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs`, `BenchmarkJsonWriterTests.cs`, `BenchmarkScenarioReportTests.cs`, `PrometheusWriterTests.cs`
|
|
|
|
## E2E Test Plan
|
|
- [ ] Load a Java Log4j ground-truth dataset via `BaselineLoader` and run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth
|
|
- [ ] Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
|
|
- [ ] Run the benchmark with a native ELF dataset and verify the `NodeBenchMetrics` captures per-node accuracy
|
|
- [ ] Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
|
|
- [ ] Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
|
|
- [ ] Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset
|
|
|
|
## Verification
|
|
- **Verified**: 2026-02-11
|
|
- **Method**: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral CLI benchmark replay
|
|
- **Build**: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj)
|
|
- **Tests**: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/StellaOps.Bench.ScannerAnalyzers.Tests.csproj: 15/15)
|
|
- **Tier 0 Evidence**: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier0-source-check.json
|
|
- **Tier 1 Evidence**: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier1-build-check.json
|
|
- **Tier 2 Evidence**: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier2-integration-check.json
|
|
|
|
## Retest Notes
|
|
- **Initial failure (run-001)**: Tier 2 CLI execution failed because analyzer IDs in benchmark config were not instantiable by ScenarioRunnerFactory.
|
|
- **Fix and retest (run-002)**: Added analyzer factory mappings + tests, then reran Tier 0/1/2 with fresh artifacts and passing verdict.
|