2.2 KiB
2.2 KiB
Reachability benchmarks with ground-truth datasets
Module
Bench
Status
IMPLEMENTED
Description
Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.
Implementation Details
- Scanner.Analyzers Benchmark:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/-- benchmark runner for scanner analyzers against ground-truth datasets. Key files:ScenarioRunners.cs(orchestrates benchmark scenarios against corpus data),NodeBenchMetrics.cs(captures per-node precision/recall metrics),BenchmarkConfig.cs(configures which datasets and analyzers to run). - Baseline Infrastructure:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs(ground-truth entry model),BaselineLoader.cs(loads ground-truth datasets from fixture files). - Reporting:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs(JSON output),BenchmarkScenarioReport.cs(report with precision/recall/F1),PrometheusWriter.cs(metric export). - Tests:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs,BenchmarkJsonWriterTests.cs,BenchmarkScenarioReportTests.cs,PrometheusWriterTests.cs
E2E Test Plan
- Load a Java Log4j ground-truth dataset via
BaselineLoaderand run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth - Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
- Run the benchmark with a native ELF dataset and verify the
NodeBenchMetricscaptures per-node accuracy - Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
- Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
- Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset