3.3 KiB
3.3 KiB
Reachability benchmarks with ground-truth datasets
Module
Bench
Status
VERIFIED
Description
Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.
Implementation Details
- Scanner.Analyzers Benchmark:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/-- benchmark runner for scanner analyzers against ground-truth datasets. Key files:ScenarioRunners.cs(orchestrates benchmark scenarios against corpus data),NodeBenchMetrics.cs(captures per-node precision/recall metrics),BenchmarkConfig.cs(configures which datasets and analyzers to run). - Baseline Infrastructure:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs(ground-truth entry model),BaselineLoader.cs(loads ground-truth datasets from fixture files). - Reporting:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs(JSON output),BenchmarkScenarioReport.cs(report with precision/recall/F1),PrometheusWriter.cs(metric export). - Tests:
src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs,BenchmarkJsonWriterTests.cs,BenchmarkScenarioReportTests.cs,PrometheusWriterTests.cs
E2E Test Plan
- Load a Java Log4j ground-truth dataset via
BaselineLoaderand run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth - Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
- Run the benchmark with a native ELF dataset and verify the
NodeBenchMetricscaptures per-node accuracy - Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
- Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
- Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset
Verification
- Verified: 2026-02-11
- Method: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral CLI benchmark replay
- Build: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj)
- Tests: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/StellaOps.Bench.ScannerAnalyzers.Tests.csproj: 15/15)
- Tier 0 Evidence: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier0-source-check.json
- Tier 1 Evidence: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier1-build-check.json
- Tier 2 Evidence: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier2-integration-check.json
Retest Notes
- Initial failure (run-001): Tier 2 CLI execution failed because analyzer IDs in benchmark config were not instantiable by ScenarioRunnerFactory.
- Fix and retest (run-002): Added analyzer factory mappings + tests, then reran Tier 0/1/2 with fresh artifacts and passing verdict.