Reachability benchmarks with ground-truth datasets

Module

Bench

Status

IMPLEMENTED

Description

Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.

Implementation Details

Scanner.Analyzers Benchmark: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/ -- benchmark runner for scanner analyzers against ground-truth datasets. Key files: ScenarioRunners.cs (orchestrates benchmark scenarios against corpus data), NodeBenchMetrics.cs (captures per-node precision/recall metrics), BenchmarkConfig.cs (configures which datasets and analyzers to run).
Baseline Infrastructure: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs (ground-truth entry model), BaselineLoader.cs (loads ground-truth datasets from fixture files).
Reporting: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs (JSON output), BenchmarkScenarioReport.cs (report with precision/recall/F1), PrometheusWriter.cs (metric export).
Tests: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs, BenchmarkJsonWriterTests.cs, BenchmarkScenarioReportTests.cs, PrometheusWriterTests.cs

E2E Test Plan

Load a Java Log4j ground-truth dataset via BaselineLoader and run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth
Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
Run the benchmark with a native ELF dataset and verify the NodeBenchMetrics captures per-node accuracy
Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset

2.2 KiB Raw Blame History

Reachability benchmarks with ground-truth datasets

Module

Status

Description

Implementation Details

E2E Test Plan

2.2 KiB

Raw Blame History