Reachability benchmarks with ground-truth datasets

Module

Bench

Status

VERIFIED

Description

Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.

Implementation Details

Scanner.Analyzers Benchmark: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/ -- benchmark runner for scanner analyzers against ground-truth datasets. Key files: ScenarioRunners.cs (orchestrates benchmark scenarios against corpus data), NodeBenchMetrics.cs (captures per-node precision/recall metrics), BenchmarkConfig.cs (configures which datasets and analyzers to run).
Baseline Infrastructure: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs (ground-truth entry model), BaselineLoader.cs (loads ground-truth datasets from fixture files).
Reporting: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs (JSON output), BenchmarkScenarioReport.cs (report with precision/recall/F1), PrometheusWriter.cs (metric export).
Tests: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs, BenchmarkJsonWriterTests.cs, BenchmarkScenarioReportTests.cs, PrometheusWriterTests.cs

E2E Test Plan

Load a Java Log4j ground-truth dataset via BaselineLoader and run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth
Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
Run the benchmark with a native ELF dataset and verify the NodeBenchMetrics captures per-node accuracy
Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset

Verification

Verified: 2026-02-11
Method: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral CLI benchmark replay
Build: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj)
Tests: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/StellaOps.Bench.ScannerAnalyzers.Tests.csproj: 15/15)
Tier 0 Evidence: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier0-source-check.json
Tier 1 Evidence: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier1-build-check.json
Tier 2 Evidence: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier2-integration-check.json

Retest Notes

Initial failure (run-001): Tier 2 CLI execution failed because analyzer IDs in benchmark config were not instantiable by ScenarioRunnerFactory.
Fix and retest (run-002): Added analyzer factory mappings + tests, then reran Tier 0/1/2 with fresh artifacts and passing verdict.

3.3 KiB Raw Blame History

Reachability benchmarks with ground-truth datasets

Module

Status

Description

Implementation Details

E2E Test Plan

Verification

Retest Notes

3.3 KiB

Raw Blame History