Files
git.stella-ops.org/docs/features/unchecked/bench/reachability-benchmarks-with-ground-truth-datasets.md

2.2 KiB

Reachability benchmarks with ground-truth datasets

Module

Bench

Status

IMPLEMENTED

Description

Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.

Implementation Details

  • Scanner.Analyzers Benchmark: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/ -- benchmark runner for scanner analyzers against ground-truth datasets. Key files: ScenarioRunners.cs (orchestrates benchmark scenarios against corpus data), NodeBenchMetrics.cs (captures per-node precision/recall metrics), BenchmarkConfig.cs (configures which datasets and analyzers to run).
  • Baseline Infrastructure: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs (ground-truth entry model), BaselineLoader.cs (loads ground-truth datasets from fixture files).
  • Reporting: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs (JSON output), BenchmarkScenarioReport.cs (report with precision/recall/F1), PrometheusWriter.cs (metric export).
  • Tests: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs, BenchmarkJsonWriterTests.cs, BenchmarkScenarioReportTests.cs, PrometheusWriterTests.cs

E2E Test Plan

  • Load a Java Log4j ground-truth dataset via BaselineLoader and run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth
  • Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
  • Run the benchmark with a native ELF dataset and verify the NodeBenchMetrics captures per-node accuracy
  • Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
  • Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
  • Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset