Files
git.stella-ops.org/docs/features/unchecked/tests/public-reachability-benchmark-dataset.md

1.9 KiB

Public Reachability Benchmark Dataset

Module

__Tests

Status

IMPLEMENTED

Description

Complete reachability benchmark dataset with JSON/YAML schemas for ground truth, traces, submissions, cases, coverage, and entrypoints. Includes website, submission guide, and legal notices (LICENSE/NOTICE).

Implementation Details

  • Benchmark Dataset: src/__Tests/__Benchmarks/reachability-benchmark/ -- complete public benchmark dataset including JSON/YAML schemas for ground truth, trace data, submission formats, test cases, coverage metrics, and entry point definitions.
  • Benchmark Harness: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/ -- evaluation harness that scores submissions against the ground truth.
  • Baseline Infrastructure: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs, BaselineLoader.cs -- loads ground-truth baselines for benchmark evaluation.
  • Reporting: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkScenarioReport.cs -- produces detailed benchmark reports with precision, recall, and F1 scores per category.

E2E Test Plan

  • Validate all JSON schemas in the benchmark dataset and verify they are well-formed and internally consistent
  • Submit a scanner's reachability results in the submission format and verify the evaluation harness produces a valid score report
  • Verify the ground-truth data covers all declared entry points and traces
  • Verify coverage metrics: submit a complete analysis and confirm the coverage report shows 100% of test cases evaluated
  • Verify the dataset includes required legal notices (LICENSE, NOTICE) and the submission guide is accessible
  • Load the baseline and compare a new submission against it; verify the harness correctly identifies improvements and regressions