semi implemented and features implemented save checkpoint

2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions
--- a/docs/features/unchecked/tests/public-reachability-benchmark-dataset.md
+++ b/docs/features/unchecked/tests/public-reachability-benchmark-dataset.md
@@ -0,0 +1,24 @@
+# Public Reachability Benchmark Dataset
+
+## Module
+__Tests
+
+## Status
+IMPLEMENTED
+
+## Description
+Complete reachability benchmark dataset with JSON/YAML schemas for ground truth, traces, submissions, cases, coverage, and entrypoints. Includes website, submission guide, and legal notices (LICENSE/NOTICE).
+
+## Implementation Details
+- **Benchmark Dataset**: `src/__Tests/__Benchmarks/reachability-benchmark/` -- complete public benchmark dataset including JSON/YAML schemas for ground truth, trace data, submission formats, test cases, coverage metrics, and entry point definitions.
+- **Benchmark Harness**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- evaluation harness that scores submissions against the ground truth.
+- **Baseline Infrastructure**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs`, `BaselineLoader.cs` -- loads ground-truth baselines for benchmark evaluation.
+- **Reporting**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkScenarioReport.cs` -- produces detailed benchmark reports with precision, recall, and F1 scores per category.
+
+## E2E Test Plan
+- [ ] Validate all JSON schemas in the benchmark dataset and verify they are well-formed and internally consistent
+- [ ] Submit a scanner's reachability results in the submission format and verify the evaluation harness produces a valid score report
+- [ ] Verify the ground-truth data covers all declared entry points and traces
+- [ ] Verify coverage metrics: submit a complete analysis and confirm the coverage report shows 100% of test cases evaluated
+- [ ] Verify the dataset includes required legal notices (LICENSE, NOTICE) and the submission guide is accessible
+- [ ] Load the baseline and compare a new submission against it; verify the harness correctly identifies improvements and regressions