31 lines
2.2 KiB
Markdown
31 lines
2.2 KiB
Markdown
# Public Reachability Benchmark Dataset
|
|
|
|
## Module
|
|
__Tests
|
|
|
|
## Status
|
|
VERIFIED
|
|
|
|
## Description
|
|
Complete reachability benchmark dataset with JSON/YAML schemas for ground truth, traces, submissions, cases, coverage, and entrypoints. Includes website, submission guide, and legal notices (LICENSE/NOTICE).
|
|
|
|
## Implementation Details
|
|
- **Benchmark Dataset**: `src/__Tests/__Benchmarks/reachability-benchmark/` -- complete public benchmark dataset including JSON/YAML schemas for ground truth, trace data, submission formats, test cases, coverage metrics, and entry point definitions.
|
|
- **Benchmark Harness**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- evaluation harness that scores submissions against the ground truth.
|
|
- **Baseline Infrastructure**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs`, `BaselineLoader.cs` -- loads ground-truth baselines for benchmark evaluation.
|
|
- **Reporting**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkScenarioReport.cs` -- produces detailed benchmark reports with precision, recall, and F1 scores per category.
|
|
|
|
## E2E Test Plan
|
|
- [ ] Validate all JSON schemas in the benchmark dataset and verify they are well-formed and internally consistent
|
|
- [ ] Submit a scanner's reachability results in the submission format and verify the evaluation harness produces a valid score report
|
|
- [ ] Verify the ground-truth data covers all declared entry points and traces
|
|
- [ ] Verify coverage metrics: submit a complete analysis and confirm the coverage report shows 100% of test cases evaluated
|
|
- [ ] Verify the dataset includes required legal notices (LICENSE, NOTICE) and the submission guide is accessible
|
|
- [ ] Load the baseline and compare a new submission against it; verify the harness correctly identifies improvements and regressions
|
|
|
|
## Verification
|
|
- Verified on 2026-02-13 via `run-001`.
|
|
- Tier 0: Source files confirmed present on disk.
|
|
- Tier 1: `dotnet build` passed (0 errors); 266/266 tests passed across Chaos.Tests, Evidence.Tests, Replay.Tests, FixtureTests.
|
|
- Tier 2d: `docs/qa/feature-checks/runs/tests/public-reachability-benchmark-dataset/run-001/tier2-integration-check.json`
|