semi implemented and features implemented save checkpoint

2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions
--- a/docs/features/unchecked/bench/benchmark-harness.md
+++ b/docs/features/unchecked/bench/benchmark-harness.md
@@ -0,0 +1,28 @@
+# Benchmark harness (reachability, scanner analyzers, policy engine, determinism)
+
+## Module
+Bench
+
+## Status
+IMPLEMENTED
+
+## Description
+Comprehensive benchmark harness exists covering reachability, scanner analyzers, policy engine, determinism, graph, and link-not-merge benchmarks with Prometheus metric export.
+
+## Implementation Details
+- **LinkNotMerge Benchmark**: `src/Bench/StellaOps.Bench/LinkNotMerge/StellaOps.Bench.LinkNotMerge/` -- benchmark scenarios for linkset aggregation performance. Key files: `LinkNotMergeScenarioRunner.cs` (runs benchmark scenarios), `LinksetAggregator.cs` (aggregation logic under test), `ObservationData.cs` (test data models), `BenchmarkConfig.cs` (scenario configuration), `ScenarioStatistics.cs` / `ScenarioResult.cs` / `ScenarioExecutionResult.cs` (result models).
+- **LinkNotMerge VEX Benchmark**: `src/Bench/StellaOps.Bench/LinkNotMerge.Vex/StellaOps.Bench.LinkNotMerge.Vex/` -- VEX-specific linkset benchmarks. Key files: `VexScenarioRunner.cs`, `VexLinksetAggregator.cs`, `VexObservationGenerator.cs`, `VexScenarioConfig.cs`, `Statistics.cs`.
+- **Notify Benchmark**: `src/Bench/StellaOps.Bench/Notify/StellaOps.Bench.Notify/` -- notification dispatch benchmarks. Key files: `NotifyScenarioRunner.cs`, `DispatchAccumulator.cs`, `BenchmarkConfig.cs`.
+- **PolicyEngine Benchmark**: `src/Bench/StellaOps.Bench/PolicyEngine/StellaOps.Bench.PolicyEngine/` -- policy evaluation benchmarks. Key files: `PolicyScenarioRunner.cs`, `PathUtilities.cs`, `BenchmarkConfig.cs`.
+- **Scanner.Analyzers Benchmark**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- scanner analyzer benchmarks. Key files: `ScenarioRunners.cs`, `NodeBenchMetrics.cs`, `BenchmarkConfig.cs`.
+- **Baseline Infrastructure**: Each benchmark has `Baseline/BaselineEntry.cs` and `Baseline/BaselineLoader.cs` for loading ground-truth comparison baselines.
+- **Reporting Infrastructure**: Each benchmark has `Reporting/BenchmarkJsonWriter.cs` (JSON output), `Reporting/BenchmarkScenarioReport.cs` (report model), `Reporting/PrometheusWriter.cs` (Prometheus metric export).
+- **Tests**: `src/Bench/StellaOps.Bench/LinkNotMerge/StellaOps.Bench.LinkNotMerge.Tests/LinkNotMergeScenarioRunnerTests.cs`, `BaselineLoaderTests.cs`, `BenchmarkScenarioReportTests.cs`; `src/Bench/StellaOps.Bench/LinkNotMerge.Vex/StellaOps.Bench.LinkNotMerge.Vex.Tests/VexScenarioRunnerTests.cs`; `src/Bench/StellaOps.Bench/Notify/StellaOps.Bench.Notify.Tests/NotifyScenarioRunnerTests.cs`, `PrometheusWriterTests.cs`
+
+## E2E Test Plan
+- [ ] Run the LinkNotMerge benchmark suite and verify it produces a valid `BenchmarkScenarioReport` with timing statistics and passes baseline comparison
+- [ ] Run the PolicyEngine benchmark and verify scenario results include evaluation counts and latency percentiles
+- [ ] Run the Scanner.Analyzers benchmark and verify `NodeBenchMetrics` are captured per analyzer
+- [ ] Verify Prometheus export: run any benchmark and confirm `PrometheusWriter` outputs valid Prometheus exposition format with scenario labels
+- [ ] Verify JSON export: run a benchmark and confirm `BenchmarkJsonWriter` produces valid JSON report matching the `BenchmarkScenarioReport` schema
+- [ ] Verify baseline comparison: load a baseline and run scenarios, confirm the harness reports regressions when results exceed baseline thresholds
--- a/docs/features/unchecked/bench/reachability-benchmarks-with-ground-truth-datasets.md
+++ b/docs/features/unchecked/bench/reachability-benchmarks-with-ground-truth-datasets.md
@@ -0,0 +1,24 @@
+# Reachability benchmarks with ground-truth datasets
+
+## Module
+Bench
+
+## Status
+IMPLEMENTED
+
+## Description
+Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.
+
+## Implementation Details
+- **Scanner.Analyzers Benchmark**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- benchmark runner for scanner analyzers against ground-truth datasets. Key files: `ScenarioRunners.cs` (orchestrates benchmark scenarios against corpus data), `NodeBenchMetrics.cs` (captures per-node precision/recall metrics), `BenchmarkConfig.cs` (configures which datasets and analyzers to run).
+- **Baseline Infrastructure**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs` (ground-truth entry model), `BaselineLoader.cs` (loads ground-truth datasets from fixture files).
+- **Reporting**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs` (JSON output), `BenchmarkScenarioReport.cs` (report with precision/recall/F1), `PrometheusWriter.cs` (metric export).
+- **Tests**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs`, `BenchmarkJsonWriterTests.cs`, `BenchmarkScenarioReportTests.cs`, `PrometheusWriterTests.cs`
+
+## E2E Test Plan
+- [ ] Load a Java Log4j ground-truth dataset via `BaselineLoader` and run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth
+- [ ] Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
+- [ ] Run the benchmark with a native ELF dataset and verify the `NodeBenchMetrics` captures per-node accuracy
+- [ ] Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
+- [ ] Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
+- [ ] Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset