save checkpoint: save features

2026-02-12 10:27:23 +02:00
parent dca86e1248
commit 5bca406787
8837 changed files with 1796879 additions and 5294 deletions
--- a/docs/features/checked/bench/benchmark-harness.md
+++ b/docs/features/checked/bench/benchmark-harness.md
@@ -0,0 +1,35 @@
+# Benchmark harness (reachability, scanner analyzers, policy engine, determinism)
+
+## Module
+Bench
+
+## Status
+VERIFIED
+
+## Description
+Comprehensive benchmark harness code exists across LinkNotMerge, LinkNotMerge.Vex, Notify, PolicyEngine, and Scanner.Analyzers modules with deterministic benchmark/reporting support.
+
+## Implementation Details
+- **LinkNotMerge Benchmark**: `src/Bench/StellaOps.Bench/LinkNotMerge/StellaOps.Bench.LinkNotMerge/` -- benchmark scenarios for linkset aggregation performance.
+- **LinkNotMerge VEX Benchmark**: `src/Bench/StellaOps.Bench/LinkNotMerge.Vex/StellaOps.Bench.LinkNotMerge.Vex/` -- VEX-specific linkset benchmarks.
+- **Notify Benchmark**: `src/Bench/StellaOps.Bench/Notify/StellaOps.Bench.Notify/` -- notification dispatch benchmarks.
+- **PolicyEngine Benchmark**: `src/Bench/StellaOps.Bench/PolicyEngine/StellaOps.Bench.PolicyEngine/` -- policy evaluation benchmarks.
+- **PolicyEngine Benchmark Policy**: `src/Bench/StellaOps.Bench/PolicyEngine/policies/benchmark-default.yaml` -- benchmark policy fixture compatible with current `StellaOps.Policy` binder schema.
+- **Scanner.Analyzers Benchmark**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- scanner analyzer benchmarks.
+- **Baseline Infrastructure**: benchmark modules include `Baseline/BaselineEntry.cs` and `Baseline/BaselineLoader.cs` for ground-truth comparison.
+- **Reporting Infrastructure**: benchmark modules include JSON and Prometheus writers for machine-consumable artifacts.
+- **Tests**: link-not-merge, vex, notify, and scanner-analyzer benchmark test suites under `src/Bench/StellaOps.Bench/**.Tests/`.
+
+## E2E Test Plan
+- [x] Run LinkNotMerge benchmark harness and verify scenario table output is generated.
+- [x] Verify JSON report output is produced and non-empty.
+- [x] Verify Prometheus metrics output is produced and non-empty.
+- [x] Verify CSV result output is produced and non-empty.
+- [x] Verify negative-path CLI behavior (`--config` missing path) exits non-zero.
+
+## Verification
+- Verified on 2026-02-11 via FLOW Tier 0/1/2 replay in `run-005`.
+- Tier 0: `docs/qa/feature-checks/runs/bench/benchmark-harness/run-005/tier0-source-check.json`
+- Tier 1: `docs/qa/feature-checks/runs/bench/benchmark-harness/run-005/tier1-build-check.json`
+- Tier 2: `docs/qa/feature-checks/runs/bench/benchmark-harness/run-005/tier2-integration-check.json`
+- Tier 2 evidence: `docs/qa/feature-checks/runs/bench/benchmark-harness/run-005/evidence/`
--- a/docs/features/checked/bench/reachability-benchmarks-with-ground-truth-datasets.md
+++ b/docs/features/checked/bench/reachability-benchmarks-with-ground-truth-datasets.md
@@ -0,0 +1,37 @@
+# Reachability benchmarks with ground-truth datasets
+
+## Module
+Bench
+
+## Status
+VERIFIED
+
+## Description
+Reachability benchmark suite with ground-truth datasets (Java Log4j, C# reachable/dead-code, native ELF), schema validation, and signal-level ground-truth validators.
+
+## Implementation Details
+- **Scanner.Analyzers Benchmark**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- benchmark runner for scanner analyzers against ground-truth datasets. Key files: `ScenarioRunners.cs` (orchestrates benchmark scenarios against corpus data), `NodeBenchMetrics.cs` (captures per-node precision/recall metrics), `BenchmarkConfig.cs` (configures which datasets and analyzers to run).
+- **Baseline Infrastructure**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs` (ground-truth entry model), `BaselineLoader.cs` (loads ground-truth datasets from fixture files).
+- **Reporting**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs` (JSON output), `BenchmarkScenarioReport.cs` (report with precision/recall/F1), `PrometheusWriter.cs` (metric export).
+- **Tests**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/BaselineLoaderTests.cs`, `BenchmarkJsonWriterTests.cs`, `BenchmarkScenarioReportTests.cs`, `PrometheusWriterTests.cs`
+
+## E2E Test Plan
+- [ ] Load a Java Log4j ground-truth dataset via `BaselineLoader` and run the scanner analyzer benchmark; verify precision and recall metrics are computed against the ground truth
+- [ ] Load a C# reachable/dead-code ground-truth dataset and verify the benchmark correctly classifies true positives, false positives, and false negatives
+- [ ] Run the benchmark with a native ELF dataset and verify the `NodeBenchMetrics` captures per-node accuracy
+- [ ] Verify JSON report output contains precision, recall, F1 score, and per-scenario timing data
+- [ ] Verify that modifying the ground-truth baseline to include additional entries causes the benchmark to report new false negatives
+- [ ] Verify Prometheus metrics export includes labeled gauges for precision and recall per dataset
+
+## Verification
+- **Verified**: 2026-02-11
+- **Method**: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral CLI benchmark replay
+- **Build**: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj)
+- **Tests**: PASS (src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers.Tests/StellaOps.Bench.ScannerAnalyzers.Tests.csproj: 15/15)
+- **Tier 0 Evidence**: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier0-source-check.json
+- **Tier 1 Evidence**: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier1-build-check.json
+- **Tier 2 Evidence**: docs/qa/feature-checks/runs/bench/reachability-benchmarks-with-ground-truth-datasets/run-002/tier2-integration-check.json
+
+## Retest Notes
+- **Initial failure (run-001)**: Tier 2 CLI execution failed because analyzer IDs in benchmark config were not instantiable by ScenarioRunnerFactory.
+- **Fix and retest (run-002)**: Added analyzer factory mappings + tests, then reran Tier 0/1/2 with fresh artifacts and passing verdict.
--- a/docs/features/checked/bench/vendor-comparison-scanner-parity-tracking.md
+++ b/docs/features/checked/bench/vendor-comparison-scanner-parity-tracking.md
@@ -0,0 +1,31 @@
+# Vendor comparison / scanner parity tracking
+
+## Module
+Bench
+
+## Status
+VERIFIED
+
+## Description
+Scanner analyzer benchmark parity tracking capabilities are present through benchmark reports and metric exports. Fresh behavioral verification confirmed parity-report fields are emitted in benchmark JSON output and CLI error semantics are enforced for invalid configuration.
+
+## What's Implemented
+- **Scanner Analyzers Benchmark**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/` -- benchmark harness evaluating analyzer scenarios and recording metrics.
+- **Baseline Loader**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineLoader.cs` -- loads baseline data for benchmark comparisons.
+- **Baseline Entry**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs` -- baseline model.
+- **Benchmark Scenario Report**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkScenarioReport.cs` -- per-scenario report model including regression and parity fields.
+- **Benchmark JSON Writer**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs` -- JSON report writer.
+- **Prometheus Writer**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/PrometheusWriter.cs` -- Prometheus metrics exporter.
+- **Vendor Parity Analyzer**: `src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/VendorParityAnalyzer.cs` -- computes vendor parity projections where vendor fixtures are available.
+
+## E2E Test Plan
+- [x] Run scanner-analyzers benchmark harness and verify JSON/Prometheus/CSV outputs are generated.
+- [x] Validate benchmark JSON output contains `vendorParity` fields in scenario reports.
+- [x] Verify baseline/regression metadata is emitted in benchmark JSON.
+- [x] Verify negative-path behavior with missing config returns non-zero exit code.
+
+## Verification
+- Verified on 2026-02-11 via FLOW Tier 0/1/2 replay in `run-001`.
+- Tier 0: `docs/qa/feature-checks/runs/bench/vendor-comparison-scanner-parity-tracking/run-001/tier0-source-check.json`
+- Tier 1: `docs/qa/feature-checks/runs/bench/vendor-comparison-scanner-parity-tracking/run-001/tier1-build-check.json`
+- Tier 2: `docs/qa/feature-checks/runs/bench/vendor-comparison-scanner-parity-tracking/run-001/tier2-integration-check.json`