Files
git.stella-ops.org/docs/features/unchecked/bench/vendor-comparison-scanner-parity-tracking.md

3.1 KiB

Vendor comparison / scanner parity tracking

Module

Bench

Status

IMPLEMENTED

Description

Scanner analyzer benchmarks and golden-set diff comparisons exist, but a dedicated vendor-comparison dashboard or automated parity scoring system as described in the advisory is not visible.

What's Implemented

  • Scanner Analyzers Benchmark: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/ -- benchmark harness that evaluates scanner analyzers against ground-truth datasets, computing precision, recall, and F1 metrics per scanner.
  • Baseline Loader: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineLoader.cs -- loads ground-truth baseline data for benchmark comparison, enabling diff detection between scanner runs.
  • Baseline Entry: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Baseline/BaselineEntry.cs -- data model for a single baseline entry with expected findings, labels, and metadata.
  • Benchmark Scenario Report: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkScenarioReport.cs -- produces per-scenario benchmark reports with precision/recall/F1 breakdowns.
  • Benchmark JSON Writer: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/BenchmarkJsonWriter.cs -- serializes benchmark results to JSON for CI consumption and historical tracking.
  • Prometheus Writer: src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/Reporting/PrometheusWriter.cs -- exports benchmark metrics to Prometheus format for dashboard visualization.

What's Missing

  • Vendor Comparison Dashboard: No dedicated UI or API endpoint exists for side-by-side vendor scanner comparison. Current benchmarks evaluate StellaOps scanners against ground truth, but do not compare against third-party vendor scanner outputs.
  • Automated Parity Scoring: No automated system computes a parity score between StellaOps scanner results and vendor scanner results (e.g., Snyk, Grype, Trivy) for the same input images.
  • Vendor Result Ingestion: No ingestion pipeline exists to import vendor scanner outputs (SARIF, JSON) as baseline comparisons alongside StellaOps results.
  • Regression Tracking Dashboard: While PrometheusWriter exports metrics, no pre-built Grafana dashboard or equivalent exists for tracking scanner parity over time.

Implementation Plan

  • Add a vendor result ingestion pipeline that imports SARIF/JSON from third-party scanners and normalizes findings to a common schema
  • Extend BenchmarkScenarioReport to include vendor comparison columns (StellaOps vs. vendor findings, unique to each, overlap percentage)
  • Build an automated parity scoring system that computes agreement/disagreement rates between scanner outputs
  • Create a dashboard (Grafana or Web UI) for visualizing parity trends over time
  • Scanner benchmark infrastructure: src/Bench/StellaOps.Bench/Scanner.Analyzers/
  • Reachability benchmark datasets: src/__Tests/__Benchmarks/reachability-benchmark/