Scanner Analyzer Benchmarks – Operations Guide

Purpose

Keep the language analyzer microbench under the < 5 s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.

Grafana note: Import docs/ops/scanner-analyzers-grafana-dashboard.json into your Prometheus-backed Grafana stack to monitor scanner_analyzer_bench_* metrics and alert on regressions.

Publishing workflow

CI (or engineers running locally) execute:

dotnet run \
  --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \
  -- \
  --repo-root . \
  --out src/Bench/StellaOps.Bench/Scanner.Analyzers/baseline.csv \
  --json out/bench/scanner-analyzers/latest.json \
  --prom out/bench/scanner-analyzers/latest.prom \
  --commit "$(git rev-parse HEAD)" \
  --environment "${CI_ENVIRONMENT_NAME:-local}"

Publish the artefacts (baseline.csv, latest.json, latest.prom) to bench-artifacts/<date>/.
Promtail (or the CI job) pushes latest.prom into Prometheus; JSON lands in long-term storage for workbook snapshots.
The harness exits non-zero if:
- max_ms for any scenario breaches its configured threshold; or
- max_ms regresses ≥ 20 % versus baseline.csv.

Grafana dashboard

Import docs/ops/scanner-analyzers-grafana-dashboard.json.
Point the template variable datasource to the Prometheus instance ingesting scanner_analyzer_bench_* metrics.
Panels:
- Max Duration (ms) – compares live runs vs baseline.
- Regression Ratio vs Limit – plots (max / baseline_max - 1) * 100.
- Breached Scenarios – stat panel sourced from scanner_analyzer_bench_regression_breached.

Alerting & on-call response

Primary alert: fire when scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20 for 2 consecutive samples (10 min default). Suggested PromQL:
```
max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20
```
Suppress duplicates using the scenario label.
Pager payload should include scenario, max_ms, baseline_max_ms, and commit.
Immediate triage steps:
1. Check latest.json artefact for the failing scenario – confirm commit and environment.
2. Re-run the harness with --captured-at and --baseline pointing at the last known good CSV to verify determinism.
3. If regression persists, open an incident ticket tagged scanner-analyzer-perf and page the owning language guild.
4. Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.

Document the outcome in docs/12_PERFORMANCE_WORKBOOK.md (section 8) so trendlines reflect any accepted regressions.

2.8 KiB Raw Blame History Unescape Escape

Scanner Analyzer Benchmarks – Operations Guide

Purpose

Publishing workflow

Grafana dashboard

Alerting & on-call response

2.8 KiB

Raw Blame History