2.8 KiB
2.8 KiB
Scanner Analyzer Benchmarks – Operations Guide
Purpose
Keep the language analyzer microbench under the < 5 s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.
Grafana note: Import
docs/ops/scanner-analyzers-grafana-dashboard.jsoninto your Prometheus-backed Grafana stack to monitorscanner_analyzer_bench_*metrics and alert on regressions.
Publishing workflow
- CI (or engineers running locally) execute:
dotnet run \ --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \ -- \ --repo-root . \ --out src/Bench/StellaOps.Bench/Scanner.Analyzers/baseline.csv \ --json out/bench/scanner-analyzers/latest.json \ --prom out/bench/scanner-analyzers/latest.prom \ --commit "$(git rev-parse HEAD)" \ --environment "${CI_ENVIRONMENT_NAME:-local}" - Publish the artefacts (
baseline.csv,latest.json,latest.prom) tobench-artifacts/<date>/. - Promtail (or the CI job) pushes
latest.prominto Prometheus; JSON lands in long-term storage for workbook snapshots. - The harness exits non-zero if:
max_msfor any scenario breaches its configured threshold; ormax_msregresses ≥ 20 % versusbaseline.csv.
Grafana dashboard
- Import
docs/ops/scanner-analyzers-grafana-dashboard.json. - Point the template variable
datasourceto the Prometheus instance ingestingscanner_analyzer_bench_*metrics. - Panels:
- Max Duration (ms) – compares live runs vs baseline.
- Regression Ratio vs Limit – plots
(max / baseline_max - 1) * 100. - Breached Scenarios – stat panel sourced from
scanner_analyzer_bench_regression_breached.
Alerting & on-call response
- Primary alert: fire when
scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20for 2 consecutive samples (10 min default). Suggested PromQL:max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20 - Suppress duplicates using the
scenariolabel. - Pager payload should include
scenario,max_ms,baseline_max_ms, andcommit. - Immediate triage steps:
- Check
latest.jsonartefact for the failing scenario – confirm commit and environment. - Re-run the harness with
--captured-atand--baselinepointing at the last known good CSV to verify determinism. - If regression persists, open an incident ticket tagged
scanner-analyzer-perfand page the owning language guild. - Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.
- Check
Document the outcome in docs/12_PERFORMANCE_WORKBOOK.md (section 8) so trendlines reflect any accepted regressions.