- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution. - Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done. - Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
49 lines
2.8 KiB
Markdown
49 lines
2.8 KiB
Markdown
# Scanner Analyzer Benchmarks – Operations Guide
|
||
|
||
## Purpose
|
||
Keep the language analyzer microbench under the < 5 s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.
|
||
|
||
> **Grafana note:** Import `docs/ops/scanner-analyzers-grafana-dashboard.json` into your Prometheus-backed Grafana stack to monitor `scanner_analyzer_bench_*` metrics and alert on regressions.
|
||
|
||
## Publishing workflow
|
||
1. CI (or engineers running locally) execute:
|
||
```bash
|
||
dotnet run \
|
||
--project src/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \
|
||
-- \
|
||
--repo-root . \
|
||
--out src/StellaOps.Bench/Scanner.Analyzers/baseline.csv \
|
||
--json out/bench/scanner-analyzers/latest.json \
|
||
--prom out/bench/scanner-analyzers/latest.prom \
|
||
--commit "$(git rev-parse HEAD)" \
|
||
--environment "${CI_ENVIRONMENT_NAME:-local}"
|
||
```
|
||
2. Publish the artefacts (`baseline.csv`, `latest.json`, `latest.prom`) to `bench-artifacts/<date>/`.
|
||
3. Promtail (or the CI job) pushes `latest.prom` into Prometheus; JSON lands in long-term storage for workbook snapshots.
|
||
4. The harness exits non-zero if:
|
||
- `max_ms` for any scenario breaches its configured threshold; or
|
||
- `max_ms` regresses ≥ 20 % versus `baseline.csv`.
|
||
|
||
## Grafana dashboard
|
||
- Import `docs/ops/scanner-analyzers-grafana-dashboard.json`.
|
||
- Point the template variable `datasource` to the Prometheus instance ingesting `scanner_analyzer_bench_*` metrics.
|
||
- Panels:
|
||
- **Max Duration (ms)** – compares live runs vs baseline.
|
||
- **Regression Ratio vs Limit** – plots `(max / baseline_max - 1) * 100`.
|
||
- **Breached Scenarios** – stat panel sourced from `scanner_analyzer_bench_regression_breached`.
|
||
|
||
## Alerting & on-call response
|
||
- **Primary alert**: fire when `scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20` for 2 consecutive samples (10 min default). Suggested PromQL:
|
||
```
|
||
max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20
|
||
```
|
||
- Suppress duplicates using the `scenario` label.
|
||
- Pager payload should include `scenario`, `max_ms`, `baseline_max_ms`, and `commit`.
|
||
- Immediate triage steps:
|
||
1. Check `latest.json` artefact for the failing scenario – confirm commit and environment.
|
||
2. Re-run the harness with `--captured-at` and `--baseline` pointing at the last known good CSV to verify determinism.
|
||
3. If regression persists, open an incident ticket tagged `scanner-analyzer-perf` and page the owning language guild.
|
||
4. Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.
|
||
|
||
Document the outcome in `docs/12_PERFORMANCE_WORKBOOK.md` (section 8) so trendlines reflect any accepted regressions.
|