- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
		
			
				
	
	
		
			49 lines
		
	
	
		
			2.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			49 lines
		
	
	
		
			2.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Scanner Analyzer Benchmarks – Operations Guide
 | ||
| 
 | ||
| ## Purpose
 | ||
| Keep the language analyzer microbench under the < 5 s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.
 | ||
| 
 | ||
| > **Grafana note:** Import `docs/modules/scanner/operations/analyzers-grafana-dashboard.json` into your Prometheus-backed Grafana stack to monitor `scanner_analyzer_bench_*` metrics and alert on regressions.
 | ||
| 
 | ||
| ## Publishing workflow
 | ||
| 1. CI (or engineers running locally) execute:
 | ||
|    ```bash
 | ||
|    dotnet run \
 | ||
|      --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \
 | ||
|      -- \
 | ||
|      --repo-root . \
 | ||
|      --out src/Bench/StellaOps.Bench/Scanner.Analyzers/baseline.csv \
 | ||
|      --json out/bench/scanner-analyzers/latest.json \
 | ||
|      --prom out/bench/scanner-analyzers/latest.prom \
 | ||
|      --commit "$(git rev-parse HEAD)" \
 | ||
|      --environment "${CI_ENVIRONMENT_NAME:-local}"
 | ||
|    ```
 | ||
| 2. Publish the artefacts (`baseline.csv`, `latest.json`, `latest.prom`) to `bench-artifacts/<date>/`.
 | ||
| 3. Promtail (or the CI job) pushes `latest.prom` into Prometheus; JSON lands in long-term storage for workbook snapshots.
 | ||
| 4. The harness exits non-zero if:
 | ||
|    - `max_ms` for any scenario breaches its configured threshold; or
 | ||
|    - `max_ms` regresses ≥ 20 % versus `baseline.csv`.
 | ||
| 
 | ||
| ## Grafana dashboard
 | ||
| - Import `docs/modules/scanner/operations/analyzers-grafana-dashboard.json`.
 | ||
| - Point the template variable `datasource` to the Prometheus instance ingesting `scanner_analyzer_bench_*` metrics.
 | ||
| - Panels:
 | ||
|   - **Max Duration (ms)** – compares live runs vs baseline.
 | ||
|   - **Regression Ratio vs Limit** – plots `(max / baseline_max - 1) * 100`.
 | ||
|   - **Breached Scenarios** – stat panel sourced from `scanner_analyzer_bench_regression_breached`.
 | ||
| 
 | ||
| ## Alerting & on-call response
 | ||
| - **Primary alert**: fire when `scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20` for 2 consecutive samples (10 min default). Suggested PromQL:
 | ||
|   ```
 | ||
|   max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20
 | ||
|   ```
 | ||
| - Suppress duplicates using the `scenario` label.
 | ||
| - Pager payload should include `scenario`, `max_ms`, `baseline_max_ms`, and `commit`.
 | ||
| - Immediate triage steps:
 | ||
|   1. Check `latest.json` artefact for the failing scenario – confirm commit and environment.
 | ||
|   2. Re-run the harness with `--captured-at` and `--baseline` pointing at the last known good CSV to verify determinism.
 | ||
|   3. If regression persists, open an incident ticket tagged `scanner-analyzer-perf` and page the owning language guild.
 | ||
|   4. Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.
 | ||
| 
 | ||
| Document the outcome in `docs/12_PERFORMANCE_WORKBOOK.md` (section 8) so trendlines reflect any accepted regressions.
 |