Files
git.stella-ops.org/docs/modules/scanner/operations/analyzers.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

2.8 KiB
Raw Blame History

Scanner Analyzer Benchmarks Operations Guide

Purpose

Keep the language analyzer microbench under the <5s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.

Grafana note: Import docs/modules/scanner/operations/analyzers-grafana-dashboard.json into your Prometheus-backed Grafana stack to monitor scanner_analyzer_bench_* metrics and alert on regressions.

Publishing workflow

  1. CI (or engineers running locally) execute:
    dotnet run \
      --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \
      -- \
      --repo-root . \
      --out src/Bench/StellaOps.Bench/Scanner.Analyzers/baseline.csv \
      --json out/bench/scanner-analyzers/latest.json \
      --prom out/bench/scanner-analyzers/latest.prom \
      --commit "$(git rev-parse HEAD)" \
      --environment "${CI_ENVIRONMENT_NAME:-local}"
    
  2. Publish the artefacts (baseline.csv, latest.json, latest.prom) to bench-artifacts/<date>/.
  3. Promtail (or the CI job) pushes latest.prom into Prometheus; JSON lands in long-term storage for workbook snapshots.
  4. The harness exits non-zero if:
    • max_ms for any scenario breaches its configured threshold; or
    • max_ms regresses ≥20% versus baseline.csv.

Grafana dashboard

  • Import docs/modules/scanner/operations/analyzers-grafana-dashboard.json.
  • Point the template variable datasource to the Prometheus instance ingesting scanner_analyzer_bench_* metrics.
  • Panels:
    • Max Duration (ms) compares live runs vs baseline.
    • Regression Ratio vs Limit plots (max / baseline_max - 1) * 100.
    • Breached Scenarios stat panel sourced from scanner_analyzer_bench_regression_breached.

Alerting & on-call response

  • Primary alert: fire when scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20 for 2 consecutive samples (10min default). Suggested PromQL:
    max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20
    
  • Suppress duplicates using the scenario label.
  • Pager payload should include scenario, max_ms, baseline_max_ms, and commit.
  • Immediate triage steps:
    1. Check latest.json artefact for the failing scenario confirm commit and environment.
    2. Re-run the harness with --captured-at and --baseline pointing at the last known good CSV to verify determinism.
    3. If regression persists, open an incident ticket tagged scanner-analyzer-perf and page the owning language guild.
    4. Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.

Document the outcome in docs/12_PERFORMANCE_WORKBOOK.md (section 8) so trendlines reflect any accepted regressions.