Files
git.stella-ops.org/docs/12_PERFORMANCE_WORKBOOK.md
master 651b8e0fa3 feat: Add new projects to solution and implement contract testing documentation
- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution.
- Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done.
- Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
2025-10-27 07:57:55 +02:00

171 lines
7.2 KiB
Markdown
Executable File
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#12 - Performance Workbook
*Purpose* define **repeatable, datadriven** benchmarks that guard StellaOps core pledge:
> *“P95 vulnerability feedback in ≤5seconds.”*
---
##0Benchmark Scope
| Area | Included | Excluded |
|------------------|----------------------------------|---------------------------|
| SBOMfirst scan | Trivy engine w/ warmed DB | Full image unpack ≥300MB |
| Delta SBOM ⭑ | Missinglayer lookup & merge | Multiarch images |
| Policy eval ⭑ | YAML → JSON → rule match | Rego (until GA) |
| Feed merge | NVD JSON 20232025 | GHSA GraphQL (plugin) |
| Quota waitpath | 5s softwait, 60s hardwait behaviour | Paid tiers (unlimited) |
| API latency | REST `/scan`, `/layers/missing` | UI SPA calls |
⭑ = new in July2025.
---
##1Hardware Baseline (Reference Rig)
| Element | Spec |
|-------------|------------------------------------|
| CPU | 8vCPU (Intel IceLake equiv.) |
| Memory | 16GiB |
| Disk | NVMe SSD, 3GB/s R/W |
| Network | 1Gbit virt. switch |
| Container | Docker 25.0 + overlay2 |
| OS | Ubuntu 22.04LTS (kernel 6.8) |
*All P95 targets assume a **singlenode** deployment on this rig unless stated.*
---
##2Phase Targets & Gates
| Phase (ID) | Target P95 | Gate (CI) | Rationale |
|-----------------------|-----------:|-----------|----------------------------------------|
| **SBOM_FIRST** | ≤5s | `hard` | Core UX promise. |
| **IMAGE_UNPACK** | ≤10s | `soft` | Fallback path for legacy flows. |
| **DELTA_SBOM** ⭑ | ≤1s | `hard` | Needed to stay sub5s for big bases. |
| **POLICY_EVAL** ⭑ | ≤50ms | `hard` | Keeps gate latency invisible to users. |
| **QUOTA_WAIT** ⭑ | *soft*5s<br>*hard*60s | `hard` | Ensures graceful Freetier throttling. |
| **SCHED_RESCAN** | ≤30s | `soft` | Nightly batch not userfacing. |
| **FEED_MERGE** | ≤60s | `soft` | Offpeak cron @ 01:00. |
| **API_P95** | ≤200ms | `hard` | UI snappiness. |
*Gate* legend `hard`: break CI if regression>3×target,
`soft`: raise warning & issue ticket.
---
##3Test Harness
* **Runner** `perf/run.sh`, accepts `--phase` and `--samples`.
* **Language analyzers microbench** `dotnet run --project src/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --out src/StellaOps.Bench/Scanner.Analyzers/baseline.csv --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom --commit $(git rev-parse HEAD)` produces CSV + JSON + Prometheus gauges for analyzer scenarios. Runs fail if `max_ms` regresses ≥20% against `baseline.csv` or if thresholds are exceeded.
* **Metrics** Prometheus + `jq` extracts; aggregated via `scripts/aggregate.ts`.
* **CI** GitLab CI job *benchmark* publishes JSON to `benchartifacts/`.
* **Visualisation** Grafana dashboard *StellaPerf* (provisioned JSON).
> **Note** harness mounts `/var/cache/trivy` tmpfs to avoid disk noise.
---
##4Current Results (July2025)
| Phase | Samples | Mean (s) | P95 (s) | Target OK? |
|---------------|--------:|---------:|--------:|-----------:|
| SBOM_FIRST | 100 | 3.7 | 4.9 | ✅ |
| IMAGE_UNPACK | 50 | 6.4 | 9.2 | ✅ |
| **DELTA_SBOM**| 100 | 0.46 | 0.83 | ✅ |
| **POLICY_EVAL** | 1000 | 0.021 | 0.041 | ✅ |
| **QUOTA_WAIT** | 80 | 4.0* | 4.9* | ✅ |
| SCHED_RESCAN | 10 | 18.3 | 24.9 | ✅ |
| FEED_MERGE | 3 | 38.1 | 41.0 | ✅ |
| API_P95 | 20000 | 0.087 | 0.143 | ✅ |
*Data files:* `bench-artifacts/20250714/phasestats.json`.
---
##5ΔSBOM MicroBenchmark Detail
### 5.1 Scenario
1. Base image `python:3.12-slim` already scanned (all layers cached).
2. Application layer (`COPY . /app`) triggers new digest.
3. `Stella CLI` lists **7** layers, backend replies *6 hit*, *1 miss*.
4. Builder scans **only 1 layer** (~9MiB, 217files) & uploads delta.
### 5.2 Key Timings
| Step | Time (ms) |
|---------------------|----------:|
| `/layers/missing` | 13 |
| Trivy single layer | 655 |
| Upload delta blob | 88 |
| Backend merge + CVE | 74 |
| **Total walltime** | **830ms** |
---
##6Quota WaitPath Benchmark Detail
###6.1Scenario
1. Freetier token reaches **scan #200** dashboard shows yellow banner.
###6.2 Key Timings
| Step | Time (ms) |
|------------------------------------|----------:|
| `/quota/check` Redis LUA INCR | 0.8 |
| Soft wait sleep (server) | 5000 |
| Hard wait sleep (server) | 60000 |
| Endtoend walltime (softhit) | 5003 |
| Endtoend walltime (hardhit) | 60004 |
---
##7Policy Eval Bench
### 7.1 Setup
* Policy YAML: **28** rules, mix severity & package conditions.
* Input: scan result JSON with **1026** findings.
* Evaluator: custom rules engine (Go structs → map lookups).
### 7.2 Latency Histogram
```
010ms ▇▇▇▇▇▇▇▇▇▇ 38%
1020ms ▇▇▇▇▇▇▇▇▇▇ 42%
2040ms ▇▇▇▇▇▇ 17%
4050ms ▇ 3%
```
P99=48ms. Meets 50ms gate.
---
##8Trend Snapshot
![Perf trend sparkline placeholder](perftrend.svg)
> **Grafana/Alerting** Import `docs/ops/scanner-analyzers-grafana-dashboard.json` and point it at the Prometheus datasource storing `scanner_analyzer_bench_*` metrics. Configure an alert on `scanner_analyzer_bench_regression_ratio` ≥1.20 (default limit); the bundled Stat panel surfaces breached scenarios (non-zero values). On-call runbook: `docs/ops/scanner-analyzers-operations.md`.
_Plot generated weekly by `scripts/updatetrend.py`; shows last 12 weeks P95 per phase._
---
##9Action Items
1. **Image Unpack** Evaluate zstd for layer decompress; aim to shave 1s.
2. **Feed Merge** Parallelise regional XML feed parse (plugin) once stable.
3. **Rego Support** Prototype OPA sidecar; target ≤100ms eval.
4. **Concurrency** Stresstest 100rps on 4node Redis cluster (Q42025).
---
##10Change Log
| Date | Note |
|------------|-------------------------------------------------------------------------|
| 20250714 | Added ΔSBOM & Policy Eval phases; updated targets & current results. |
| 20250712 | First public workbook (SBOMfirst, imageunpack, feed merge). |
---