- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution. - Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done. - Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
171 lines
7.2 KiB
Markdown
Executable File
171 lines
7.2 KiB
Markdown
Executable File
# 12 - Performance Workbook
|
||
|
||
*Purpose* – define **repeatable, data‑driven** benchmarks that guard Stella Ops’ core pledge:
|
||
> *“P95 vulnerability feedback in ≤ 5 seconds.”*
|
||
|
||
---
|
||
|
||
## 0 Benchmark Scope
|
||
|
||
| Area | Included | Excluded |
|
||
|------------------|----------------------------------|---------------------------|
|
||
| SBOM‑first scan | Trivy engine w/ warmed DB | Full image unpack ≥ 300 MB |
|
||
| Delta SBOM ⭑ | Missing‑layer lookup & merge | Multi‑arch images |
|
||
| Policy eval ⭑ | YAML → JSON → rule match | Rego (until GA) |
|
||
| Feed merge | NVD JSON 2023–2025 | GHSA GraphQL (plugin) |
|
||
| Quota wait‑path | 5 s soft‑wait, 60 s hard‑wait behaviour | Paid tiers (unlimited) |
|
||
| API latency | REST `/scan`, `/layers/missing` | UI SPA calls |
|
||
|
||
⭑ = new in July 2025.
|
||
|
||
---
|
||
|
||
## 1 Hardware Baseline (Reference Rig)
|
||
|
||
| Element | Spec |
|
||
|-------------|------------------------------------|
|
||
| CPU | 8 vCPU (Intel Ice‑Lake equiv.) |
|
||
| Memory | 16 GiB |
|
||
| Disk | NVMe SSD, 3 GB/s R/W |
|
||
| Network | 1 Gbit virt. switch |
|
||
| Container | Docker 25.0 + overlay2 |
|
||
| OS | Ubuntu 22.04 LTS (kernel 6.8) |
|
||
|
||
*All P95 targets assume a **single‑node** deployment on this rig unless stated.*
|
||
|
||
---
|
||
|
||
## 2 Phase Targets & Gates
|
||
|
||
| Phase (ID) | Target P95 | Gate (CI) | Rationale |
|
||
|-----------------------|-----------:|-----------|----------------------------------------|
|
||
| **SBOM_FIRST** | ≤ 5 s | `hard` | Core UX promise. |
|
||
| **IMAGE_UNPACK** | ≤ 10 s | `soft` | Fallback path for legacy flows. |
|
||
| **DELTA_SBOM** ⭑ | ≤ 1 s | `hard` | Needed to stay sub‑5 s for big bases. |
|
||
| **POLICY_EVAL** ⭑ | ≤ 50 ms | `hard` | Keeps gate latency invisible to users. |
|
||
| **QUOTA_WAIT** ⭑ | *soft* ≤ 5 s<br>*hard* ≤ 60 s | `hard` | Ensures graceful Free‑tier throttling. |
|
||
| **SCHED_RESCAN** | ≤ 30 s | `soft` | Nightly batch – not user‑facing. |
|
||
| **FEED_MERGE** | ≤ 60 s | `soft` | Off‑peak cron @ 01:00. |
|
||
| **API_P95** | ≤ 200 ms | `hard` | UI snappiness. |
|
||
|
||
*Gate* legend — `hard`: break CI if regression > 3 × target,
|
||
`soft`: raise warning & issue ticket.
|
||
|
||
---
|
||
|
||
## 3 Test Harness
|
||
|
||
* **Runner** – `perf/run.sh`, accepts `--phase` and `--samples`.
|
||
* **Language analyzers microbench** – `dotnet run --project src/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --out src/StellaOps.Bench/Scanner.Analyzers/baseline.csv --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom --commit $(git rev-parse HEAD)` produces CSV + JSON + Prometheus gauges for analyzer scenarios. Runs fail if `max_ms` regresses ≥ 20 % against `baseline.csv` or if thresholds are exceeded.
|
||
* **Metrics** – Prometheus + `jq` extracts; aggregated via `scripts/aggregate.ts`.
|
||
* **CI** – GitLab CI job *benchmark* publishes JSON to `bench‑artifacts/`.
|
||
* **Visualisation** – Grafana dashboard *Stella‑Perf* (provisioned JSON).
|
||
|
||
> **Note** – harness mounts `/var/cache/trivy` tmpfs to avoid disk noise.
|
||
|
||
---
|
||
|
||
## 4 Current Results (July 2025)
|
||
|
||
| Phase | Samples | Mean (s) | P95 (s) | Target OK? |
|
||
|---------------|--------:|---------:|--------:|-----------:|
|
||
| SBOM_FIRST | 100 | 3.7 | 4.9 | ✅ |
|
||
| IMAGE_UNPACK | 50 | 6.4 | 9.2 | ✅ |
|
||
| **DELTA_SBOM**| 100 | 0.46 | 0.83 | ✅ |
|
||
| **POLICY_EVAL** | 1 000 | 0.021 | 0.041 | ✅ |
|
||
| **QUOTA_WAIT** | 80 | 4.0* | 4.9* | ✅ |
|
||
| SCHED_RESCAN | 10 | 18.3 | 24.9 | ✅ |
|
||
| FEED_MERGE | 3 | 38.1 | 41.0 | ✅ |
|
||
| API_P95 | 20 000 | 0.087 | 0.143 | ✅ |
|
||
|
||
*Data files:* `bench-artifacts/2025‑07‑14/phase‑stats.json`.
|
||
|
||
---
|
||
|
||
## 5 Δ‑SBOM Micro‑Benchmark Detail
|
||
|
||
### 5.1 Scenario
|
||
|
||
1. Base image `python:3.12-slim` already scanned (all layers cached).
|
||
2. Application layer (`COPY . /app`) triggers new digest.
|
||
3. `Stella CLI` lists **7** layers, backend replies *6 hit*, *1 miss*.
|
||
4. Builder scans **only 1 layer** (~9 MiB, 217 files) & uploads delta.
|
||
|
||
### 5.2 Key Timings
|
||
|
||
| Step | Time (ms) |
|
||
|---------------------|----------:|
|
||
| `/layers/missing` | 13 |
|
||
| Trivy single layer | 655 |
|
||
| Upload delta blob | 88 |
|
||
| Backend merge + CVE | 74 |
|
||
| **Total wall‑time** | **830 ms** |
|
||
|
||
---
|
||
|
||
## 6 Quota Wait‑Path Benchmark Detail
|
||
|
||
### 6.1 Scenario
|
||
|
||
1. Free‑tier token reaches **scan #200** – dashboard shows yellow banner.
|
||
|
||
### 6.2 Key Timings
|
||
|
||
| Step | Time (ms) |
|
||
|------------------------------------|----------:|
|
||
| `/quota/check` Redis LUA INCR | 0.8 |
|
||
| Soft wait sleep (server) | 5 000 |
|
||
| Hard wait sleep (server) | 60 000 |
|
||
| End‑to‑end wall‑time (soft‑hit) | 5 003 |
|
||
| End‑to‑end wall‑time (hard‑hit) | 60 004 |
|
||
|
||
---
|
||
## 7 Policy Eval Bench
|
||
|
||
### 7.1 Setup
|
||
|
||
* Policy YAML: **28** rules, mix severity & package conditions.
|
||
* Input: scan result JSON with **1 026** findings.
|
||
* Evaluator: custom rules engine (Go structs → map look‑ups).
|
||
|
||
### 7.2 Latency Histogram
|
||
|
||
```
|
||
0‑10 ms ▇▇▇▇▇▇▇▇▇▇ 38 %
|
||
10‑20 ms ▇▇▇▇▇▇▇▇▇▇ 42 %
|
||
20‑40 ms ▇▇▇▇▇▇ 17 %
|
||
40‑50 ms ▇ 3 %
|
||
```
|
||
|
||
P99 = 48 ms. Meets 50 ms gate.
|
||
|
||
---
|
||
|
||
## 8 Trend Snapshot
|
||
|
||

|
||
|
||
> **Grafana/Alerting** – Import `docs/ops/scanner-analyzers-grafana-dashboard.json` and point it at the Prometheus datasource storing `scanner_analyzer_bench_*` metrics. Configure an alert on `scanner_analyzer_bench_regression_ratio` ≥ 1.20 (default limit); the bundled Stat panel surfaces breached scenarios (non-zero values). On-call runbook: `docs/ops/scanner-analyzers-operations.md`.
|
||
|
||
_Plot generated weekly by `scripts/update‑trend.py`; shows last 12 weeks P95 per phase._
|
||
|
||
---
|
||
|
||
## 9 Action Items
|
||
|
||
1. **Image Unpack** – Evaluate zstd for layer decompress; aim to shave 1 s.
|
||
2. **Feed Merge** – Parallelise regional XML feed parse (plugin) once stable.
|
||
3. **Rego Support** – Prototype OPA side‑car; target ≤ 100 ms eval.
|
||
4. **Concurrency** – Stress‑test 100 rps on 4‑node Redis cluster (Q4‑2025).
|
||
|
||
---
|
||
|
||
## 10 Change Log
|
||
|
||
| Date | Note |
|
||
|------------|-------------------------------------------------------------------------|
|
||
| 2025‑07‑14 | Added Δ‑SBOM & Policy Eval phases; updated targets & current results. |
|
||
| 2025‑07‑12 | First public workbook (SBOM‑first, image‑unpack, feed merge). |
|
||
|
||
---
|