168 lines
6.3 KiB
Markdown
168 lines
6.3 KiB
Markdown
# 12 - Performance Workbook
|
||
|
||
*Purpose* – define **repeatable, data‑driven** benchmarks that guard Stella Ops’ core pledge:
|
||
> *“P95 vulnerability feedback in ≤ 5 seconds.”*
|
||
|
||
---
|
||
|
||
## 0 Benchmark Scope
|
||
|
||
| Area | Included | Excluded |
|
||
|------------------|----------------------------------|---------------------------|
|
||
| SBOM‑first scan | Trivy engine w/ warmed DB | Full image unpack ≥ 300 MB |
|
||
| Delta SBOM ⭑ | Missing‑layer lookup & merge | Multi‑arch images |
|
||
| Policy eval ⭑ | YAML → JSON → rule match | Rego (until GA) |
|
||
| Feed merge | NVD JSON 2023–2025 | GHSA GraphQL (plugin) |
|
||
| Quota wait‑path | 5 s soft‑wait, 60 s hard‑wait behaviour | Paid tiers (unlimited) |
|
||
| API latency | REST `/scan`, `/layers/missing` | UI SPA calls |
|
||
|
||
⭑ = new in July 2025.
|
||
|
||
---
|
||
|
||
## 1 Hardware Baseline (Reference Rig)
|
||
|
||
| Element | Spec |
|
||
|-------------|------------------------------------|
|
||
| CPU | 8 vCPU (Intel Ice‑Lake equiv.) |
|
||
| Memory | 16 GiB |
|
||
| Disk | NVMe SSD, 3 GB/s R/W |
|
||
| Network | 1 Gbit virt. switch |
|
||
| Container | Docker 25.0 + overlay2 |
|
||
| OS | Ubuntu 22.04 LTS (kernel 6.8) |
|
||
|
||
*All P95 targets assume a **single‑node** deployment on this rig unless stated.*
|
||
|
||
---
|
||
|
||
## 2 Phase Targets & Gates
|
||
|
||
| Phase (ID) | Target P95 | Gate (CI) | Rationale |
|
||
|-----------------------|-----------:|-----------|----------------------------------------|
|
||
| **SBOM_FIRST** | ≤ 5 s | `hard` | Core UX promise. |
|
||
| **IMAGE_UNPACK** | ≤ 10 s | `soft` | Fallback path for legacy flows. |
|
||
| **DELTA_SBOM** ⭑ | ≤ 1 s | `hard` | Needed to stay sub‑5 s for big bases. |
|
||
| **POLICY_EVAL** ⭑ | ≤ 50 ms | `hard` | Keeps gate latency invisible to users. |
|
||
| **QUOTA_WAIT** ⭑ | *soft* ≤ 5 s<br>*hard* ≤ 60 s | `hard` | Ensures graceful Free‑tier throttling. |
|
||
| **SCHED_RESCAN** | ≤ 30 s | `soft` | Nightly batch – not user‑facing. |
|
||
| **FEED_MERGE** | ≤ 60 s | `soft` | Off‑peak cron @ 01:00. |
|
||
| **API_P95** | ≤ 200 ms | `hard` | UI snappiness. |
|
||
|
||
*Gate* legend — `hard`: break CI if regression > 3 × target,
|
||
`soft`: raise warning & issue ticket.
|
||
|
||
---
|
||
|
||
## 3 Test Harness
|
||
|
||
* **Runner** – `perf/run.sh`, accepts `--phase` and `--samples`.
|
||
* **Metrics** – Prometheus + `jq` extracts; aggregated via `scripts/aggregate.ts`.
|
||
* **CI** – GitLab CI job *benchmark* publishes JSON to `bench‑artifacts/`.
|
||
* **Visualisation** – Grafana dashboard *Stella‑Perf* (provisioned JSON).
|
||
|
||
> **Note** – harness mounts `/var/cache/trivy` tmpfs to avoid disk noise.
|
||
|
||
---
|
||
|
||
## 4 Current Results (July 2025)
|
||
|
||
| Phase | Samples | Mean (s) | P95 (s) | Target OK? |
|
||
|---------------|--------:|---------:|--------:|-----------:|
|
||
| SBOM_FIRST | 100 | 3.7 | 4.9 | ✅ |
|
||
| IMAGE_UNPACK | 50 | 6.4 | 9.2 | ✅ |
|
||
| **DELTA_SBOM**| 100 | 0.46 | 0.83 | ✅ |
|
||
| **POLICY_EVAL** | 1 000 | 0.021 | 0.041 | ✅ |
|
||
| **QUOTA_WAIT** | 80 | 4.0* | 4.9* | ✅ |
|
||
| SCHED_RESCAN | 10 | 18.3 | 24.9 | ✅ |
|
||
| FEED_MERGE | 3 | 38.1 | 41.0 | ✅ |
|
||
| API_P95 | 20 000 | 0.087 | 0.143 | ✅ |
|
||
|
||
*Data files:* `bench-artifacts/2025‑07‑14/phase‑stats.json`.
|
||
|
||
---
|
||
|
||
## 5 Δ‑SBOM Micro‑Benchmark Detail
|
||
|
||
### 5.1 Scenario
|
||
|
||
1. Base image `python:3.12-slim` already scanned (all layers cached).
|
||
2. Application layer (`COPY . /app`) triggers new digest.
|
||
3. Santech lists **7** layers, backend replies *6 hit*, *1 miss*.
|
||
4. Builder scans **only 1 layer** (~9 MiB, 217 files) & uploads delta.
|
||
|
||
### 5.2 Key Timings
|
||
|
||
| Step | Time (ms) |
|
||
|---------------------|----------:|
|
||
| `/layers/missing` | 13 |
|
||
| Trivy single layer | 655 |
|
||
| Upload delta blob | 88 |
|
||
| Backend merge + CVE | 74 |
|
||
| **Total wall‑time** | **830 ms** |
|
||
|
||
---
|
||
|
||
## 6 Quota Wait‑Path Benchmark Detail
|
||
|
||
### 6.1 Scenario
|
||
|
||
1. Free‑tier token reaches **scan #200** – dashboard shows yellow banner.
|
||
|
||
### 6.2 Key Timings
|
||
|
||
| Step | Time (ms) |
|
||
|------------------------------------|----------:|
|
||
| `/quota/check` Redis LUA INCR | 0.8 |
|
||
| Soft wait sleep (server) | 5 000 |
|
||
| Hard wait sleep (server) | 60 000 |
|
||
| End‑to‑end wall‑time (soft‑hit) | 5 003 |
|
||
| End‑to‑end wall‑time (hard‑hit) | 60 004 |
|
||
|
||
---
|
||
## 7 Policy Eval Bench
|
||
|
||
### 7.1 Setup
|
||
|
||
* Policy YAML: **28** rules, mix severity & package conditions.
|
||
* Input: scan result JSON with **1 026** findings.
|
||
* Evaluator: custom rules engine (Go structs → map look‑ups).
|
||
|
||
### 7.2 Latency Histogram
|
||
|
||
```
|
||
0‑10 ms ▇▇▇▇▇▇▇▇▇▇ 38 %
|
||
10‑20 ms ▇▇▇▇▇▇▇▇▇▇ 42 %
|
||
20‑40 ms ▇▇▇▇▇▇ 17 %
|
||
40‑50 ms ▇ 3 %
|
||
```
|
||
|
||
P99 = 48 ms. Meets 50 ms gate.
|
||
|
||
---
|
||
|
||
## 8 Trend Snapshot
|
||
|
||

|
||
|
||
_Plot generated weekly by `scripts/update‑trend.py`; shows last 12 weeks P95 per phase._
|
||
|
||
---
|
||
|
||
## 9 Action Items
|
||
|
||
1. **Image Unpack** – Evaluate zstd for layer decompress; aim to shave 1 s.
|
||
2. **Feed Merge** – Parallelise BDU XML parse (plugin) once stable.
|
||
3. **Rego Support** – Prototype OPA side‑car; target ≤ 100 ms eval.
|
||
4. **Concurrency** – Stress‑test 100 rps on 4‑node Redis cluster (Q4‑2025).
|
||
|
||
---
|
||
|
||
## 10 Change Log
|
||
|
||
| Date | Note |
|
||
|------------|-------------------------------------------------------------------------|
|
||
| 2025‑07‑14 | Added Δ‑SBOM & Policy Eval phases; updated targets & current results. |
|
||
| 2025‑07‑12 | First public workbook (SBOM‑first, image‑unpack, feed merge). |
|
||
|
||
---
|