git.stella-ops.org/12_PERFORMANCE_WORKBOOK.md at 791e12baab550a97c05723b514496a57a6619e44

Files

Docs CI / lint-and-preview (push) Has been cancelled

Details

Add tests and implement StubBearer authentication for Signer endpoints

- Created SignerEndpointsTests to validate the SignDsse and VerifyReferrers endpoints.
- Implemented StubBearerAuthenticationDefaults and StubBearerAuthenticationHandler for token-based authentication.
- Developed ConcelierExporterClient for managing Trivy DB settings and export operations.
- Added TrivyDbSettingsPageComponent for UI interactions with Trivy DB settings, including form handling and export triggering.
- Implemented styles and HTML structure for Trivy DB settings page.
- Created NotifySmokeCheck tool for validating Redis event streams and Notify deliveries.

2025-10-21 09:37:07 +03:00

6.8 KiB

Executable File

Raw Blame History

# 12 - Performance Workbook

Purpose – define repeatable, data‑driven benchmarks that guard Stella Ops’ core pledge:

“P95 vulnerability feedback in ≤ 5 seconds.”

## 0 Benchmark Scope

Area	Included	Excluded
SBOM‑first scan	Trivy engine w/ warmed DB	Full image unpack ≥ 300 MB
Delta SBOM ⭑	Missing‑layer lookup & merge	Multi‑arch images
Policy eval ⭑	YAML → JSON → rule match	Rego (until GA)
Feed merge	NVD JSON 2023–2025	GHSA GraphQL (plugin)
Quota wait‑path	5 s soft‑wait, 60 s hard‑wait behaviour	Paid tiers (unlimited)
API latency	REST `/scan`, `/layers/missing`	UI SPA calls

⭑ = new in July 2025.

## 1 Hardware Baseline (Reference Rig)

Element	Spec
CPU	8 vCPU (Intel Ice‑Lake equiv.)
Memory	16 GiB
Disk	NVMe SSD, 3 GB/s R/W
Network	1 Gbit virt. switch
Container	Docker 25.0 + overlay2
OS	Ubuntu 22.04 LTS (kernel 6.8)

All P95 targets assume a single‑node deployment on this rig unless stated.

## 2 Phase Targets & Gates

Phase (ID)	Target P95	Gate (CI)	Rationale
SBOM_FIRST	≤ 5 s	`hard`	Core UX promise.
IMAGE_UNPACK	≤ 10 s	`soft`	Fallback path for legacy flows.
DELTA_SBOM ⭑	≤ 1 s	`hard`	Needed to stay sub‑5 s for big bases.
POLICY_EVAL ⭑	≤ 50 ms	`hard`	Keeps gate latency invisible to users.
QUOTA_WAIT ⭑	soft ≤ 5 s hard ≤ 60 s	`hard`	Ensures graceful Free‑tier throttling.
SCHED_RESCAN	≤ 30 s	`soft`	Nightly batch – not user‑facing.
FEED_MERGE	≤ 60 s	`soft`	Off‑peak cron @ 01:00.
API_P95	≤ 200 ms	`hard`	UI snappiness.

Gate legend — hard: break CI if regression > 3 × target,
soft: raise warning & issue ticket.

## 3 Test Harness

Runner – perf/run.sh, accepts --phase and --samples.
Language analyzers microbench – dotnet run --project bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --out bench/Scanner.Analyzers/baseline.csv produces deterministic CSVs for analyzer scenarios (Node today, others as they land).
Metrics – Prometheus + jq extracts; aggregated via scripts/aggregate.ts.
CI – GitLab CI job benchmark publishes JSON to bench‑artifacts/.
Visualisation – Grafana dashboard Stella‑Perf (provisioned JSON).

Note

– harness mounts /var/cache/trivy tmpfs to avoid disk noise.

## 4 Current Results (July 2025)

Phase	Samples	Mean (s)	P95 (s)	Target OK?
SBOM_FIRST	100	3.7	4.9	✅
IMAGE_UNPACK	50	6.4	9.2	✅
DELTA_SBOM	100	0.46	0.83	✅
POLICY_EVAL	1 000	0.021	0.041	✅
QUOTA_WAIT	80	4.0*	4.9*	✅
SCHED_RESCAN	10	18.3	24.9	✅
FEED_MERGE	3	38.1	41.0	✅
API_P95	20 000	0.087	0.143	✅

Data files: bench-artifacts/2025‑07‑14/phase‑stats.json.

## 5 Δ‑SBOM Micro‑Benchmark Detail

### 5.1 Scenario

Base image python:3.12-slim already scanned (all layers cached).
Application layer (COPY . /app) triggers new digest.
Stella CLI lists 7 layers, backend replies 6 hit, 1 miss.
Builder scans only 1 layer (~9 MiB, 217 files) & uploads delta.

### 5.2 Key Timings

Step	Time (ms)
`/layers/missing`	13
Trivy single layer	655
Upload delta blob	88
Backend merge + CVE	74
Total wall‑time	830 ms

## 6 Quota Wait‑Path Benchmark Detail

### 6.1 Scenario

Free‑tier token reaches scan #200 – dashboard shows yellow banner.

### 6.2 Key Timings

Step	Time (ms)
`/quota/check` Redis LUA INCR	0.8
Soft wait sleep (server)	5 000
Hard wait sleep (server)	60 000
End‑to‑end wall‑time (soft‑hit)	5 003
End‑to‑end wall‑time (hard‑hit)	60 004

## 7 Policy Eval Bench

### 7.1 Setup

Policy YAML: 28 rules, mix severity & package conditions.
Input: scan result JSON with 1 026 findings.
Evaluator: custom rules engine (Go structs → map look‑ups).

### 7.2 Latency Histogram

0‑10 ms  ▇▇▇▇▇▇▇▇▇▇  38 %
10‑20 ms ▇▇▇▇▇▇▇▇▇▇  42 %
20‑40 ms ▇▇▇▇▇▇     17 %
40‑50 ms ▇           3 %

P99 = 48 ms. Meets 50 ms gate.

## 8 Trend Snapshot

Plot generated weekly by scripts/update‑trend.py; shows last 12 weeks P95 per phase.

## 9 Action Items

Image Unpack – Evaluate zstd for layer decompress; aim to shave 1 s.
Feed Merge – Parallelise regional XML feed parse (plugin) once stable.
Rego Support – Prototype OPA side‑car; target ≤ 100 ms eval.
Concurrency – Stress‑test 100 rps on 4‑node Redis cluster (Q4‑2025).

## 10 Change Log

Date	Note
2025‑07‑14	Added Δ‑SBOM & Policy Eval phases; updated targets & current results.
2025‑07‑12	First public workbook (SBOM‑first, image‑unpack, feed merge).

6.8 KiB Executable File Raw Blame History Unescape Escape

6.8 KiB

Executable File

Raw Blame History