Files
git.stella-ops.org/docs/12_PERFORMANCE_WORKBOOK.md

6.3 KiB
Raw Permalink Blame History

#12 - Performance Workbook

Purpose define repeatable, datadriven benchmarks that guard StellaOps core pledge:

“P95 vulnerability feedback in ≤5seconds.”


##0Benchmark Scope

Area Included Excluded
SBOMfirst scan Trivy engine w/ warmed DB Full image unpack ≥300MB
Delta SBOM ⭑ Missinglayer lookup & merge Multiarch images
Policy eval ⭑ YAML → JSON → rule match Rego (until GA)
Feed merge NVD JSON 20232025 GHSA GraphQL (plugin)
Quota waitpath 5s softwait, 60s hardwait behaviour Paid tiers (unlimited)
API latency REST /scan, /layers/missing UI SPA calls

⭑ = new in July2025.


##1Hardware Baseline (Reference Rig)

Element Spec
CPU 8vCPU (Intel IceLake equiv.)
Memory 16GiB
Disk NVMe SSD, 3GB/s R/W
Network 1Gbit virt. switch
Container Docker 25.0 + overlay2
OS Ubuntu 22.04LTS (kernel 6.8)

All P95 targets assume a singlenode deployment on this rig unless stated.


##2Phase Targets & Gates

Phase (ID) Target P95 Gate (CI) Rationale
SBOM_FIRST 5s hard Core UX promise.
IMAGE_UNPACK 10s soft Fallback path for legacy flows.
DELTA_SBOM 1s hard Needed to stay sub5s for big bases.
POLICY_EVAL 50ms hard Keeps gate latency invisible to users.
QUOTA_WAIT soft5s
hard60s
hard Ensures graceful Freetier throttling.
SCHED_RESCAN 30s soft Nightly batch not userfacing.
FEED_MERGE 60s soft Offpeak cron @ 01:00.
API_P95 200ms hard UI snappiness.

Gate legend — hard: break CI if regression>3×target,
soft: raise warning & issue ticket.


##3Test Harness

  • Runner perf/run.sh, accepts --phase and --samples.
  • Metrics Prometheus + jq extracts; aggregated via scripts/aggregate.ts.
  • CI GitLab CI job benchmark publishes JSON to benchartifacts/.
  • Visualisation Grafana dashboard StellaPerf (provisioned JSON).

Note

harness mounts /var/cache/trivy tmpfs to avoid disk noise.


##4Current Results (July2025)

Phase Samples Mean (s) P95 (s) Target OK?
SBOM_FIRST 100 3.7 4.9
IMAGE_UNPACK 50 6.4 9.2
DELTA_SBOM 100 0.46 0.83
POLICY_EVAL 1000 0.021 0.041
QUOTA_WAIT 80 4.0* 4.9*
SCHED_RESCAN 10 18.3 24.9
FEED_MERGE 3 38.1 41.0
API_P95 20000 0.087 0.143

Data files: bench-artifacts/20250714/phasestats.json.


##5ΔSBOM MicroBenchmark Detail

### 5.1 Scenario

  1. Base image python:3.12-slim already scanned (all layers cached).
  2. Application layer (COPY . /app) triggers new digest.
  3. Santech lists 7 layers, backend replies 6 hit, 1 miss.
  4. Builder scans only 1 layer (~9MiB, 217files) & uploads delta.

### 5.2 Key Timings

Step Time (ms)
/layers/missing 13
Trivy single layer 655
Upload delta blob 88
Backend merge + CVE 74
Total walltime 830ms

##6Quota WaitPath Benchmark Detail

###6.1Scenario

  1. Freetier token reaches scan #200 dashboard shows yellow banner.

###6.2 Key Timings

Step Time (ms)
/quota/check Redis LUA INCR 0.8
Soft wait sleep (server) 5000
Hard wait sleep (server) 60000
Endtoend walltime (softhit) 5003
Endtoend walltime (hardhit) 60004

##7Policy Eval Bench

### 7.1 Setup

  • Policy YAML: 28 rules, mix severity & package conditions.
  • Input: scan result JSON with 1026 findings.
  • Evaluator: custom rules engine (Go structs → map lookups).

### 7.2 Latency Histogram

010ms  ▇▇▇▇▇▇▇▇▇▇  38%
1020ms ▇▇▇▇▇▇▇▇▇▇  42%
2040ms ▇▇▇▇▇▇     17%
4050ms ▇           3%

P99=48ms. Meets 50ms gate.


##8Trend Snapshot

Perf trend spark‑line placeholder

Plot generated weekly by scripts/updatetrend.py; shows last 12 weeks P95 per phase.


##9Action Items

  1. Image Unpack Evaluate zstd for layer decompress; aim to shave 1s.
  2. Feed Merge Parallelise BDU XML parse (plugin) once stable.
  3. Rego Support Prototype OPA sidecar; target ≤100ms eval.
  4. Concurrency Stresstest 100rps on 4node Redis cluster (Q42025).

##10Change Log

Date Note
20250714 Added ΔSBOM & Policy Eval phases; updated targets & current results.
20250712 First public workbook (SBOMfirst, imageunpack, feed merge).