19 KiB
Performance Testing Pipeline for Queue-Based Workflows
Note
: This document was originally created as part of advisory analysis. It provides a comprehensive playbook for HTTP → Valkey → Worker performance testing.
What we're measuring (plain English)
- TTFB/TTFS (HTTP): time the gateway spends accepting the request + queuing the job.
- Valkey latency: enqueue (
LPUSH/XADD), pop/claim (BRPOP/XREADGROUP), and round-trip. - Worker service time: time to pick up, process, and ack.
- Queueing delay: time spent waiting in the queue (arrival → start of worker).
These four add up to the "hop latency" users feel when the system is under load.
Minimal tracing you can add today
Emit these IDs/headers end-to-end:
x-stella-corr-id(uuid)x-stella-enq-ts(gateway enqueue ts, ns)x-stella-claim-ts(worker claim ts, ns)x-stella-done-ts(worker done ts, ns)
From these, compute:
queue_delay = claim_ts - enq_tsservice_time = done_ts - claim_tshttp_ttfs = gateway_first_byte_ts - http_request_start_tshop_latency = done_ts - enq_ts(or return-path if synchronous)
Clock-sync tip: use monotonic clocks in code and convert to ns; don't mix wall-clock.
Valkey commands (safe, BSD Valkey)
Use Valkey Streams + Consumer Groups for fairness and metrics:
- Enqueue:
XADD jobs * corr-id <uuid> enq-ts <ns> payload <...> - Claim:
XREADGROUP GROUP workers w1 COUNT 1 BLOCK 1000 STREAMS jobs > - Ack:
XACK jobs workers <id>
Add a small Lua for timestamping at enqueue (atomic):
-- KEYS[1]=stream
-- ARGV[1]=enq_ts_ns, ARGV[2]=corr_id, ARGV[3]=payload
return redis.call('XADD', KEYS[1], '*',
'corr', ARGV[2], 'enq', ARGV[1], 'p', ARGV[3])
Load shapes to test (find the envelope)
- Open-loop (arrival-rate controlled): 50 → 10k req/min in steps; constant rate per step. Reveals queueing onset.
- Burst: 0 → N in short spikes (e.g., 5k in 10s) to see saturation and drain time.
- Step-up/down: double every 2 min until SLO breach; then halve down.
- Long tail soak: run at 70–80% of max for 1h; watch p95-p99.9 drift.
Target outputs per step: p50/p90/p95/p99 for queue_delay, service_time, hop_latency, plus throughput and error rate.
k6 script (HTTP client pressure)
// save as hop-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
scenarios: {
step_load: {
executor: 'ramping-arrival-rate',
startRate: 20, timeUnit: '1s',
preAllocatedVUs: 200, maxVUs: 5000,
stages: [
{ target: 50, duration: '1m' },
{ target: 100, duration: '1m' },
{ target: 200, duration: '1m' },
{ target: 400, duration: '1m' },
{ target: 800, duration: '1m' },
],
},
},
thresholds: {
'http_req_failed': ['rate<0.01'],
'http_req_duration{phase:hop}': ['p(95)<500'],
},
};
export default function () {
const corr = crypto.randomUUID();
const res = http.post(
__ENV.GW_URL,
JSON.stringify({ data: 'ping', corr }),
{
headers: { 'Content-Type': 'application/json', 'x-stella-corr-id': corr },
tags: { phase: 'hop' },
}
);
check(res, { 'status 2xx/202': r => r.status === 200 || r.status === 202 });
sleep(0.01);
}
Run: GW_URL=https://gateway.example/hop k6 run hop-test.js
Worker hooks (.NET 10 sketch)
// At claim
var now = Stopwatch.GetTimestamp(); // monotonic
var claimNs = now.ToNanoseconds();
log.AddTag("x-stella-claim-ts", claimNs);
// After processing
var doneNs = Stopwatch.GetTimestamp().ToNanoseconds();
log.AddTag("x-stella-done-ts", doneNs);
// Include corr-id and stream entry id in logs/metrics
Helper:
public static class MonoTime {
static readonly double _nsPerTick = 1_000_000_000d / Stopwatch.Frequency;
public static long ToNanoseconds(this long ticks) => (long)(ticks * _nsPerTick);
}
Prometheus metrics to expose
valkey_enqueue_ns(histogram)valkey_claim_block_ms(gauge)worker_service_ns(histogram, labels: worker_type, route)queue_depth(gauge viaXLENorXINFO STREAM)enqueue_rate,dequeue_rate(counters)
Example recording rules:
- record: hop:queue_delay_p95
expr: histogram_quantile(0.95, sum(rate(valkey_enqueue_ns_bucket[1m])) by (le))
- record: hop:service_time_p95
expr: histogram_quantile(0.95, sum(rate(worker_service_ns_bucket[1m])) by (le))
- record: hop:latency_budget_p95
expr: hop:queue_delay_p95 + hop:service_time_p95
Autoscaling signals (HPA/KEDA friendly)
- Primary: queue depth & its derivative (d/dt).
- Secondary: p95
queue_delayand worker CPU. - Safety: max in-flight per worker; backpressure HTTP 429 when
queue_depth > Dorp95_queue_delay > SLO*0.8.
Plot the "envelope" (what you'll look at)
- X-axis: offered load (req/s).
- Y-axis: p95 hop latency (ms).
- Overlay: p99 (dashed), SLO line (e.g., 500 ms), and capacity knee (where p95 sharply rises).
- Add secondary panel: queue depth vs load.
Performance Test Guidelines
HTTP → Valkey → Worker pipeline
1) Objectives and scope
Primary objectives
Your performance tests MUST answer these questions with evidence:
- Capacity knee: At what offered load does queue delay start growing sharply?
- User-impact envelope: What are p50/p95/p99 hop latency curves vs offered load?
- Decomposition: How much of hop latency is:
- gateway enqueue time
- Valkey enqueue/claim RTT
- queue wait time
- worker service time
- Scaling behavior: How do these change with worker replica counts (N workers)?
- Stability: Under sustained load, do latencies drift (GC, memory, fragmentation, background jobs)?
Non-goals (explicitly out of scope unless you add them later)
- Micro-optimizing single function runtime
- Synthetic "max QPS" records without a representative payload
- Tests that don't collect segment metrics (end-to-end only) for anything beyond basic smoke
2) Definitions and required metrics
Required latency definitions (standardize these names)
Agents MUST compute and report these per request/job:
t_http_accept: time from client send → gateway accepts requestt_enqueue: time spent in gateway to enqueue into Valkey (server-side)t_valkey_rtt_enq: client-observed RTT for enqueue command(s)t_queue_delay:claim_ts - enq_tst_service:done_ts - claim_tst_hop:done_ts - enq_ts(this is the "true pipeline hop" latency)- Optional but recommended:
t_ack: time to ack completion (Valkey ack RTT)t_http_response: request start → gateway response sent (TTFB/TTFS)
Required percentiles and aggregations
Per scenario step (e.g., each offered load plateau), agents MUST output:
- p50 / p90 / p95 / p99 / p99.9 for:
t_hop,t_queue_delay,t_service,t_enqueue - Throughput: offered rps and achieved rps
- Error rate: HTTP failures, enqueue failures, worker failures
- Queue depth and backlog drain time
Required system-level telemetry (minimum)
Agents MUST collect these time series during tests:
- Worker: CPU, memory, GC pauses (if .NET), threadpool saturation indicators
- Valkey: ops/sec, connected clients, blocked clients, memory used, evictions, slowlog count
- Gateway: CPU/mem, request rate, response codes, request duration histogram
3) Environment and test hygiene requirements
Environment requirements
Agents SHOULD run tests in an environment that matches production in:
- container CPU/memory limits
- number of nodes, network topology
- Valkey topology (single, cluster, sentinel, etc.)
- worker replica autoscaling rules (or deliberately disabled)
If exact parity isn't possible, agents MUST record all known differences in the report.
Test hygiene (non-negotiable)
Agents MUST:
- Start from empty queues (no backlog).
- Disable client retries (or explicitly run two variants: retries off / retries on).
- Warm up before measuring (e.g., 60s warm-up minimum).
- Hold steady plateaus long enough to stabilize (usually 2–5 minutes per step).
- Cool down and verify backlog drains (queue depth returns to baseline).
- Record exact versions/SHAs of gateway/worker and Valkey config.
Load generator hygiene
Agents MUST ensure the load generator is not the bottleneck:
- CPU < ~70% during test
- no local socket exhaustion
- enough VUs/connections
- if needed, distributed load generation
4) Instrumentation spec (agents implement this first)
Correlation and timestamps
Agents MUST propagate an end-to-end correlation ID and timestamps.
Required fields
corr_id(UUID)enq_ts_ns(set at enqueue, monotonic or consistent clock)claim_ts_ns(set by worker when job is claimed)done_ts_ns(set by worker when job processing ends)
Where these live
- HTTP request header:
x-corr-id: <uuid> - Valkey job payload fields:
corr,enq, and optionally payload size/type - Worker logs/metrics: include
corr_id, job id,claim_ts_ns,done_ts_ns
Clock requirements
Agents MUST use a consistent timing source:
- Prefer monotonic timers for durations (Stopwatch / monotonic clock)
- If timestamps cross machines, ensure they're comparable:
- either rely on synchronized clocks (NTP) and monitor drift
- or compute durations using monotonic tick deltas within the same host and transmit durations (less ideal for queue delay)
Practical recommendation: use wall-clock ns for cross-host timestamps with NTP + drift checks, and also record per-host monotonic durations for sanity.
Valkey queue semantics (recommended)
Agents SHOULD use Streams + Consumer Groups for stable claim semantics and good observability:
- Enqueue:
XADD jobs * corr <uuid> enq <ns> payload <...> - Claim:
XREADGROUP GROUP workers <consumer> COUNT 1 BLOCK 1000 STREAMS jobs > - Ack:
XACK jobs workers <id>
Agents MUST record stream length (XLEN) or consumer group lag (XINFO GROUPS) as queue depth/lag.
Metrics exposure
Agents MUST publish Prometheus (or equivalent) histograms:
gateway_enqueue_seconds(or ns) histogramvalkey_enqueue_rtt_secondshistogramworker_service_secondshistogramqueue_delay_secondshistogram (derived from timestamps; can be computed in worker or offline)hop_latency_secondshistogram
5) Workload modeling and test data
Agents MUST define a workload model before running capacity tests:
- Endpoint(s): list exact gateway routes under test
- Payload types: small/typical/large
- Mix: e.g., 70/25/5 by payload size
- Idempotency rules: ensure repeated jobs don't corrupt state
- Data reset strategy: how test data is cleaned or isolated per run
Agents SHOULD test at least:
- Typical payload (p50)
- Large payload (p95)
- Worst-case allowed payload (bounded by your API limits)
6) Scenario suite your agents MUST implement
Each scenario MUST be defined as code/config (not manual).
Scenario A — Smoke (fast sanity)
Goal: verify instrumentation + basic correctness Load: low (e.g., 1–5 rps), 2 minutes Pass:
- 0 backlog after run
- error rate < 0.1%
- metrics present for all segments
Scenario B — Baseline (repeatable reference point)
Goal: establish a stable baseline for regression tracking Load: fixed moderate load (e.g., 30–50% of expected capacity), 10 minutes Pass:
- p95
t_hopwithin baseline ± tolerance (set after first runs) - no upward drift in p95 across time (trend line ~flat)
Scenario C — Capacity ramp (open-loop)
Goal: find the knee where queueing begins Method: open-loop arrival-rate ramp with plateaus Example stages (edit to fit your system):
- 50 rps for 2m
- 100 rps for 2m
- 200 rps for 2m
- 400 rps for 2m
- … until SLO breach or errors spike
MUST:
- warm-up stage before first plateau
- record per-plateau summary
Stop conditions (any triggers stop):
- error rate > 1%
- queue depth grows without bound over an entire plateau
- p95
t_hopexceeds SLO for 2 consecutive plateaus
Scenario D — Stress (push past capacity)
Goal: characterize failure mode and recovery Load: 120–200% of knee load, 5–10 minutes Pass (for resilience):
- system does not crash permanently
- once load stops, backlog drains within target time (define it)
Scenario E — Burst / spike
Goal: see how quickly queue grows and drains Load shape:
- baseline low load
- sudden burst (e.g., 10× for 10–30s)
- return to baseline
Report:
- peak queue depth
- time to drain to baseline
- p99
t_hopduring burst
Scenario F — Soak (long-running)
Goal: detect drift (leaks, fragmentation, GC patterns) Load: 70–85% of knee, 60–180 minutes Pass:
- p95 does not trend upward beyond threshold
- memory remains bounded
- no rising error rate
Scenario G — Scaling curve (worker replica sweep)
Goal: turn results into scaling rules Method:
- Repeat Scenario C with worker replicas = 1, 2, 4, 8…
Deliverable:
- plot of knee load vs worker count
- p95
t_servicevs worker count (should remain similar; queue delay should drop)
7) Execution protocol (runbook)
Agents MUST run every scenario using the same disciplined flow:
Pre-run checklist
- confirm system versions/SHAs
- confirm autoscaling mode:
- Off for baseline capacity characterization
- On for validating autoscaling policies
- clear queues and consumer group pending entries
- restart or at least record "time since deploy" for services (cold vs warm)
During run
- ensure load is truly open-loop when required (arrival-rate based)
- continuously record:
- offered vs achieved rate
- queue depth
- CPU/mem for gateway/worker/Valkey
Post-run
- stop load
- wait until backlog drains (or record that it doesn't)
- export:
- k6/runner raw output
- Prometheus time series snapshot
- sampled logs with corr_id fields
- generate a summary report automatically (no hand calculations)
8) Analysis rules (how agents compute "the envelope")
Agents MUST generate at minimum two plots per run:
- Latency envelope: offered load (x-axis) vs p95
t_hop(y-axis)- overlay p99 (and SLO line)
- Queue behavior: offered load vs queue depth (or lag), plus drain time
How to identify the "knee"
Agents SHOULD mark the knee as the first plateau where:
- queue depth grows monotonically within the plateau, or
- p95
t_queue_delayincreases by > X% step-to-step (e.g., 50–100%)
Convert results into scaling guidance
Agents SHOULD compute:
capacity_per_worker ≈ 1 / mean(t_service)(jobs/sec per worker)- recommended replicas for offered load λ at target utilization U:
workers_needed = ceil(λ * mean(t_service) / U)- choose U ~ 0.6–0.75 for headroom
This should be reported alongside the measured envelope.
9) Pass/fail criteria and regression gates
Agents MUST define gates in configuration, not in someone's head.
Suggested gating structure:
- Smoke gate: error rate < 0.1%, backlog drains
- Baseline gate: p95
t_hopregression < 10% (tune after you have history) - Capacity gate: knee load regression < 10% (optional but very valuable)
- Soak gate: p95 drift over time < 15% and no memory runaway
10) Common pitfalls (agents must avoid)
-
Closed-loop tests used for capacity Closed-loop ("N concurrent users") self-throttles and can hide queueing onset. Use open-loop arrival rate for capacity.
-
Ignoring queue depth A system can look "healthy" in request latency while silently building backlog.
-
Measuring only gateway latency You must measure enqueue → claim → done to see the real hop.
-
Load generator bottleneck If the generator saturates, you'll under-estimate capacity.
-
Retries enabled by default Retries can inflate load and hide root causes; run with retries off first.
-
Not controlling warm vs cold Cold caches vs warmed services produce different envelopes; record the condition.
Agent implementation checklist (deliverables)
Assign these as concrete tasks to your agents.
Agent 1 — Observability & tracing
MUST deliver:
- correlation id propagation gateway → Valkey → worker
- timestamps
enq/claim/done - Prometheus histograms for enqueue, service, hop
- queue depth metric (
XLEN/XINFOlag)
Agent 2 — Load test harness
MUST deliver:
- test runner scripts (k6 or equivalent) for scenarios A–G
- test config file (YAML/JSON) controlling:
- stages (rates/durations)
- payload mix
- headers (corr-id)
- reproducible seeds and version stamping
Agent 3 — Result collector and analyzer
MUST deliver:
- a pipeline that merges:
- load generator output
- hop timing data (from logs or a completion stream)
- Prometheus snapshots
- automatic summary + plots:
- latency envelope
- queue depth/drain
- CSV/JSON exports for long-term tracking
Agent 4 — Reporting and dashboards
MUST deliver:
- a standard report template that includes:
- environment details
- scenario details
- key charts
- knee estimate
- scaling recommendation
- Grafana dashboard with the required panels
Agent 5 — CI / release integration
SHOULD deliver:
- PR-level smoke test (Scenario A)
- nightly baseline (Scenario B)
- weekly capacity sweep (Scenario C + scaling curve)
Template: scenario spec (agents can copy/paste)
test_run:
system_under_test:
gateway_sha: "<git sha>"
worker_sha: "<git sha>"
valkey_version: "<version>"
environment:
cluster: "<name>"
workers: 4
autoscaling: "off" # off|on
workload:
endpoint: "/hop"
payload_profile: "p50"
mix:
p50: 0.7
p95: 0.25
max: 0.05
scenario:
name: "capacity_ramp"
mode: "open_loop"
warmup_seconds: 60
stages:
- rps: 50
duration_seconds: 120
- rps: 100
duration_seconds: 120
- rps: 200
duration_seconds: 120
- rps: 400
duration_seconds: 120
gates:
max_error_rate: 0.01
slo_ms_p95_hop: 500
backlog_must_drain_seconds: 300
outputs:
artifacts_dir: "./artifacts/<timestamp>/"
Sample folder layout
perf/
docker-compose.yml
prometheus/
prometheus.yml
k6/
lib.js
smoke.js
capacity_ramp.js
burst.js
soak.js
stress.js
scaling_curve.sh
tools/
analyze.py
src/
Perf.Gateway/
Perf.Worker/
Document Version: 1.0 Archived From: docs/product-advisories/unprocessed/16-Dec-2025 - Reimagining Proof-Linked UX in Security Workflows.md Archive Reason: Wrong content was pasted; this performance testing content preserved for future use.