48 KiB
Here’s a compact, ready‑to‑use playbook to measure and plot performance envelopes for an HTTP → Valkey → Worker hop under variable concurrency, so you can tune autoscaling and predict user‑visible spikes.
What we’re measuring (plain English)
- TTFB/TTFS (HTTP): time the gateway spends accepting the request + queuing the job.
- Valkey latency: enqueue (
LPUSH/XADD), pop/claim (BRPOP/XREADGROUP), and round‑trip. - Worker service time: time to pick up, process, and ack.
- Queueing delay: time spent waiting in the queue (arrival → start of worker).
These four add up to the “hop latency” users feel when the system is under load.
Minimal tracing you can add today
Emit these IDs/headers end‑to‑end:
x-stella-corr-id(uuid)x-stella-enq-ts(gateway enqueue ts, ns)x-stella-claim-ts(worker claim ts, ns)x-stella-done-ts(worker done ts, ns)
From these, compute:
queue_delay = claim_ts - enq_tsservice_time = done_ts - claim_tshttp_ttfs = gateway_first_byte_ts - http_request_start_tshop_latency = done_ts - enq_ts(or return‑path if synchronous)
Clock‑sync tip: use monotonic clocks in code and convert to ns; don’t mix wall‑clock.
Valkey commands (safe, BSD Valkey)
Use Valkey Streams + Consumer Groups for fairness and metrics:
- Enqueue:
XADD jobs * corr-id <uuid> enq-ts <ns> payload <...> - Claim:
XREADGROUP GROUP workers w1 COUNT 1 BLOCK 1000 STREAMS jobs > - Ack:
XACK jobs workers <id>
Add a small Lua for timestamping at enqueue (atomic):
-- KEYS[1]=stream
-- ARGV[1]=enq_ts_ns, ARGV[2]=corr_id, ARGV[3]=payload
return redis.call('XADD', KEYS[1], '*',
'corr', ARGV[2], 'enq', ARGV[1], 'p', ARGV[3])
Load shapes to test (find the envelope)
- Open‑loop (arrival‑rate controlled): 50 → 10k req/min in steps; constant rate per step. Reveals queueing onset.
- Burst: 0 → N in short spikes (e.g., 5k in 10s) to see saturation and drain time.
- Step‑up/down: double every 2 min until SLO breach; then halve down.
- Long tail soak: run at 70–80% of max for 1h; watch p95‑p99.9 drift.
Target outputs per step: p50/p90/p95/p99 for queue_delay, service_time, hop_latency, plus throughput and error rate.
k6 script (HTTP client pressure)
// save as hop-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
scenarios: {
step_load: {
executor: 'ramping-arrival-rate',
startRate: 20, timeUnit: '1s',
preAllocatedVUs: 200, maxVUs: 5000,
stages: [
{ target: 50, duration: '1m' },
{ target: 100, duration: '1m' },
{ target: 200, duration: '1m' },
{ target: 400, duration: '1m' },
{ target: 800, duration: '1m' },
],
},
},
thresholds: {
'http_req_failed': ['rate<0.01'],
'http_req_duration{phase:hop}': ['p(95)<500'],
},
};
export default function () {
const corr = crypto.randomUUID();
const res = http.post(
__ENV.GW_URL,
JSON.stringify({ data: 'ping', corr }),
{
headers: { 'Content-Type': 'application/json', 'x-stella-corr-id': corr },
tags: { phase: 'hop' },
}
);
check(res, { 'status 2xx/202': r => r.status === 200 || r.status === 202 });
sleep(0.01);
}
Run: GW_URL=https://gateway.example/hop k6 run hop-test.js
Worker hooks (.NET 10 sketch)
// At claim
var now = Stopwatch.GetTimestamp(); // monotonic
var claimNs = now.ToNanoseconds();
log.AddTag("x-stella-claim-ts", claimNs);
// After processing
var doneNs = Stopwatch.GetTimestamp().ToNanoseconds();
log.AddTag("x-stella-done-ts", doneNs);
// Include corr-id and stream entry id in logs/metrics
Helper:
public static class MonoTime {
static readonly double _nsPerTick = 1_000_000_000d / Stopwatch.Frequency;
public static long ToNanoseconds(this long ticks) => (long)(ticks * _nsPerTick);
}
Prometheus metrics to expose
valkey_enqueue_ns(histogram)valkey_claim_block_ms(gauge)worker_service_ns(histogram, labels: worker_type, route)queue_depth(gauge viaXLENorXINFO STREAM)enqueue_rate,dequeue_rate(counters)
Example recording rules:
- record: hop:queue_delay_p95
expr: histogram_quantile(0.95, sum(rate(valkey_enqueue_ns_bucket[1m])) by (le))
- record: hop:service_time_p95
expr: histogram_quantile(0.95, sum(rate(worker_service_ns_bucket[1m])) by (le))
- record: hop:latency_budget_p95
expr: hop:queue_delay_p95 + hop:service_time_p95
Autoscaling signals (HPA/KEDA friendly)
- Primary: queue depth & its derivative (d/dt).
- Secondary: p95
queue_delayand worker CPU. - Safety: max in‑flight per worker; backpressure HTTP 429 when
queue_depth > Dorp95_queue_delay > SLO*0.8.
Plot the “envelope” (what you’ll look at)
- X‑axis: offered load (req/s).
- Y‑axis: p95 hop latency (ms).
- Overlay: p99 (dashed), SLO line (e.g., 500 ms), and capacity knee (where p95 sharply rises).
- Add secondary panel: queue depth vs load.
If you want, I can generate a ready‑made notebook that ingests your logs/metrics CSV and outputs these plots. Below is a set of implementation guidelines your agents can follow to build a repeatable performance test system for the HTTP → Valkey → Worker pipeline. It’s written as a “spec + runbook” with clear MUST/SHOULD requirements and concrete scenario definitions.
Performance Test Guidelines
HTTP → Valkey → Worker pipeline
1) Objectives and scope
Primary objectives
Your performance tests MUST answer these questions with evidence:
-
Capacity knee: At what offered load does queue delay start growing sharply?
-
User-impact envelope: What are p50/p95/p99 hop latency curves vs offered load?
-
Decomposition: How much of hop latency is:
- gateway enqueue time
- Valkey enqueue/claim RTT
- queue wait time
- worker service time
-
Scaling behavior: How do these change with worker replica counts (N workers)?
-
Stability: Under sustained load, do latencies drift (GC, memory, fragmentation, background jobs)?
Non-goals (explicitly out of scope unless you add them later)
- Micro-optimizing single function runtime
- Synthetic “max QPS” records without a representative payload
- Tests that don’t collect segment metrics (end-to-end only) for anything beyond basic smoke
2) Definitions and required metrics
Required latency definitions (standardize these names)
Agents MUST compute and report these per request/job:
-
t_http_accept: time from client send → gateway accepts request -
t_enqueue: time spent in gateway to enqueue into Valkey (server-side) -
t_valkey_rtt_enq: client-observed RTT for enqueue command(s) -
t_queue_delay:claim_ts - enq_ts -
t_service:done_ts - claim_ts -
t_hop:done_ts - enq_ts(this is the “true pipeline hop” latency) -
Optional but recommended:
t_ack: time to ack completion (Valkey ack RTT)t_http_response: request start → gateway response sent (TTFB/TTFS)
Required percentiles and aggregations
Per scenario step (e.g., each offered load plateau), agents MUST output:
- p50 / p90 / p95 / p99 / p99.9 for:
t_hop,t_queue_delay,t_service,t_enqueue - Throughput: offered rps and achieved rps
- Error rate: HTTP failures, enqueue failures, worker failures
- Queue depth and backlog drain time
Required system-level telemetry (minimum)
Agents MUST collect these time series during tests:
- Worker: CPU, memory, GC pauses (if .NET), threadpool saturation indicators
- Valkey: ops/sec, connected clients, blocked clients, memory used, evictions, slowlog count
- Gateway: CPU/mem, request rate, response codes, request duration histogram
3) Environment and test hygiene requirements
Environment requirements
Agents SHOULD run tests in an environment that matches production in:
- container CPU/memory limits
- number of nodes, network topology
- Valkey topology (single, cluster, sentinel, etc.)
- worker replica autoscaling rules (or deliberately disabled)
If exact parity isn’t possible, agents MUST record all known differences in the report.
Test hygiene (non-negotiable)
Agents MUST:
- Start from empty queues (no backlog).
- Disable client retries (or explicitly run two variants: retries off / retries on).
- Warm up before measuring (e.g., 60s warm-up minimum).
- Hold steady plateaus long enough to stabilize (usually 2–5 minutes per step).
- Cool down and verify backlog drains (queue depth returns to baseline).
- Record exact versions/SHAs of gateway/worker and Valkey config.
Load generator hygiene
Agents MUST ensure the load generator is not the bottleneck:
- CPU < ~70% during test
- no local socket exhaustion
- enough VUs/connections
- if needed, distributed load generation
4) Instrumentation spec (agents implement this first)
Correlation and timestamps
Agents MUST propagate an end-to-end correlation ID and timestamps.
Required fields
corr_id(UUID)enq_ts_ns(set at enqueue, monotonic or consistent clock)claim_ts_ns(set by worker when job is claimed)done_ts_ns(set by worker when job processing ends)
Where these live
- HTTP request header:
x-corr-id: <uuid> - Valkey job payload fields:
corr,enq, and optionally payload size/type - Worker logs/metrics: include
corr_id, job id,claim_ts_ns,done_ts_ns
Clock requirements
Agents MUST use a consistent timing source:
-
Prefer monotonic timers for durations (Stopwatch / monotonic clock)
-
If timestamps cross machines, ensure they’re comparable:
- either rely on synchronized clocks (NTP) and monitor drift
- or compute durations using monotonic tick deltas within the same host and transmit durations (less ideal for queue delay)
Practical recommendation: use wall-clock ns for cross-host timestamps with NTP + drift checks, and also record per-host monotonic durations for sanity.
Valkey queue semantics (recommended)
Agents SHOULD use Streams + Consumer Groups for stable claim semantics and good observability:
- Enqueue:
XADD jobs * corr <uuid> enq <ns> payload <...> - Claim:
XREADGROUP GROUP workers <consumer> COUNT 1 BLOCK 1000 STREAMS jobs > - Ack:
XACK jobs workers <id>
Agents MUST record stream length (XLEN) or consumer group lag (XINFO GROUPS) as queue depth/lag.
Metrics exposure
Agents MUST publish Prometheus (or equivalent) histograms:
gateway_enqueue_seconds(or ns) histogramvalkey_enqueue_rtt_secondshistogramworker_service_secondshistogramqueue_delay_secondshistogram (derived from timestamps; can be computed in worker or offline)hop_latency_secondshistogram
5) Workload modeling and test data
Agents MUST define a workload model before running capacity tests:
- Endpoint(s): list exact gateway routes under test
- Payload types: small/typical/large
- Mix: e.g., 70/25/5 by payload size
- Idempotency rules: ensure repeated jobs don’t corrupt state
- Data reset strategy: how test data is cleaned or isolated per run
Agents SHOULD test at least:
- Typical payload (p50)
- Large payload (p95)
- Worst-case allowed payload (bounded by your API limits)
6) Scenario suite your agents MUST implement
Each scenario MUST be defined as code/config (not manual).
Scenario A — Smoke (fast sanity)
Goal: verify instrumentation + basic correctness Load: low (e.g., 1–5 rps), 2 minutes Pass:
- 0 backlog after run
- error rate < 0.1%
- metrics present for all segments
Scenario B — Baseline (repeatable reference point)
Goal: establish a stable baseline for regression tracking Load: fixed moderate load (e.g., 30–50% of expected capacity), 10 minutes Pass:
- p95
t_hopwithin baseline ± tolerance (set after first runs) - no upward drift in p95 across time (trend line ~flat)
Scenario C — Capacity ramp (open-loop)
Goal: find the knee where queueing begins Method: open-loop arrival-rate ramp with plateaus Example stages (edit to fit your system):
- 50 rps for 2m
- 100 rps for 2m
- 200 rps for 2m
- 400 rps for 2m
- … until SLO breach or errors spike
MUST:
- warm-up stage before first plateau
- record per-plateau summary
Stop conditions (any triggers stop):
- error rate > 1%
- queue depth grows without bound over an entire plateau
- p95
t_hopexceeds SLO for 2 consecutive plateaus
Scenario D — Stress (push past capacity)
Goal: characterize failure mode and recovery Load: 120–200% of knee load, 5–10 minutes Pass (for resilience):
- system does not crash permanently
- once load stops, backlog drains within target time (define it)
Scenario E — Burst / spike
Goal: see how quickly queue grows and drains Load shape:
- baseline low load
- sudden burst (e.g., 10× for 10–30s)
- return to baseline
Report:
- peak queue depth
- time to drain to baseline
- p99
t_hopduring burst
Scenario F — Soak (long-running)
Goal: detect drift (leaks, fragmentation, GC patterns) Load: 70–85% of knee, 60–180 minutes Pass:
- p95 does not trend upward beyond threshold
- memory remains bounded
- no rising error rate
Scenario G — Scaling curve (worker replica sweep)
Goal: turn results into scaling rules Method:
- Repeat Scenario C with worker replicas = 1, 2, 4, 8… Deliverable:
- plot of knee load vs worker count
- p95
t_servicevs worker count (should remain similar; queue delay should drop)
7) Execution protocol (runbook)
Agents MUST run every scenario using the same disciplined flow:
Pre-run checklist
-
confirm system versions/SHAs
-
confirm autoscaling mode:
- Off for baseline capacity characterization
- On for validating autoscaling policies
-
clear queues and consumer group pending entries
-
restart or at least record “time since deploy” for services (cold vs warm)
During run
-
ensure load is truly open-loop when required (arrival-rate based)
-
continuously record:
- offered vs achieved rate
- queue depth
- CPU/mem for gateway/worker/Valkey
Post-run
-
stop load
-
wait until backlog drains (or record that it doesn’t)
-
export:
- k6/runner raw output
- Prometheus time series snapshot
- sampled logs with corr_id fields
-
generate a summary report automatically (no hand calculations)
8) Analysis rules (how agents compute “the envelope”)
Agents MUST generate at minimum two plots per run:
-
Latency envelope: offered load (x-axis) vs p95
t_hop(y-axis)- overlay p99 (and SLO line)
-
Queue behavior: offered load vs queue depth (or lag), plus drain time
How to identify the “knee”
Agents SHOULD mark the knee as the first plateau where:
- queue depth grows monotonically within the plateau, or
- p95
t_queue_delayincreases by > X% step-to-step (e.g., 50–100%)
Convert results into scaling guidance
Agents SHOULD compute:
-
capacity_per_worker ≈ 1 / mean(t_service)(jobs/sec per worker) -
recommended replicas for offered load λ at target utilization U:
workers_needed = ceil(λ * mean(t_service) / U)- choose U ~ 0.6–0.75 for headroom
This should be reported alongside the measured envelope.
9) Pass/fail criteria and regression gates
Agents MUST define gates in configuration, not in someone’s head.
Suggested gating structure:
- Smoke gate: error rate < 0.1%, backlog drains
- Baseline gate: p95
t_hopregression < 10% (tune after you have history) - Capacity gate: knee load regression < 10% (optional but very valuable)
- Soak gate: p95 drift over time < 15% and no memory runaway
10) Common pitfalls (agents must avoid)
-
Closed-loop tests used for capacity Closed-loop (“N concurrent users”) self-throttles and can hide queueing onset. Use open-loop arrival rate for capacity.
-
Ignoring queue depth A system can look “healthy” in request latency while silently building backlog.
-
Measuring only gateway latency You must measure enqueue → claim → done to see the real hop.
-
Load generator bottleneck If the generator saturates, you’ll under-estimate capacity.
-
Retries enabled by default Retries can inflate load and hide root causes; run with retries off first.
-
Not controlling warm vs cold Cold caches vs warmed services produce different envelopes; record the condition.
Agent implementation checklist (deliverables)
Assign these as concrete tasks to your agents.
Agent 1 — Observability & tracing
MUST deliver:
- correlation id propagation gateway → Valkey → worker
- timestamps
enq/claim/done - Prometheus histograms for enqueue, service, hop
- queue depth metric (
XLEN/XINFOlag)
Agent 2 — Load test harness
MUST deliver:
-
test runner scripts (k6 or equivalent) for scenarios A–G
-
test config file (YAML/JSON) controlling:
- stages (rates/durations)
- payload mix
- headers (corr-id)
-
reproducible seeds and version stamping
Agent 3 — Result collector and analyzer
MUST deliver:
-
a pipeline that merges:
- load generator output
- hop timing data (from logs or a completion stream)
- Prometheus snapshots
-
automatic summary + plots:
- latency envelope
- queue depth/drain
-
CSV/JSON exports for long-term tracking
Agent 4 — Reporting and dashboards
MUST deliver:
-
a standard report template that includes:
- environment details
- scenario details
- key charts
- knee estimate
- scaling recommendation
-
Grafana dashboard with the required panels
Agent 5 — CI / release integration
SHOULD deliver:
- PR-level smoke test (Scenario A)
- nightly baseline (Scenario B)
- weekly capacity sweep (Scenario C + scaling curve)
Template: scenario spec (agents can copy/paste)
test_run:
system_under_test:
gateway_sha: "<git sha>"
worker_sha: "<git sha>"
valkey_version: "<version>"
environment:
cluster: "<name>"
workers: 4
autoscaling: "off" # off|on
workload:
endpoint: "/hop"
payload_profile: "p50"
mix:
p50: 0.7
p95: 0.25
max: 0.05
scenario:
name: "capacity_ramp"
mode: "open_loop"
warmup_seconds: 60
stages:
- rps: 50
duration_seconds: 120
- rps: 100
duration_seconds: 120
- rps: 200
duration_seconds: 120
- rps: 400
duration_seconds: 120
gates:
max_error_rate: 0.01
slo_ms_p95_hop: 500
backlog_must_drain_seconds: 300
outputs:
artifacts_dir: "./artifacts/<timestamp>/"
If you want, I can also provide a single “golden” folder structure (tests/ scripts/ dashboards/ analysis/) and a “definition of done” checklist that matches how your repo is organized—but the above is already sufficient for agents to start implementing immediately. Below is a sample / partial implementation that gives full functional coverage of your performance-test requirements (instrumentation, correlation, timestamps, queue semantics, scenarios A–G, artifact export, and analysis). It is intentionally minimal and “swap-in-real-code” friendly.
You can copy these files into a perf/ folder in your repo, build, and run locally with Docker Compose.
1) Suggested folder layout
perf/
docker-compose.yml
prometheus/
prometheus.yml
k6/
lib.js
smoke.js
capacity_ramp.js
burst.js
soak.js
stress.js
scaling_curve.sh
tools/
analyze.py
src/
Perf.Gateway/
Perf.Gateway.csproj
Program.cs
Metrics.cs
ValkeyStreams.cs
TimeNs.cs
Perf.Worker/
Perf.Worker.csproj
Program.cs
WorkerService.cs
Metrics.cs
ValkeyStreams.cs
TimeNs.cs
2) Gateway sample (.NET 10, Minimal API)
perf/src/Perf.Gateway/Perf.Gateway.csproj
<Project Sdk="Microsoft.NET.Sdk.Web">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>
<ItemGroup>
<!-- StackExchange.Redis is MIT and works with Valkey -->
<PackageReference Include="StackExchange.Redis" Version="2.8.16" />
</ItemGroup>
</Project>
perf/src/Perf.Gateway/TimeNs.cs
namespace Perf.Gateway;
public static class TimeNs
{
private static readonly long UnixEpochTicks = DateTime.UnixEpoch.Ticks; // 100ns units
public static long UnixNowNs()
{
var ticks = DateTime.UtcNow.Ticks - UnixEpochTicks; // 100ns
return ticks * 100L; // ns
}
}
perf/src/Perf.Gateway/Metrics.cs
using System.Collections.Concurrent;
using System.Globalization;
using System.Text;
namespace Perf.Gateway;
public sealed class Metrics
{
private readonly ConcurrentDictionary<string, long> _counters = new();
// Simple fixed-bucket histograms in seconds (Prometheus histogram format)
private readonly ConcurrentDictionary<string, Histogram> _h = new();
public void Inc(string name, long by = 1) => _counters.AddOrUpdate(name, by, (_, v) => v + by);
public Histogram Hist(string name, double[] bucketsSeconds) =>
_h.GetOrAdd(name, _ => new Histogram(name, bucketsSeconds));
public string ExportPrometheus()
{
var sb = new StringBuilder(16 * 1024);
foreach (var (k, v) in _counters.OrderBy(kv => kv.Key))
{
sb.Append("# TYPE ").Append(k).Append(" counter\n");
sb.Append(k).Append(' ').Append(v.ToString(CultureInfo.InvariantCulture)).Append('\n');
}
foreach (var hist in _h.Values.OrderBy(h => h.Name))
{
sb.Append(hist.Export());
}
return sb.ToString();
}
public sealed class Histogram
{
public string Name { get; }
private readonly double[] _buckets; // sorted
private readonly long[] _bucketCounts; // cumulative exposed later
private long _count;
private double _sum;
private readonly object _lock = new();
public Histogram(string name, double[] bucketsSeconds)
{
Name = name;
_buckets = bucketsSeconds.OrderBy(x => x).ToArray();
_bucketCounts = new long[_buckets.Length];
}
public void Observe(double seconds)
{
lock (_lock)
{
_count++;
_sum += seconds;
for (int i = 0; i < _buckets.Length; i++)
{
if (seconds <= _buckets[i]) _bucketCounts[i]++;
}
}
}
public string Export()
{
// Prometheus hist buckets are cumulative; we already maintain that.
var sb = new StringBuilder(2048);
sb.Append("# TYPE ").Append(Name).Append(" histogram\n");
lock (_lock)
{
for (int i = 0; i < _buckets.Length; i++)
{
sb.Append(Name).Append("_bucket{le=\"")
.Append(_buckets[i].ToString("0.################", CultureInfo.InvariantCulture))
.Append("\"} ")
.Append(_bucketCounts[i].ToString(CultureInfo.InvariantCulture))
.Append('\n');
}
sb.Append(Name).Append("_bucket{le=\"+Inf\"} ")
.Append(_count.ToString(CultureInfo.InvariantCulture))
.Append('\n');
sb.Append(Name).Append("_sum ")
.Append(_sum.ToString(CultureInfo.InvariantCulture))
.Append('\n');
sb.Append(Name).Append("_count ")
.Append(_count.ToString(CultureInfo.InvariantCulture))
.Append('\n');
}
return sb.ToString();
}
}
}
perf/src/Perf.Gateway/ValkeyStreams.cs
using StackExchange.Redis;
namespace Perf.Gateway;
public sealed class ValkeyStreams
{
private readonly IDatabase _db;
public ValkeyStreams(IConnectionMultiplexer mux) => _db = mux.GetDatabase();
public async Task EnsureConsumerGroupAsync(string stream, string group)
{
try
{
// XGROUP CREATE <key> <groupname> $ MKSTREAM
await _db.ExecuteAsync("XGROUP", "CREATE", stream, group, "$", "MKSTREAM");
}
catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.OrdinalIgnoreCase))
{
// ok
}
}
public async Task<RedisResult> XAddAsync(string stream, NameValueEntry[] fields)
{
// XADD stream * field value field value ...
var args = new List<object>(2 + fields.Length * 2) { stream, "*" };
foreach (var f in fields) { args.Add(f.Name); args.Add(f.Value); }
return await _db.ExecuteAsync("XADD", args.ToArray());
}
}
perf/src/Perf.Gateway/Program.cs
using Perf.Gateway;
using StackExchange.Redis;
using System.Diagnostics;
var builder = WebApplication.CreateBuilder(args);
var valkey = builder.Configuration["VALKEY_ENDPOINT"] ?? "valkey:6379";
builder.Services.AddSingleton<IConnectionMultiplexer>(_ => ConnectionMultiplexer.Connect(valkey));
builder.Services.AddSingleton<ValkeyStreams>();
builder.Services.AddSingleton<Metrics>();
var app = builder.Build();
var metrics = app.Services.GetRequiredService<Metrics>();
var streams = app.Services.GetRequiredService<ValkeyStreams>();
const string JobsStream = "stella:perf:jobs";
const string DoneStream = "stella:perf:done";
const string Group = "workers";
await streams.EnsureConsumerGroupAsync(JobsStream, Group);
var allowTestControl = (app.Configuration["ALLOW_TEST_CONTROL"] ?? "1") == "1";
var runs = new Dictionary<string, long>(StringComparer.Ordinal); // run_id -> start_ns
if (allowTestControl)
{
app.MapPost("/test/start", () =>
{
var runId = Guid.NewGuid().ToString("N");
var startNs = TimeNs.UnixNowNs();
lock (runs) runs[runId] = startNs;
metrics.Inc("perf_test_start_total");
return Results.Ok(new { run_id = runId, start_ns = startNs, jobs_stream = JobsStream, done_stream = DoneStream });
});
app.MapPost("/test/end/{runId}", (string runId) =>
{
lock (runs) runs.Remove(runId);
metrics.Inc("perf_test_end_total");
return Results.Ok(new { run_id = runId });
});
}
app.MapGet("/metrics", () => Results.Text(metrics.ExportPrometheus(), "text/plain; version=0.0.4"));
app.MapPost("/hop", async (HttpRequest req) =>
{
// Correlation / run id
var corr = req.Headers["x-stella-corr-id"].FirstOrDefault() ?? Guid.NewGuid().ToString();
var runId = req.Headers["x-stella-run-id"].FirstOrDefault() ?? "no-run";
// Enqueue timestamp (UTC-derived ns)
var enqNs = TimeNs.UnixNowNs();
// Read raw body (payload) - keep it simple for perf harness
string payload;
using (var sr = new StreamReader(req.Body))
payload = await sr.ReadToEndAsync();
var sw = Stopwatch.GetTimestamp();
// Valkey enqueue
var valkeySw = Stopwatch.GetTimestamp();
var entryId = await streams.XAddAsync(JobsStream, new[]
{
new NameValueEntry("corr", corr),
new NameValueEntry("run", runId),
new NameValueEntry("enq_ns", enqNs),
new NameValueEntry("payload", payload),
});
var valkeyRttSec = (Stopwatch.GetTimestamp() - valkeySw) / (double)Stopwatch.Frequency;
var enqueueSec = (Stopwatch.GetTimestamp() - sw) / (double)Stopwatch.Frequency;
metrics.Inc("hop_requests_total");
metrics.Hist("gateway_enqueue_seconds", new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2 }).Observe(enqueueSec);
metrics.Hist("valkey_enqueue_rtt_seconds", new[] { .0005, .001, .002, .005, .01, .02, .05, .1, .2 }).Observe(valkeyRttSec);
return Results.Accepted(value: new { corr, run_id = runId, enq_ns = enqNs, entry_id = entryId.ToString() });
});
app.Run("http://0.0.0.0:8080");
3) Worker sample (.NET 10 hosted service + metrics)
perf/src/Perf.Worker/Perf.Worker.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="StackExchange.Redis" Version="2.8.16" />
</ItemGroup>
</Project>
perf/src/Perf.Worker/TimeNs.cs
namespace Perf.Worker;
public static class TimeNs
{
private static readonly long UnixEpochTicks = DateTime.UnixEpoch.Ticks;
public static long UnixNowNs() => (DateTime.UtcNow.Ticks - UnixEpochTicks) * 100L;
}
perf/src/Perf.Worker/Metrics.cs
// Same as gateway Metrics.cs (copy/paste). Keep identical for consistency.
using System.Collections.Concurrent;
using System.Globalization;
using System.Text;
namespace Perf.Worker;
public sealed class Metrics
{
private readonly ConcurrentDictionary<string, long> _counters = new();
private readonly ConcurrentDictionary<string, Histogram> _h = new();
public void Inc(string name, long by = 1) => _counters.AddOrUpdate(name, by, (_, v) => v + by);
public Histogram Hist(string name, double[] bucketsSeconds) =>
_h.GetOrAdd(name, _ => new Histogram(name, bucketsSeconds));
public string ExportPrometheus()
{
var sb = new StringBuilder(16 * 1024);
foreach (var (k, v) in _counters.OrderBy(kv => kv.Key))
{
sb.Append("# TYPE ").Append(k).Append(" counter\n");
sb.Append(k).Append(' ').Append(v.ToString(CultureInfo.InvariantCulture)).Append('\n');
}
foreach (var hist in _h.Values.OrderBy(h => h.Name))
sb.Append(hist.Export());
return sb.ToString();
}
public sealed class Histogram
{
public string Name { get; }
private readonly double[] _buckets;
private readonly long[] _bucketCounts;
private long _count;
private double _sum;
private readonly object _lock = new();
public Histogram(string name, double[] bucketsSeconds)
{
Name = name;
_buckets = bucketsSeconds.OrderBy(x => x).ToArray();
_bucketCounts = new long[_buckets.Length];
}
public void Observe(double seconds)
{
lock (_lock)
{
_count++;
_sum += seconds;
for (int i = 0; i < _buckets.Length; i++)
if (seconds <= _buckets[i]) _bucketCounts[i]++;
}
}
public string Export()
{
var sb = new StringBuilder(2048);
sb.Append("# TYPE ").Append(Name).Append(" histogram\n");
lock (_lock)
{
for (int i = 0; i < _buckets.Length; i++)
{
sb.Append(Name).Append("_bucket{le=\"")
.Append(_buckets[i].ToString("0.################", CultureInfo.InvariantCulture))
.Append("\"} ")
.Append(_bucketCounts[i].ToString(CultureInfo.InvariantCulture))
.Append('\n');
}
sb.Append(Name).Append("_bucket{le=\"+Inf\"} ")
.Append(_count.ToString(CultureInfo.InvariantCulture))
.Append('\n');
sb.Append(Name).Append("_sum ")
.Append(_sum.ToString(CultureInfo.InvariantCulture))
.Append('\n');
sb.Append(Name).Append("_count ")
.Append(_count.ToString(CultureInfo.InvariantCulture))
.Append('\n');
}
return sb.ToString();
}
}
}
perf/src/Perf.Worker/ValkeyStreams.cs
using StackExchange.Redis;
namespace Perf.Worker;
public sealed class ValkeyStreams
{
private readonly IDatabase _db;
public ValkeyStreams(IConnectionMultiplexer mux) => _db = mux.GetDatabase();
public async Task EnsureConsumerGroupAsync(string stream, string group)
{
try
{
await _db.ExecuteAsync("XGROUP", "CREATE", stream, group, "$", "MKSTREAM");
}
catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.OrdinalIgnoreCase)) { }
}
public async Task<RedisResult> XReadGroupAsync(string group, string consumer, string stream, string id, int count, int blockMs)
=> await _db.ExecuteAsync("XREADGROUP", "GROUP", group, consumer, "COUNT", count, "BLOCK", blockMs, "STREAMS", stream, id);
public async Task XAckAsync(string stream, string group, RedisValue id)
=> await _db.ExecuteAsync("XACK", stream, group, id);
public async Task<RedisResult> XAddAsync(string stream, NameValueEntry[] fields)
{
var args = new List<object>(2 + fields.Length * 2) { stream, "*" };
foreach (var f in fields) { args.Add(f.Name); args.Add(f.Value); }
return await _db.ExecuteAsync("XADD", args.ToArray());
}
}
perf/src/Perf.Worker/WorkerService.cs
using StackExchange.Redis;
using System.Diagnostics;
namespace Perf.Worker;
public sealed class WorkerService : BackgroundService
{
private readonly ValkeyStreams _streams;
private readonly Metrics _metrics;
private readonly ILogger<WorkerService> _log;
private const string JobsStream = "stella:perf:jobs";
private const string DoneStream = "stella:perf:done";
private const string Group = "workers";
private readonly string _consumer;
public WorkerService(ValkeyStreams streams, Metrics metrics, ILogger<WorkerService> log)
{
_streams = streams;
_metrics = metrics;
_log = log;
_consumer = Environment.GetEnvironmentVariable("WORKER_CONSUMER") ?? $"w-{Environment.MachineName}-{Guid.NewGuid():N}";
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await _streams.EnsureConsumerGroupAsync(JobsStream, Group);
var serviceBuckets = new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5 };
var queueBuckets = new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10, 30 };
var hopBuckets = new[] { .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10, 30 };
while (!stoppingToken.IsCancellationRequested)
{
RedisResult res;
try
{
res = await _streams.XReadGroupAsync(Group, _consumer, JobsStream, ">", count: 1, blockMs: 1000);
}
catch (Exception ex)
{
_metrics.Inc("worker_xread_errors_total");
_log.LogWarning(ex, "XREADGROUP failed");
await Task.Delay(250, stoppingToken);
continue;
}
if (res.IsNull) continue;
// Parse XREADGROUP result (array -> stream -> entries)
// Expected shape: [[stream, [[id, [field, value, field, value...]], ...]]]
var outer = (RedisResult[])res!;
foreach (var streamBlock in outer)
{
var sb = (RedisResult[])streamBlock!;
var entries = (RedisResult[])sb[1]!;
foreach (var entry in entries)
{
var e = (RedisResult[])entry!;
var entryId = (RedisValue)e[0]!;
var fields = (RedisResult[])e[1]!;
string corr = "", run = "no-run";
long enqNs = 0;
for (int i = 0; i < fields.Length; i += 2)
{
var key = (string)fields[i]!;
var val = fields[i + 1].ToString();
if (key == "corr") corr = val;
else if (key == "run") run = val;
else if (key == "enq_ns") _ = long.TryParse(val, out enqNs);
}
var claimNs = TimeNs.UnixNowNs();
var sw = Stopwatch.GetTimestamp();
// Placeholder "service work" – replace with real processing
// Keep it deterministic-ish; use env var to model different service times.
var workMs = int.TryParse(Environment.GetEnvironmentVariable("WORK_MS"), out var ms) ? ms : 5;
await Task.Delay(workMs, stoppingToken);
var doneNs = TimeNs.UnixNowNs();
var serviceSec = (Stopwatch.GetTimestamp() - sw) / (double)Stopwatch.Frequency;
var queueDelaySec = enqNs > 0 ? (claimNs - enqNs) / 1_000_000_000d : double.NaN;
var hopSec = enqNs > 0 ? (doneNs - enqNs) / 1_000_000_000d : double.NaN;
// Ack then publish "done" record for offline analysis
await _streams.XAckAsync(JobsStream, Group, entryId);
await _streams.XAddAsync(DoneStream, new[]
{
new NameValueEntry("run", run),
new NameValueEntry("corr", corr),
new NameValueEntry("entry", entryId),
new NameValueEntry("enq_ns", enqNs),
new NameValueEntry("claim_ns", claimNs),
new NameValueEntry("done_ns", doneNs),
new NameValueEntry("work_ms", workMs),
});
_metrics.Inc("worker_jobs_total");
_metrics.Hist("worker_service_seconds", serviceBuckets).Observe(serviceSec);
if (!double.IsNaN(queueDelaySec))
_metrics.Hist("queue_delay_seconds", queueBuckets).Observe(queueDelaySec);
if (!double.IsNaN(hopSec))
_metrics.Hist("hop_latency_seconds", hopBuckets).Observe(hopSec);
}
}
}
}
}
perf/src/Perf.Worker/Program.cs
using Perf.Worker;
using StackExchange.Redis;
var builder = Host.CreateApplicationBuilder(args);
var valkey = builder.Configuration["VALKEY_ENDPOINT"] ?? "valkey:6379";
builder.Services.AddSingleton<IConnectionMultiplexer>(_ => ConnectionMultiplexer.Connect(valkey));
builder.Services.AddSingleton<ValkeyStreams>();
builder.Services.AddSingleton<Metrics>();
builder.Services.AddHostedService<WorkerService>();
// Minimal metrics endpoint
builder.Services.AddSingleton<IHostedService>(sp =>
{
return new SimpleMetricsServer(
sp.GetRequiredService<Metrics>(),
url: "http://0.0.0.0:8081/metrics"
);
});
var host = builder.Build();
await host.RunAsync();
// ---- minimal metrics server ----
file sealed class SimpleMetricsServer : BackgroundService
{
private readonly Metrics _metrics;
private readonly string _url;
public SimpleMetricsServer(Metrics metrics, string url) { _metrics = metrics; _url = url; }
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
var builder = WebApplication.CreateBuilder();
var app = builder.Build();
app.MapGet("/metrics", () => Results.Text(_metrics.ExportPrometheus(), "text/plain; version=0.0.4"));
await app.RunAsync(_url, stoppingToken);
}
}
4) Docker Compose (Valkey + gateway + worker + Prometheus)
perf/docker-compose.yml
services:
valkey:
image: valkey/valkey:7.2
ports: ["6379:6379"]
gateway:
build:
context: ./src/Perf.Gateway
environment:
- VALKEY_ENDPOINT=valkey:6379
- ALLOW_TEST_CONTROL=1
ports: ["8080:8080"]
depends_on: [valkey]
worker:
build:
context: ./src/Perf.Worker
environment:
- VALKEY_ENDPOINT=valkey:6379
- WORK_MS=5
ports: ["8081:8081"]
depends_on: [valkey]
prometheus:
image: prom/prometheus:v2.55.0
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
ports: ["9090:9090"]
depends_on: [gateway, worker]
perf/prometheus/prometheus.yml
global:
scrape_interval: 5s
scrape_configs:
- job_name: gateway
static_configs:
- targets: ["gateway:8080"]
- job_name: worker
static_configs:
- targets: ["worker:8081"]
Run:
cd perf
docker compose up -d --build
5) k6 scenarios A–G (open-loop where required)
perf/k6/lib.js
import http from "k6/http";
export function startRun(baseUrl) {
const res = http.post(`${baseUrl}/test/start`, null, { tags: { phase: "control" } });
if (res.status !== 200) throw new Error(`startRun failed: ${res.status} ${res.body}`);
return res.json();
}
export function hop(baseUrl, runId) {
const corr = crypto.randomUUID();
const payload = JSON.stringify({ corr, data: "ping" });
return http.post(
`${baseUrl}/hop`,
payload,
{
headers: {
"content-type": "application/json",
"x-stella-run-id": runId,
"x-stella-corr-id": corr
},
tags: { phase: "hop" }
}
);
}
Scenario A: Smoke — perf/k6/smoke.js
import { check, sleep } from "k6";
import { startRun, hop } from "./lib.js";
export const options = {
scenarios: {
smoke: {
executor: "constant-arrival-rate",
rate: 2,
timeUnit: "1s",
duration: "2m",
preAllocatedVUs: 20,
maxVUs: 200
}
},
thresholds: {
http_req_failed: ["rate<0.001"]
}
};
export function setup() {
return startRun(__ENV.GW_URL);
}
export default function (data) {
const res = hop(__ENV.GW_URL, data.run_id);
check(res, { "202 accepted": r => r.status === 202 });
sleep(0.01);
}
Scenario C: Capacity ramp (open-loop) — perf/k6/capacity_ramp.js
import { check } from "k6";
import { startRun, hop } from "./lib.js";
export const options = {
scenarios: {
ramp: {
executor: "ramping-arrival-rate",
startRate: 50,
timeUnit: "1s",
preAllocatedVUs: 200,
maxVUs: 5000,
stages: [
{ target: 50, duration: "2m" },
{ target: 100, duration: "2m" },
{ target: 200, duration: "2m" },
{ target: 400, duration: "2m" },
{ target: 800, duration: "2m" }
]
}
},
thresholds: {
http_req_failed: ["rate<0.01"]
}
};
export function setup() {
return startRun(__ENV.GW_URL);
}
export default function (data) {
const res = hop(__ENV.GW_URL, data.run_id);
check(res, { "202 accepted": r => r.status === 202 });
}
Scenario E: Burst — perf/k6/burst.js
import { check } from "k6";
import { startRun, hop } from "./lib.js";
export const options = {
scenarios: {
burst: {
executor: "ramping-arrival-rate",
startRate: 20,
timeUnit: "1s",
preAllocatedVUs: 200,
maxVUs: 5000,
stages: [
{ target: 20, duration: "60s" },
{ target: 400, duration: "20s" },
{ target: 20, duration: "120s" }
]
}
}
};
export function setup() { return startRun(__ENV.GW_URL); }
export default function (data) {
const res = hop(__ENV.GW_URL, data.run_id);
check(res, { "202": r => r.status === 202 });
}
Scenario F: Soak — perf/k6/soak.js
import { check } from "k6";
import { startRun, hop } from "./lib.js";
export const options = {
scenarios: {
soak: {
executor: "constant-arrival-rate",
rate: 200,
timeUnit: "1s",
duration: "60m",
preAllocatedVUs: 500,
maxVUs: 5000
}
}
};
export function setup() { return startRun(__ENV.GW_URL); }
export default function (data) {
const res = hop(__ENV.GW_URL, data.run_id);
check(res, { "202": r => r.status === 202 });
}
Scenario D: Stress — perf/k6/stress.js
import { check } from "k6";
import { startRun, hop } from "./lib.js";
export const options = {
scenarios: {
stress: {
executor: "constant-arrival-rate",
rate: 1500,
timeUnit: "1s",
duration: "10m",
preAllocatedVUs: 2000,
maxVUs: 15000
}
},
thresholds: {
http_req_failed: ["rate<0.05"]
}
};
export function setup() { return startRun(__ENV.GW_URL); }
export default function (data) {
const res = hop(__ENV.GW_URL, data.run_id);
check(res, { "202": r => r.status === 202 });
}
Scenario G: Scaling curve orchestration — perf/k6/scaling_curve.sh
#!/usr/bin/env bash
set -euo pipefail
GW_URL="${GW_URL:-http://localhost:8080}"
for n in 1 2 4 8; do
echo "== Scaling workers to $n =="
docker compose -f ../docker-compose.yml up -d --scale worker="$n"
mkdir -p "../artifacts/scale-$n"
k6 run \
-e GW_URL="$GW_URL" \
--summary-export "../artifacts/scale-$n/k6-summary.json" \
./capacity_ramp.js
done
Run (examples):
cd perf/k6
GW_URL=http://localhost:8080 k6 run --summary-export ../artifacts/smoke-summary.json smoke.js
GW_URL=http://localhost:8080 k6 run --summary-export ../artifacts/ramp-summary.json capacity_ramp.js
6) Offline analysis tool (reads “done” stream by run_id)
perf/tools/analyze.py
import os, sys, json, math
from datetime import datetime, timezone
import redis
def pct(values, p):
if not values:
return None
values = sorted(values)
k = (len(values) - 1) * (p / 100.0)
f = math.floor(k); c = math.ceil(k)
if f == c:
return values[int(k)]
return values[f] * (c - k) + values[c] * (k - f)
def main():
valkey = os.getenv("VALKEY_ENDPOINT", "localhost:6379")
host, port = valkey.split(":")
r = redis.Redis(host=host, port=int(port), decode_responses=True)
run_id = os.getenv("RUN_ID")
if not run_id:
print("Set RUN_ID env var (from /test/start response).", file=sys.stderr)
sys.exit(2)
done_stream = os.getenv("DONE_STREAM", "stella:perf:done")
# Read all entries (sample scale). For big runs use XREAD with cursor.
entries = r.xrange(done_stream, min='-', max='+', count=200000)
hop_ms = []
queue_ms = []
service_ms = []
matched = 0
for entry_id, fields in entries:
if fields.get("run") != run_id:
continue
matched += 1
enq_ns = int(fields.get("enq_ns", "0"))
claim_ns = int(fields.get("claim_ns", "0"))
done_ns = int(fields.get("done_ns", "0"))
if enq_ns > 0 and claim_ns > 0:
queue_ms.append((claim_ns - enq_ns) / 1_000_000.0)
if claim_ns > 0 and done_ns > 0:
service_ms.append((done_ns - claim_ns) / 1_000_000.0)
if enq_ns > 0 and done_ns > 0:
hop_ms.append((done_ns - enq_ns) / 1_000_000.0)
summary = {
"run_id": run_id,
"done_stream": done_stream,
"matched_jobs": matched,
"hop_ms": {
"p50": pct(hop_ms, 50), "p95": pct(hop_ms, 95), "p99": pct(hop_ms, 99)
},
"queue_ms": {
"p50": pct(queue_ms, 50), "p95": pct(queue_ms, 95), "p99": pct(queue_ms, 99)
},
"service_ms": {
"p50": pct(service_ms, 50), "p95": pct(service_ms, 95), "p99": pct(service_ms, 99)
},
"generated_at": datetime.now(timezone.utc).isoformat()
}
print(json.dumps(summary, indent=2))
if __name__ == "__main__":
main()
Run:
pip install redis
RUN_ID=<value_from_/test/start> python perf/tools/analyze.py
This yields the key percentiles for hop, queue_delay, and service from the authoritative worker-side timestamps.
7) What this sample already covers
-
Correlation:
x-stella-corr-idend-to-end. -
Run isolation:
x-stella-run-idcreated via/test/start, used to filter results. -
Valkey Streams + consumer group: fair claim semantics.
-
Required timestamps:
enq_ns,claim_ns,done_ns. -
Metrics:
gateway_enqueue_secondshistogramvalkey_enqueue_rtt_secondshistogramworker_service_seconds,queue_delay_seconds,hop_latency_secondshistograms
-
Scenarios:
- A Smoke, C Capacity ramp, D Stress, E Burst, F Soak
- G Scaling curve via script (repeat ramp across worker counts)
8) Immediate next hardening steps (still “small”)
- Add queue depth / lag gauges: in worker or gateway poll
XLEN stella:perf:jobsand export as a gauge metric in Prometheus format. - Drain-time measurement: implement
/test/end/{runId}that waits until “matched jobs stop increasing” + queue depth returns baseline, and records a final metric. - Stage slicing (per plateau stats): extend
analyze.pyto accept your k6 stage plan and compute p95 per stage window (based on start_ns).
If you want, I can extend the sample with (1) queue-depth export and (2) per-plateau slicing in analyze.py without adding any new dependencies.