Here’s a compact, ready‑to‑use playbook to **measure and plot performance envelopes for an HTTP → Valkey → Worker hop under variable concurrency**, so you can tune autoscaling and predict user‑visible spikes. --- ## What we’re measuring (plain English) * **TTFB/TTFS (HTTP):** time the gateway spends accepting the request + queuing the job. * **Valkey latency:** enqueue (`LPUSH`/`XADD`), pop/claim (`BRPOP`/`XREADGROUP`), and round‑trip. * **Worker service time:** time to pick up, process, and ack. * **Queueing delay:** time spent waiting in the queue (arrival → start of worker). These four add up to the “hop latency” users feel when the system is under load. --- ## Minimal tracing you can add today Emit these IDs/headers end‑to‑end: * `x-stella-corr-id` (uuid) * `x-stella-enq-ts` (gateway enqueue ts, ns) * `x-stella-claim-ts` (worker claim ts, ns) * `x-stella-done-ts` (worker done ts, ns) From these, compute: * `queue_delay = claim_ts - enq_ts` * `service_time = done_ts - claim_ts` * `http_ttfs = gateway_first_byte_ts - http_request_start_ts` * `hop_latency = done_ts - enq_ts` (or return‑path if synchronous) Clock‑sync tip: use monotonic clocks in code and convert to ns; don’t mix wall‑clock. --- ## Valkey commands (safe, BSD Valkey) Use **Valkey Streams + Consumer Groups** for fairness and metrics: * Enqueue: `XADD jobs * corr-id enq-ts payload <...>` * Claim: `XREADGROUP GROUP workers w1 COUNT 1 BLOCK 1000 STREAMS jobs >` * Ack: `XACK jobs workers ` Add a small Lua for timestamping at enqueue (atomic): ```lua -- KEYS[1]=stream -- ARGV[1]=enq_ts_ns, ARGV[2]=corr_id, ARGV[3]=payload return redis.call('XADD', KEYS[1], '*', 'corr', ARGV[2], 'enq', ARGV[1], 'p', ARGV[3]) ``` --- ## Load shapes to test (find the envelope) 1. **Open‑loop (arrival‑rate controlled):** 50 → 10k req/min in steps; constant rate per step. Reveals queueing onset. 2. **Burst:** 0 → N in short spikes (e.g., 5k in 10s) to see saturation and drain time. 3. **Step‑up/down:** double every 2 min until SLO breach; then halve down. 4. **Long tail soak:** run at 70–80% of max for 1h; watch p95‑p99.9 drift. Target outputs per step: **p50/p90/p95/p99** for `queue_delay`, `service_time`, `hop_latency`, plus **throughput** and **error rate**. --- ## k6 script (HTTP client pressure) ```javascript // save as hop-test.js import http from 'k6/http'; import { check, sleep } from 'k6'; export let options = { scenarios: { step_load: { executor: 'ramping-arrival-rate', startRate: 20, timeUnit: '1s', preAllocatedVUs: 200, maxVUs: 5000, stages: [ { target: 50, duration: '1m' }, { target: 100, duration: '1m' }, { target: 200, duration: '1m' }, { target: 400, duration: '1m' }, { target: 800, duration: '1m' }, ], }, }, thresholds: { 'http_req_failed': ['rate<0.01'], 'http_req_duration{phase:hop}': ['p(95)<500'], }, }; export default function () { const corr = crypto.randomUUID(); const res = http.post( __ENV.GW_URL, JSON.stringify({ data: 'ping', corr }), { headers: { 'Content-Type': 'application/json', 'x-stella-corr-id': corr }, tags: { phase: 'hop' }, } ); check(res, { 'status 2xx/202': r => r.status === 200 || r.status === 202 }); sleep(0.01); } ``` Run: `GW_URL=https://gateway.example/hop k6 run hop-test.js` --- ## Worker hooks (.NET 10 sketch) ```csharp // At claim var now = Stopwatch.GetTimestamp(); // monotonic var claimNs = now.ToNanoseconds(); log.AddTag("x-stella-claim-ts", claimNs); // After processing var doneNs = Stopwatch.GetTimestamp().ToNanoseconds(); log.AddTag("x-stella-done-ts", doneNs); // Include corr-id and stream entry id in logs/metrics ``` Helper: ```csharp public static class MonoTime { static readonly double _nsPerTick = 1_000_000_000d / Stopwatch.Frequency; public static long ToNanoseconds(this long ticks) => (long)(ticks * _nsPerTick); } ``` --- ## Prometheus metrics to expose * `valkey_enqueue_ns` (histogram) * `valkey_claim_block_ms` (gauge) * `worker_service_ns` (histogram, labels: worker_type, route) * `queue_depth` (gauge via `XLEN` or `XINFO STREAM`) * `enqueue_rate`, `dequeue_rate` (counters) Example recording rules: ```yaml - record: hop:queue_delay_p95 expr: histogram_quantile(0.95, sum(rate(valkey_enqueue_ns_bucket[1m])) by (le)) - record: hop:service_time_p95 expr: histogram_quantile(0.95, sum(rate(worker_service_ns_bucket[1m])) by (le)) - record: hop:latency_budget_p95 expr: hop:queue_delay_p95 + hop:service_time_p95 ``` --- ## Autoscaling signals (HPA/KEDA friendly) * **Primary:** queue depth & its derivative (d/dt). * **Secondary:** p95 `queue_delay` and worker CPU. * **Safety:** max in‑flight per worker; backpressure HTTP 429 when `queue_depth > D` or `p95_queue_delay > SLO*0.8`. --- ## Plot the “envelope” (what you’ll look at) * X‑axis: **offered load** (req/s). * Y‑axis: **p95 hop latency** (ms). * Overlay: p99 (dashed), **SLO line** (e.g., 500 ms), and **capacity knee** (where p95 sharply rises). * Add secondary panel: **queue depth** vs load. If you want, I can generate a ready‑made notebook that ingests your logs/metrics CSV and outputs these plots. Below is a **set of implementation guidelines** your agents can follow to build a repeatable performance test system for the **HTTP → Valkey → Worker** pipeline. It’s written as a “spec + runbook” with clear MUST/SHOULD requirements and concrete scenario definitions. --- # Performance Test Guidelines ## HTTP → Valkey → Worker pipeline ## 1) Objectives and scope ### Primary objectives Your performance tests MUST answer these questions with evidence: 1. **Capacity knee**: At what offered load does **queue delay** start growing sharply? 2. **User-impact envelope**: What are p50/p95/p99 **hop latency** curves vs offered load? 3. **Decomposition**: How much of hop latency is: * gateway enqueue time * Valkey enqueue/claim RTT * queue wait time * worker service time 4. **Scaling behavior**: How do these change with worker replica counts (N workers)? 5. **Stability**: Under sustained load, do latencies drift (GC, memory, fragmentation, background jobs)? ### Non-goals (explicitly out of scope unless you add them later) * Micro-optimizing single function runtime * Synthetic “max QPS” records without a representative payload * Tests that don’t collect segment metrics (end-to-end only) for anything beyond basic smoke --- ## 2) Definitions and required metrics ### Required latency definitions (standardize these names) Agents MUST compute and report these per request/job: * **`t_http_accept`**: time from client send → gateway accepts request * **`t_enqueue`**: time spent in gateway to enqueue into Valkey (server-side) * **`t_valkey_rtt_enq`**: client-observed RTT for enqueue command(s) * **`t_queue_delay`**: `claim_ts - enq_ts` * **`t_service`**: `done_ts - claim_ts` * **`t_hop`**: `done_ts - enq_ts` (this is the “true pipeline hop” latency) * Optional but recommended: * **`t_ack`**: time to ack completion (Valkey ack RTT) * **`t_http_response`**: request start → gateway response sent (TTFB/TTFS) ### Required percentiles and aggregations Per scenario step (e.g., each offered load plateau), agents MUST output: * p50 / p90 / p95 / p99 / p99.9 for: `t_hop`, `t_queue_delay`, `t_service`, `t_enqueue` * Throughput: offered rps and achieved rps * Error rate: HTTP failures, enqueue failures, worker failures * Queue depth and backlog drain time ### Required system-level telemetry (minimum) Agents MUST collect these time series during tests: * **Worker**: CPU, memory, GC pauses (if .NET), threadpool saturation indicators * **Valkey**: ops/sec, connected clients, blocked clients, memory used, evictions, slowlog count * **Gateway**: CPU/mem, request rate, response codes, request duration histogram --- ## 3) Environment and test hygiene requirements ### Environment requirements Agents SHOULD run tests in an environment that matches production in: * container CPU/memory limits * number of nodes, network topology * Valkey topology (single, cluster, sentinel, etc.) * worker replica autoscaling rules (or deliberately disabled) If exact parity isn’t possible, agents MUST record all known differences in the report. ### Test hygiene (non-negotiable) Agents MUST: 1. **Start from empty queues** (no backlog). 2. **Disable client retries** (or explicitly run two variants: retries off / retries on). 3. **Warm up** before measuring (e.g., 60s warm-up minimum). 4. **Hold steady plateaus** long enough to stabilize (usually 2–5 minutes per step). 5. **Cool down** and verify backlog drains (queue depth returns to baseline). 6. Record exact versions/SHAs of gateway/worker and Valkey config. ### Load generator hygiene Agents MUST ensure the load generator is not the bottleneck: * CPU < ~70% during test * no local socket exhaustion * enough VUs/connections * if needed, distributed load generation --- ## 4) Instrumentation spec (agents implement this first) ### Correlation and timestamps Agents MUST propagate an end-to-end correlation ID and timestamps. **Required fields** * `corr_id` (UUID) * `enq_ts_ns` (set at enqueue, monotonic or consistent clock) * `claim_ts_ns` (set by worker when job is claimed) * `done_ts_ns` (set by worker when job processing ends) **Where these live** * HTTP request header: `x-corr-id: ` * Valkey job payload fields: `corr`, `enq`, and optionally payload size/type * Worker logs/metrics: include `corr_id`, job id, `claim_ts_ns`, `done_ts_ns` ### Clock requirements Agents MUST use a consistent timing source: * Prefer monotonic timers for durations (Stopwatch / monotonic clock) * If timestamps cross machines, ensure they’re comparable: * either rely on synchronized clocks (NTP) **and** monitor drift * or compute durations using monotonic tick deltas within the same host and transmit durations (less ideal for queue delay) **Practical recommendation**: use wall-clock ns for cross-host timestamps with NTP + drift checks, and also record per-host monotonic durations for sanity. ### Valkey queue semantics (recommended) Agents SHOULD use **Streams + Consumer Groups** for stable claim semantics and good observability: * Enqueue: `XADD jobs * corr enq payload <...>` * Claim: `XREADGROUP GROUP workers COUNT 1 BLOCK 1000 STREAMS jobs >` * Ack: `XACK jobs workers ` Agents MUST record stream length (`XLEN`) or consumer group lag (`XINFO GROUPS`) as queue depth/lag. ### Metrics exposure Agents MUST publish Prometheus (or equivalent) histograms: * `gateway_enqueue_seconds` (or ns) histogram * `valkey_enqueue_rtt_seconds` histogram * `worker_service_seconds` histogram * `queue_delay_seconds` histogram (derived from timestamps; can be computed in worker or offline) * `hop_latency_seconds` histogram --- ## 5) Workload modeling and test data Agents MUST define a workload model before running capacity tests: 1. **Endpoint(s)**: list exact gateway routes under test 2. **Payload types**: small/typical/large 3. **Mix**: e.g., 70/25/5 by payload size 4. **Idempotency rules**: ensure repeated jobs don’t corrupt state 5. **Data reset strategy**: how test data is cleaned or isolated per run Agents SHOULD test at least: * Typical payload (p50) * Large payload (p95) * Worst-case allowed payload (bounded by your API limits) --- ## 6) Scenario suite your agents MUST implement Each scenario MUST be defined as code/config (not manual). ### Scenario A — Smoke (fast sanity) **Goal**: verify instrumentation + basic correctness **Load**: low (e.g., 1–5 rps), 2 minutes **Pass**: * 0 backlog after run * error rate < 0.1% * metrics present for all segments ### Scenario B — Baseline (repeatable reference point) **Goal**: establish a stable baseline for regression tracking **Load**: fixed moderate load (e.g., 30–50% of expected capacity), 10 minutes **Pass**: * p95 `t_hop` within baseline ± tolerance (set after first runs) * no upward drift in p95 across time (trend line ~flat) ### Scenario C — Capacity ramp (open-loop) **Goal**: find the knee where queueing begins **Method**: open-loop arrival-rate ramp with plateaus Example stages (edit to fit your system): * 50 rps for 2m * 100 rps for 2m * 200 rps for 2m * 400 rps for 2m * … until SLO breach or errors spike **MUST**: * warm-up stage before first plateau * record per-plateau summary **Stop conditions** (any triggers stop): * error rate > 1% * queue depth grows without bound over an entire plateau * p95 `t_hop` exceeds SLO for 2 consecutive plateaus ### Scenario D — Stress (push past capacity) **Goal**: characterize failure mode and recovery **Load**: 120–200% of knee load, 5–10 minutes **Pass** (for resilience): * system does not crash permanently * once load stops, backlog drains within target time (define it) ### Scenario E — Burst / spike **Goal**: see how quickly queue grows and drains **Load shape**: * baseline low load * sudden burst (e.g., 10× for 10–30s) * return to baseline **Report**: * peak queue depth * time to drain to baseline * p99 `t_hop` during burst ### Scenario F — Soak (long-running) **Goal**: detect drift (leaks, fragmentation, GC patterns) **Load**: 70–85% of knee, 60–180 minutes **Pass**: * p95 does not trend upward beyond threshold * memory remains bounded * no rising error rate ### Scenario G — Scaling curve (worker replica sweep) **Goal**: turn results into scaling rules **Method**: * Repeat Scenario C with worker replicas = 1, 2, 4, 8… **Deliverable**: * plot of knee load vs worker count * p95 `t_service` vs worker count (should remain similar; queue delay should drop) --- ## 7) Execution protocol (runbook) Agents MUST run every scenario using the same disciplined flow: ### Pre-run checklist * confirm system versions/SHAs * confirm autoscaling mode: * **Off** for baseline capacity characterization * **On** for validating autoscaling policies * clear queues and consumer group pending entries * restart or at least record “time since deploy” for services (cold vs warm) ### During run * ensure load is truly open-loop when required (arrival-rate based) * continuously record: * offered vs achieved rate * queue depth * CPU/mem for gateway/worker/Valkey ### Post-run * stop load * wait until backlog drains (or record that it doesn’t) * export: * k6/runner raw output * Prometheus time series snapshot * sampled logs with corr_id fields * generate a summary report automatically (no hand calculations) --- ## 8) Analysis rules (how agents compute “the envelope”) Agents MUST generate at minimum two plots per run: 1. **Latency envelope**: offered load (x-axis) vs p95 `t_hop` (y-axis) * overlay p99 (and SLO line) 2. **Queue behavior**: offered load vs queue depth (or lag), plus drain time ### How to identify the “knee” Agents SHOULD mark the knee as the first plateau where: * queue depth grows monotonically within the plateau, **or** * p95 `t_queue_delay` increases by > X% step-to-step (e.g., 50–100%) ### Convert results into scaling guidance Agents SHOULD compute: * `capacity_per_worker ≈ 1 / mean(t_service)` (jobs/sec per worker) * recommended replicas for offered load λ at target utilization U: * `workers_needed = ceil(λ * mean(t_service) / U)` * choose U ~ 0.6–0.75 for headroom This should be reported alongside the measured envelope. --- ## 9) Pass/fail criteria and regression gates Agents MUST define gates in configuration, not in someone’s head. Suggested gating structure: * **Smoke gate**: error rate < 0.1%, backlog drains * **Baseline gate**: p95 `t_hop` regression < 10% (tune after you have history) * **Capacity gate**: knee load regression < 10% (optional but very valuable) * **Soak gate**: p95 drift over time < 15% and no memory runaway --- ## 10) Common pitfalls (agents must avoid) 1. **Closed-loop tests used for capacity** Closed-loop (“N concurrent users”) self-throttles and can hide queueing onset. Use open-loop arrival rate for capacity. 2. **Ignoring queue depth** A system can look “healthy” in request latency while silently building backlog. 3. **Measuring only gateway latency** You must measure enqueue → claim → done to see the real hop. 4. **Load generator bottleneck** If the generator saturates, you’ll under-estimate capacity. 5. **Retries enabled by default** Retries can inflate load and hide root causes; run with retries off first. 6. **Not controlling warm vs cold** Cold caches vs warmed services produce different envelopes; record the condition. --- # Agent implementation checklist (deliverables) Assign these as concrete tasks to your agents. ## Agent 1 — Observability & tracing MUST deliver: * correlation id propagation gateway → Valkey → worker * timestamps `enq/claim/done` * Prometheus histograms for enqueue, service, hop * queue depth metric (`XLEN` / `XINFO` lag) ## Agent 2 — Load test harness MUST deliver: * test runner scripts (k6 or equivalent) for scenarios A–G * test config file (YAML/JSON) controlling: * stages (rates/durations) * payload mix * headers (corr-id) * reproducible seeds and version stamping ## Agent 3 — Result collector and analyzer MUST deliver: * a pipeline that merges: * load generator output * hop timing data (from logs or a completion stream) * Prometheus snapshots * automatic summary + plots: * latency envelope * queue depth/drain * CSV/JSON exports for long-term tracking ## Agent 4 — Reporting and dashboards MUST deliver: * a standard report template that includes: * environment details * scenario details * key charts * knee estimate * scaling recommendation * Grafana dashboard with the required panels ## Agent 5 — CI / release integration SHOULD deliver: * PR-level smoke test (Scenario A) * nightly baseline (Scenario B) * weekly capacity sweep (Scenario C + scaling curve) --- ## Template: scenario spec (agents can copy/paste) ```yaml test_run: system_under_test: gateway_sha: "" worker_sha: "" valkey_version: "" environment: cluster: "" workers: 4 autoscaling: "off" # off|on workload: endpoint: "/hop" payload_profile: "p50" mix: p50: 0.7 p95: 0.25 max: 0.05 scenario: name: "capacity_ramp" mode: "open_loop" warmup_seconds: 60 stages: - rps: 50 duration_seconds: 120 - rps: 100 duration_seconds: 120 - rps: 200 duration_seconds: 120 - rps: 400 duration_seconds: 120 gates: max_error_rate: 0.01 slo_ms_p95_hop: 500 backlog_must_drain_seconds: 300 outputs: artifacts_dir: "./artifacts//" ``` --- If you want, I can also provide a **single “golden” folder structure** (tests/ scripts/ dashboards/ analysis/) and a “definition of done” checklist that matches how your repo is organized—but the above is already sufficient for agents to start implementing immediately. Below is a **sample / partial implementation** that gives **full functional coverage** of your performance-test requirements (instrumentation, correlation, timestamps, queue semantics, scenarios A–G, artifact export, and analysis). It is intentionally minimal and “swap-in-real-code” friendly. You can copy these files into a `perf/` folder in your repo, build, and run locally with Docker Compose. --- ## 1) Suggested folder layout ``` perf/ docker-compose.yml prometheus/ prometheus.yml k6/ lib.js smoke.js capacity_ramp.js burst.js soak.js stress.js scaling_curve.sh tools/ analyze.py src/ Perf.Gateway/ Perf.Gateway.csproj Program.cs Metrics.cs ValkeyStreams.cs TimeNs.cs Perf.Worker/ Perf.Worker.csproj Program.cs WorkerService.cs Metrics.cs ValkeyStreams.cs TimeNs.cs ``` --- ## 2) Gateway sample (.NET 10, Minimal API) ### `perf/src/Perf.Gateway/Perf.Gateway.csproj` ```xml net10.0 enable enable ``` ### `perf/src/Perf.Gateway/TimeNs.cs` ```csharp namespace Perf.Gateway; public static class TimeNs { private static readonly long UnixEpochTicks = DateTime.UnixEpoch.Ticks; // 100ns units public static long UnixNowNs() { var ticks = DateTime.UtcNow.Ticks - UnixEpochTicks; // 100ns return ticks * 100L; // ns } } ``` ### `perf/src/Perf.Gateway/Metrics.cs` ```csharp using System.Collections.Concurrent; using System.Globalization; using System.Text; namespace Perf.Gateway; public sealed class Metrics { private readonly ConcurrentDictionary _counters = new(); // Simple fixed-bucket histograms in seconds (Prometheus histogram format) private readonly ConcurrentDictionary _h = new(); public void Inc(string name, long by = 1) => _counters.AddOrUpdate(name, by, (_, v) => v + by); public Histogram Hist(string name, double[] bucketsSeconds) => _h.GetOrAdd(name, _ => new Histogram(name, bucketsSeconds)); public string ExportPrometheus() { var sb = new StringBuilder(16 * 1024); foreach (var (k, v) in _counters.OrderBy(kv => kv.Key)) { sb.Append("# TYPE ").Append(k).Append(" counter\n"); sb.Append(k).Append(' ').Append(v.ToString(CultureInfo.InvariantCulture)).Append('\n'); } foreach (var hist in _h.Values.OrderBy(h => h.Name)) { sb.Append(hist.Export()); } return sb.ToString(); } public sealed class Histogram { public string Name { get; } private readonly double[] _buckets; // sorted private readonly long[] _bucketCounts; // cumulative exposed later private long _count; private double _sum; private readonly object _lock = new(); public Histogram(string name, double[] bucketsSeconds) { Name = name; _buckets = bucketsSeconds.OrderBy(x => x).ToArray(); _bucketCounts = new long[_buckets.Length]; } public void Observe(double seconds) { lock (_lock) { _count++; _sum += seconds; for (int i = 0; i < _buckets.Length; i++) { if (seconds <= _buckets[i]) _bucketCounts[i]++; } } } public string Export() { // Prometheus hist buckets are cumulative; we already maintain that. var sb = new StringBuilder(2048); sb.Append("# TYPE ").Append(Name).Append(" histogram\n"); lock (_lock) { for (int i = 0; i < _buckets.Length; i++) { sb.Append(Name).Append("_bucket{le=\"") .Append(_buckets[i].ToString("0.################", CultureInfo.InvariantCulture)) .Append("\"} ") .Append(_bucketCounts[i].ToString(CultureInfo.InvariantCulture)) .Append('\n'); } sb.Append(Name).Append("_bucket{le=\"+Inf\"} ") .Append(_count.ToString(CultureInfo.InvariantCulture)) .Append('\n'); sb.Append(Name).Append("_sum ") .Append(_sum.ToString(CultureInfo.InvariantCulture)) .Append('\n'); sb.Append(Name).Append("_count ") .Append(_count.ToString(CultureInfo.InvariantCulture)) .Append('\n'); } return sb.ToString(); } } } ``` ### `perf/src/Perf.Gateway/ValkeyStreams.cs` ```csharp using StackExchange.Redis; namespace Perf.Gateway; public sealed class ValkeyStreams { private readonly IDatabase _db; public ValkeyStreams(IConnectionMultiplexer mux) => _db = mux.GetDatabase(); public async Task EnsureConsumerGroupAsync(string stream, string group) { try { // XGROUP CREATE $ MKSTREAM await _db.ExecuteAsync("XGROUP", "CREATE", stream, group, "$", "MKSTREAM"); } catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.OrdinalIgnoreCase)) { // ok } } public async Task XAddAsync(string stream, NameValueEntry[] fields) { // XADD stream * field value field value ... var args = new List(2 + fields.Length * 2) { stream, "*" }; foreach (var f in fields) { args.Add(f.Name); args.Add(f.Value); } return await _db.ExecuteAsync("XADD", args.ToArray()); } } ``` ### `perf/src/Perf.Gateway/Program.cs` ```csharp using Perf.Gateway; using StackExchange.Redis; using System.Diagnostics; var builder = WebApplication.CreateBuilder(args); var valkey = builder.Configuration["VALKEY_ENDPOINT"] ?? "valkey:6379"; builder.Services.AddSingleton(_ => ConnectionMultiplexer.Connect(valkey)); builder.Services.AddSingleton(); builder.Services.AddSingleton(); var app = builder.Build(); var metrics = app.Services.GetRequiredService(); var streams = app.Services.GetRequiredService(); const string JobsStream = "stella:perf:jobs"; const string DoneStream = "stella:perf:done"; const string Group = "workers"; await streams.EnsureConsumerGroupAsync(JobsStream, Group); var allowTestControl = (app.Configuration["ALLOW_TEST_CONTROL"] ?? "1") == "1"; var runs = new Dictionary(StringComparer.Ordinal); // run_id -> start_ns if (allowTestControl) { app.MapPost("/test/start", () => { var runId = Guid.NewGuid().ToString("N"); var startNs = TimeNs.UnixNowNs(); lock (runs) runs[runId] = startNs; metrics.Inc("perf_test_start_total"); return Results.Ok(new { run_id = runId, start_ns = startNs, jobs_stream = JobsStream, done_stream = DoneStream }); }); app.MapPost("/test/end/{runId}", (string runId) => { lock (runs) runs.Remove(runId); metrics.Inc("perf_test_end_total"); return Results.Ok(new { run_id = runId }); }); } app.MapGet("/metrics", () => Results.Text(metrics.ExportPrometheus(), "text/plain; version=0.0.4")); app.MapPost("/hop", async (HttpRequest req) => { // Correlation / run id var corr = req.Headers["x-stella-corr-id"].FirstOrDefault() ?? Guid.NewGuid().ToString(); var runId = req.Headers["x-stella-run-id"].FirstOrDefault() ?? "no-run"; // Enqueue timestamp (UTC-derived ns) var enqNs = TimeNs.UnixNowNs(); // Read raw body (payload) - keep it simple for perf harness string payload; using (var sr = new StreamReader(req.Body)) payload = await sr.ReadToEndAsync(); var sw = Stopwatch.GetTimestamp(); // Valkey enqueue var valkeySw = Stopwatch.GetTimestamp(); var entryId = await streams.XAddAsync(JobsStream, new[] { new NameValueEntry("corr", corr), new NameValueEntry("run", runId), new NameValueEntry("enq_ns", enqNs), new NameValueEntry("payload", payload), }); var valkeyRttSec = (Stopwatch.GetTimestamp() - valkeySw) / (double)Stopwatch.Frequency; var enqueueSec = (Stopwatch.GetTimestamp() - sw) / (double)Stopwatch.Frequency; metrics.Inc("hop_requests_total"); metrics.Hist("gateway_enqueue_seconds", new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2 }).Observe(enqueueSec); metrics.Hist("valkey_enqueue_rtt_seconds", new[] { .0005, .001, .002, .005, .01, .02, .05, .1, .2 }).Observe(valkeyRttSec); return Results.Accepted(value: new { corr, run_id = runId, enq_ns = enqNs, entry_id = entryId.ToString() }); }); app.Run("http://0.0.0.0:8080"); ``` --- ## 3) Worker sample (.NET 10 hosted service + metrics) ### `perf/src/Perf.Worker/Perf.Worker.csproj` ```xml net10.0 enable enable ``` ### `perf/src/Perf.Worker/TimeNs.cs` ```csharp namespace Perf.Worker; public static class TimeNs { private static readonly long UnixEpochTicks = DateTime.UnixEpoch.Ticks; public static long UnixNowNs() => (DateTime.UtcNow.Ticks - UnixEpochTicks) * 100L; } ``` ### `perf/src/Perf.Worker/Metrics.cs` ```csharp // Same as gateway Metrics.cs (copy/paste). Keep identical for consistency. using System.Collections.Concurrent; using System.Globalization; using System.Text; namespace Perf.Worker; public sealed class Metrics { private readonly ConcurrentDictionary _counters = new(); private readonly ConcurrentDictionary _h = new(); public void Inc(string name, long by = 1) => _counters.AddOrUpdate(name, by, (_, v) => v + by); public Histogram Hist(string name, double[] bucketsSeconds) => _h.GetOrAdd(name, _ => new Histogram(name, bucketsSeconds)); public string ExportPrometheus() { var sb = new StringBuilder(16 * 1024); foreach (var (k, v) in _counters.OrderBy(kv => kv.Key)) { sb.Append("# TYPE ").Append(k).Append(" counter\n"); sb.Append(k).Append(' ').Append(v.ToString(CultureInfo.InvariantCulture)).Append('\n'); } foreach (var hist in _h.Values.OrderBy(h => h.Name)) sb.Append(hist.Export()); return sb.ToString(); } public sealed class Histogram { public string Name { get; } private readonly double[] _buckets; private readonly long[] _bucketCounts; private long _count; private double _sum; private readonly object _lock = new(); public Histogram(string name, double[] bucketsSeconds) { Name = name; _buckets = bucketsSeconds.OrderBy(x => x).ToArray(); _bucketCounts = new long[_buckets.Length]; } public void Observe(double seconds) { lock (_lock) { _count++; _sum += seconds; for (int i = 0; i < _buckets.Length; i++) if (seconds <= _buckets[i]) _bucketCounts[i]++; } } public string Export() { var sb = new StringBuilder(2048); sb.Append("# TYPE ").Append(Name).Append(" histogram\n"); lock (_lock) { for (int i = 0; i < _buckets.Length; i++) { sb.Append(Name).Append("_bucket{le=\"") .Append(_buckets[i].ToString("0.################", CultureInfo.InvariantCulture)) .Append("\"} ") .Append(_bucketCounts[i].ToString(CultureInfo.InvariantCulture)) .Append('\n'); } sb.Append(Name).Append("_bucket{le=\"+Inf\"} ") .Append(_count.ToString(CultureInfo.InvariantCulture)) .Append('\n'); sb.Append(Name).Append("_sum ") .Append(_sum.ToString(CultureInfo.InvariantCulture)) .Append('\n'); sb.Append(Name).Append("_count ") .Append(_count.ToString(CultureInfo.InvariantCulture)) .Append('\n'); } return sb.ToString(); } } } ``` ### `perf/src/Perf.Worker/ValkeyStreams.cs` ```csharp using StackExchange.Redis; namespace Perf.Worker; public sealed class ValkeyStreams { private readonly IDatabase _db; public ValkeyStreams(IConnectionMultiplexer mux) => _db = mux.GetDatabase(); public async Task EnsureConsumerGroupAsync(string stream, string group) { try { await _db.ExecuteAsync("XGROUP", "CREATE", stream, group, "$", "MKSTREAM"); } catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.OrdinalIgnoreCase)) { } } public async Task XReadGroupAsync(string group, string consumer, string stream, string id, int count, int blockMs) => await _db.ExecuteAsync("XREADGROUP", "GROUP", group, consumer, "COUNT", count, "BLOCK", blockMs, "STREAMS", stream, id); public async Task XAckAsync(string stream, string group, RedisValue id) => await _db.ExecuteAsync("XACK", stream, group, id); public async Task XAddAsync(string stream, NameValueEntry[] fields) { var args = new List(2 + fields.Length * 2) { stream, "*" }; foreach (var f in fields) { args.Add(f.Name); args.Add(f.Value); } return await _db.ExecuteAsync("XADD", args.ToArray()); } } ``` ### `perf/src/Perf.Worker/WorkerService.cs` ```csharp using StackExchange.Redis; using System.Diagnostics; namespace Perf.Worker; public sealed class WorkerService : BackgroundService { private readonly ValkeyStreams _streams; private readonly Metrics _metrics; private readonly ILogger _log; private const string JobsStream = "stella:perf:jobs"; private const string DoneStream = "stella:perf:done"; private const string Group = "workers"; private readonly string _consumer; public WorkerService(ValkeyStreams streams, Metrics metrics, ILogger log) { _streams = streams; _metrics = metrics; _log = log; _consumer = Environment.GetEnvironmentVariable("WORKER_CONSUMER") ?? $"w-{Environment.MachineName}-{Guid.NewGuid():N}"; } protected override async Task ExecuteAsync(CancellationToken stoppingToken) { await _streams.EnsureConsumerGroupAsync(JobsStream, Group); var serviceBuckets = new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5 }; var queueBuckets = new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10, 30 }; var hopBuckets = new[] { .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10, 30 }; while (!stoppingToken.IsCancellationRequested) { RedisResult res; try { res = await _streams.XReadGroupAsync(Group, _consumer, JobsStream, ">", count: 1, blockMs: 1000); } catch (Exception ex) { _metrics.Inc("worker_xread_errors_total"); _log.LogWarning(ex, "XREADGROUP failed"); await Task.Delay(250, stoppingToken); continue; } if (res.IsNull) continue; // Parse XREADGROUP result (array -> stream -> entries) // Expected shape: [[stream, [[id, [field, value, field, value...]], ...]]] var outer = (RedisResult[])res!; foreach (var streamBlock in outer) { var sb = (RedisResult[])streamBlock!; var entries = (RedisResult[])sb[1]!; foreach (var entry in entries) { var e = (RedisResult[])entry!; var entryId = (RedisValue)e[0]!; var fields = (RedisResult[])e[1]!; string corr = "", run = "no-run"; long enqNs = 0; for (int i = 0; i < fields.Length; i += 2) { var key = (string)fields[i]!; var val = fields[i + 1].ToString(); if (key == "corr") corr = val; else if (key == "run") run = val; else if (key == "enq_ns") _ = long.TryParse(val, out enqNs); } var claimNs = TimeNs.UnixNowNs(); var sw = Stopwatch.GetTimestamp(); // Placeholder "service work" – replace with real processing // Keep it deterministic-ish; use env var to model different service times. var workMs = int.TryParse(Environment.GetEnvironmentVariable("WORK_MS"), out var ms) ? ms : 5; await Task.Delay(workMs, stoppingToken); var doneNs = TimeNs.UnixNowNs(); var serviceSec = (Stopwatch.GetTimestamp() - sw) / (double)Stopwatch.Frequency; var queueDelaySec = enqNs > 0 ? (claimNs - enqNs) / 1_000_000_000d : double.NaN; var hopSec = enqNs > 0 ? (doneNs - enqNs) / 1_000_000_000d : double.NaN; // Ack then publish "done" record for offline analysis await _streams.XAckAsync(JobsStream, Group, entryId); await _streams.XAddAsync(DoneStream, new[] { new NameValueEntry("run", run), new NameValueEntry("corr", corr), new NameValueEntry("entry", entryId), new NameValueEntry("enq_ns", enqNs), new NameValueEntry("claim_ns", claimNs), new NameValueEntry("done_ns", doneNs), new NameValueEntry("work_ms", workMs), }); _metrics.Inc("worker_jobs_total"); _metrics.Hist("worker_service_seconds", serviceBuckets).Observe(serviceSec); if (!double.IsNaN(queueDelaySec)) _metrics.Hist("queue_delay_seconds", queueBuckets).Observe(queueDelaySec); if (!double.IsNaN(hopSec)) _metrics.Hist("hop_latency_seconds", hopBuckets).Observe(hopSec); } } } } } ``` ### `perf/src/Perf.Worker/Program.cs` ```csharp using Perf.Worker; using StackExchange.Redis; var builder = Host.CreateApplicationBuilder(args); var valkey = builder.Configuration["VALKEY_ENDPOINT"] ?? "valkey:6379"; builder.Services.AddSingleton(_ => ConnectionMultiplexer.Connect(valkey)); builder.Services.AddSingleton(); builder.Services.AddSingleton(); builder.Services.AddHostedService(); // Minimal metrics endpoint builder.Services.AddSingleton(sp => { return new SimpleMetricsServer( sp.GetRequiredService(), url: "http://0.0.0.0:8081/metrics" ); }); var host = builder.Build(); await host.RunAsync(); // ---- minimal metrics server ---- file sealed class SimpleMetricsServer : BackgroundService { private readonly Metrics _metrics; private readonly string _url; public SimpleMetricsServer(Metrics metrics, string url) { _metrics = metrics; _url = url; } protected override async Task ExecuteAsync(CancellationToken stoppingToken) { var builder = WebApplication.CreateBuilder(); var app = builder.Build(); app.MapGet("/metrics", () => Results.Text(_metrics.ExportPrometheus(), "text/plain; version=0.0.4")); await app.RunAsync(_url, stoppingToken); } } ``` --- ## 4) Docker Compose (Valkey + gateway + worker + Prometheus) ### `perf/docker-compose.yml` ```yaml services: valkey: image: valkey/valkey:7.2 ports: ["6379:6379"] gateway: build: context: ./src/Perf.Gateway environment: - VALKEY_ENDPOINT=valkey:6379 - ALLOW_TEST_CONTROL=1 ports: ["8080:8080"] depends_on: [valkey] worker: build: context: ./src/Perf.Worker environment: - VALKEY_ENDPOINT=valkey:6379 - WORK_MS=5 ports: ["8081:8081"] depends_on: [valkey] prometheus: image: prom/prometheus:v2.55.0 volumes: - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro ports: ["9090:9090"] depends_on: [gateway, worker] ``` ### `perf/prometheus/prometheus.yml` ```yaml global: scrape_interval: 5s scrape_configs: - job_name: gateway static_configs: - targets: ["gateway:8080"] - job_name: worker static_configs: - targets: ["worker:8081"] ``` Run: ```bash cd perf docker compose up -d --build ``` --- ## 5) k6 scenarios A–G (open-loop where required) ### `perf/k6/lib.js` ```javascript import http from "k6/http"; export function startRun(baseUrl) { const res = http.post(`${baseUrl}/test/start`, null, { tags: { phase: "control" } }); if (res.status !== 200) throw new Error(`startRun failed: ${res.status} ${res.body}`); return res.json(); } export function hop(baseUrl, runId) { const corr = crypto.randomUUID(); const payload = JSON.stringify({ corr, data: "ping" }); return http.post( `${baseUrl}/hop`, payload, { headers: { "content-type": "application/json", "x-stella-run-id": runId, "x-stella-corr-id": corr }, tags: { phase: "hop" } } ); } ``` ### Scenario A: Smoke — `perf/k6/smoke.js` ```javascript import { check, sleep } from "k6"; import { startRun, hop } from "./lib.js"; export const options = { scenarios: { smoke: { executor: "constant-arrival-rate", rate: 2, timeUnit: "1s", duration: "2m", preAllocatedVUs: 20, maxVUs: 200 } }, thresholds: { http_req_failed: ["rate<0.001"] } }; export function setup() { return startRun(__ENV.GW_URL); } export default function (data) { const res = hop(__ENV.GW_URL, data.run_id); check(res, { "202 accepted": r => r.status === 202 }); sleep(0.01); } ``` ### Scenario C: Capacity ramp (open-loop) — `perf/k6/capacity_ramp.js` ```javascript import { check } from "k6"; import { startRun, hop } from "./lib.js"; export const options = { scenarios: { ramp: { executor: "ramping-arrival-rate", startRate: 50, timeUnit: "1s", preAllocatedVUs: 200, maxVUs: 5000, stages: [ { target: 50, duration: "2m" }, { target: 100, duration: "2m" }, { target: 200, duration: "2m" }, { target: 400, duration: "2m" }, { target: 800, duration: "2m" } ] } }, thresholds: { http_req_failed: ["rate<0.01"] } }; export function setup() { return startRun(__ENV.GW_URL); } export default function (data) { const res = hop(__ENV.GW_URL, data.run_id); check(res, { "202 accepted": r => r.status === 202 }); } ``` ### Scenario E: Burst — `perf/k6/burst.js` ```javascript import { check } from "k6"; import { startRun, hop } from "./lib.js"; export const options = { scenarios: { burst: { executor: "ramping-arrival-rate", startRate: 20, timeUnit: "1s", preAllocatedVUs: 200, maxVUs: 5000, stages: [ { target: 20, duration: "60s" }, { target: 400, duration: "20s" }, { target: 20, duration: "120s" } ] } } }; export function setup() { return startRun(__ENV.GW_URL); } export default function (data) { const res = hop(__ENV.GW_URL, data.run_id); check(res, { "202": r => r.status === 202 }); } ``` ### Scenario F: Soak — `perf/k6/soak.js` ```javascript import { check } from "k6"; import { startRun, hop } from "./lib.js"; export const options = { scenarios: { soak: { executor: "constant-arrival-rate", rate: 200, timeUnit: "1s", duration: "60m", preAllocatedVUs: 500, maxVUs: 5000 } } }; export function setup() { return startRun(__ENV.GW_URL); } export default function (data) { const res = hop(__ENV.GW_URL, data.run_id); check(res, { "202": r => r.status === 202 }); } ``` ### Scenario D: Stress — `perf/k6/stress.js` ```javascript import { check } from "k6"; import { startRun, hop } from "./lib.js"; export const options = { scenarios: { stress: { executor: "constant-arrival-rate", rate: 1500, timeUnit: "1s", duration: "10m", preAllocatedVUs: 2000, maxVUs: 15000 } }, thresholds: { http_req_failed: ["rate<0.05"] } }; export function setup() { return startRun(__ENV.GW_URL); } export default function (data) { const res = hop(__ENV.GW_URL, data.run_id); check(res, { "202": r => r.status === 202 }); } ``` ### Scenario G: Scaling curve orchestration — `perf/k6/scaling_curve.sh` ```bash #!/usr/bin/env bash set -euo pipefail GW_URL="${GW_URL:-http://localhost:8080}" for n in 1 2 4 8; do echo "== Scaling workers to $n ==" docker compose -f ../docker-compose.yml up -d --scale worker="$n" mkdir -p "../artifacts/scale-$n" k6 run \ -e GW_URL="$GW_URL" \ --summary-export "../artifacts/scale-$n/k6-summary.json" \ ./capacity_ramp.js done ``` Run (examples): ```bash cd perf/k6 GW_URL=http://localhost:8080 k6 run --summary-export ../artifacts/smoke-summary.json smoke.js GW_URL=http://localhost:8080 k6 run --summary-export ../artifacts/ramp-summary.json capacity_ramp.js ``` --- ## 6) Offline analysis tool (reads “done” stream by run_id) ### `perf/tools/analyze.py` ```python import os, sys, json, math from datetime import datetime, timezone import redis def pct(values, p): if not values: return None values = sorted(values) k = (len(values) - 1) * (p / 100.0) f = math.floor(k); c = math.ceil(k) if f == c: return values[int(k)] return values[f] * (c - k) + values[c] * (k - f) def main(): valkey = os.getenv("VALKEY_ENDPOINT", "localhost:6379") host, port = valkey.split(":") r = redis.Redis(host=host, port=int(port), decode_responses=True) run_id = os.getenv("RUN_ID") if not run_id: print("Set RUN_ID env var (from /test/start response).", file=sys.stderr) sys.exit(2) done_stream = os.getenv("DONE_STREAM", "stella:perf:done") # Read all entries (sample scale). For big runs use XREAD with cursor. entries = r.xrange(done_stream, min='-', max='+', count=200000) hop_ms = [] queue_ms = [] service_ms = [] matched = 0 for entry_id, fields in entries: if fields.get("run") != run_id: continue matched += 1 enq_ns = int(fields.get("enq_ns", "0")) claim_ns = int(fields.get("claim_ns", "0")) done_ns = int(fields.get("done_ns", "0")) if enq_ns > 0 and claim_ns > 0: queue_ms.append((claim_ns - enq_ns) / 1_000_000.0) if claim_ns > 0 and done_ns > 0: service_ms.append((done_ns - claim_ns) / 1_000_000.0) if enq_ns > 0 and done_ns > 0: hop_ms.append((done_ns - enq_ns) / 1_000_000.0) summary = { "run_id": run_id, "done_stream": done_stream, "matched_jobs": matched, "hop_ms": { "p50": pct(hop_ms, 50), "p95": pct(hop_ms, 95), "p99": pct(hop_ms, 99) }, "queue_ms": { "p50": pct(queue_ms, 50), "p95": pct(queue_ms, 95), "p99": pct(queue_ms, 99) }, "service_ms": { "p50": pct(service_ms, 50), "p95": pct(service_ms, 95), "p99": pct(service_ms, 99) }, "generated_at": datetime.now(timezone.utc).isoformat() } print(json.dumps(summary, indent=2)) if __name__ == "__main__": main() ``` Run: ```bash pip install redis RUN_ID= python perf/tools/analyze.py ``` This yields the **key percentiles** for `hop`, `queue_delay`, and `service` from the authoritative worker-side timestamps. --- ## 7) What this sample already covers * **Correlation**: `x-stella-corr-id` end-to-end. * **Run isolation**: `x-stella-run-id` created via `/test/start`, used to filter results. * **Valkey Streams + consumer group**: fair claim semantics. * **Required timestamps**: `enq_ns`, `claim_ns`, `done_ns`. * **Metrics**: * `gateway_enqueue_seconds` histogram * `valkey_enqueue_rtt_seconds` histogram * `worker_service_seconds`, `queue_delay_seconds`, `hop_latency_seconds` histograms * **Scenarios**: * A Smoke, C Capacity ramp, D Stress, E Burst, F Soak * G Scaling curve via script (repeat ramp across worker counts) --- ## 8) Immediate next hardening steps (still “small”) 1. **Add queue depth / lag gauges**: in worker or gateway poll `XLEN stella:perf:jobs` and export as a gauge metric in Prometheus format. 2. **Drain-time measurement**: implement `/test/end/{runId}` that waits until “matched jobs stop increasing” + queue depth returns baseline, and records a final metric. 3. **Stage slicing** (per plateau stats): extend `analyze.py` to accept your k6 stage plan and compute p95 per stage window (based on start_ns). If you want, I can extend the sample with (1) queue-depth export and (2) per-plateau slicing in `analyze.py` without adding any new dependencies.