Files

master d0a7b88398 move docs/**/archived/* to docs-archived/**/*

2026-01-05 16:02:11 +02:00

48 KiB

Raw Blame History

Here’s a compact, ready‑to‑use playbook to measure and plot performance envelopes for an HTTP → Valkey → Worker hop under variable concurrency, so you can tune autoscaling and predict user‑visible spikes.

What we’re measuring (plain English)

TTFB/TTFS (HTTP): time the gateway spends accepting the request + queuing the job.
Valkey latency: enqueue (LPUSH/XADD), pop/claim (BRPOP/XREADGROUP), and round‑trip.
Worker service time: time to pick up, process, and ack.
Queueing delay: time spent waiting in the queue (arrival → start of worker).

These four add up to the “hop latency” users feel when the system is under load.

Minimal tracing you can add today

Emit these IDs/headers end‑to‑end:

x-stella-corr-id (uuid)
x-stella-enq-ts (gateway enqueue ts, ns)
x-stella-claim-ts (worker claim ts, ns)
x-stella-done-ts (worker done ts, ns)

From these, compute:

queue_delay = claim_ts - enq_ts
service_time = done_ts - claim_ts
http_ttfs = gateway_first_byte_ts - http_request_start_ts
hop_latency = done_ts - enq_ts (or return‑path if synchronous)

Clock‑sync tip: use monotonic clocks in code and convert to ns; don’t mix wall‑clock.

Valkey commands (safe, BSD Valkey)

Use Valkey Streams + Consumer Groups for fairness and metrics:

Enqueue: XADD jobs * corr-id <uuid> enq-ts <ns> payload <...>
Claim: XREADGROUP GROUP workers w1 COUNT 1 BLOCK 1000 STREAMS jobs >
Ack: XACK jobs workers <id>

Add a small Lua for timestamping at enqueue (atomic):

-- KEYS[1]=stream
-- ARGV[1]=enq_ts_ns, ARGV[2]=corr_id, ARGV[3]=payload
return redis.call('XADD', KEYS[1], '*',
  'corr', ARGV[2], 'enq', ARGV[1], 'p', ARGV[3])

Load shapes to test (find the envelope)

Open‑loop (arrival‑rate controlled): 50 → 10k req/min in steps; constant rate per step. Reveals queueing onset.
Burst: 0 → N in short spikes (e.g., 5k in 10s) to see saturation and drain time.
Step‑up/down: double every 2 min until SLO breach; then halve down.
Long tail soak: run at 70–80% of max for 1h; watch p95‑p99.9 drift.

Target outputs per step: p50/p90/p95/p99 for queue_delay, service_time, hop_latency, plus throughput and error rate.

k6 script (HTTP client pressure)

// save as hop-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  scenarios: {
    step_load: {
      executor: 'ramping-arrival-rate',
      startRate: 20, timeUnit: '1s',
      preAllocatedVUs: 200, maxVUs: 5000,
      stages: [
        { target: 50, duration: '1m' },
        { target: 100, duration: '1m' },
        { target: 200, duration: '1m' },
        { target: 400, duration: '1m' },
        { target: 800, duration: '1m' },
      ],
    },
  },
  thresholds: {
    'http_req_failed': ['rate<0.01'],
    'http_req_duration{phase:hop}': ['p(95)<500'],
  },
};

export default function () {
  const corr = crypto.randomUUID();
  const res = http.post(
    __ENV.GW_URL,
    JSON.stringify({ data: 'ping', corr }),
    {
      headers: { 'Content-Type': 'application/json', 'x-stella-corr-id': corr },
      tags: { phase: 'hop' },
    }
  );
  check(res, { 'status 2xx/202': r => r.status === 200 || r.status === 202 });
  sleep(0.01);
}

Run: GW_URL=https://gateway.example/hop k6 run hop-test.js

Worker hooks (.NET 10 sketch)

// At claim
var now = Stopwatch.GetTimestamp(); // monotonic
var claimNs = now.ToNanoseconds();
log.AddTag("x-stella-claim-ts", claimNs);

// After processing
var doneNs = Stopwatch.GetTimestamp().ToNanoseconds();
log.AddTag("x-stella-done-ts", doneNs);
// Include corr-id and stream entry id in logs/metrics

Helper:

public static class MonoTime {
  static readonly double _nsPerTick = 1_000_000_000d / Stopwatch.Frequency;
  public static long ToNanoseconds(this long ticks) => (long)(ticks * _nsPerTick);
}

Prometheus metrics to expose

valkey_enqueue_ns (histogram)
valkey_claim_block_ms (gauge)
worker_service_ns (histogram, labels: worker_type, route)
queue_depth (gauge via XLEN or XINFO STREAM)
enqueue_rate, dequeue_rate (counters)

Example recording rules:

- record: hop:queue_delay_p95
  expr: histogram_quantile(0.95, sum(rate(valkey_enqueue_ns_bucket[1m])) by (le))
- record: hop:service_time_p95
  expr: histogram_quantile(0.95, sum(rate(worker_service_ns_bucket[1m])) by (le))
- record: hop:latency_budget_p95
  expr: hop:queue_delay_p95 + hop:service_time_p95

Autoscaling signals (HPA/KEDA friendly)

Primary: queue depth & its derivative (d/dt).
Secondary: p95 queue_delay and worker CPU.
Safety: max in‑flight per worker; backpressure HTTP 429 when queue_depth > D or p95_queue_delay > SLO*0.8.

Plot the “envelope” (what you’ll look at)

X‑axis: offered load (req/s).
Y‑axis: p95 hop latency (ms).
Overlay: p99 (dashed), SLO line (e.g., 500 ms), and capacity knee (where p95 sharply rises).
Add secondary panel: queue depth vs load.

If you want, I can generate a ready‑made notebook that ingests your logs/metrics CSV and outputs these plots. Below is a set of implementation guidelines your agents can follow to build a repeatable performance test system for the HTTP → Valkey → Worker pipeline. It’s written as a “spec + runbook” with clear MUST/SHOULD requirements and concrete scenario definitions.

Performance Test Guidelines

HTTP → Valkey → Worker pipeline

1) Objectives and scope

Primary objectives

Your performance tests MUST answer these questions with evidence:

Capacity knee: At what offered load does queue delay start growing sharply?
User-impact envelope: What are p50/p95/p99 hop latency curves vs offered load?
Decomposition: How much of hop latency is:
- gateway enqueue time
- Valkey enqueue/claim RTT
- queue wait time
- worker service time
Scaling behavior: How do these change with worker replica counts (N workers)?
Stability: Under sustained load, do latencies drift (GC, memory, fragmentation, background jobs)?

Non-goals (explicitly out of scope unless you add them later)

Micro-optimizing single function runtime
Synthetic “max QPS” records without a representative payload
Tests that don’t collect segment metrics (end-to-end only) for anything beyond basic smoke

2) Definitions and required metrics

Required latency definitions (standardize these names)

Agents MUST compute and report these per request/job:

t_http_accept: time from client send → gateway accepts request
t_enqueue: time spent in gateway to enqueue into Valkey (server-side)
t_valkey_rtt_enq: client-observed RTT for enqueue command(s)
t_queue_delay: claim_ts - enq_ts
t_service: done_ts - claim_ts
t_hop: done_ts - enq_ts (this is the “true pipeline hop” latency)
Optional but recommended:
- t_ack: time to ack completion (Valkey ack RTT)
- t_http_response: request start → gateway response sent (TTFB/TTFS)

Required percentiles and aggregations

Per scenario step (e.g., each offered load plateau), agents MUST output:

p50 / p90 / p95 / p99 / p99.9 for: t_hop, t_queue_delay, t_service, t_enqueue
Throughput: offered rps and achieved rps
Error rate: HTTP failures, enqueue failures, worker failures
Queue depth and backlog drain time

Required system-level telemetry (minimum)

Agents MUST collect these time series during tests:

Worker: CPU, memory, GC pauses (if .NET), threadpool saturation indicators
Valkey: ops/sec, connected clients, blocked clients, memory used, evictions, slowlog count
Gateway: CPU/mem, request rate, response codes, request duration histogram

3) Environment and test hygiene requirements

Environment requirements

Agents SHOULD run tests in an environment that matches production in:

container CPU/memory limits
number of nodes, network topology
Valkey topology (single, cluster, sentinel, etc.)
worker replica autoscaling rules (or deliberately disabled)

If exact parity isn’t possible, agents MUST record all known differences in the report.

Test hygiene (non-negotiable)

Agents MUST:

Start from empty queues (no backlog).
Disable client retries (or explicitly run two variants: retries off / retries on).
Warm up before measuring (e.g., 60s warm-up minimum).
Hold steady plateaus long enough to stabilize (usually 2–5 minutes per step).
Cool down and verify backlog drains (queue depth returns to baseline).
Record exact versions/SHAs of gateway/worker and Valkey config.

Load generator hygiene

Agents MUST ensure the load generator is not the bottleneck:

CPU < ~70% during test
no local socket exhaustion
enough VUs/connections
if needed, distributed load generation

4) Instrumentation spec (agents implement this first)

Correlation and timestamps

Agents MUST propagate an end-to-end correlation ID and timestamps.

Required fields

corr_id (UUID)
enq_ts_ns (set at enqueue, monotonic or consistent clock)
claim_ts_ns (set by worker when job is claimed)
done_ts_ns (set by worker when job processing ends)

Where these live

HTTP request header: x-corr-id: <uuid>
Valkey job payload fields: corr, enq, and optionally payload size/type
Worker logs/metrics: include corr_id, job id, claim_ts_ns, done_ts_ns

Clock requirements

Agents MUST use a consistent timing source:

Prefer monotonic timers for durations (Stopwatch / monotonic clock)
If timestamps cross machines, ensure they’re comparable:
- either rely on synchronized clocks (NTP) and monitor drift
- or compute durations using monotonic tick deltas within the same host and transmit durations (less ideal for queue delay)

Practical recommendation: use wall-clock ns for cross-host timestamps with NTP + drift checks, and also record per-host monotonic durations for sanity.

Valkey queue semantics (recommended)

Agents SHOULD use Streams + Consumer Groups for stable claim semantics and good observability:

Enqueue: XADD jobs * corr <uuid> enq <ns> payload <...>
Claim: XREADGROUP GROUP workers <consumer> COUNT 1 BLOCK 1000 STREAMS jobs >
Ack: XACK jobs workers <id>

Agents MUST record stream length (XLEN) or consumer group lag (XINFO GROUPS) as queue depth/lag.

Metrics exposure

Agents MUST publish Prometheus (or equivalent) histograms:

gateway_enqueue_seconds (or ns) histogram
valkey_enqueue_rtt_seconds histogram
worker_service_seconds histogram
queue_delay_seconds histogram (derived from timestamps; can be computed in worker or offline)
hop_latency_seconds histogram

5) Workload modeling and test data

Agents MUST define a workload model before running capacity tests:

Endpoint(s): list exact gateway routes under test
Payload types: small/typical/large
Mix: e.g., 70/25/5 by payload size
Idempotency rules: ensure repeated jobs don’t corrupt state
Data reset strategy: how test data is cleaned or isolated per run

Agents SHOULD test at least:

Typical payload (p50)
Large payload (p95)
Worst-case allowed payload (bounded by your API limits)

6) Scenario suite your agents MUST implement

Each scenario MUST be defined as code/config (not manual).

Scenario A — Smoke (fast sanity)

Goal: verify instrumentation + basic correctness Load: low (e.g., 1–5 rps), 2 minutes Pass:

0 backlog after run
error rate < 0.1%
metrics present for all segments

Scenario B — Baseline (repeatable reference point)

Goal: establish a stable baseline for regression tracking Load: fixed moderate load (e.g., 30–50% of expected capacity), 10 minutes Pass:

p95 t_hop within baseline ± tolerance (set after first runs)
no upward drift in p95 across time (trend line ~flat)

Scenario C — Capacity ramp (open-loop)

Goal: find the knee where queueing begins Method: open-loop arrival-rate ramp with plateaus Example stages (edit to fit your system):

50 rps for 2m
100 rps for 2m
200 rps for 2m
400 rps for 2m
… until SLO breach or errors spike

MUST:

warm-up stage before first plateau
record per-plateau summary

Stop conditions (any triggers stop):

error rate > 1%
queue depth grows without bound over an entire plateau
p95 t_hop exceeds SLO for 2 consecutive plateaus

Scenario D — Stress (push past capacity)

Goal: characterize failure mode and recovery Load: 120–200% of knee load, 5–10 minutes Pass (for resilience):

system does not crash permanently
once load stops, backlog drains within target time (define it)

Scenario E — Burst / spike

Goal: see how quickly queue grows and drains Load shape:

baseline low load
sudden burst (e.g., 10× for 10–30s)
return to baseline

Report:

peak queue depth
time to drain to baseline
p99 t_hop during burst

Scenario F — Soak (long-running)

Goal: detect drift (leaks, fragmentation, GC patterns) Load: 70–85% of knee, 60–180 minutes Pass:

p95 does not trend upward beyond threshold
memory remains bounded
no rising error rate

Scenario G — Scaling curve (worker replica sweep)

Goal: turn results into scaling rules Method:

Repeat Scenario C with worker replicas = 1, 2, 4, 8… Deliverable:
plot of knee load vs worker count
p95 t_service vs worker count (should remain similar; queue delay should drop)

7) Execution protocol (runbook)

Agents MUST run every scenario using the same disciplined flow:

Pre-run checklist

confirm system versions/SHAs
confirm autoscaling mode:
- Off for baseline capacity characterization
- On for validating autoscaling policies
clear queues and consumer group pending entries
restart or at least record “time since deploy” for services (cold vs warm)

During run

ensure load is truly open-loop when required (arrival-rate based)
continuously record:
- offered vs achieved rate
- queue depth
- CPU/mem for gateway/worker/Valkey

Post-run

stop load
wait until backlog drains (or record that it doesn’t)
export:
- k6/runner raw output
- Prometheus time series snapshot
- sampled logs with corr_id fields
generate a summary report automatically (no hand calculations)

8) Analysis rules (how agents compute “the envelope”)

Agents MUST generate at minimum two plots per run:

Latency envelope: offered load (x-axis) vs p95 t_hop (y-axis)
- overlay p99 (and SLO line)
Queue behavior: offered load vs queue depth (or lag), plus drain time

How to identify the “knee”

Agents SHOULD mark the knee as the first plateau where:

queue depth grows monotonically within the plateau, or
p95 t_queue_delay increases by > X% step-to-step (e.g., 50–100%)

Convert results into scaling guidance

Agents SHOULD compute:

capacity_per_worker ≈ 1 / mean(t_service) (jobs/sec per worker)
recommended replicas for offered load λ at target utilization U:
- workers_needed = ceil(λ * mean(t_service) / U)
- choose U ~ 0.6–0.75 for headroom

This should be reported alongside the measured envelope.

9) Pass/fail criteria and regression gates

Agents MUST define gates in configuration, not in someone’s head.

Suggested gating structure:

Smoke gate: error rate < 0.1%, backlog drains
Baseline gate: p95 t_hop regression < 10% (tune after you have history)
Capacity gate: knee load regression < 10% (optional but very valuable)
Soak gate: p95 drift over time < 15% and no memory runaway

10) Common pitfalls (agents must avoid)

Closed-loop tests used for capacity Closed-loop (“N concurrent users”) self-throttles and can hide queueing onset. Use open-loop arrival rate for capacity.
Ignoring queue depth A system can look “healthy” in request latency while silently building backlog.
Measuring only gateway latency You must measure enqueue → claim → done to see the real hop.
Load generator bottleneck If the generator saturates, you’ll under-estimate capacity.
Retries enabled by default Retries can inflate load and hide root causes; run with retries off first.
Not controlling warm vs cold Cold caches vs warmed services produce different envelopes; record the condition.

Agent implementation checklist (deliverables)

Assign these as concrete tasks to your agents.

Agent 1 — Observability & tracing

MUST deliver:

correlation id propagation gateway → Valkey → worker
timestamps enq/claim/done
Prometheus histograms for enqueue, service, hop
queue depth metric (XLEN / XINFO lag)

Agent 2 — Load test harness

MUST deliver:

test runner scripts (k6 or equivalent) for scenarios A–G
test config file (YAML/JSON) controlling:
- stages (rates/durations)
- payload mix
- headers (corr-id)
reproducible seeds and version stamping

Agent 3 — Result collector and analyzer

MUST deliver:

a pipeline that merges:
- load generator output
- hop timing data (from logs or a completion stream)
- Prometheus snapshots
automatic summary + plots:
- latency envelope
- queue depth/drain
CSV/JSON exports for long-term tracking

Agent 4 — Reporting and dashboards

MUST deliver:

a standard report template that includes:
- environment details
- scenario details
- key charts
- knee estimate
- scaling recommendation
Grafana dashboard with the required panels

Agent 5 — CI / release integration

SHOULD deliver:

PR-level smoke test (Scenario A)
nightly baseline (Scenario B)
weekly capacity sweep (Scenario C + scaling curve)

Template: scenario spec (agents can copy/paste)

test_run:
  system_under_test:
    gateway_sha: "<git sha>"
    worker_sha: "<git sha>"
    valkey_version: "<version>"
  environment:
    cluster: "<name>"
    workers: 4
    autoscaling: "off"   # off|on
  workload:
    endpoint: "/hop"
    payload_profile: "p50"
    mix:
      p50: 0.7
      p95: 0.25
      max: 0.05
  scenario:
    name: "capacity_ramp"
    mode: "open_loop"
    warmup_seconds: 60
    stages:
      - rps: 50
        duration_seconds: 120
      - rps: 100
        duration_seconds: 120
      - rps: 200
        duration_seconds: 120
      - rps: 400
        duration_seconds: 120
  gates:
    max_error_rate: 0.01
    slo_ms_p95_hop: 500
    backlog_must_drain_seconds: 300
  outputs:
    artifacts_dir: "./artifacts/<timestamp>/"

If you want, I can also provide a single “golden” folder structure (tests/ scripts/ dashboards/ analysis/) and a “definition of done” checklist that matches how your repo is organized—but the above is already sufficient for agents to start implementing immediately. Below is a sample / partial implementation that gives full functional coverage of your performance-test requirements (instrumentation, correlation, timestamps, queue semantics, scenarios A–G, artifact export, and analysis). It is intentionally minimal and “swap-in-real-code” friendly.

You can copy these files into a perf/ folder in your repo, build, and run locally with Docker Compose.

1) Suggested folder layout

perf/
  docker-compose.yml
  prometheus/
    prometheus.yml
  k6/
    lib.js
    smoke.js
    capacity_ramp.js
    burst.js
    soak.js
    stress.js
    scaling_curve.sh
  tools/
    analyze.py
  src/
    Perf.Gateway/
      Perf.Gateway.csproj
      Program.cs
      Metrics.cs
      ValkeyStreams.cs
      TimeNs.cs
    Perf.Worker/
      Perf.Worker.csproj
      Program.cs
      WorkerService.cs
      Metrics.cs
      ValkeyStreams.cs
      TimeNs.cs

2) Gateway sample (.NET 10, Minimal API)

`perf/src/Perf.Gateway/Perf.Gateway.csproj`

<Project Sdk="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TargetFramework>net10.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>

  <ItemGroup>
    <!-- StackExchange.Redis is MIT and works with Valkey -->
    <PackageReference Include="StackExchange.Redis" Version="2.8.16" />
  </ItemGroup>
</Project>

`perf/src/Perf.Gateway/TimeNs.cs`

namespace Perf.Gateway;

public static class TimeNs
{
    private static readonly long UnixEpochTicks = DateTime.UnixEpoch.Ticks; // 100ns units

    public static long UnixNowNs()
    {
        var ticks = DateTime.UtcNow.Ticks - UnixEpochTicks; // 100ns
        return ticks * 100L; // ns
    }
}

`perf/src/Perf.Gateway/Metrics.cs`

using System.Collections.Concurrent;
using System.Globalization;
using System.Text;

namespace Perf.Gateway;

public sealed class Metrics
{
    private readonly ConcurrentDictionary<string, long> _counters = new();

    // Simple fixed-bucket histograms in seconds (Prometheus histogram format)
    private readonly ConcurrentDictionary<string, Histogram> _h = new();

    public void Inc(string name, long by = 1) => _counters.AddOrUpdate(name, by, (_, v) => v + by);

    public Histogram Hist(string name, double[] bucketsSeconds) =>
        _h.GetOrAdd(name, _ => new Histogram(name, bucketsSeconds));

    public string ExportPrometheus()
    {
        var sb = new StringBuilder(16 * 1024);

        foreach (var (k, v) in _counters.OrderBy(kv => kv.Key))
        {
            sb.Append("# TYPE ").Append(k).Append(" counter\n");
            sb.Append(k).Append(' ').Append(v.ToString(CultureInfo.InvariantCulture)).Append('\n');
        }

        foreach (var hist in _h.Values.OrderBy(h => h.Name))
        {
            sb.Append(hist.Export());
        }

        return sb.ToString();
    }

    public sealed class Histogram
    {
        public string Name { get; }
        private readonly double[] _buckets; // sorted
        private readonly long[] _bucketCounts; // cumulative exposed later
        private long _count;
        private double _sum;

        private readonly object _lock = new();

        public Histogram(string name, double[] bucketsSeconds)
        {
            Name = name;
            _buckets = bucketsSeconds.OrderBy(x => x).ToArray();
            _bucketCounts = new long[_buckets.Length];
        }

        public void Observe(double seconds)
        {
            lock (_lock)
            {
                _count++;
                _sum += seconds;

                for (int i = 0; i < _buckets.Length; i++)
                {
                    if (seconds <= _buckets[i]) _bucketCounts[i]++;
                }
            }
        }

        public string Export()
        {
            // Prometheus hist buckets are cumulative; we already maintain that.
            var sb = new StringBuilder(2048);
            sb.Append("# TYPE ").Append(Name).Append(" histogram\n");

            lock (_lock)
            {
                for (int i = 0; i < _buckets.Length; i++)
                {
                    sb.Append(Name).Append("_bucket{le=\"")
                      .Append(_buckets[i].ToString("0.################", CultureInfo.InvariantCulture))
                      .Append("\"} ")
                      .Append(_bucketCounts[i].ToString(CultureInfo.InvariantCulture))
                      .Append('\n');
                }

                sb.Append(Name).Append("_bucket{le=\"+Inf\"} ")
                  .Append(_count.ToString(CultureInfo.InvariantCulture))
                  .Append('\n');

                sb.Append(Name).Append("_sum ")
                  .Append(_sum.ToString(CultureInfo.InvariantCulture))
                  .Append('\n');

                sb.Append(Name).Append("_count ")
                  .Append(_count.ToString(CultureInfo.InvariantCulture))
                  .Append('\n');
            }

            return sb.ToString();
        }
    }
}

`perf/src/Perf.Gateway/ValkeyStreams.cs`

using StackExchange.Redis;

namespace Perf.Gateway;

public sealed class ValkeyStreams
{
    private readonly IDatabase _db;
    public ValkeyStreams(IConnectionMultiplexer mux) => _db = mux.GetDatabase();

    public async Task EnsureConsumerGroupAsync(string stream, string group)
    {
        try
        {
            // XGROUP CREATE <key> <groupname> $ MKSTREAM
            await _db.ExecuteAsync("XGROUP", "CREATE", stream, group, "$", "MKSTREAM");
        }
        catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.OrdinalIgnoreCase))
        {
            // ok
        }
    }

    public async Task<RedisResult> XAddAsync(string stream, NameValueEntry[] fields)
    {
        // XADD stream * field value field value ...
        var args = new List<object>(2 + fields.Length * 2) { stream, "*" };
        foreach (var f in fields) { args.Add(f.Name); args.Add(f.Value); }
        return await _db.ExecuteAsync("XADD", args.ToArray());
    }
}

`perf/src/Perf.Gateway/Program.cs`

using Perf.Gateway;
using StackExchange.Redis;
using System.Diagnostics;

var builder = WebApplication.CreateBuilder(args);

var valkey = builder.Configuration["VALKEY_ENDPOINT"] ?? "valkey:6379";
builder.Services.AddSingleton<IConnectionMultiplexer>(_ => ConnectionMultiplexer.Connect(valkey));
builder.Services.AddSingleton<ValkeyStreams>();
builder.Services.AddSingleton<Metrics>();

var app = builder.Build();

var metrics = app.Services.GetRequiredService<Metrics>();
var streams = app.Services.GetRequiredService<ValkeyStreams>();

const string JobsStream = "stella:perf:jobs";
const string DoneStream = "stella:perf:done";
const string Group = "workers";

await streams.EnsureConsumerGroupAsync(JobsStream, Group);

var allowTestControl = (app.Configuration["ALLOW_TEST_CONTROL"] ?? "1") == "1";
var runs = new Dictionary<string, long>(StringComparer.Ordinal); // run_id -> start_ns

if (allowTestControl)
{
    app.MapPost("/test/start", () =>
    {
        var runId = Guid.NewGuid().ToString("N");
        var startNs = TimeNs.UnixNowNs();
        lock (runs) runs[runId] = startNs;

        metrics.Inc("perf_test_start_total");
        return Results.Ok(new { run_id = runId, start_ns = startNs, jobs_stream = JobsStream, done_stream = DoneStream });
    });

    app.MapPost("/test/end/{runId}", (string runId) =>
    {
        lock (runs) runs.Remove(runId);
        metrics.Inc("perf_test_end_total");
        return Results.Ok(new { run_id = runId });
    });
}

app.MapGet("/metrics", () => Results.Text(metrics.ExportPrometheus(), "text/plain; version=0.0.4"));

app.MapPost("/hop", async (HttpRequest req) =>
{
    // Correlation / run id
    var corr = req.Headers["x-stella-corr-id"].FirstOrDefault() ?? Guid.NewGuid().ToString();
    var runId = req.Headers["x-stella-run-id"].FirstOrDefault() ?? "no-run";

    // Enqueue timestamp (UTC-derived ns)
    var enqNs = TimeNs.UnixNowNs();

    // Read raw body (payload) - keep it simple for perf harness
    string payload;
    using (var sr = new StreamReader(req.Body))
        payload = await sr.ReadToEndAsync();

    var sw = Stopwatch.GetTimestamp();

    // Valkey enqueue
    var valkeySw = Stopwatch.GetTimestamp();
    var entryId = await streams.XAddAsync(JobsStream, new[]
    {
        new NameValueEntry("corr", corr),
        new NameValueEntry("run", runId),
        new NameValueEntry("enq_ns", enqNs),
        new NameValueEntry("payload", payload),
    });
    var valkeyRttSec = (Stopwatch.GetTimestamp() - valkeySw) / (double)Stopwatch.Frequency;

    var enqueueSec = (Stopwatch.GetTimestamp() - sw) / (double)Stopwatch.Frequency;

    metrics.Inc("hop_requests_total");
    metrics.Hist("gateway_enqueue_seconds", new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2 }).Observe(enqueueSec);
    metrics.Hist("valkey_enqueue_rtt_seconds", new[] { .0005, .001, .002, .005, .01, .02, .05, .1, .2 }).Observe(valkeyRttSec);

    return Results.Accepted(value: new { corr, run_id = runId, enq_ns = enqNs, entry_id = entryId.ToString() });
});

app.Run("http://0.0.0.0:8080");

3) Worker sample (.NET 10 hosted service + metrics)

`perf/src/Perf.Worker/Perf.Worker.csproj`

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net10.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="StackExchange.Redis" Version="2.8.16" />
  </ItemGroup>
</Project>

`perf/src/Perf.Worker/TimeNs.cs`

namespace Perf.Worker;

public static class TimeNs
{
    private static readonly long UnixEpochTicks = DateTime.UnixEpoch.Ticks;
    public static long UnixNowNs() => (DateTime.UtcNow.Ticks - UnixEpochTicks) * 100L;
}

`perf/src/Perf.Worker/Metrics.cs`

// Same as gateway Metrics.cs (copy/paste). Keep identical for consistency.
using System.Collections.Concurrent;
using System.Globalization;
using System.Text;

namespace Perf.Worker;

public sealed class Metrics
{
    private readonly ConcurrentDictionary<string, long> _counters = new();
    private readonly ConcurrentDictionary<string, Histogram> _h = new();

    public void Inc(string name, long by = 1) => _counters.AddOrUpdate(name, by, (_, v) => v + by);
    public Histogram Hist(string name, double[] bucketsSeconds) =>
        _h.GetOrAdd(name, _ => new Histogram(name, bucketsSeconds));

    public string ExportPrometheus()
    {
        var sb = new StringBuilder(16 * 1024);

        foreach (var (k, v) in _counters.OrderBy(kv => kv.Key))
        {
            sb.Append("# TYPE ").Append(k).Append(" counter\n");
            sb.Append(k).Append(' ').Append(v.ToString(CultureInfo.InvariantCulture)).Append('\n');
        }

        foreach (var hist in _h.Values.OrderBy(h => h.Name))
            sb.Append(hist.Export());

        return sb.ToString();
    }

    public sealed class Histogram
    {
        public string Name { get; }
        private readonly double[] _buckets;
        private readonly long[] _bucketCounts;
        private long _count;
        private double _sum;
        private readonly object _lock = new();

        public Histogram(string name, double[] bucketsSeconds)
        {
            Name = name;
            _buckets = bucketsSeconds.OrderBy(x => x).ToArray();
            _bucketCounts = new long[_buckets.Length];
        }

        public void Observe(double seconds)
        {
            lock (_lock)
            {
                _count++;
                _sum += seconds;
                for (int i = 0; i < _buckets.Length; i++)
                    if (seconds <= _buckets[i]) _bucketCounts[i]++;
            }
        }

        public string Export()
        {
            var sb = new StringBuilder(2048);
            sb.Append("# TYPE ").Append(Name).Append(" histogram\n");
            lock (_lock)
            {
                for (int i = 0; i < _buckets.Length; i++)
                {
                    sb.Append(Name).Append("_bucket{le=\"")
                      .Append(_buckets[i].ToString("0.################", CultureInfo.InvariantCulture))
                      .Append("\"} ")
                      .Append(_bucketCounts[i].ToString(CultureInfo.InvariantCulture))
                      .Append('\n');
                }

                sb.Append(Name).Append("_bucket{le=\"+Inf\"} ")
                  .Append(_count.ToString(CultureInfo.InvariantCulture))
                  .Append('\n');

                sb.Append(Name).Append("_sum ")
                  .Append(_sum.ToString(CultureInfo.InvariantCulture))
                  .Append('\n');

                sb.Append(Name).Append("_count ")
                  .Append(_count.ToString(CultureInfo.InvariantCulture))
                  .Append('\n');
            }
            return sb.ToString();
        }
    }
}

`perf/src/Perf.Worker/ValkeyStreams.cs`

using StackExchange.Redis;

namespace Perf.Worker;

public sealed class ValkeyStreams
{
    private readonly IDatabase _db;
    public ValkeyStreams(IConnectionMultiplexer mux) => _db = mux.GetDatabase();

    public async Task EnsureConsumerGroupAsync(string stream, string group)
    {
        try
        {
            await _db.ExecuteAsync("XGROUP", "CREATE", stream, group, "$", "MKSTREAM");
        }
        catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.OrdinalIgnoreCase)) { }
    }

    public async Task<RedisResult> XReadGroupAsync(string group, string consumer, string stream, string id, int count, int blockMs)
        => await _db.ExecuteAsync("XREADGROUP", "GROUP", group, consumer, "COUNT", count, "BLOCK", blockMs, "STREAMS", stream, id);

    public async Task XAckAsync(string stream, string group, RedisValue id)
        => await _db.ExecuteAsync("XACK", stream, group, id);

    public async Task<RedisResult> XAddAsync(string stream, NameValueEntry[] fields)
    {
        var args = new List<object>(2 + fields.Length * 2) { stream, "*" };
        foreach (var f in fields) { args.Add(f.Name); args.Add(f.Value); }
        return await _db.ExecuteAsync("XADD", args.ToArray());
    }
}

`perf/src/Perf.Worker/WorkerService.cs`

using StackExchange.Redis;
using System.Diagnostics;

namespace Perf.Worker;

public sealed class WorkerService : BackgroundService
{
    private readonly ValkeyStreams _streams;
    private readonly Metrics _metrics;
    private readonly ILogger<WorkerService> _log;

    private const string JobsStream = "stella:perf:jobs";
    private const string DoneStream = "stella:perf:done";
    private const string Group = "workers";

    private readonly string _consumer;

    public WorkerService(ValkeyStreams streams, Metrics metrics, ILogger<WorkerService> log)
    {
        _streams = streams;
        _metrics = metrics;
        _log = log;
        _consumer = Environment.GetEnvironmentVariable("WORKER_CONSUMER") ?? $"w-{Environment.MachineName}-{Guid.NewGuid():N}";
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await _streams.EnsureConsumerGroupAsync(JobsStream, Group);

        var serviceBuckets = new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5 };
        var queueBuckets = new[] { .001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10, 30 };
        var hopBuckets   = new[] { .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10, 30 };

        while (!stoppingToken.IsCancellationRequested)
        {
            RedisResult res;
            try
            {
                res = await _streams.XReadGroupAsync(Group, _consumer, JobsStream, ">", count: 1, blockMs: 1000);
            }
            catch (Exception ex)
            {
                _metrics.Inc("worker_xread_errors_total");
                _log.LogWarning(ex, "XREADGROUP failed");
                await Task.Delay(250, stoppingToken);
                continue;
            }

            if (res.IsNull) continue;

            // Parse XREADGROUP result (array -> stream -> entries)
            // Expected shape: [[stream, [[id, [field, value, field, value...]], ...]]]
            var outer = (RedisResult[])res!;
            foreach (var streamBlock in outer)
            {
                var sb = (RedisResult[])streamBlock!;
                var entries = (RedisResult[])sb[1]!;

                foreach (var entry in entries)
                {
                    var e = (RedisResult[])entry!;
                    var entryId = (RedisValue)e[0]!;
                    var fields = (RedisResult[])e[1]!;

                    string corr = "", run = "no-run";
                    long enqNs = 0;

                    for (int i = 0; i < fields.Length; i += 2)
                    {
                        var key = (string)fields[i]!;
                        var val = fields[i + 1].ToString();
                        if (key == "corr") corr = val;
                        else if (key == "run") run = val;
                        else if (key == "enq_ns") _ = long.TryParse(val, out enqNs);
                    }

                    var claimNs = TimeNs.UnixNowNs();

                    var sw = Stopwatch.GetTimestamp();

                    // Placeholder "service work" – replace with real processing
                    // Keep it deterministic-ish; use env var to model different service times.
                    var workMs = int.TryParse(Environment.GetEnvironmentVariable("WORK_MS"), out var ms) ? ms : 5;
                    await Task.Delay(workMs, stoppingToken);

                    var doneNs = TimeNs.UnixNowNs();
                    var serviceSec = (Stopwatch.GetTimestamp() - sw) / (double)Stopwatch.Frequency;

                    var queueDelaySec = enqNs > 0 ? (claimNs - enqNs) / 1_000_000_000d : double.NaN;
                    var hopSec = enqNs > 0 ? (doneNs - enqNs) / 1_000_000_000d : double.NaN;

                    // Ack then publish "done" record for offline analysis
                    await _streams.XAckAsync(JobsStream, Group, entryId);

                    await _streams.XAddAsync(DoneStream, new[]
                    {
                        new NameValueEntry("run", run),
                        new NameValueEntry("corr", corr),
                        new NameValueEntry("entry", entryId),
                        new NameValueEntry("enq_ns", enqNs),
                        new NameValueEntry("claim_ns", claimNs),
                        new NameValueEntry("done_ns", doneNs),
                        new NameValueEntry("work_ms", workMs),
                    });

                    _metrics.Inc("worker_jobs_total");
                    _metrics.Hist("worker_service_seconds", serviceBuckets).Observe(serviceSec);

                    if (!double.IsNaN(queueDelaySec))
                        _metrics.Hist("queue_delay_seconds", queueBuckets).Observe(queueDelaySec);

                    if (!double.IsNaN(hopSec))
                        _metrics.Hist("hop_latency_seconds", hopBuckets).Observe(hopSec);
                }
            }
        }
    }
}

`perf/src/Perf.Worker/Program.cs`

using Perf.Worker;
using StackExchange.Redis;

var builder = Host.CreateApplicationBuilder(args);

var valkey = builder.Configuration["VALKEY_ENDPOINT"] ?? "valkey:6379";
builder.Services.AddSingleton<IConnectionMultiplexer>(_ => ConnectionMultiplexer.Connect(valkey));
builder.Services.AddSingleton<ValkeyStreams>();
builder.Services.AddSingleton<Metrics>();
builder.Services.AddHostedService<WorkerService>();

// Minimal metrics endpoint
builder.Services.AddSingleton<IHostedService>(sp =>
{
    return new SimpleMetricsServer(
        sp.GetRequiredService<Metrics>(),
        url: "http://0.0.0.0:8081/metrics"
    );
});

var host = builder.Build();
await host.RunAsync();

// ---- minimal metrics server ----
file sealed class SimpleMetricsServer : BackgroundService
{
    private readonly Metrics _metrics;
    private readonly string _url;

    public SimpleMetricsServer(Metrics metrics, string url) { _metrics = metrics; _url = url; }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var builder = WebApplication.CreateBuilder();
        var app = builder.Build();
        app.MapGet("/metrics", () => Results.Text(_metrics.ExportPrometheus(), "text/plain; version=0.0.4"));
        await app.RunAsync(_url, stoppingToken);
    }
}

4) Docker Compose (Valkey + gateway + worker + Prometheus)

`perf/docker-compose.yml`

services:
  valkey:
    image: valkey/valkey:7.2
    ports: ["6379:6379"]

  gateway:
    build:
      context: ./src/Perf.Gateway
    environment:
      - VALKEY_ENDPOINT=valkey:6379
      - ALLOW_TEST_CONTROL=1
    ports: ["8080:8080"]
    depends_on: [valkey]

  worker:
    build:
      context: ./src/Perf.Worker
    environment:
      - VALKEY_ENDPOINT=valkey:6379
      - WORK_MS=5
    ports: ["8081:8081"]
    depends_on: [valkey]

  prometheus:
    image: prom/prometheus:v2.55.0
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
    ports: ["9090:9090"]
    depends_on: [gateway, worker]

`perf/prometheus/prometheus.yml`

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: gateway
    static_configs:
      - targets: ["gateway:8080"]

  - job_name: worker
    static_configs:
      - targets: ["worker:8081"]

Run:

cd perf
docker compose up -d --build

5) k6 scenarios A–G (open-loop where required)

`perf/k6/lib.js`

import http from "k6/http";

export function startRun(baseUrl) {
  const res = http.post(`${baseUrl}/test/start`, null, { tags: { phase: "control" } });
  if (res.status !== 200) throw new Error(`startRun failed: ${res.status} ${res.body}`);
  return res.json();
}

export function hop(baseUrl, runId) {
  const corr = crypto.randomUUID();
  const payload = JSON.stringify({ corr, data: "ping" });

  return http.post(
    `${baseUrl}/hop`,
    payload,
    {
      headers: {
        "content-type": "application/json",
        "x-stella-run-id": runId,
        "x-stella-corr-id": corr
      },
      tags: { phase: "hop" }
    }
  );
}

Scenario A: Smoke — `perf/k6/smoke.js`

import { check, sleep } from "k6";
import { startRun, hop } from "./lib.js";

export const options = {
  scenarios: {
    smoke: {
      executor: "constant-arrival-rate",
      rate: 2,
      timeUnit: "1s",
      duration: "2m",
      preAllocatedVUs: 20,
      maxVUs: 200
    }
  },
  thresholds: {
    http_req_failed: ["rate<0.001"]
  }
};

export function setup() {
  return startRun(__ENV.GW_URL);
}

export default function (data) {
  const res = hop(__ENV.GW_URL, data.run_id);
  check(res, { "202 accepted": r => r.status === 202 });
  sleep(0.01);
}

Scenario C: Capacity ramp (open-loop) — `perf/k6/capacity_ramp.js`

import { check } from "k6";
import { startRun, hop } from "./lib.js";

export const options = {
  scenarios: {
    ramp: {
      executor: "ramping-arrival-rate",
      startRate: 50,
      timeUnit: "1s",
      preAllocatedVUs: 200,
      maxVUs: 5000,
      stages: [
        { target: 50, duration: "2m" },
        { target: 100, duration: "2m" },
        { target: 200, duration: "2m" },
        { target: 400, duration: "2m" },
        { target: 800, duration: "2m" }
      ]
    }
  },
  thresholds: {
    http_req_failed: ["rate<0.01"]
  }
};

export function setup() {
  return startRun(__ENV.GW_URL);
}

export default function (data) {
  const res = hop(__ENV.GW_URL, data.run_id);
  check(res, { "202 accepted": r => r.status === 202 });
}

Scenario E: Burst — `perf/k6/burst.js`

import { check } from "k6";
import { startRun, hop } from "./lib.js";

export const options = {
  scenarios: {
    burst: {
      executor: "ramping-arrival-rate",
      startRate: 20,
      timeUnit: "1s",
      preAllocatedVUs: 200,
      maxVUs: 5000,
      stages: [
        { target: 20, duration: "60s" },
        { target: 400, duration: "20s" },
        { target: 20, duration: "120s" }
      ]
    }
  }
};

export function setup() { return startRun(__ENV.GW_URL); }

export default function (data) {
  const res = hop(__ENV.GW_URL, data.run_id);
  check(res, { "202": r => r.status === 202 });
}

Scenario F: Soak — `perf/k6/soak.js`

import { check } from "k6";
import { startRun, hop } from "./lib.js";

export const options = {
  scenarios: {
    soak: {
      executor: "constant-arrival-rate",
      rate: 200,
      timeUnit: "1s",
      duration: "60m",
      preAllocatedVUs: 500,
      maxVUs: 5000
    }
  }
};

export function setup() { return startRun(__ENV.GW_URL); }

export default function (data) {
  const res = hop(__ENV.GW_URL, data.run_id);
  check(res, { "202": r => r.status === 202 });
}

Scenario D: Stress — `perf/k6/stress.js`

import { check } from "k6";
import { startRun, hop } from "./lib.js";

export const options = {
  scenarios: {
    stress: {
      executor: "constant-arrival-rate",
      rate: 1500,
      timeUnit: "1s",
      duration: "10m",
      preAllocatedVUs: 2000,
      maxVUs: 15000
    }
  },
  thresholds: {
    http_req_failed: ["rate<0.05"]
  }
};

export function setup() { return startRun(__ENV.GW_URL); }

export default function (data) {
  const res = hop(__ENV.GW_URL, data.run_id);
  check(res, { "202": r => r.status === 202 });
}

Scenario G: Scaling curve orchestration — `perf/k6/scaling_curve.sh`

#!/usr/bin/env bash
set -euo pipefail

GW_URL="${GW_URL:-http://localhost:8080}"

for n in 1 2 4 8; do
  echo "== Scaling workers to $n =="
  docker compose -f ../docker-compose.yml up -d --scale worker="$n"

  mkdir -p "../artifacts/scale-$n"
  k6 run \
    -e GW_URL="$GW_URL" \
    --summary-export "../artifacts/scale-$n/k6-summary.json" \
    ./capacity_ramp.js
done

Run (examples):

cd perf/k6
GW_URL=http://localhost:8080 k6 run --summary-export ../artifacts/smoke-summary.json smoke.js
GW_URL=http://localhost:8080 k6 run --summary-export ../artifacts/ramp-summary.json capacity_ramp.js

6) Offline analysis tool (reads “done” stream by run_id)

`perf/tools/analyze.py`

import os, sys, json, math
from datetime import datetime, timezone

import redis

def pct(values, p):
    if not values:
        return None
    values = sorted(values)
    k = (len(values) - 1) * (p / 100.0)
    f = math.floor(k); c = math.ceil(k)
    if f == c:
        return values[int(k)]
    return values[f] * (c - k) + values[c] * (k - f)

def main():
    valkey = os.getenv("VALKEY_ENDPOINT", "localhost:6379")
    host, port = valkey.split(":")
    r = redis.Redis(host=host, port=int(port), decode_responses=True)

    run_id = os.getenv("RUN_ID")
    if not run_id:
        print("Set RUN_ID env var (from /test/start response).", file=sys.stderr)
        sys.exit(2)

    done_stream = os.getenv("DONE_STREAM", "stella:perf:done")

    # Read all entries (sample scale). For big runs use XREAD with cursor.
    entries = r.xrange(done_stream, min='-', max='+', count=200000)

    hop_ms = []
    queue_ms = []
    service_ms = []

    matched = 0
    for entry_id, fields in entries:
        if fields.get("run") != run_id:
            continue
        matched += 1

        enq_ns = int(fields.get("enq_ns", "0"))
        claim_ns = int(fields.get("claim_ns", "0"))
        done_ns = int(fields.get("done_ns", "0"))

        if enq_ns > 0 and claim_ns > 0:
            queue_ms.append((claim_ns - enq_ns) / 1_000_000.0)
        if claim_ns > 0 and done_ns > 0:
            service_ms.append((done_ns - claim_ns) / 1_000_000.0)
        if enq_ns > 0 and done_ns > 0:
            hop_ms.append((done_ns - enq_ns) / 1_000_000.0)

    summary = {
        "run_id": run_id,
        "done_stream": done_stream,
        "matched_jobs": matched,
        "hop_ms": {
            "p50": pct(hop_ms, 50), "p95": pct(hop_ms, 95), "p99": pct(hop_ms, 99)
        },
        "queue_ms": {
            "p50": pct(queue_ms, 50), "p95": pct(queue_ms, 95), "p99": pct(queue_ms, 99)
        },
        "service_ms": {
            "p50": pct(service_ms, 50), "p95": pct(service_ms, 95), "p99": pct(service_ms, 99)
        },
        "generated_at": datetime.now(timezone.utc).isoformat()
    }

    print(json.dumps(summary, indent=2))

if __name__ == "__main__":
    main()

Run:

pip install redis
RUN_ID=<value_from_/test/start> python perf/tools/analyze.py

This yields the key percentiles for hop, queue_delay, and service from the authoritative worker-side timestamps.

7) What this sample already covers

Correlation: x-stella-corr-id end-to-end.
Run isolation: x-stella-run-id created via /test/start, used to filter results.
Valkey Streams + consumer group: fair claim semantics.
Required timestamps: enq_ns, claim_ns, done_ns.
Metrics:
- gateway_enqueue_seconds histogram
- valkey_enqueue_rtt_seconds histogram
- worker_service_seconds, queue_delay_seconds, hop_latency_seconds histograms
Scenarios:
- A Smoke, C Capacity ramp, D Stress, E Burst, F Soak
- G Scaling curve via script (repeat ramp across worker counts)

8) Immediate next hardening steps (still “small”)

Add queue depth / lag gauges: in worker or gateway poll XLEN stella:perf:jobs and export as a gauge metric in Prometheus format.
Drain-time measurement: implement /test/end/{runId} that waits until “matched jobs stop increasing” + queue depth returns baseline, and records a final metric.
Stage slicing (per plateau stats): extend analyze.py to accept your k6 stage plan and compute p95 per stage window (based on start_ns).

If you want, I can extend the sample with (1) queue-depth export and (2) per-plateau slicing in analyze.py without adding any new dependencies.

48 KiB Raw Blame History Unescape Escape

What we’re measuring (plain English)

Minimal tracing you can add today

Valkey commands (safe, BSD Valkey)

Load shapes to test (find the envelope)

k6 script (HTTP client pressure)

Worker hooks (.NET 10 sketch)

Prometheus metrics to expose

Autoscaling signals (HPA/KEDA friendly)

Plot the “envelope” (what you’ll look at)

Performance Test Guidelines

HTTP → Valkey → Worker pipeline

1) Objectives and scope

Primary objectives

Non-goals (explicitly out of scope unless you add them later)

2) Definitions and required metrics

Required latency definitions (standardize these names)

Required percentiles and aggregations

Required system-level telemetry (minimum)

3) Environment and test hygiene requirements

Environment requirements

Test hygiene (non-negotiable)

Load generator hygiene

4) Instrumentation spec (agents implement this first)

Correlation and timestamps

Clock requirements

Valkey queue semantics (recommended)

Metrics exposure

5) Workload modeling and test data

6) Scenario suite your agents MUST implement

Scenario A — Smoke (fast sanity)

Scenario B — Baseline (repeatable reference point)

Scenario C — Capacity ramp (open-loop)

Scenario D — Stress (push past capacity)

Scenario E — Burst / spike

Scenario F — Soak (long-running)

Scenario G — Scaling curve (worker replica sweep)

7) Execution protocol (runbook)

Pre-run checklist

During run

Post-run

8) Analysis rules (how agents compute “the envelope”)

How to identify the “knee”

Convert results into scaling guidance

9) Pass/fail criteria and regression gates

10) Common pitfalls (agents must avoid)

Agent implementation checklist (deliverables)

Agent 1 — Observability & tracing

Agent 2 — Load test harness

Agent 3 — Result collector and analyzer

Agent 4 — Reporting and dashboards

Agent 5 — CI / release integration

Template: scenario spec (agents can copy/paste)

1) Suggested folder layout

2) Gateway sample (.NET 10, Minimal API)

perf/src/Perf.Gateway/Perf.Gateway.csproj

perf/src/Perf.Gateway/TimeNs.cs

perf/src/Perf.Gateway/Metrics.cs

perf/src/Perf.Gateway/ValkeyStreams.cs

perf/src/Perf.Gateway/Program.cs

3) Worker sample (.NET 10 hosted service + metrics)

perf/src/Perf.Worker/Perf.Worker.csproj

perf/src/Perf.Worker/TimeNs.cs

perf/src/Perf.Worker/Metrics.cs

perf/src/Perf.Worker/ValkeyStreams.cs

perf/src/Perf.Worker/WorkerService.cs

perf/src/Perf.Worker/Program.cs

4) Docker Compose (Valkey + gateway + worker + Prometheus)

perf/docker-compose.yml

perf/prometheus/prometheus.yml

5) k6 scenarios A–G (open-loop where required)

perf/k6/lib.js

Scenario A: Smoke — perf/k6/smoke.js

Scenario C: Capacity ramp (open-loop) — perf/k6/capacity_ramp.js

Scenario E: Burst — perf/k6/burst.js

Scenario F: Soak — perf/k6/soak.js

Scenario D: Stress — perf/k6/stress.js

Scenario G: Scaling curve orchestration — perf/k6/scaling_curve.sh

6) Offline analysis tool (reads “done” stream by run_id)

perf/tools/analyze.py

48 KiB

Raw Blame History

`perf/src/Perf.Gateway/Perf.Gateway.csproj`

`perf/src/Perf.Gateway/TimeNs.cs`

`perf/src/Perf.Gateway/Metrics.cs`

`perf/src/Perf.Gateway/ValkeyStreams.cs`

`perf/src/Perf.Gateway/Program.cs`

`perf/src/Perf.Worker/Perf.Worker.csproj`

`perf/src/Perf.Worker/TimeNs.cs`

`perf/src/Perf.Worker/Metrics.cs`

`perf/src/Perf.Worker/ValkeyStreams.cs`

`perf/src/Perf.Worker/WorkerService.cs`

`perf/src/Perf.Worker/Program.cs`

`perf/docker-compose.yml`

`perf/prometheus/prometheus.yml`

`perf/k6/lib.js`

Scenario A: Smoke — `perf/k6/smoke.js`

Scenario C: Capacity ramp (open-loop) — `perf/k6/capacity_ramp.js`

Scenario E: Burst — `perf/k6/burst.js`

Scenario F: Soak — `perf/k6/soak.js`

Scenario D: Stress — `perf/k6/stress.js`

Scenario G: Scaling curve orchestration — `perf/k6/scaling_curve.sh`

`perf/tools/analyze.py`