Bench Prep — PREP-BENCH-GRAPH-21-002 (UI headless graph benchmarks)

Status: Ready for implementation (2025-11-20) Owners: Bench Guild · UI Guild Scope: Define the Playwright-based UI benchmark that rides on the graph harness from BENCH-GRAPH-21-001 (50k/100k node fixtures) and produces deterministic latency/FPS metrics.

Dependencies

Harness + fixtures from BENCH-GRAPH-21-001 (must expose HTTP endpoints and data seeds for 50k/100k graphs).
Graph API/Indexer stable query contract (per docs/modules/graph/architecture.md).

Benchmark plan

Runner: Playwright (Chromium, headless) driven via src/Bench/StellaOps.Bench.GraphUi.
Environment:
- Viewport: 1920x1080, device scale 1.0, throttling disabled; CPU pinned via --disable-features=CPUThrottling.
- Fixed session seed GRAPH_BENCH_SEED=2025-01-01T00:00:00Z for RNG use in camera jitter.
Scenarios (each repeated 5x, median + p95 recorded):
1. Canvas load: open /graph/bench?fixture=50k → measure TTI, first contentful paint, tiles loaded count.
2. Pan/zoom loop: pan 500px x 20 iterations + zoom in/out (2x each) → record average FPS and frame jank percentage.
3. Path query: submit shortest-path query between two seeded nodes → measure query latency (client + API) and render latency.
4. Filter drill-down: apply two filters (severity=high, product=“core”) → measure time to filtered render + memory delta.
Metrics captured to NDJSON per run:
- timestampUtc, scenario, fixture, p95_ms, median_ms, avg_fps, jank_pct, mem_mb, api_latency_ms (where applicable).
Determinism:
- All timestamps recorded in UTC ISO-8601; RNG seeded; cache cleared before each scenario; --disable-features=UseAFH disabled to avoid adaptive throttling.

Outputs

NDJSON benchmark results stored under out/bench/graph/ui/{runId}.ndjson with a .sha256 alongside.
Summary CSV optional, derived from NDJSON for reporting only.
CI step publishes artifacts to out/bench/graph/ui/latest/ with write-once semantics per runId.

Acceptance criteria

Playwright suite reproducibly exercises the four scenarios on 50k and 100k fixtures with seeded inputs.
Metrics include p95 and median for each scenario and fixture size; FPS ≥ 30 on 50k fixture baseline.
Archive outputs are deterministic for given fixture and seed (excluding wall-clock timestamps in filenames; embed timestamps only in content).

Next steps

Wire Playwright harness into BENCH-GRAPH-21-001 pipeline once fixtures ready.
Hook results into perf dashboard if available; otherwise store NDJSON + hashes.

2.6 KiB Raw Blame History