# Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark) Status: **Ready for execution** (2025-12-11) Owners: Bench Guild · Policy Guild · Scheduler Guild Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs. ## Goals - Compare delta evaluation (incremental changes) against full evaluation over the same dataset. - Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions. ## Dataset - Baseline snapshot: `docs/samples/policy/policy-delta-baseline.ndjson` - 5,000 records of `{ "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.", "version": "1.0.", "decision": "allow|deny", "factors": { ... } }` - Deterministic ordering; SHA256 `40ca9ee15065a9e16f51a259d3feec778203ab461db2af3bf196f5fcd9f0d590` (`policy-delta-baseline.ndjson.sha256`). - Delta patch: `docs/samples/policy/policy-delta-changes.ndjson` - 500 changes mixing updates/inserts/deletes (encoded with `op`: "upsert"|"delete"). - Sorted by `policyId` then `op` for deterministic replay; SHA256 `7f9d7f124830b9fe4d3f232b4cc7e2e728be2ef725e8a66606b9e95682bf6318` (`policy-delta-changes.ndjson.sha256`). ## Harness plan (implemented under `src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py`) - Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics. - Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics. - Metrics captured to NDJSON per run: - `{ run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }` - Determinism: - Use fixed random seed `2025-01-01` for any shuffling; single-threaded mode flag `--threads 1` when reproducibility needed. - All timestamps in UTC ISO-8601; output NDJSON sorted by `run`. ## Acceptance criteria - Baseline + delta sample files and SHA256 hashes present under `docs/samples/policy/`. - Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware. - Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput. ## Next steps - Harness CLI: `python src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py --baseline docs/samples/policy/policy-delta-baseline.ndjson --delta docs/samples/policy/policy-delta-changes.ndjson --output src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson --threads 1 --seed 20250101`. - Results hashed at `src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson.sha256`.