Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)

Status: Ready for implementation (2025-11-20) Owners: Bench Guild · Policy Guild · Scheduler Guild Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.

Goals

Compare delta evaluation (incremental changes) against full evaluation over the same dataset.
Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions.

Dataset

Baseline snapshot: docs/samples/policy/policy-delta-baseline.ndjson
- 5,000 records of { "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } }
- Deterministic ordering; SHA256 file saved as policy-delta-baseline.ndjson.sha256.
Delta patch: docs/samples/policy/policy-delta-changes.ndjson
- 500 changes mixing updates/inserts/deletes (encoded with op: "upsert"|"delete").
- Sorted by policyId then op for deterministic replay.

Harness plan (to be built under `src/Bench/StellaOps.Bench.Policy`)

Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
Metrics captured to NDJSON per run:
- { run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }
Determinism:
- Use fixed random seed 2025-01-01 for any shuffling; single-threaded mode flag --threads 1 when reproducibility needed.
- All timestamps in UTC ISO-8601; output NDJSON sorted by run.

Acceptance criteria

Baseline + delta sample files and SHA256 hashes present under docs/samples/policy/.
Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware.
Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.

Next steps

Add sample files + hashes to docs/samples/policy/ (can be generated with fixed seed).
Implement harness CLI wrapper dotnet run -- policy-delta --baseline <path> --delta <path> [--threads 1] writing outputs to out/bench/policy/ with .sha256.

2.2 KiB Raw Blame History