36 lines
2.2 KiB
Markdown
36 lines
2.2 KiB
Markdown
# Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)
|
|
|
|
Status: **Ready for implementation** (2025-11-20)
|
|
Owners: Bench Guild · Policy Guild · Scheduler Guild
|
|
Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.
|
|
|
|
## Goals
|
|
- Compare delta evaluation (incremental changes) against full evaluation over the same dataset.
|
|
- Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions.
|
|
|
|
## Dataset
|
|
- Baseline snapshot: `docs/samples/policy/policy-delta-baseline.ndjson`
|
|
- 5,000 records of `{ "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } }`
|
|
- Deterministic ordering; SHA256 file saved as `policy-delta-baseline.ndjson.sha256`.
|
|
- Delta patch: `docs/samples/policy/policy-delta-changes.ndjson`
|
|
- 500 changes mixing updates/inserts/deletes (encoded with `op`: "upsert"|"delete").
|
|
- Sorted by `policyId` then `op` for deterministic replay.
|
|
|
|
## Harness plan (to be built under `src/Bench/StellaOps.Bench.Policy`)
|
|
- Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
|
|
- Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
|
|
- Metrics captured to NDJSON per run:
|
|
- `{ run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }`
|
|
- Determinism:
|
|
- Use fixed random seed `2025-01-01` for any shuffling; single-threaded mode flag `--threads 1` when reproducibility needed.
|
|
- All timestamps in UTC ISO-8601; output NDJSON sorted by `run`.
|
|
|
|
## Acceptance criteria
|
|
- Baseline + delta sample files and SHA256 hashes present under `docs/samples/policy/`.
|
|
- Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware.
|
|
- Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.
|
|
|
|
## Next steps
|
|
- Add sample files + hashes to `docs/samples/policy/` (can be generated with fixed seed).
|
|
- Implement harness CLI wrapper `dotnet run -- policy-delta --baseline <path> --delta <path> [--threads 1]` writing outputs to `out/bench/policy/` with `.sha256`.
|