Some checks failed
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
2.5 KiB
2.5 KiB
Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)
Status: Ready for execution (2025-12-11) Owners: Bench Guild · Policy Guild · Scheduler Guild Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.
Goals
- Compare delta evaluation (incremental changes) against full evaluation over the same dataset.
- Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions.
Dataset
- Baseline snapshot:
docs/samples/policy/policy-delta-baseline.ndjson- 5,000 records of
{ "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } } - Deterministic ordering; SHA256
40ca9ee15065a9e16f51a259d3feec778203ab461db2af3bf196f5fcd9f0d590(policy-delta-baseline.ndjson.sha256).
- 5,000 records of
- Delta patch:
docs/samples/policy/policy-delta-changes.ndjson- 500 changes mixing updates/inserts/deletes (encoded with
op: "upsert"|"delete"). - Sorted by
policyIdthenopfor deterministic replay; SHA2567f9d7f124830b9fe4d3f232b4cc7e2e728be2ef725e8a66606b9e95682bf6318(policy-delta-changes.ndjson.sha256).
- 500 changes mixing updates/inserts/deletes (encoded with
Harness plan (implemented under src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py)
- Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
- Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
- Metrics captured to NDJSON per run:
{ run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }
- Determinism:
- Use fixed random seed
2025-01-01for any shuffling; single-threaded mode flag--threads 1when reproducibility needed. - All timestamps in UTC ISO-8601; output NDJSON sorted by
run.
- Use fixed random seed
Acceptance criteria
- Baseline + delta sample files and SHA256 hashes present under
docs/samples/policy/. - Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware.
- Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.
Next steps
- Harness CLI:
python src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py --baseline docs/samples/policy/policy-delta-baseline.ndjson --delta docs/samples/policy/policy-delta-changes.ndjson --output src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson --threads 1 --seed 20250101. - Results hashed at
src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson.sha256.