Files
git.stella-ops.org/docs/benchmarks/policy/bench-policy-20-002-prep.md
master d519782a8f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
prep docs and service updates
2025-11-21 06:56:36 +00:00

2.2 KiB

Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)

Status: Ready for implementation (2025-11-20) Owners: Bench Guild · Policy Guild · Scheduler Guild Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.

Goals

  • Compare delta evaluation (incremental changes) against full evaluation over the same dataset.
  • Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions.

Dataset

  • Baseline snapshot: docs/samples/policy/policy-delta-baseline.ndjson
    • 5,000 records of { "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } }
    • Deterministic ordering; SHA256 file saved as policy-delta-baseline.ndjson.sha256.
  • Delta patch: docs/samples/policy/policy-delta-changes.ndjson
    • 500 changes mixing updates/inserts/deletes (encoded with op: "upsert"|"delete").
    • Sorted by policyId then op for deterministic replay.

Harness plan (to be built under src/Bench/StellaOps.Bench.Policy)

  • Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
  • Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
  • Metrics captured to NDJSON per run:
    • { run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }
  • Determinism:
    • Use fixed random seed 2025-01-01 for any shuffling; single-threaded mode flag --threads 1 when reproducibility needed.
    • All timestamps in UTC ISO-8601; output NDJSON sorted by run.

Acceptance criteria

  • Baseline + delta sample files and SHA256 hashes present under docs/samples/policy/.
  • Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware.
  • Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.

Next steps

  • Add sample files + hashes to docs/samples/policy/ (can be generated with fixed seed).
  • Implement harness CLI wrapper dotnet run -- policy-delta --baseline <path> --delta <path> [--threads 1] writing outputs to out/bench/policy/ with .sha256.