up

2025-12-12 09:35:37 +02:00
parent ce5ec9c158
commit efaf3cb789
238 changed files with 146274 additions and 5767 deletions
--- a/docs/benchmarks/signals/bench-sig-26-002-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-002-prep.md
@@ -1,21 +1,31 @@
 # Policy Eval with Reachability Cache Prep — PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU

-Status: Draft (2025-11-20)
-Owners: Bench Guild · Policy Guild
-Scope: Capture prep for measuring policy evaluation overhead with reachability cache hot/cold, dependent on 26-001 outputs.
+Status: Ready for execution (2025-12-11)
+Owners: Bench Guild Aú Policy Guild
+Scope: Measure policy evaluation overhead with reachability cache hot/cold/mixed scenarios using outputs from BENCH-SIG-26-001.

 ## Dependencies
- Bench outputs from 26-001 (reachability scoring harness) providing cached datasets.
- Policy overlay schema (30-001) for status fields.
+- Reachability cache NDJSON from BENCH-SIG-26-001:
+  - `src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson` (`.sha256`).
+  - 50k variant available for heavier runs (`reachability-cache-50k.ndjson` + `.sha256`).
+- Policy baseline dataset: `docs/samples/policy/policy-delta-baseline.ndjson` (+ `.sha256`).
+- Policy overlay schema (30-001) — using deterministic synthetic mapping in harness; update when official schema lands.

-## Proposed benchmarks
- Scenarios: cold cache, warm cache, mixed workload (70/30), parallel workers.
- Metrics: added latency per evaluation (p50/p95), cache hit ratio, CPU, memory.
- Determinism: fixed seed; deterministic request order; stable JSON output ordering.
+## Harness
+- Project: `src/Bench/StellaOps.Bench/PolicyCache/policy_cache_bench.py`.
+- Scenarios: cold cache, warm cache, mixed (70/30 warm/cold).
+- Metrics: throughput, p50/p95/p99 added latency per evaluation, RSS/managed MB, GC gen2, cache hit rate.
+- Inputs: policy baseline + reachability cache NDJSON.
+
+## Commands
+- 10k cache with baseline policies:
+  `python src/Bench/StellaOps.Bench/PolicyCache/policy_cache_bench.py --policies docs/samples/policy/policy-delta-baseline.ndjson --reachability-cache src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson --output src/Bench/StellaOps.Bench/PolicyCache/results/policy-cache.ndjson --seed 20250101 --threads 1`
+- Swap cache path to `reachability-cache-50k.ndjson` to stress the larger dataset.

 ## Acceptance
- Reference to reachability dataset hash from 26-001 once available.
- Config/sample command drafted for `src/Bench/StellaOps.Bench.Policy` (or shared).
+- Cache input and policy baseline present with hashes. ✅
+- Cold/warm/mixed runs emit NDJSON with sorted keys; cache hit rate captured. ✅
+- Outputs hashed locally (`policy-cache.ndjson.sha256`) and ready for perf dashboard ingestion. ✅

 ## Handoff
-Use this prep doc to satisfy PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU. Update with dataset hash and schema references after 26-001 is done, then move to DONE and unblock BENCH-SIG-26-002.
+Use cache outputs from BENCH-SIG-26-001 to run the above command. Compare added latency between cold vs warm runs; mixed scenario should stay within target thresholds (p95 delta ≤ configured budget).