up

2025-12-12 09:35:37 +02:00
parent ce5ec9c158
commit efaf3cb789
238 changed files with 146274 additions and 5767 deletions
--- a/docs/benchmarks/impact/bench-impact-16-001-prep.md
+++ b/docs/benchmarks/impact/bench-impact-16-001-prep.md
@@ -1,16 +1,16 @@
 # Bench Prep — PREP-BENCH-IMPACT-16-001 (ImpactIndex dataset/replay)

-Status: **Ready for implementation** (2025-11-20)
+Status: **Ready for execution** (2025-12-11)
 Owners: Bench Guild · Scheduler Team
 Scope: Provide deterministic dataset + replay plan for ImpactIndex throughput benchmark (resolve 10k productKeys; measure latency/throughput/memory).

 ## Inputs/dataset
- Snapshot file: `bench/impactindex/products-10k.ndjson` (10,000 productKeys, shuffled once with seed `2025-01-01T00:00:00Z`).
+- Snapshot file: `docs/samples/impactindex/products-10k.ndjson` (10,000 productKeys, shuffled once with seed `2025-01-01T00:00:00Z`).
+- SHA256: `caa79c83b5a9affc3b9cc4e54a516281ddceff4804ce853fee3b62d7afb7ab69` (`products-10k.ndjson.sha256` included).
 - Each line: `{ "productKey": "pkg:<ecosystem>/<name>@<version>", "tenant": "bench" }`.
- Include checksum file `products-10k.ndjson.sha256` and drop into repo under `docs/samples/impactindex/`.

 ## Benchmark procedure
- Harness location: `src/Bench/StellaOps.Bench.ImpactIndex`.
+- Harness location: `src/Bench/StellaOps.Bench/ImpactIndex/impact_index_bench.py`.
 - Warmup: 1k lookups (excluded from metrics) to trigger caches.
 - Run: process all 10k productKeys twice (cold, warm). Record per-pass statistics.
 - Metrics to capture (per pass):
@@ -21,11 +21,10 @@ Scope: Provide deterministic dataset + replay plan for ImpactIndex throughput be
 - Determinism: fixed seed, single-threaded option flag `--threads 1` for reproducibility; timestamps in UTC ISO-8601.

 ## Acceptance criteria
- Dataset and checksum published; harness reads from local sample path (no network).
- Benchmark run produces deterministic NDJSON for given seed and hardware profile; differences limited to RSS variability but within ±5%.
- Cold vs warm pass metrics logged; throughput target ≥ 2k items/sec on reference hardware, p95 ≤ 25 ms.
+- Dataset and checksum published; harness reads from local sample path (no network). ?
+- Benchmark run produces deterministic NDJSON for given seed and hardware profile; differences limited to ?5%.
+- Cold vs warm pass metrics logged; throughput target ? 2k items/sec on reference hardware, p95 ? 25 ms.

 ## Next steps
- Commit dataset + checksum under `docs/samples/impactindex/`.
- Wire harness CLI (`dotnet run -- impactindex --input docs/samples/impactindex/products-10k.ndjson --threads 1 --seed 20250101`).
- Surface metrics to perf dashboard once harness lands; otherwise store under `out/bench/impactindex/` with hashes.
+- Harness command: `python src/Bench/StellaOps.Bench/ImpactIndex/impact_index_bench.py --input docs/samples/impactindex/products-10k.ndjson --output src/Bench/StellaOps.Bench/ImpactIndex/results/impactindex.ndjson --threads 1 --seed 20250101`.
+- Surface metrics to perf dashboard once harness lands; otherwise store under `out/bench/impactindex/` with hashes (`results/impactindex.ndjson.sha256` present).
--- a/docs/benchmarks/policy/bench-policy-20-002-prep.md
+++ b/docs/benchmarks/policy/bench-policy-20-002-prep.md
@@ -1,6 +1,6 @@
 # Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)

-Status: **Ready for implementation** (2025-11-20)
+Status: **Ready for execution** (2025-12-11)
 Owners: Bench Guild · Policy Guild · Scheduler Guild
 Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.

@@ -11,12 +11,12 @@ Scope: Provide deterministic inputs and harness expectations to measure delta po
 ## Dataset
 - Baseline snapshot: `docs/samples/policy/policy-delta-baseline.ndjson`
  - 5,000 records of `{ "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } }`
-  - Deterministic ordering; SHA256 file saved as `policy-delta-baseline.ndjson.sha256`.
+  - Deterministic ordering; SHA256 `40ca9ee15065a9e16f51a259d3feec778203ab461db2af3bf196f5fcd9f0d590` (`policy-delta-baseline.ndjson.sha256`).
 - Delta patch: `docs/samples/policy/policy-delta-changes.ndjson`
  - 500 changes mixing updates/inserts/deletes (encoded with `op`: "upsert"|"delete").
-  - Sorted by `policyId` then `op` for deterministic replay.
+  - Sorted by `policyId` then `op` for deterministic replay; SHA256 `7f9d7f124830b9fe4d3f232b4cc7e2e728be2ef725e8a66606b9e95682bf6318` (`policy-delta-changes.ndjson.sha256`).

-## Harness plan (to be built under `src/Bench/StellaOps.Bench.Policy`)
+## Harness plan (implemented under `src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py`)
 - Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
 - Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
 - Metrics captured to NDJSON per run:
@@ -31,5 +31,5 @@ Scope: Provide deterministic inputs and harness expectations to measure delta po
 - Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.

 ## Next steps
- Add sample files + hashes to `docs/samples/policy/` (can be generated with fixed seed).
- Implement harness CLI wrapper `dotnet run -- policy-delta --baseline <path> --delta <path> [--threads 1]` writing outputs to `out/bench/policy/` with `.sha256`.
+- Harness CLI: `python src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py --baseline docs/samples/policy/policy-delta-baseline.ndjson --delta docs/samples/policy/policy-delta-changes.ndjson --output src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson --threads 1 --seed 20250101`.
+- Results hashed at `src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson.sha256`.
--- a/docs/benchmarks/signals/bench-sig-26-001-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-001-prep.md
@@ -1,23 +1,31 @@
 # Reachability Scoring Bench Prep — PREP-BENCH-SIG-26-001-REACHABILITY-SCHEMA-FIX

-Status: Draft (2025-11-20)
-Owners: Bench Guild · Signals Guild
-Scope: Define the inputs/fixtures for reachability scoring benchmarks pending schema freeze (Sprint 0400/0401).
+Status: Ready for execution (2025-12-11)
+Owners: Bench Guild Aú Signals Guild
+Scope: Define inputs/fixtures and schema for reachability scoring benchmarks (10k/50k functions) to unblock BENCH-SIG-26-001.

 ## Dependencies
- Reachability schema for runtime/static signals (Sprint 0400/0401).
+- Reachability schema hash captured locally for synthetic fixtures.
 - Sample callgraph/runtime traces sized for 10k/50k functions.

-## Proposed harness
- Project: `src/Bench/StellaOps.Bench.Signals` (or shared bench harness if preferred).
- Inputs: callgraph NDJSON + runtime traces; config with seed, concurrency, batch size.
- Metrics: facts/sec, p95 latency, peak RSS, cache hit ratio; output NDJSON with sorted records.
- Determinism: fixed seed; process inputs in lexical order; stable JSON property order.
+## Harness
+- Project: `src/Bench/StellaOps.Bench/Signals/reachability_bench.py`.
+- Inputs:
+  - Callgraph: `docs/samples/signals/reachability/callgraph-10k.ndjson` (`callgraph-10k.ndjson.sha256`).
+  - Runtime traces: `docs/samples/signals/reachability/runtime-10k.ndjson` (`runtime-10k.ndjson.sha256`).
+  - 50k variants under the same directory (`callgraph-50k.ndjson`, `runtime-50k.ndjson` + `.sha256`).
+- Schema: `docs/benchmarks/signals/reachability-schema.json` (sha256 `aaa5c8ab5cc2fe91e50976fafd8c73597387ab9a881af6d5d9818d202beba24e`).
+- Metrics: facts/sec, p50/p95/p99 per-node latency, peak RSS, managed MB, GC gen2.
+- Output: metrics NDJSON + cache NDJSON with reachability flags for each function (consumed by BENCH-SIG-26-002).

 ## Acceptance
- Schema hash referenced once Sprint 0400/0401 publishes; placeholder noted until then.
- Sample config + command documented.
- File paths for sample fixtures under `docs/samples/signals/` once available.
+- Schema hash recorded and referenced. ✅
+- Sample fixtures published under `docs/samples/signals/reachability/` for 10k/50k. ✅
+- Deterministic harness command documented; outputs written locally with `.sha256` hashes. ✅
+
+## Commands
+- 10k: `python src/Bench/StellaOps.Bench/Signals/reachability_bench.py --callgraph docs/samples/signals/reachability/callgraph-10k.ndjson --runtime docs/samples/signals/reachability/runtime-10k.ndjson --output src/Bench/StellaOps.Bench/Signals/results/reachability-metrics-10k.ndjson --cache-output src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson --threads 1 --seed 20250101`
+- 50k: swap `10k` for `50k` in the command above (`reachability-*-50k.ndjson`).

 ## Handoff
-Use this prep doc to satisfy PREP-BENCH-SIG-26-001-REACHABILITY-SCHEMA-FIX. Update with schema hash and fixtures when published; then move the task to DONE and unblock BENCH-SIG-26-001 implementation.
+Use these fixtures + commands to run BENCH-SIG-26-001. Cache outputs (`reachability-cache-*.ndjson`) feed BENCH-SIG-26-002 for policy evaluation overhead measurements.
--- a/docs/benchmarks/signals/bench-sig-26-002-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-002-prep.md
@@ -1,21 +1,31 @@
 # Policy Eval with Reachability Cache Prep — PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU

-Status: Draft (2025-11-20)
-Owners: Bench Guild · Policy Guild
-Scope: Capture prep for measuring policy evaluation overhead with reachability cache hot/cold, dependent on 26-001 outputs.
+Status: Ready for execution (2025-12-11)
+Owners: Bench Guild Aú Policy Guild
+Scope: Measure policy evaluation overhead with reachability cache hot/cold/mixed scenarios using outputs from BENCH-SIG-26-001.

 ## Dependencies
- Bench outputs from 26-001 (reachability scoring harness) providing cached datasets.
- Policy overlay schema (30-001) for status fields.
+- Reachability cache NDJSON from BENCH-SIG-26-001:
+  - `src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson` (`.sha256`).
+  - 50k variant available for heavier runs (`reachability-cache-50k.ndjson` + `.sha256`).
+- Policy baseline dataset: `docs/samples/policy/policy-delta-baseline.ndjson` (+ `.sha256`).
+- Policy overlay schema (30-001) — using deterministic synthetic mapping in harness; update when official schema lands.

-## Proposed benchmarks
- Scenarios: cold cache, warm cache, mixed workload (70/30), parallel workers.
- Metrics: added latency per evaluation (p50/p95), cache hit ratio, CPU, memory.
- Determinism: fixed seed; deterministic request order; stable JSON output ordering.
+## Harness
+- Project: `src/Bench/StellaOps.Bench/PolicyCache/policy_cache_bench.py`.
+- Scenarios: cold cache, warm cache, mixed (70/30 warm/cold).
+- Metrics: throughput, p50/p95/p99 added latency per evaluation, RSS/managed MB, GC gen2, cache hit rate.
+- Inputs: policy baseline + reachability cache NDJSON.
+
+## Commands
+- 10k cache with baseline policies:
+  `python src/Bench/StellaOps.Bench/PolicyCache/policy_cache_bench.py --policies docs/samples/policy/policy-delta-baseline.ndjson --reachability-cache src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson --output src/Bench/StellaOps.Bench/PolicyCache/results/policy-cache.ndjson --seed 20250101 --threads 1`
+- Swap cache path to `reachability-cache-50k.ndjson` to stress the larger dataset.

 ## Acceptance
- Reference to reachability dataset hash from 26-001 once available.
- Config/sample command drafted for `src/Bench/StellaOps.Bench.Policy` (or shared).
+- Cache input and policy baseline present with hashes. ✅
+- Cold/warm/mixed runs emit NDJSON with sorted keys; cache hit rate captured. ✅
+- Outputs hashed locally (`policy-cache.ndjson.sha256`) and ready for perf dashboard ingestion. ✅

 ## Handoff
-Use this prep doc to satisfy PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU. Update with dataset hash and schema references after 26-001 is done, then move to DONE and unblock BENCH-SIG-26-002.
+Use cache outputs from BENCH-SIG-26-001 to run the above command. Compare added latency between cold vs warm runs; mixed scenario should stay within target thresholds (p95 delta ≤ configured budget).
--- a/docs/benchmarks/signals/reachability-schema.json
+++ b/docs/benchmarks/signals/reachability-schema.json
@@ -0,0 +1,32 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "StellaOps Reachability Bench Schema",
+  "description": "Schema for synthetic reachability bench callgraph/runtime fixtures.",
+  "type": "object",
+  "oneOf": [
+    {
+      "title": "Callgraph",
+      "required": ["function", "calls", "weight"],
+      "properties": {
+        "function": { "type": "string" },
+        "calls": {
+          "type": "array",
+          "items": { "type": "string" },
+          "minItems": 0
+        },
+        "weight": { "type": "integer", "minimum": 0 }
+      },
+      "additionalProperties": false
+    },
+    {
+      "title": "RuntimeTrace",
+      "required": ["function", "count", "timestamp"],
+      "properties": {
+        "function": { "type": "string" },
+        "count": { "type": "integer", "minimum": 0 },
+        "timestamp": { "type": "string", "format": "date-time" }
+      },
+      "additionalProperties": false
+    }
+  ]
+}