prep docs and service updates

2025-11-21 06:56:36 +00:00
parent ca35db9ef4
commit d519782a8f
242 changed files with 17293 additions and 13367 deletions
--- a/docs/benchmarks/graph/bench-graph-21-001-prep.md
+++ b/docs/benchmarks/graph/bench-graph-21-001-prep.md
@@ -0,0 +1,31 @@
+# Bench Prep — PREP-BENCH-GRAPH-21-001 (Graph API/Indexer harness)
+
+Status: **Ready for implementation** (2025-11-20)
+Owners: Bench Guild · Graph Platform Guild
+Scope: Build deterministic Graph benchmark harness for 50k/100k node fixtures measuring API/Indexer latency, memory, and tile cache hit rates.
+
+## Fixtures
+- Use SAMPLES-GRAPH-24-003 (40–50k) and extend to 100k via duplication with new ids; store under `docs/samples/graph/50k.ndjson` and `100k.ndjson` with `.sha256` hashes.
+- Node ordering deterministic; timestamps fixed to `2025-01-01T00:00:00Z`.
+
+## Harness plan (project: `src/Bench/StellaOps.Bench.GraphApi`)
+- Scenarios (repeat 5x; report median/p95):
+  1. **Viewport fetch**: `/v1/graph/tiles?bbox=<seed>` — measure server latency + tile count.
+  2. **Path query**: `/v1/graph/path?from=...&to=...` — latency + hops + cache hits.
+  3. **Overlay apply**: apply policy overlay to 1k nodes; measure apply time and index rebuild cost.
+  4. **Cold vs warm cache**: run viewport + path with cache cold then warm; capture hit rate.
+- Metrics captured as NDJSON per run: `{ scenario, fixture, pass: cold|warm, medianMs, p95Ms, maxMs, rssMb, managedMb, cacheHitRate }` plus start/end UTC timestamps.
+- Determinism: fixed seed (`GRAPH_BENCH_SEED=2025-01-01T00:00:00Z`); single-thread option `--threads 1` for reproducibility; clear caches between cold/warm phases.
+
+## Outputs
+- Store under `out/bench/graph/api/{runId}/results.ndjson` with `.sha256`.
+- Summary CSV optional derived from NDJSON; no dynamic wall-clock in filenames beyond runId.
+
+## Acceptance criteria
+- Harness runs offline against local fixtures; no external calls.
+- Median/p95 for each scenario produced for both 50k and 100k fixtures; cache hit rate recorded where applicable.
+- Re-running with same seed/fixtures yields identical NDJSON (apart from RSS variance). 
+
+## Next steps
+- Generate fixtures + hashes; wire CLI entry `dotnet run -- graph-api --fixture docs/samples/graph/50k.ndjson --seed 20250101`.
+- Add perf dashboard hook if available; otherwise publish artifacts under `out/bench/graph/api/latest/`.
--- a/docs/benchmarks/graph/bench-graph-21-002-prep.md
+++ b/docs/benchmarks/graph/bench-graph-21-002-prep.md
@@ -0,0 +1,38 @@
+# Bench Prep — PREP-BENCH-GRAPH-21-002 (UI headless graph benchmarks)
+
+Status: **Ready for implementation** (2025-11-20)
+Owners: Bench Guild · UI Guild
+Scope: Define the Playwright-based UI benchmark that rides on the graph harness from BENCH-GRAPH-21-001 (50k/100k node fixtures) and produces deterministic latency/FPS metrics.
+
+## Dependencies
+- Harness + fixtures from BENCH-GRAPH-21-001 (must expose HTTP endpoints and data seeds for 50k/100k graphs).
+- Graph API/Indexer stable query contract (per `docs/modules/graph/architecture.md`).
+
+## Benchmark plan
+- Runner: Playwright (Chromium, headless) driven via `src/Bench/StellaOps.Bench.GraphUi`.
+- Environment:
+  - Viewport: 1920x1080, device scale 1.0, throttling disabled; CPU pinned via `--disable-features=CPUThrottling`.
+  - Fixed session seed `GRAPH_BENCH_SEED=2025-01-01T00:00:00Z` for RNG use in camera jitter.
+- Scenarios (each repeated 5x, median + p95 recorded):
+  1. **Canvas load**: open `/graph/bench?fixture=50k` → measure TTI, first contentful paint, tiles loaded count.
+  2. **Pan/zoom loop**: pan 500px x 20 iterations + zoom in/out (2x each) → record average FPS and frame jank percentage.
+  3. **Path query**: submit shortest-path query between two seeded nodes → measure query latency (client + API) and render latency.
+  4. **Filter drill-down**: apply two filters (severity=high, product=“core”) → measure time to filtered render + memory delta.
+- Metrics captured to NDJSON per run:
+  - `timestampUtc`, `scenario`, `fixture`, `p95_ms`, `median_ms`, `avg_fps`, `jank_pct`, `mem_mb`, `api_latency_ms` (where applicable).
+- Determinism:
+  - All timestamps recorded in UTC ISO-8601; RNG seeded; cache cleared before each scenario; `--disable-features=UseAFH` disabled to avoid adaptive throttling.
+
+## Outputs
+- NDJSON benchmark results stored under `out/bench/graph/ui/{runId}.ndjson` with a `.sha256` alongside.
+- Summary CSV optional, derived from NDJSON for reporting only.
+- CI step publishes artifacts to `out/bench/graph/ui/latest/` with write-once semantics per runId.
+
+## Acceptance criteria
+- Playwright suite reproducibly exercises the four scenarios on 50k and 100k fixtures with seeded inputs.
+- Metrics include p95 and median for each scenario and fixture size; FPS ≥ 30 on 50k fixture baseline.
+- Archive outputs are deterministic for given fixture and seed (excluding wall-clock timestamps in filenames; embed timestamps only in content).
+
+## Next steps
+- Wire Playwright harness into `BENCH-GRAPH-21-001` pipeline once fixtures ready.
+- Hook results into perf dashboard if available; otherwise store NDJSON + hashes.
--- a/docs/benchmarks/impact/bench-impact-16-001-prep.md
+++ b/docs/benchmarks/impact/bench-impact-16-001-prep.md
@@ -0,0 +1,31 @@
+# Bench Prep — PREP-BENCH-IMPACT-16-001 (ImpactIndex dataset/replay)
+
+Status: **Ready for implementation** (2025-11-20)
+Owners: Bench Guild · Scheduler Team
+Scope: Provide deterministic dataset + replay plan for ImpactIndex throughput benchmark (resolve 10k productKeys; measure latency/throughput/memory).
+
+## Inputs/dataset
+- Snapshot file: `bench/impactindex/products-10k.ndjson` (10,000 productKeys, shuffled once with seed `2025-01-01T00:00:00Z`).
+- Each line: `{ "productKey": "pkg:<ecosystem>/<name>@<version>", "tenant": "bench" }`.
+- Include checksum file `products-10k.ndjson.sha256` and drop into repo under `docs/samples/impactindex/`.
+
+## Benchmark procedure
+- Harness location: `src/Bench/StellaOps.Bench.ImpactIndex`.
+- Warmup: 1k lookups (excluded from metrics) to trigger caches.
+- Run: process all 10k productKeys twice (cold, warm). Record per-pass statistics.
+- Metrics to capture (per pass):
+  - `throughput_items_per_sec`, `p95_ms`, `p99_ms`, `max_ms` for lookups.
+  - `rss_mb`, `managed_mb`, `gc_gen2_count` from .NET counters.
+  - `cache_hit_rate` if cache present.
+- Output format: NDJSON; one object per pass with fields `{ pass: "cold"|"warm", startedAtUtc, durationMs, throughput, p95Ms, p99Ms, maxMs, rssMb, managedMb, gcGen2, cacheHitRate }`.
+- Determinism: fixed seed, single-threaded option flag `--threads 1` for reproducibility; timestamps in UTC ISO-8601.
+
+## Acceptance criteria
+- Dataset and checksum published; harness reads from local sample path (no network).
+- Benchmark run produces deterministic NDJSON for given seed and hardware profile; differences limited to RSS variability but within ±5%.
+- Cold vs warm pass metrics logged; throughput target ≥ 2k items/sec on reference hardware, p95 ≤ 25 ms.
+
+## Next steps
+- Commit dataset + checksum under `docs/samples/impactindex/`.
+- Wire harness CLI (`dotnet run -- impactindex --input docs/samples/impactindex/products-10k.ndjson --threads 1 --seed 20250101`).
+- Surface metrics to perf dashboard once harness lands; otherwise store under `out/bench/impactindex/` with hashes.
--- a/docs/benchmarks/policy/bench-policy-20-002-prep.md
+++ b/docs/benchmarks/policy/bench-policy-20-002-prep.md
@@ -0,0 +1,35 @@
+# Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)
+
+Status: **Ready for implementation** (2025-11-20)
+Owners: Bench Guild · Policy Guild · Scheduler Guild
+Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.
+
+## Goals
+- Compare delta evaluation (incremental changes) against full evaluation over the same dataset.
+- Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions.
+
+## Dataset
+- Baseline snapshot: `docs/samples/policy/policy-delta-baseline.ndjson`
+  - 5,000 records of `{ "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } }`
+  - Deterministic ordering; SHA256 file saved as `policy-delta-baseline.ndjson.sha256`.
+- Delta patch: `docs/samples/policy/policy-delta-changes.ndjson`
+  - 500 changes mixing updates/inserts/deletes (encoded with `op`: "upsert"|"delete").
+  - Sorted by `policyId` then `op` for deterministic replay.
+
+## Harness plan (to be built under `src/Bench/StellaOps.Bench.Policy`)
+- Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
+- Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
+- Metrics captured to NDJSON per run:
+  - `{ run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }`
+- Determinism:
+  - Use fixed random seed `2025-01-01` for any shuffling; single-threaded mode flag `--threads 1` when reproducibility needed.
+  - All timestamps in UTC ISO-8601; output NDJSON sorted by `run`.
+
+## Acceptance criteria
+- Baseline + delta sample files and SHA256 hashes present under `docs/samples/policy/`.
+- Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware.
+- Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.
+
+## Next steps
+- Add sample files + hashes to `docs/samples/policy/` (can be generated with fixed seed).
+- Implement harness CLI wrapper `dotnet run -- policy-delta --baseline <path> --delta <path> [--threads 1]` writing outputs to `out/bench/policy/` with `.sha256`.
--- a/docs/benchmarks/signals/bench-sig-26-001-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-001-prep.md
@@ -0,0 +1,23 @@
+# Reachability Scoring Bench Prep — PREP-BENCH-SIG-26-001-REACHABILITY-SCHEMA-FIX
+
+Status: Draft (2025-11-20)
+Owners: Bench Guild · Signals Guild
+Scope: Define the inputs/fixtures for reachability scoring benchmarks pending schema freeze (Sprint 0400/0401).
+
+## Dependencies
+- Reachability schema for runtime/static signals (Sprint 0400/0401).
+- Sample callgraph/runtime traces sized for 10k/50k functions.
+
+## Proposed harness
+- Project: `src/Bench/StellaOps.Bench.Signals` (or shared bench harness if preferred).
+- Inputs: callgraph NDJSON + runtime traces; config with seed, concurrency, batch size.
+- Metrics: facts/sec, p95 latency, peak RSS, cache hit ratio; output NDJSON with sorted records.
+- Determinism: fixed seed; process inputs in lexical order; stable JSON property order.
+
+## Acceptance
+- Schema hash referenced once Sprint 0400/0401 publishes; placeholder noted until then.
+- Sample config + command documented.
+- File paths for sample fixtures under `docs/samples/signals/` once available.
+
+## Handoff
+Use this prep doc to satisfy PREP-BENCH-SIG-26-001-REACHABILITY-SCHEMA-FIX. Update with schema hash and fixtures when published; then move the task to DONE and unblock BENCH-SIG-26-001 implementation.
--- a/docs/benchmarks/signals/bench-sig-26-002-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-002-prep.md
@@ -0,0 +1,21 @@
+# Policy Eval with Reachability Cache Prep — PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU
+
+Status: Draft (2025-11-20)
+Owners: Bench Guild · Policy Guild
+Scope: Capture prep for measuring policy evaluation overhead with reachability cache hot/cold, dependent on 26-001 outputs.
+
+## Dependencies
+- Bench outputs from 26-001 (reachability scoring harness) providing cached datasets.
+- Policy overlay schema (30-001) for status fields.
+
+## Proposed benchmarks
+- Scenarios: cold cache, warm cache, mixed workload (70/30), parallel workers.
+- Metrics: added latency per evaluation (p50/p95), cache hit ratio, CPU, memory.
+- Determinism: fixed seed; deterministic request order; stable JSON output ordering.
+
+## Acceptance
+- Reference to reachability dataset hash from 26-001 once available.
+- Config/sample command drafted for `src/Bench/StellaOps.Bench.Policy` (or shared).
+
+## Handoff
+Use this prep doc to satisfy PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU. Update with dataset hash and schema references after 26-001 is done, then move to DONE and unblock BENCH-SIG-26-002.