chore: remove outdated documentation and prep notes

- Deleted several draft and prep documents related to benchmarks, authority DPoP & mTLS implementation, Java analyzer observation, link-not-merge determinism tests, replay operations, and crypto provider registry. - Updated the merge semver playbook to reflect current database schema usage. - Cleaned up the technical development README to remove references to obsolete documents and streamline guidance for contributors.
2025-12-24 12:47:50 +02:00
parent 02772c7a27
commit 40362de568
20 changed files with 6 additions and 758 deletions
--- a/docs/benchmarks/graph/bench-graph-21-001-prep.md
+++ b/docs/benchmarks/graph/bench-graph-21-001-prep.md
@@ -1,31 +0,0 @@
-# Bench Prep — PREP-BENCH-GRAPH-21-001 (Graph API/Indexer harness)
-
-Status: **Ready for implementation** (2025-11-20)
-Owners: Bench Guild · Graph Platform Guild
-Scope: Build deterministic Graph benchmark harness for 50k/100k node fixtures measuring API/Indexer latency, memory, and tile cache hit rates.
-
-## Fixtures
- Use SAMPLES-GRAPH-24-003 (40–50k) and extend to 100k via duplication with new ids; store under `docs/samples/graph/50k.ndjson` and `100k.ndjson` with `.sha256` hashes.
- Node ordering deterministic; timestamps fixed to `2025-01-01T00:00:00Z`.
-
-## Harness plan (project: `src/Bench/StellaOps.Bench.GraphApi`)
- Scenarios (repeat 5x; report median/p95):
-  1. **Viewport fetch**: `/v1/graph/tiles?bbox=<seed>` — measure server latency + tile count.
-  2. **Path query**: `/v1/graph/path?from=...&to=...` — latency + hops + cache hits.
-  3. **Overlay apply**: apply policy overlay to 1k nodes; measure apply time and index rebuild cost.
-  4. **Cold vs warm cache**: run viewport + path with cache cold then warm; capture hit rate.
- Metrics captured as NDJSON per run: `{ scenario, fixture, pass: cold|warm, medianMs, p95Ms, maxMs, rssMb, managedMb, cacheHitRate }` plus start/end UTC timestamps.
- Determinism: fixed seed (`GRAPH_BENCH_SEED=2025-01-01T00:00:00Z`); single-thread option `--threads 1` for reproducibility; clear caches between cold/warm phases.
-
-## Outputs
- Store under `out/bench/graph/api/{runId}/results.ndjson` with `.sha256`.
- Summary CSV optional derived from NDJSON; no dynamic wall-clock in filenames beyond runId.
-
-## Acceptance criteria
- Harness runs offline against local fixtures; no external calls.
- Median/p95 for each scenario produced for both 50k and 100k fixtures; cache hit rate recorded where applicable.
- Re-running with same seed/fixtures yields identical NDJSON (apart from RSS variance). 
-
-## Next steps
- Generate fixtures + hashes; wire CLI entry `dotnet run -- graph-api --fixture docs/samples/graph/50k.ndjson --seed 20250101`.
- Add perf dashboard hook if available; otherwise publish artifacts under `out/bench/graph/api/latest/`.
--- a/docs/benchmarks/graph/bench-graph-21-002-prep.md
+++ b/docs/benchmarks/graph/bench-graph-21-002-prep.md
@@ -1,38 +0,0 @@
-# Bench Prep — PREP-BENCH-GRAPH-21-002 (UI headless graph benchmarks)
-
-Status: **Ready for implementation** (2025-11-20)
-Owners: Bench Guild · UI Guild
-Scope: Define the Playwright-based UI benchmark that rides on the graph harness from BENCH-GRAPH-21-001 (50k/100k node fixtures) and produces deterministic latency/FPS metrics.
-
-## Dependencies
- Harness + fixtures from BENCH-GRAPH-21-001 (must expose HTTP endpoints and data seeds for 50k/100k graphs).
- Graph API/Indexer stable query contract (per `docs/modules/graph/architecture.md`).
-
-## Benchmark plan
- Runner: Playwright (Chromium, headless) driven via `src/Bench/StellaOps.Bench.GraphUi`.
- Environment:
-  - Viewport: 1920x1080, device scale 1.0, throttling disabled; CPU pinned via `--disable-features=CPUThrottling`.
-  - Fixed session seed `GRAPH_BENCH_SEED=2025-01-01T00:00:00Z` for RNG use in camera jitter.
- Scenarios (each repeated 5x, median + p95 recorded):
-  1. **Canvas load**: open `/graph/bench?fixture=50k` → measure TTI, first contentful paint, tiles loaded count.
-  2. **Pan/zoom loop**: pan 500px x 20 iterations + zoom in/out (2x each) → record average FPS and frame jank percentage.
-  3. **Path query**: submit shortest-path query between two seeded nodes → measure query latency (client + API) and render latency.
-  4. **Filter drill-down**: apply two filters (severity=high, product=“core”) → measure time to filtered render + memory delta.
- Metrics captured to NDJSON per run:
-  - `timestampUtc`, `scenario`, `fixture`, `p95_ms`, `median_ms`, `avg_fps`, `jank_pct`, `mem_mb`, `api_latency_ms` (where applicable).
- Determinism:
-  - All timestamps recorded in UTC ISO-8601; RNG seeded; cache cleared before each scenario; `--disable-features=UseAFH` disabled to avoid adaptive throttling.
-
-## Outputs
- NDJSON benchmark results stored under `out/bench/graph/ui/{runId}.ndjson` with a `.sha256` alongside.
- Summary CSV optional, derived from NDJSON for reporting only.
- CI step publishes artifacts to `out/bench/graph/ui/latest/` with write-once semantics per runId.
-
-## Acceptance criteria
- Playwright suite reproducibly exercises the four scenarios on 50k and 100k fixtures with seeded inputs.
- Metrics include p95 and median for each scenario and fixture size; FPS ≥ 30 on 50k fixture baseline.
- Archive outputs are deterministic for given fixture and seed (excluding wall-clock timestamps in filenames; embed timestamps only in content).
-
-## Next steps
- Wire Playwright harness into `BENCH-GRAPH-21-001` pipeline once fixtures ready.
- Hook results into perf dashboard if available; otherwise store NDJSON + hashes.
--- a/docs/benchmarks/impact/bench-impact-16-001-prep.md
+++ b/docs/benchmarks/impact/bench-impact-16-001-prep.md
@@ -1,30 +0,0 @@
-# Bench Prep — PREP-BENCH-IMPACT-16-001 (ImpactIndex dataset/replay)
-
-Status: **Ready for execution** (2025-12-11)
-Owners: Bench Guild · Scheduler Team
-Scope: Provide deterministic dataset + replay plan for ImpactIndex throughput benchmark (resolve 10k productKeys; measure latency/throughput/memory).
-
-## Inputs/dataset
- Snapshot file: `docs/samples/impactindex/products-10k.ndjson` (10,000 productKeys, shuffled once with seed `2025-01-01T00:00:00Z`).
- SHA256: `caa79c83b5a9affc3b9cc4e54a516281ddceff4804ce853fee3b62d7afb7ab69` (`products-10k.ndjson.sha256` included).
- Each line: `{ "productKey": "pkg:<ecosystem>/<name>@<version>", "tenant": "bench" }`.
-
-## Benchmark procedure
- Harness location: `src/Bench/StellaOps.Bench/ImpactIndex/impact_index_bench.py`.
- Warmup: 1k lookups (excluded from metrics) to trigger caches.
- Run: process all 10k productKeys twice (cold, warm). Record per-pass statistics.
- Metrics to capture (per pass):
-  - `throughput_items_per_sec`, `p95_ms`, `p99_ms`, `max_ms` for lookups.
-  - `rss_mb`, `managed_mb`, `gc_gen2_count` from .NET counters.
-  - `cache_hit_rate` if cache present.
- Output format: NDJSON; one object per pass with fields `{ pass: "cold"|"warm", startedAtUtc, durationMs, throughput, p95Ms, p99Ms, maxMs, rssMb, managedMb, gcGen2, cacheHitRate }`.
- Determinism: fixed seed, single-threaded option flag `--threads 1` for reproducibility; timestamps in UTC ISO-8601.
-
-## Acceptance criteria
- Dataset and checksum published; harness reads from local sample path (no network). ?
- Benchmark run produces deterministic NDJSON for given seed and hardware profile; differences limited to ?5%.
- Cold vs warm pass metrics logged; throughput target ? 2k items/sec on reference hardware, p95 ? 25 ms.
-
-## Next steps
- Harness command: `python src/Bench/StellaOps.Bench/ImpactIndex/impact_index_bench.py --input docs/samples/impactindex/products-10k.ndjson --output src/Bench/StellaOps.Bench/ImpactIndex/results/impactindex.ndjson --threads 1 --seed 20250101`.
- Surface metrics to perf dashboard once harness lands; otherwise store under `out/bench/impactindex/` with hashes (`results/impactindex.ndjson.sha256` present).
--- a/docs/benchmarks/policy/bench-policy-20-002-prep.md
+++ b/docs/benchmarks/policy/bench-policy-20-002-prep.md
@@ -1,35 +0,0 @@
-# Bench Prep — PREP-BENCH-POLICY-20-002 (Policy delta benchmark)
-
-Status: **Ready for execution** (2025-12-11)
-Owners: Bench Guild · Policy Guild · Scheduler Guild
-Scope: Provide deterministic inputs and harness expectations to measure delta policy evaluation vs full runs.
-
-## Goals
- Compare delta evaluation (incremental changes) against full evaluation over the same dataset.
- Capture throughput, latency (p50/p95/p99), and memory/GC impact under deterministic conditions.
-
-## Dataset
- Baseline snapshot: `docs/samples/policy/policy-delta-baseline.ndjson`
-  - 5,000 records of `{ "tenant": "bench", "policyId": "pol-<0001..5000>", "package": "bench.pkg.<n>", "version": "1.0.<n>", "decision": "allow|deny", "factors": { ... } }`
-  - Deterministic ordering; SHA256 `40ca9ee15065a9e16f51a259d3feec778203ab461db2af3bf196f5fcd9f0d590` (`policy-delta-baseline.ndjson.sha256`).
- Delta patch: `docs/samples/policy/policy-delta-changes.ndjson`
-  - 500 changes mixing updates/inserts/deletes (encoded with `op`: "upsert"|"delete").
-  - Sorted by `policyId` then `op` for deterministic replay; SHA256 `7f9d7f124830b9fe4d3f232b4cc7e2e728be2ef725e8a66606b9e95682bf6318` (`policy-delta-changes.ndjson.sha256`).
-
-## Harness plan (implemented under `src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py`)
- Run 1 (Full): load baseline snapshot, evaluate full policy set; record metrics.
- Run 2 (Delta): apply delta patch to in-memory store, run incremental evaluation; record metrics.
- Metrics captured to NDJSON per run:
-  - `{ run: "full"|"delta", startedAtUtc, durationMs, evaluationsPerSec, p50Ms, p95Ms, p99Ms, rssMb, managedMb, gcGen2 }`
- Determinism:
-  - Use fixed random seed `2025-01-01` for any shuffling; single-threaded mode flag `--threads 1` when reproducibility needed.
-  - All timestamps in UTC ISO-8601; output NDJSON sorted by `run`.
-
-## Acceptance criteria
- Baseline + delta sample files and SHA256 hashes present under `docs/samples/policy/`.
- Harness reads only local files, no network dependencies; replays produce consistent NDJSON for given hardware.
- Delta run shows reduced duration vs full run; metrics captured for both p95/p99 and throughput.
-
-## Next steps
- Harness CLI: `python src/Bench/StellaOps.Bench/PolicyDelta/policy_delta_bench.py --baseline docs/samples/policy/policy-delta-baseline.ndjson --delta docs/samples/policy/policy-delta-changes.ndjson --output src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson --threads 1 --seed 20250101`.
- Results hashed at `src/Bench/StellaOps.Bench/PolicyDelta/results/policy-delta.ndjson.sha256`.
--- a/docs/benchmarks/signals/bench-sig-26-001-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-001-prep.md
@@ -1,31 +0,0 @@
-# Reachability Scoring Bench Prep — PREP-BENCH-SIG-26-001-REACHABILITY-SCHEMA-FIX
-
-Status: Ready for execution (2025-12-11)
-Owners: Bench Guild Aú Signals Guild
-Scope: Define inputs/fixtures and schema for reachability scoring benchmarks (10k/50k functions) to unblock BENCH-SIG-26-001.
-
-## Dependencies
- Reachability schema hash captured locally for synthetic fixtures.
- Sample callgraph/runtime traces sized for 10k/50k functions.
-
-## Harness
- Project: `src/Bench/StellaOps.Bench/Signals/reachability_bench.py`.
- Inputs:
-  - Callgraph: `docs/samples/signals/reachability/callgraph-10k.ndjson` (`callgraph-10k.ndjson.sha256`).
-  - Runtime traces: `docs/samples/signals/reachability/runtime-10k.ndjson` (`runtime-10k.ndjson.sha256`).
-  - 50k variants under the same directory (`callgraph-50k.ndjson`, `runtime-50k.ndjson` + `.sha256`).
- Schema: `docs/benchmarks/signals/reachability-schema.json` (sha256 `aaa5c8ab5cc2fe91e50976fafd8c73597387ab9a881af6d5d9818d202beba24e`).
- Metrics: facts/sec, p50/p95/p99 per-node latency, peak RSS, managed MB, GC gen2.
- Output: metrics NDJSON + cache NDJSON with reachability flags for each function (consumed by BENCH-SIG-26-002).
-
-## Acceptance
- Schema hash recorded and referenced. ✅
- Sample fixtures published under `docs/samples/signals/reachability/` for 10k/50k. ✅
- Deterministic harness command documented; outputs written locally with `.sha256` hashes. ✅
-
-## Commands
- 10k: `python src/Bench/StellaOps.Bench/Signals/reachability_bench.py --callgraph docs/samples/signals/reachability/callgraph-10k.ndjson --runtime docs/samples/signals/reachability/runtime-10k.ndjson --output src/Bench/StellaOps.Bench/Signals/results/reachability-metrics-10k.ndjson --cache-output src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson --threads 1 --seed 20250101`
- 50k: swap `10k` for `50k` in the command above (`reachability-*-50k.ndjson`).
-
-## Handoff
-Use these fixtures + commands to run BENCH-SIG-26-001. Cache outputs (`reachability-cache-*.ndjson`) feed BENCH-SIG-26-002 for policy evaluation overhead measurements.
--- a/docs/benchmarks/signals/bench-sig-26-002-prep.md
+++ b/docs/benchmarks/signals/bench-sig-26-002-prep.md
@@ -1,31 +0,0 @@
-# Policy Eval with Reachability Cache Prep — PREP-BENCH-SIG-26-002-BLOCKED-ON-26-001-OUTPU
-
-Status: Ready for execution (2025-12-11)
-Owners: Bench Guild Aú Policy Guild
-Scope: Measure policy evaluation overhead with reachability cache hot/cold/mixed scenarios using outputs from BENCH-SIG-26-001.
-
-## Dependencies
- Reachability cache NDJSON from BENCH-SIG-26-001:
-  - `src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson` (`.sha256`).
-  - 50k variant available for heavier runs (`reachability-cache-50k.ndjson` + `.sha256`).
- Policy baseline dataset: `docs/samples/policy/policy-delta-baseline.ndjson` (+ `.sha256`).
- Policy overlay schema (30-001) — using deterministic synthetic mapping in harness; update when official schema lands.
-
-## Harness
- Project: `src/Bench/StellaOps.Bench/PolicyCache/policy_cache_bench.py`.
- Scenarios: cold cache, warm cache, mixed (70/30 warm/cold).
- Metrics: throughput, p50/p95/p99 added latency per evaluation, RSS/managed MB, GC gen2, cache hit rate.
- Inputs: policy baseline + reachability cache NDJSON.
-
-## Commands
- 10k cache with baseline policies:
-  `python src/Bench/StellaOps.Bench/PolicyCache/policy_cache_bench.py --policies docs/samples/policy/policy-delta-baseline.ndjson --reachability-cache src/Bench/StellaOps.Bench/Signals/results/reachability-cache-10k.ndjson --output src/Bench/StellaOps.Bench/PolicyCache/results/policy-cache.ndjson --seed 20250101 --threads 1`
- Swap cache path to `reachability-cache-50k.ndjson` to stress the larger dataset.
-
-## Acceptance
- Cache input and policy baseline present with hashes. ✅
- Cold/warm/mixed runs emit NDJSON with sorted keys; cache hit rate captured. ✅
- Outputs hashed locally (`policy-cache.ndjson.sha256`) and ready for perf dashboard ingestion. ✅
-
-## Handoff
-Use cache outputs from BENCH-SIG-26-001 to run the above command. Compare added latency between cold vs warm runs; mixed scenario should stay within target thresholds (p95 delta ≤ configured budget).