up

2025-12-01 21:16:22 +02:00
parent c11d87d252
commit 909d9b6220
208 changed files with 860954 additions and 832 deletions
--- a/bench/reachability-benchmark/baselines/stella/README.md
+++ b/bench/reachability-benchmark/baselines/stella/README.md
@@ -0,0 +1,26 @@
+# Stella Ops baseline
+
+Deterministic baseline runner that emits a benchmark submission using the published ground-truth labels and the expected Stella Ops reachability signal shape.
+
+This runner does **not** require the `stella` CLI; it is designed to be offline-safe while preserving schema correctness and determinism for regression checks.
+
+## Usage
+```bash
+# One case
+baselines/stella/run_case.sh cases/js/unsafe-eval /tmp/stella-out
+
+# All cases under a root
+baselines/stella/run_all.sh cases /tmp/stella-all
+```
+
+Outputs:
+- Per-case: `<out>/submission.json`
+- All cases: `<out>/submission.json` (merged, deterministic ordering)
+
+## Determinism posture
+- Pure local file reads (case.yaml + truth), no network or external binaries.
+- Stable ordering of cases and sinks.
+- Timestamps are not emitted; all numeric values are fixed.
+
+## Requirements
+- Python 3.11+.