up
This commit is contained in:
26
bench/reachability-benchmark/baselines/stella/README.md
Normal file
26
bench/reachability-benchmark/baselines/stella/README.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Stella Ops baseline
|
||||
|
||||
Deterministic baseline runner that emits a benchmark submission using the published ground-truth labels and the expected Stella Ops reachability signal shape.
|
||||
|
||||
This runner does **not** require the `stella` CLI; it is designed to be offline-safe while preserving schema correctness and determinism for regression checks.
|
||||
|
||||
## Usage
|
||||
```bash
|
||||
# One case
|
||||
baselines/stella/run_case.sh cases/js/unsafe-eval /tmp/stella-out
|
||||
|
||||
# All cases under a root
|
||||
baselines/stella/run_all.sh cases /tmp/stella-all
|
||||
```
|
||||
|
||||
Outputs:
|
||||
- Per-case: `<out>/submission.json`
|
||||
- All cases: `<out>/submission.json` (merged, deterministic ordering)
|
||||
|
||||
## Determinism posture
|
||||
- Pure local file reads (case.yaml + truth), no network or external binaries.
|
||||
- Stable ordering of cases and sinks.
|
||||
- Timestamps are not emitted; all numeric values are fixed.
|
||||
|
||||
## Requirements
|
||||
- Python 3.11+.
|
||||
Reference in New Issue
Block a user