867 B
867 B
Stella Ops baseline
Deterministic baseline runner that emits a benchmark submission using the published ground-truth labels and the expected Stella Ops reachability signal shape.
This runner does not require the stella CLI; it is designed to be offline-safe while preserving schema correctness and determinism for regression checks.
Usage
# One case
baselines/stella/run_case.sh cases/js/unsafe-eval /tmp/stella-out
# All cases under a root
baselines/stella/run_all.sh cases /tmp/stella-all
Outputs:
- Per-case:
<out>/submission.json - All cases:
<out>/submission.json(merged, deterministic ordering)
Determinism posture
- Pure local file reads (case.yaml + truth), no network or external binaries.
- Stable ordering of cases and sinks.
- Timestamps are not emitted; all numeric values are fixed.
Requirements
- Python 3.11+.