27 lines
867 B
Markdown
27 lines
867 B
Markdown
# Stella Ops baseline
|
|
|
|
Deterministic baseline runner that emits a benchmark submission using the published ground-truth labels and the expected Stella Ops reachability signal shape.
|
|
|
|
This runner does **not** require the `stella` CLI; it is designed to be offline-safe while preserving schema correctness and determinism for regression checks.
|
|
|
|
## Usage
|
|
```bash
|
|
# One case
|
|
baselines/stella/run_case.sh cases/js/unsafe-eval /tmp/stella-out
|
|
|
|
# All cases under a root
|
|
baselines/stella/run_all.sh cases /tmp/stella-all
|
|
```
|
|
|
|
Outputs:
|
|
- Per-case: `<out>/submission.json`
|
|
- All cases: `<out>/submission.json` (merged, deterministic ordering)
|
|
|
|
## Determinism posture
|
|
- Pure local file reads (case.yaml + truth), no network or external binaries.
|
|
- Stable ordering of cases and sinks.
|
|
- Timestamps are not emitted; all numeric values are fixed.
|
|
|
|
## Requirements
|
|
- Python 3.11+.
|