stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 108d1c64b3

Docs CI / lint-and-preview (push) Has been cancelled

Details

Findings Ledger CI / build-test (push) Has been cancelled

Details

Findings Ledger CI / migration-validation (push) Has been cancelled

Details

Scanner Analyzers / Discover Analyzers (push) Has been cancelled

Details

Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled

Details

AOC Guard CI / aoc-guard (push) Has been cancelled

Details

Concelier Attestation Tests / attestation-tests (push) Has been cancelled

Details

cryptopro-linux-csp / build-and-test (push) Has been cancelled

Details

Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled

Details

Signals CI & Image / signals-ci (push) Has been cancelled

Details

sm-remote-ci / build-and-test (push) Has been cancelled

Details

Findings Ledger CI / generate-manifest (push) Has been cancelled

Details

AOC Guard CI / aoc-verify (push) Has been cancelled

Details

Scanner Analyzers / Build Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Test Language Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled

Details

Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled

Details

2025-12-09 09:38:09 +02:00

3.1 KiB

Raw Blame History

Reachability Benchmark · AGENTS

Scope & Roles

Working directory: bench/reachability-benchmark/
Roles: benchmark curator (datasets, schemas), tooling engineer (scorer/CI), docs maintainer (public README/CONTRIBUTING), DevOps (deterministic builds, CI).
Outputs are public-facing (Apache-2.0); keep artefacts deterministic and offline-friendly.

Required Reading

docs/README.md
docs/07_HIGH_LEVEL_ARCHITECTURE.md
docs/reachability/function-level-evidence.md
docs/reachability/lattice.md
Product advisories:
- docs/product-advisories/24-Nov-2025 - Designing a Deterministic Reachability Benchmark.md
- docs/product-advisories/archived/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md
- docs/product-advisories/archived/23-Nov-2025 - Publishing a Reachability Benchmark Dataset.md
Sprint plan: docs/implplan/SPRINT_0513_0001_0001_public_reachability_benchmark.md
DB/spec guidance for determinism and licensing: docs/db/RULES.md, docs/db/VERIFICATION.md

Working Agreements

Determinism: pin toolchains; set SOURCE_DATE_EPOCH; sort file lists; stable JSON/YAML ordering; fixed seeds for any sampling.
Offline posture: no network at build/test time; vendored toolchains; registry pulls are forbidden—use cached/bundled images.
Java builds: use vendored Temurin 21 via tools/java/ensure_jdk.sh when JAVA_HOME/javac are absent; keep .jdk/ out of VCS and use build_all.py --skip-lang when a toolchain is missing.
Licensing: all benchmark content Apache-2.0; include LICENSE in repo root; third-party cases must have compatible licenses and attributions.
Evidence: each case must include oracle tests/coverage proving reachability label; store truth and submissions under benchmark/truth/ and benchmark/submissions/ with JSON Schema.
Security: no secrets; scrub URLs/tokens; deterministic CI artifacts only.
Observability: scorer emits structured logs (JSON) with deterministic ordering; metrics optional.

Directory Contracts

cases/<lang>/<project>/: source, Dockerfile (deterministic), pinned dependencies, oracle tests, expected coverage output.
schemas/: JSON/YAML schemas for cases, entrypoints, truth, submission; include validation CLI.
tools/scorer/: rb-score CLI; no network; pure local file IO.
baselines/: reference runners (Semgrep/CodeQL/Stella) with normalized outputs.
ci/: deterministic CI workflows; no cache flakiness.
website/: static site (no trackers/fonts from CDN).

Testing

Per-case oracle tests must pass locally without network.
Scorer unit tests: schema validation, scoring math (precision/recall/F1), explainability tiers.
Determinism tests: rerun scorer twice → identical outputs/hash.

Status Discipline

Mirror task status in docs/implplan/SPRINT_0513_0001_0001_public_reachability_benchmark.md when starting/pausing/completing work.
Log material changes in sprint Execution Log with date (UTC).

Allowed Shared Libraries

Use existing repo toolchains only (Python/Node/Go minimal). No new external services. Keep scorer dependencies minimal and vendored when possible.

3.1 KiB Raw Blame History

Reachability Benchmark · AGENTS

Scope & Roles

Required Reading

Working Agreements

Directory Contracts

Testing

Status Discipline

Allowed Shared Libraries

3.1 KiB

Raw Blame History