# Reachability Corpus Plan (QA-CORPUS-401-031) Objective - Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows. - Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos. ## Corpus Map ### 1) Multi-runtime corpus (internal MVP) Path: `tests/reachability/corpus/` Per-case layout: `tests/reachability/corpus///` - `callgraph.static.json` — static call graph sample (stub for MVP). - `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`). - `vex.openvex.json` — expected VEX slice for the case. - Optional (future): `runtime/*.ndjson`, `sbom.*.json` `tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory. ### 2) Public mini dataset (PHP/JS/C#) Path: `tests/reachability/samples-public/` Layout: - `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1). - `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory. - `samples///` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`. - `runners/run_all.{sh,ps1}` — deterministic manifest regeneration. ### 3) Reachbench fixture pack (expanded, dual variants) Path: `tests/reachability/fixtures/reachbench-2025-expanded/` Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`. ## Ground Truth Conventions - Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`). - Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`. - Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant. ## Determinism & Runners Regenerate all reachability manifests (corpus + public samples + reachbench pack): - `tests/reachability/runners/run_all.sh` - `tests/reachability/runners/run_all.ps1` Individual scripts: - `python tests/reachability/scripts/update_corpus_manifest.py` - `python tests/reachability/samples-public/scripts/update_manifest.py` - `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py` ## CI Gates - `tests/reachability/StellaOps.Reachability.FixtureTests` - validates presence + hashes from manifests for corpus/public samples/reachbench fixtures - enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#) ## MVP Slice (stub cases) - Go: `go-ssh-CVE-2020-9283-keyexchange` - .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset` - Python: `python-django-CVE-2019-19844-sqli-like` - Rust: `rust-axum-header-parsing-TBD` ## Next Work (post-MVP) - Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`. - Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.