up

2025-12-13 09:37:15 +02:00
parent e00f6365da
commit 6e45066e37
349 changed files with 17160 additions and 1867 deletions
--- a/docs/reachability/corpus-plan.md
+++ b/docs/reachability/corpus-plan.md
@@ -1,45 +1,69 @@
 # Reachability Corpus Plan (QA-CORPUS-401-031)

 Objective
- Build a multi-runtime reachability corpus (Go/.NET/Python/Rust) with EXPECT.yaml ground truths and captured traces.
- Make fixtures CI-consumable to validate reachability scoring and VEX proofs continuously.
- Add public mini-dataset cases (PHP/JavaScript/C#) from advisory 23-Nov-2025 for ingestion/bench reuse.
+- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
+- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.

-Scope & deliverables
- Fixture layout: `tests/reachability/corpus/<language>/<case>/`
-  - `expect.yaml` — states (`reachable|conditional|unreachable`), score, evidence refs.
-  - `callgraph.*.json` — static graphs per language.
-  - `runtime/*.ndjson` — traces/probes when available.
-  - `sbom.*.json` — CycloneDX/SPDX slices.
-  - `vex.openvex.json` — expected VEX statement.
- CI integration: add corpus harness to `tests/reachability/StellaOps.Reachability.FixtureTests` to validate presence, schema, and determinism (hash manifest).
- Offline posture: all artifacts deterministic, no external downloads; hashes recorded in manifest.
- Public mini-dataset layout (PHP/JS/C#) to be mirrored under `tests/reachability/samples-public/`:
-```
-vuln-reach-dataset/
-  schema/ground-truth.schema.json
-  runners/run_all.sh
-  samples/
-    php/php-001-phar-deserialize/...
-    js/js-002-yaml-unsafe-load/...
-    csharp/cs-001-binaryformatter-deserialize/...
-```
-Each sample ships: minimal app, lockfile, SBOM (CycloneDX JSON), VEX, ground truth (EXPECT/JSON), repro script.
+## Corpus Map

-MVP slice (proposed)
+### 1) Multi-runtime corpus (internal MVP)
+
+Path: `tests/reachability/corpus/`
+
+Per-case layout: `tests/reachability/corpus/<language>/<case>/`
+- `callgraph.static.json` — static call graph sample (stub for MVP).
+- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
+- `vex.openvex.json` — expected VEX slice for the case.
+- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
+
+`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
+
+### 2) Public mini dataset (PHP/JS/C#)
+
+Path: `tests/reachability/samples-public/`
+
+Layout:
+- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
+- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
+- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
+- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
+
+### 3) Reachbench fixture pack (expanded, dual variants)
+
+Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
+
+Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
+
+## Ground Truth Conventions
+
+- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
+- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
+- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
+
+## Determinism & Runners
+
+Regenerate all reachability manifests (corpus + public samples + reachbench pack):
+- `tests/reachability/runners/run_all.sh`
+- `tests/reachability/runners/run_all.ps1`
+
+Individual scripts:
+- `python tests/reachability/scripts/update_corpus_manifest.py`
+- `python tests/reachability/samples-public/scripts/update_manifest.py`
+- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
+
+## CI Gates
+
+- `tests/reachability/StellaOps.Reachability.FixtureTests`
+  - validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
+  - enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
+
+## MVP Slice (stub cases)
 - Go: `go-ssh-CVE-2020-9283-keyexchange`
 - .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
 - Python: `python-django-CVE-2019-19844-sqli-like`
 - Rust: `rust-axum-header-parsing-TBD`

-Work plan
-1) Define shared manifest schema + hash manifest (NDJSON) under `tests/reachability/corpus/manifest.json`.
-2) For each MVP case, add minimal static callgraph + EXPECT.yaml with score/state and evidence links. (DONE: stub versions committed)
-3) Extend reachability fixture tests to cover corpus folders (presence, hashes, EXPECT.yaml schema). (DONE)
-4) Wire CI job to run the extended tests in `tests/reachability/StellaOps.Reachability.FixtureTests`. (TODO)
-5) Replace stubs with real callgraphs/traces and expand corpus after MVP passes CI. (TODO)
+## Next Work (post-MVP)
+- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
+- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.

-Determinism rules
- Sort JSON keys; round scores to 2dp; UTC times only if needed.
- Stable ordering of files in manifests; hash with SHA-256.
- No network calls during test or generation.