# Golden Pairs Corpus Golden pairs are curated binary pairs (original vs patched) used to validate binary-diff logic. Binaries are stored outside git; this folder tracks metadata, hashes, and reports only. ## Current Corpus | CVE | Name | Binary | Status | Notes | |-----|------|--------|--------|-------| | CVE-2021-3156 | Baron Samedit | sudo | Validated | Debian 10 packages with verified SHA-256 | | CVE-2022-0847 | Dirty Pipe | vmlinux | Pending | Kernel binaries large; fetch pending | ## Layout ``` datasets/golden-pairs/ index.json README.md CVE-2021-3156/ metadata.json advisories/ CVE-2022-0847/ metadata.json advisories/ ``` When binaries are fetched: ``` CVE-YYYY-NNNN/ original/ .sha256 .sections.json patched/ .sha256 .sections.json diff-report.json ``` ## File Conventions - `metadata.json` follows `docs/schemas/golden-pair-v1.schema.json`. - `index.json` follows `docs/schemas/golden-pairs-index.schema.json`. - `*.sha256` contains a single lowercase hex digest, no prefix. - `*.sections.json` contains section hash output from the ELF hash extractor. - `diff-report.json` is produced by `golden-pairs diff`. ## Adding a Pair 1. Create a `CVE-YYYY-NNNN/metadata.json` with required fields. 2. Fetch binaries via `golden-pairs mirror CVE-...`. 3. Generate section hashes for each binary. 4. Run `golden-pairs diff CVE-...` and review `diff-report.json`. 5. Update `index.json` with status and summary counts. ## Package Sources ### CVE-2021-3156 (Baron Samedit) - **Vulnerable**: `sudo 1.8.27-1+deb10u2` from snapshot.debian.org - **Patched**: `sudo 1.8.27-1+deb10u3` from debian-security - Binary SHA-256 hashes verified and documented in metadata.json ### CVE-2022-0847 (Dirty Pipe) - **Vulnerable**: `linux-image-unsigned-5.13.0-34-generic` from old-releases.ubuntu.com - **Patched**: `linux-image-unsigned-5.13.0-35-generic` from old-releases.ubuntu.com - Kernel binaries are large (100MB+); consider extracting specific sections ## Offline Notes - Use cached package mirrors or `file://` sources for air-gapped runs. - Keep hashes and timestamps deterministic; always use UTC ISO-8601 timestamps. - Debian packages available via snapshot.debian.org for reproducible fetches.