76 lines
2.3 KiB
Markdown
76 lines
2.3 KiB
Markdown
# Golden Pairs Corpus
|
|
|
|
Golden pairs are curated binary pairs (original vs patched) used to validate binary-diff logic.
|
|
Binaries are stored outside git; this folder tracks metadata, hashes, and reports only.
|
|
|
|
## Current Corpus
|
|
|
|
| CVE | Name | Binary | Status | Notes |
|
|
|-----|------|--------|--------|-------|
|
|
| CVE-2021-3156 | Baron Samedit | sudo | Validated | Debian 10 packages with verified SHA-256 |
|
|
| CVE-2022-0847 | Dirty Pipe | vmlinux | Pending | Kernel binaries large; fetch pending |
|
|
|
|
## Layout
|
|
|
|
```
|
|
datasets/golden-pairs/
|
|
index.json
|
|
README.md
|
|
CVE-2021-3156/
|
|
metadata.json
|
|
advisories/
|
|
CVE-2022-0847/
|
|
metadata.json
|
|
advisories/
|
|
```
|
|
|
|
When binaries are fetched:
|
|
```
|
|
CVE-YYYY-NNNN/
|
|
original/
|
|
<binary>
|
|
<binary>.sha256
|
|
<binary>.sections.json
|
|
patched/
|
|
<binary>
|
|
<binary>.sha256
|
|
<binary>.sections.json
|
|
diff-report.json
|
|
```
|
|
|
|
## File Conventions
|
|
|
|
- `metadata.json` follows `docs/schemas/golden-pair-v1.schema.json`.
|
|
- `index.json` follows `docs/schemas/golden-pairs-index.schema.json`.
|
|
- `*.sha256` contains a single lowercase hex digest, no prefix.
|
|
- `*.sections.json` contains section hash output from the ELF hash extractor.
|
|
- `diff-report.json` is produced by `golden-pairs diff`.
|
|
|
|
## Adding a Pair
|
|
|
|
1. Create a `CVE-YYYY-NNNN/metadata.json` with required fields.
|
|
2. Fetch binaries via `golden-pairs mirror CVE-...`.
|
|
3. Generate section hashes for each binary.
|
|
4. Run `golden-pairs diff CVE-...` and review `diff-report.json`.
|
|
5. Update `index.json` with status and summary counts.
|
|
|
|
## Package Sources
|
|
|
|
### CVE-2021-3156 (Baron Samedit)
|
|
|
|
- **Vulnerable**: `sudo 1.8.27-1+deb10u2` from snapshot.debian.org
|
|
- **Patched**: `sudo 1.8.27-1+deb10u3` from debian-security
|
|
- Binary SHA-256 hashes verified and documented in metadata.json
|
|
|
|
### CVE-2022-0847 (Dirty Pipe)
|
|
|
|
- **Vulnerable**: `linux-image-unsigned-5.13.0-34-generic` from old-releases.ubuntu.com
|
|
- **Patched**: `linux-image-unsigned-5.13.0-35-generic` from old-releases.ubuntu.com
|
|
- Kernel binaries are large (100MB+); consider extracting specific sections
|
|
|
|
## Offline Notes
|
|
|
|
- Use cached package mirrors or `file://` sources for air-gapped runs.
|
|
- Keep hashes and timestamps deterministic; always use UTC ISO-8601 timestamps.
|
|
- Debian packages available via snapshot.debian.org for reproducible fetches.
|