Files
git.stella-ops.org/datasets/golden-pairs/README.md

76 lines
2.3 KiB
Markdown

# Golden Pairs Corpus
Golden pairs are curated binary pairs (original vs patched) used to validate binary-diff logic.
Binaries are stored outside git; this folder tracks metadata, hashes, and reports only.
## Current Corpus
| CVE | Name | Binary | Status | Notes |
|-----|------|--------|--------|-------|
| CVE-2021-3156 | Baron Samedit | sudo | Validated | Debian 10 packages with verified SHA-256 |
| CVE-2022-0847 | Dirty Pipe | vmlinux | Pending | Kernel binaries large; fetch pending |
## Layout
```
datasets/golden-pairs/
index.json
README.md
CVE-2021-3156/
metadata.json
advisories/
CVE-2022-0847/
metadata.json
advisories/
```
When binaries are fetched:
```
CVE-YYYY-NNNN/
original/
<binary>
<binary>.sha256
<binary>.sections.json
patched/
<binary>
<binary>.sha256
<binary>.sections.json
diff-report.json
```
## File Conventions
- `metadata.json` follows `docs/schemas/golden-pair-v1.schema.json`.
- `index.json` follows `docs/schemas/golden-pairs-index.schema.json`.
- `*.sha256` contains a single lowercase hex digest, no prefix.
- `*.sections.json` contains section hash output from the ELF hash extractor.
- `diff-report.json` is produced by `golden-pairs diff`.
## Adding a Pair
1. Create a `CVE-YYYY-NNNN/metadata.json` with required fields.
2. Fetch binaries via `golden-pairs mirror CVE-...`.
3. Generate section hashes for each binary.
4. Run `golden-pairs diff CVE-...` and review `diff-report.json`.
5. Update `index.json` with status and summary counts.
## Package Sources
### CVE-2021-3156 (Baron Samedit)
- **Vulnerable**: `sudo 1.8.27-1+deb10u2` from snapshot.debian.org
- **Patched**: `sudo 1.8.27-1+deb10u3` from debian-security
- Binary SHA-256 hashes verified and documented in metadata.json
### CVE-2022-0847 (Dirty Pipe)
- **Vulnerable**: `linux-image-unsigned-5.13.0-34-generic` from old-releases.ubuntu.com
- **Patched**: `linux-image-unsigned-5.13.0-35-generic` from old-releases.ubuntu.com
- Kernel binaries are large (100MB+); consider extracting specific sections
## Offline Notes
- Use cached package mirrors or `file://` sources for air-gapped runs.
- Keep hashes and timestamps deterministic; always use UTC ISO-8601 timestamps.
- Debian packages available via snapshot.debian.org for reproducible fetches.