Files
git.stella-ops.org/datasets/golden-pairs/README.md

2.3 KiB

Golden Pairs Corpus

Golden pairs are curated binary pairs (original vs patched) used to validate binary-diff logic. Binaries are stored outside git; this folder tracks metadata, hashes, and reports only.

Current Corpus

CVE Name Binary Status Notes
CVE-2021-3156 Baron Samedit sudo Validated Debian 10 packages with verified SHA-256
CVE-2022-0847 Dirty Pipe vmlinux Pending Kernel binaries large; fetch pending

Layout

datasets/golden-pairs/
  index.json
  README.md
  CVE-2021-3156/
    metadata.json
    advisories/
  CVE-2022-0847/
    metadata.json
    advisories/

When binaries are fetched:

  CVE-YYYY-NNNN/
    original/
      <binary>
      <binary>.sha256
      <binary>.sections.json
    patched/
      <binary>
      <binary>.sha256
      <binary>.sections.json
    diff-report.json

File Conventions

  • metadata.json follows docs/schemas/golden-pair-v1.schema.json.
  • index.json follows docs/schemas/golden-pairs-index.schema.json.
  • *.sha256 contains a single lowercase hex digest, no prefix.
  • *.sections.json contains section hash output from the ELF hash extractor.
  • diff-report.json is produced by golden-pairs diff.

Adding a Pair

  1. Create a CVE-YYYY-NNNN/metadata.json with required fields.
  2. Fetch binaries via golden-pairs mirror CVE-....
  3. Generate section hashes for each binary.
  4. Run golden-pairs diff CVE-... and review diff-report.json.
  5. Update index.json with status and summary counts.

Package Sources

CVE-2021-3156 (Baron Samedit)

  • Vulnerable: sudo 1.8.27-1+deb10u2 from snapshot.debian.org
  • Patched: sudo 1.8.27-1+deb10u3 from debian-security
  • Binary SHA-256 hashes verified and documented in metadata.json

CVE-2022-0847 (Dirty Pipe)

  • Vulnerable: linux-image-unsigned-5.13.0-34-generic from old-releases.ubuntu.com
  • Patched: linux-image-unsigned-5.13.0-35-generic from old-releases.ubuntu.com
  • Kernel binaries are large (100MB+); consider extracting specific sections

Offline Notes

  • Use cached package mirrors or file:// sources for air-gapped runs.
  • Keep hashes and timestamps deterministic; always use UTC ISO-8601 timestamps.
  • Debian packages available via snapshot.debian.org for reproducible fetches.