Files
git.stella-ops.org/docs/technical/adr/0044-binary-delta-signatures.md

206 lines
8.4 KiB
Markdown

# ADR 0044: Binary Delta Signatures for Backport Detection
## Status
ACCEPTED (2026-01-03)
## Context
Vulnerability scanners today rely on version string comparison to determine if a package is vulnerable. However, Linux distributions (RHEL, Debian, Ubuntu, SUSE, Alpine) routinely **backport** security fixes into older versions without bumping the upstream version number.
### The Problem
**Example:** OpenSSL 1.0.1e on RHEL 6 has Heartbleed (CVE-2014-0160) patched, but upstream says `1.0.1e < 1.0.1g` (the fix version), so scanners flag it as vulnerable. This creates:
1. **False positives** - Patched systems flagged as vulnerable
2. **Alert fatigue** - Security teams waste time investigating non-issues
3. **Compliance failures** - Audit reports show phantom vulnerabilities
4. **Trust erosion** - Users distrust scanner results
### Current Mitigations
1. **Distro-specific advisory feeds** (DSA, RHSA, USN) - Incomplete coverage
2. **VEX statements from vendors** - Requires vendor participation, often delayed
3. **Manual triage** - Doesn't scale
4. **OVAL feeds** - OS packages only, not application binaries
### Requirements
- **Binary-level detection**: Examine compiled code, not version strings
- **Cryptographic proof**: Hash-based evidence that fix is present
- **Offline operation**: Work in air-gapped environments
- **Multi-architecture**: Support x86-64, ARM64, and other ISAs
- **Deterministic**: Same binary → same signature across platforms
- **LTO resilience**: Handle Link-Time Optimization changes
## Decision
**Implement binary delta signature matching using normalized code comparison.**
### Architecture
```
┌────────────────────────────────────────────────────────────────────────────┐
│ Delta Signature Pipeline │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ Binary Disassembly Normalization Signature │
│ ───────────► ───────────────► ──────────────► ─────────────► │
│ ELF/PE/MachO Iced (x86) or Zero addresses SHA-256 + │
│ B2R2 (ARM/MIPS) Canonicalize NOPs CFG hash + │
│ Normalize PLT/GOT Chunk hashes │
│ │
└────────────────────────────────────────────────────────────────────────────┘
```
### Disassembly Engine Selection
**Chosen: Plugin-based architecture with Iced (primary) + B2R2 (fallback)**
| Engine | Strengths | Weaknesses |
|--------|-----------|------------|
| **Iced** | Fastest x86/x86-64, MIT license, pure C# | x86 only |
| **B2R2** | Multi-arch (ARM, MIPS, RISC-V), IR lifting, MIT license | F# (requires wrapper) |
**Rationale:**
- Iced for performance-critical x86/x86-64 path (90%+ of scanned binaries)
- B2R2 for ARM64, MIPS, RISC-V when needed
- Plugin architecture allows adding engines without core changes
### Normalization Strategy
To compare binaries compiled by different toolchains/versions, we normalize:
1. **Zero absolute addresses** - Remove PC-relative and RIP-relative variance
2. **Canonicalize NOPs** - Collapse multi-byte NOPs (0x90, 0x0F1F, etc.) to single NOP
3. **Normalize PLT/GOT** - Replace dynamic linking stubs with symbolic tokens
4. **Zero relocations** - Remove relocation target variance
5. **Normalize jump tables** - Convert absolute offsets to relative
**Recipe versioning**: Every signature includes the normalization recipe ID and version. Changing normalization behavior requires a version bump.
### Signature Components
```json
{
"schema": "stellaops.deltasig.v1",
"cve": "CVE-2014-0160",
"package": { "name": "openssl", "soname": "libssl.so.1.0.0" },
"target": { "arch": "x86_64", "abi": "gnu" },
"normalization": { "recipeId": "stellaops.normalize.x64.v1", "version": "1.0.0" },
"signatureState": "patched",
"symbols": [
{
"name": "tls1_process_heartbeat",
"hashAlg": "sha256",
"hashHex": "abc123...",
"sizeBytes": 1234,
"cfgBbCount": 15,
"cfgEdgeHash": "def456...",
"chunks": [
{ "offset": 0, "size": 2048, "hashHex": "..." },
{ "offset": 2048, "size": 2048, "hashHex": "..." }
]
}
]
}
```
### Matching Strategy
1. **Exact match** - Full normalized hash matches patched or vulnerable signature
2. **Chunk match** - ≥70% of chunks match (handles LTO modifications)
3. **CFG match** - Control flow graph structure matches (catches recompilations)
### VEX Evidence Emission
When a binary is confirmed patched via delta signature:
```json
{
"result": "patched",
"cveIds": ["CVE-2014-0160"],
"confidence": 0.95,
"symbolMatches": [
{ "symbolName": "tls1_process_heartbeat", "state": "patched", "exactMatch": true }
],
"justification": "vulnerable_code_not_present",
"summary": "Binary confirmed PATCHED with 95% confidence. 1 symbol(s) matched patched signatures exactly."
}
```
This evidence feeds into VEX candidate generation with full audit trail.
## Alternatives Considered
### 1. Source Code Comparison
**Rejected**: Requires source access, doesn't work for closed-source binaries, compile options affect behavior.
### 2. Debug Symbol Matching
**Rejected**: Symbols often stripped in production, doesn't prove code content.
### 3. File Hash Matching
**Rejected**: Entire binary must match exactly; any rebuild invalidates signature.
### 4. YARA Rules
**Rejected**: Pattern-based, high false positive rate, doesn't provide cryptographic proof.
### 5. Single Disassembly Engine (B2R2 only)
**Rejected**: Performance critical; Iced is 3-5x faster for x86/x86-64 which is 90%+ of scanned binaries.
## Consequences
### Positive
1. **Eliminate false positives** for backported security fixes
2. **Cryptographic proof** of patch status (auditable, reproducible)
3. **Offline operation** with signature packs
4. **Multi-architecture** support for modern infrastructure
5. **VEX integration** for automated triage
### Negative
1. **Signature authoring required** - Must create signatures for each CVE/package
2. **Normalization limits** - Extreme compiler optimizations may defeat matching
3. **Storage overhead** - Signature database growth
4. **Compute cost** - Disassembly + normalization per binary
### Mitigations
- **Signature federation** - Share signatures across organizations
- **Chunk matching** - Resilient to LTO and PGO changes
- **Priority authoring** - Focus on high-severity CVEs first
- **Incremental scanning** - Cache analysis results
## Implementation
### Sprint: SPRINT_20260102_001_BE
| Component | Status | Notes |
|-----------|--------|-------|
| Disassembly.Abstractions | DONE | Plugin interface, models |
| Disassembly.Iced | DONE | x86/x86-64 support |
| Disassembly.B2R2 | DONE | Multi-arch support |
| Normalization | DONE | X64 + ARM64 pipelines |
| DeltaSig | DONE | Generator + matcher |
| Persistence | DONE | PostgreSQL schema |
| CLI | DONE | extract, author, sign, verify, match, pack, inspect |
| Scanner integration | DONE | DeltaSigAnalyzer, IBinaryVulnerabilityService |
| VEX emission | DONE | DeltaSignatureEvidence, DeltaSigVexEmitter |
### Test Coverage
- 74 unit tests for DeltaSig library
- 45 unit tests for Normalization
- 24 unit tests for Disassembly
- 11 property tests (FsCheck) for normalization idempotency
- 14 golden tests for known CVEs (Heartbleed, Log4Shell, POODLE)
- 25 unit tests for VEX evidence emission
## References
- [Binary Diff Signatures Advisory](../product-advisories/30-Dec-2025%20-%20Binary%20Diff%20Signatures%20for%20Patch%20Detection.md)
- [B2R2 GitHub](https://github.com/B2R2-org/B2R2)
- [Iced GitHub](https://github.com/icedland/iced)
- [OpenVEX Specification](https://github.com/openvex/spec)
- [CVE-2014-0160 (Heartbleed)](https://nvd.nist.gov/vuln/detail/CVE-2014-0160)