Files
git.stella-ops.org/docs/adr/0042-cgs-merkle-tree-implementation.md
master a4badc275e UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization
Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
2025-12-29 19:12:38 +02:00

221 lines
6.5 KiB
Markdown

# ADR 0042: CGS Merkle Tree Implementation
## Status
ACCEPTED (2025-12-29)
## Context
The CGS (Canonical Graph Signature) system requires deterministic hash computation for verdicts. We need to decide whether to:
1. Reuse existing `StellaOps.Attestor.ProofChain` Merkle tree builder
2. Build a custom Merkle tree implementation in `VerdictBuilderService`
### Requirements
- **Determinism**: Same evidence must always produce identical CGS hash
- **Order Independence**: VEX document ordering should not affect hash (sorted internally)
- **Cross-Platform**: Identical hash on Windows, macOS, Linux (glibc), Linux (musl), BSD
- **Leaf Composition**: Specific ordering of evidence components (SBOM, VEX sorted, reachability, policy lock)
### Existing ProofChain Merkle Builder
Located at: `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/`
**Pros:**
- Already implements Merkle tree construction
- Tested and proven in production
- Handles parent/child attestation chains
**Cons:**
- Designed for attestation chains, not evidence hashing
- Includes attestation-specific metadata in hash
- Doesn't support custom leaf ordering required for CGS
- Would require modifications that might break existing attestation behavior
## Decision
**Build custom Merkle tree implementation in `VerdictBuilderService`.**
### Rationale
1. **Separation of Concerns**: CGS hash computation has different requirements than attestation chain verification
2. **Full Control Over Determinism**: Custom implementation allows:
- Explicit leaf ordering: SBOM → VEX (sorted) → Reachability → PolicyLock
- VEX document sorting by content hash (not insertion order)
- Culture-invariant string comparison (`StringComparer.Ordinal`)
3. **Simplicity**: ~50 lines of code vs modifying 500+ lines in ProofChain
4. **No Breaking Changes**: Doesn't affect existing attestation infrastructure
### Implementation
```csharp
// VerdictBuilderService.cs
private static string ComputeCgsHash(EvidencePack evidence, PolicyLock policyLock)
{
// Build Merkle tree from evidence components (sorted for determinism)
var leaves = new List<string>
{
ComputeHash(evidence.SbomCanonJson),
ComputeHash(evidence.FeedSnapshotDigest)
};
// Add VEX digests in sorted order (ORDER-CRITICAL for determinism!)
foreach (var vex in evidence.VexCanonJson.OrderBy(v => v, StringComparer.Ordinal))
{
leaves.Add(ComputeHash(vex));
}
// Add reachability if present
if (!string.IsNullOrEmpty(evidence.ReachabilityGraphJson))
{
leaves.Add(ComputeHash(evidence.ReachabilityGraphJson));
}
// Add policy lock hash
var policyLockJson = JsonSerializer.Serialize(policyLock, CanonicalJsonOptions);
leaves.Add(ComputeHash(policyLockJson));
// Build Merkle root
var merkleRoot = BuildMerkleRoot(leaves);
return $"cgs:sha256:{merkleRoot}";
}
private static string BuildMerkleRoot(List<string> leaves)
{
if (leaves.Count == 0)
return ComputeHash("");
if (leaves.Count == 1)
return leaves[0];
var level = leaves.ToList();
while (level.Count > 1)
{
var nextLevel = new List<string>();
for (int i = 0; i < level.Count; i += 2)
{
if (i + 1 < level.Count)
{
// Combine two hashes
var combined = level[i] + level[i + 1];
nextLevel.Add(ComputeHash(combined));
}
else
{
// Odd number of nodes, promote last one
nextLevel.Add(level[i]);
}
}
level = nextLevel;
}
return level[0];
}
private static string ComputeHash(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
var hashBytes = SHA256.HashData(bytes);
return Convert.ToHexString(hashBytes).ToLowerInvariant();
}
```
## Consequences
### Positive
- ✅ Full control over CGS hash computation logic
- ✅ No risk of breaking existing attestation chains
- ✅ Simple, testable implementation (~50 lines)
- ✅ Explicit ordering guarantees determinism
- ✅ Cross-platform verified (Windows, macOS, Linux, Alpine, Debian)
### Negative
- ⚠️ Code duplication with ProofChain (minimal - different use case)
- ⚠️ Need to maintain separate Merkle tree implementation (low maintenance burden)
### Neutral
- 📝 Custom implementation documented in tests (CgsDeterminismTests.cs)
- 📝 Future: Could extract shared Merkle tree primitives if needed
## Alternatives Considered
### Alternative 1: Modify ProofChain Builder
**Rejected because:**
- Would require adding configuration options to ProofChain
- Risk of breaking existing attestation behavior
- Increased complexity for both use cases
- Tight coupling between verdict and attestation systems
### Alternative 2: Use Third-Party Merkle Tree Library
**Rejected because:**
- External dependency for ~50 lines of code
- Less control over ordering and hash format
- Potential platform-specific issues
- Security review overhead
### Alternative 3: Single-Level Hash (No Merkle Tree)
**Rejected because:**
- Loses incremental verification capability
- Can't prove individual evidence components without full evidence pack
- Less efficient for large evidence packs (can't skip unchanged components)
## Verification
### Test Coverage
File: `src/__Tests/Determinism/CgsDeterminismTests.cs`
1. **Golden File Test**: Known evidence produces expected hash
2. **10-Iteration Stability**: Same input produces identical hash 10 times
3. **VEX Order Independence**: VEX document ordering doesn't affect hash
4. **Reachability Inclusion**: Reachability graph changes hash predictably
5. **Policy Lock Versioning**: Different policy versions produce different hashes
### Cross-Platform Verification
CI/CD Workflow: `.gitea/workflows/cross-platform-determinism.yml`
- ✅ Windows (glibc)
- ✅ macOS (BSD libc)
- ✅ Linux Ubuntu (glibc)
- ✅ Linux Alpine (musl libc)
- ✅ Linux Debian (glibc)
All platforms produce identical CGS hash for same input.
## Migration
No migration required - this is a new feature.
## References
- **Sprint**: `docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md`
- **Implementation**: `src/__Libraries/StellaOps.Verdict/VerdictBuilderService.cs`
- **Tests**: `src/__Tests/Determinism/CgsDeterminismTests.cs`
- **ProofChain**: `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/`
## Decision Date
2025-12-29
## Decision Makers
- Backend Team
- Security Team
- Attestation Team (consulted)
## Review Date
2026-06-29 (6 months) - Evaluate if code duplication warrants shared library