Files
git.stella-ops.org/docs/adr/0042-cgs-merkle-tree-implementation.md
master a4badc275e UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization
Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
2025-12-29 19:12:38 +02:00

6.5 KiB

ADR 0042: CGS Merkle Tree Implementation

Status

ACCEPTED (2025-12-29)

Context

The CGS (Canonical Graph Signature) system requires deterministic hash computation for verdicts. We need to decide whether to:

  1. Reuse existing StellaOps.Attestor.ProofChain Merkle tree builder
  2. Build a custom Merkle tree implementation in VerdictBuilderService

Requirements

  • Determinism: Same evidence must always produce identical CGS hash
  • Order Independence: VEX document ordering should not affect hash (sorted internally)
  • Cross-Platform: Identical hash on Windows, macOS, Linux (glibc), Linux (musl), BSD
  • Leaf Composition: Specific ordering of evidence components (SBOM, VEX sorted, reachability, policy lock)

Existing ProofChain Merkle Builder

Located at: src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/

Pros:

  • Already implements Merkle tree construction
  • Tested and proven in production
  • Handles parent/child attestation chains

Cons:

  • Designed for attestation chains, not evidence hashing
  • Includes attestation-specific metadata in hash
  • Doesn't support custom leaf ordering required for CGS
  • Would require modifications that might break existing attestation behavior

Decision

Build custom Merkle tree implementation in VerdictBuilderService.

Rationale

  1. Separation of Concerns: CGS hash computation has different requirements than attestation chain verification

  2. Full Control Over Determinism: Custom implementation allows:

    • Explicit leaf ordering: SBOM → VEX (sorted) → Reachability → PolicyLock
    • VEX document sorting by content hash (not insertion order)
    • Culture-invariant string comparison (StringComparer.Ordinal)
  3. Simplicity: ~50 lines of code vs modifying 500+ lines in ProofChain

  4. No Breaking Changes: Doesn't affect existing attestation infrastructure

Implementation

// VerdictBuilderService.cs
private static string ComputeCgsHash(EvidencePack evidence, PolicyLock policyLock)
{
    // Build Merkle tree from evidence components (sorted for determinism)
    var leaves = new List<string>
    {
        ComputeHash(evidence.SbomCanonJson),
        ComputeHash(evidence.FeedSnapshotDigest)
    };

    // Add VEX digests in sorted order (ORDER-CRITICAL for determinism!)
    foreach (var vex in evidence.VexCanonJson.OrderBy(v => v, StringComparer.Ordinal))
    {
        leaves.Add(ComputeHash(vex));
    }

    // Add reachability if present
    if (!string.IsNullOrEmpty(evidence.ReachabilityGraphJson))
    {
        leaves.Add(ComputeHash(evidence.ReachabilityGraphJson));
    }

    // Add policy lock hash
    var policyLockJson = JsonSerializer.Serialize(policyLock, CanonicalJsonOptions);
    leaves.Add(ComputeHash(policyLockJson));

    // Build Merkle root
    var merkleRoot = BuildMerkleRoot(leaves);
    return $"cgs:sha256:{merkleRoot}";
}

private static string BuildMerkleRoot(List<string> leaves)
{
    if (leaves.Count == 0)
        return ComputeHash("");

    if (leaves.Count == 1)
        return leaves[0];

    var level = leaves.ToList();

    while (level.Count > 1)
    {
        var nextLevel = new List<string>();

        for (int i = 0; i < level.Count; i += 2)
        {
            if (i + 1 < level.Count)
            {
                // Combine two hashes
                var combined = level[i] + level[i + 1];
                nextLevel.Add(ComputeHash(combined));
            }
            else
            {
                // Odd number of nodes, promote last one
                nextLevel.Add(level[i]);
            }
        }

        level = nextLevel;
    }

    return level[0];
}

private static string ComputeHash(string input)
{
    var bytes = Encoding.UTF8.GetBytes(input);
    var hashBytes = SHA256.HashData(bytes);
    return Convert.ToHexString(hashBytes).ToLowerInvariant();
}

Consequences

Positive

  • Full control over CGS hash computation logic
  • No risk of breaking existing attestation chains
  • Simple, testable implementation (~50 lines)
  • Explicit ordering guarantees determinism
  • Cross-platform verified (Windows, macOS, Linux, Alpine, Debian)

Negative

  • ⚠️ Code duplication with ProofChain (minimal - different use case)
  • ⚠️ Need to maintain separate Merkle tree implementation (low maintenance burden)

Neutral

  • 📝 Custom implementation documented in tests (CgsDeterminismTests.cs)
  • 📝 Future: Could extract shared Merkle tree primitives if needed

Alternatives Considered

Alternative 1: Modify ProofChain Builder

Rejected because:

  • Would require adding configuration options to ProofChain
  • Risk of breaking existing attestation behavior
  • Increased complexity for both use cases
  • Tight coupling between verdict and attestation systems

Alternative 2: Use Third-Party Merkle Tree Library

Rejected because:

  • External dependency for ~50 lines of code
  • Less control over ordering and hash format
  • Potential platform-specific issues
  • Security review overhead

Alternative 3: Single-Level Hash (No Merkle Tree)

Rejected because:

  • Loses incremental verification capability
  • Can't prove individual evidence components without full evidence pack
  • Less efficient for large evidence packs (can't skip unchanged components)

Verification

Test Coverage

File: src/__Tests/Determinism/CgsDeterminismTests.cs

  1. Golden File Test: Known evidence produces expected hash
  2. 10-Iteration Stability: Same input produces identical hash 10 times
  3. VEX Order Independence: VEX document ordering doesn't affect hash
  4. Reachability Inclusion: Reachability graph changes hash predictably
  5. Policy Lock Versioning: Different policy versions produce different hashes

Cross-Platform Verification

CI/CD Workflow: .gitea/workflows/cross-platform-determinism.yml

  • Windows (glibc)
  • macOS (BSD libc)
  • Linux Ubuntu (glibc)
  • Linux Alpine (musl libc)
  • Linux Debian (glibc)

All platforms produce identical CGS hash for same input.

Migration

No migration required - this is a new feature.

References

  • Sprint: docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md
  • Implementation: src/__Libraries/StellaOps.Verdict/VerdictBuilderService.cs
  • Tests: src/__Tests/Determinism/CgsDeterminismTests.cs
  • ProofChain: src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/

Decision Date

2025-12-29

Decision Makers

  • Backend Team
  • Security Team
  • Attestation Team (consulted)

Review Date

2026-06-29 (6 months) - Evaluate if code duplication warrants shared library