# Golden File Establishment Guide ## Overview Golden files are baseline reference values that verify deterministic behavior remains stable over time. This guide explains how to establish, verify, and maintain golden hashes for CGS (Canonical Graph Signature) and other deterministic subsystems. ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Initial Baseline Setup](#initial-baseline-setup) 3. [Cross-Platform Verification](#cross-platform-verification) 4. [Golden Hash Maintenance](#golden-hash-maintenance) 5. [Troubleshooting](#troubleshooting) 6. [Breaking Change Process](#breaking-change-process) ## Prerequisites ### Local Environment - .NET 10 SDK (10.0.100 or later) - Git access to repository - Write access to CI/CD workflows ### CI/CD Environment - Gitea Actions enabled - Cross-platform runners configured: - Windows (windows-latest) - macOS (macos-latest) - Linux (ubuntu-latest) - Alpine (mcr.microsoft.com/dotnet/sdk:10.0-alpine) - Debian (mcr.microsoft.com/dotnet/sdk:10.0-bookworm-slim) ## Initial Baseline Setup ### Step 1: Run Tests Locally ```bash cd src/__Tests/Determinism # Run CGS determinism tests dotnet test --filter "Category=Determinism" --logger "console;verbosity=detailed" ``` **Expected Output:** ``` Test Name: CgsHash_WithKnownEvidence_MatchesGoldenHash Outcome: Passed Duration: 87ms Standard Output Messages: Computed CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 Golden CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 ``` ### Step 2: Verify Hash Format Computed CGS hash should match this format: - Prefix: `cgs:sha256:` - Hash: 64 hexadecimal characters (lowercase) - Total length: 75 characters **Example:** ``` cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 |-------| |---------------------------------------------------------------| prefix 64 hex chars ``` ### Step 3: Run 10-Iteration Stability Test ```bash # Run 10 times to verify determinism for i in {1..10}; do echo "=== Iteration $i ===" dotnet test \ --filter "FullyQualifiedName~CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations" \ --logger "console;verbosity=minimal" done ``` **Expected Result:** All 10 iterations should pass. If any iteration fails with: ``` Expected hashes.Distinct() to have count 1, but found 2 or more. ``` This indicates non-deterministic behavior. **DO NOT proceed** until determinism is fixed. ### Step 4: Verify VEX Order Independence ```bash dotnet test --filter "FullyQualifiedName~CgsHash_VexOrderIndependent_ProducesIdenticalHash" ``` This test creates evidence packs with VEX documents in different orders (1-2-3, 3-1-2, 2-3-1) and verifies all produce identical hash. **Expected Output:** ``` Test Passed VEX order-independent CGS: cgs:sha256:... ``` ### Step 5: Document Baseline Create a baseline record: ```bash cat > docs/testing/baselines/cgs-golden-hash-$(date +%Y%m%d).md < l, StringComparer.Ordinal).ToList(); ``` 2. **Explicit UTF-8 Encoding:** ```csharp // ❌ Wrong (platform default encoding) var bytes = Encoding.Default.GetBytes(input); // ✅ Correct (explicit UTF-8) var bytes = Encoding.UTF8.GetBytes(input); ``` 3. **Invariant Culture for Numbers:** ```csharp // ❌ Wrong (culture-dependent) var json = JsonSerializer.Serialize(data); // ✅ Correct (invariant culture) var json = JsonSerializer.Serialize(data, new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase, WriteIndented = false, // Ensure invariant culture Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping }); ``` ### Divergence on Windows **Symptom:** ``` Windows: cgs:sha256:abc123... macOS/Linux: cgs:sha256:def456... ``` **Likely Causes:** 1. Path separator differences (`\` vs `/`) 2. Line ending differences (CRLF vs LF) 3. Case sensitivity in file paths **Solutions:** 1. **Use Path.Combine:** ```csharp // ❌ Wrong (hardcoded separator) var path = "dir\\file.txt"; // ✅ Correct (cross-platform) var path = Path.Combine("dir", "file.txt"); ``` 2. **Normalize Line Endings:** ```csharp // ❌ Wrong (platform line endings) var text = File.ReadAllText(path); // ✅ Correct (normalized to \n) var text = File.ReadAllText(path).Replace("\r\n", "\n"); ``` ### Golden Hash Changes After .NET Upgrade **Symptom:** After upgrading from .NET 10.0.100 to 10.0.101: ``` Expected: cgs:sha256:abc123... Actual: cgs:sha256:def456... ``` **Investigation:** 1. **Check .NET Version:** ```bash dotnet --version # Should be consistent across platforms ``` 2. **Check JsonSerializer Behavior:** ```csharp // Test JSON serialization consistency var test = new { name = "test", value = 123 }; var json1 = JsonSerializer.Serialize(test, CanonicalJsonOptions); var json2 = JsonSerializer.Serialize(test, CanonicalJsonOptions); Assert.Equal(json1, json2); ``` 3. **Check Hash Algorithm:** ```csharp // Verify SHA256 produces expected output var input = "test"; var hash = Convert.ToHexString(SHA256.HashData(Encoding.UTF8.GetBytes(input))).ToLowerInvariant(); // Should be: 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 ``` **Resolution:** - If .NET change is intentional and breaking, follow [Breaking Change Process](#breaking-change-process) - If .NET change is unintentional, file bug with .NET team ## Breaking Change Process ### When Golden Hash MUST Change Golden hash changes are **breaking changes** that affect audit trail integrity. Only change for: 1. **Security Fixes**: Vulnerability in hash computation 2. **Correctness Bugs**: Hash not deterministic or incorrect 3. **Platform Incompatibility**: Hash diverges across platforms ### Change Process #### Step 1: Document in ADR Create `docs/adr/NNNN-cgs-hash-algorithm-change.md`: ```markdown # ADR NNNN: CGS Hash Algorithm Change ## Status ACCEPTED (2025-03-15) ## Context The current CGS hash computation has a bug in VEX document ordering that causes non-deterministic results when VEX documents have identical timestamps. ## Decision Update VerdictBuilderService to sort VEX documents by (timestamp, cve_id, issuer_id) instead of just (timestamp). ## Consequences ### Breaking Changes - Golden hash will change from `cgs:sha256:abc123...` to `cgs:sha256:def456...` - All historical verdicts computed before 2025-03-15 will have old hash format - Audit trail verification requires dual-algorithm support during transition ### Migration Strategy 1. Deploy dual-algorithm support (v1 and v2 hash computation) 2. Recompute all verdicts created after 2025-02-01 with v2 algorithm 3. Store both v1 and v2 hashes for 90-day transition period 4. Deprecate v1 algorithm on 2025-06-15 ### Testing - Verify v2 hash is deterministic across all platforms - Verify v1 verdicts can still be verified during transition - Load test recomputation of 1M+ verdicts ``` #### Step 2: Implement Dual-Algorithm Support ```csharp public enum CgsHashVersion { V1 = 1, // Original algorithm (deprecated 2025-03-15) V2 = 2 // Fixed VEX ordering (current) } public string ComputeCgsHash(EvidencePack evidence, PolicyLock policyLock, CgsHashVersion version = CgsHashVersion.V2) { return version switch { CgsHashVersion.V1 => ComputeCgsHashV1(evidence, policyLock), CgsHashVersion.V2 => ComputeCgsHashV2(evidence, policyLock), _ => throw new ArgumentException($"Unsupported CGS hash version: {version}") }; } ``` #### Step 3: Update Tests with Both Versions ```csharp [Theory] [InlineData(CgsHashVersion.V1, "cgs:sha256:abc123...")] // Old golden hash [InlineData(CgsHashVersion.V2, "cgs:sha256:def456...")] // New golden hash public async Task CgsHash_WithKnownEvidence_MatchesGoldenHash_BothVersions( CgsHashVersion version, string expectedHash) { // Test both algorithms during transition period var evidence = CreateKnownEvidencePack(); var policyLock = CreateKnownPolicyLock(); var service = CreateVerdictBuilder(); var result = await service.BuildAsync(evidence, policyLock, version, CancellationToken.None); result.CgsHash.Should().Be(expectedHash); } ``` #### Step 4: Create Migration Script ```csharp // tools/migrate-cgs-hashes.cs public class CgsHashMigrator { public async Task MigrateVerdicts(DateTimeOffset since) { var verdicts = await _repository.GetVerdictsSince(since); foreach (var verdict in verdicts) { // Recompute with V2 algorithm var newHash = ComputeCgsHashV2(verdict.Evidence, verdict.PolicyLock); // Store both hashes during transition await _repository.UpdateVerdict(verdict.Id, new { CgsHashV1 = verdict.CgsHash, CgsHashV2 = newHash, MigratedAt = DateTimeOffset.UtcNow }); } } } ``` #### Step 5: Coordinate Deployment **Timeline:** - Week 1: Deploy dual-algorithm support to staging - Week 2: Run migration script on staging data - Week 3: Verify all verdicts have both v1 and v2 hashes - Week 4: Deploy to production - Week 5-16: 90-day transition period (both algorithms supported) - Week 17: Deprecate v1, remove from codebase #### Step 6: Update Golden Hash After successful migration: ```csharp // src/__Tests/Determinism/CgsDeterminismTests.cs [Fact] public async Task CgsHash_WithKnownEvidence_MatchesGoldenHash() { // Arrange var evidence = CreateKnownEvidencePack(); var policyLock = CreateKnownPolicyLock(); var service = CreateVerdictBuilder(); // Act var result = await service.BuildAsync(evidence, policyLock, CancellationToken.None); // Assert - Updated golden hash (2025-03-15) var goldenHash = "cgs:sha256:def456..."; // V2 algorithm result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file (V2 algorithm)"); } ``` #### Step 7: Document in Changelog ```markdown ## CHANGELOG ### [2.0.0] - 2025-03-15 - BREAKING CHANGE #### Changed - **CGS Hash Algorithm**: Fixed VEX ordering bug (#1234) - Old: `cgs:sha256:abc123...` - New: `cgs:sha256:def456...` - Migration: All verdicts after 2025-02-01 recomputed - Dual-algorithm support: 90 days (until 2025-06-15) #### Migration Guide See: `docs/migrations/cgs-hash-v2-migration.md` #### ADR See: `docs/adr/0042-cgs-hash-algorithm-change.md` ``` ## Best Practices ### 1. Never Change Golden Hash Without ADR Every golden hash change MUST have an ADR documenting: - Why the change is necessary - Impact on historical data - Migration strategy - Testing plan ### 2. Always Support Dual Algorithms During Transition For 90 days after change, support both old and new algorithms to avoid breaking existing integrations. ### 3. Run Cross-Platform Tests Before Merge Never merge golden hash changes without verifying all 5 platforms produce identical results. ### 4. Version Golden Hashes in Baseline Files Maintain historical record: ``` docs/testing/baselines/ ├── cgs-golden-hash-20250101-v1.md # Original └── cgs-golden-hash-20250315-v2.md # Updated ``` ### 5. Automate Monitoring Set up daily cross-platform runs to detect drift early: ```yaml # .gitea/workflows/golden-hash-monitor.yml on: schedule: - cron: '0 0 * * *' # Daily at midnight UTC ``` ## References - **Sprint Documentation**: `docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md` - **Test README**: `src/__Tests/Determinism/README.md` - **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml` - **Batch Summary**: `docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md` ## Support For questions or issues: - Create issue with label: `determinism`, `golden-file` - Priority: Critical (affects audit trail integrity) - Slack: #determinism-testing