Files
git.stella-ops.org/docs-archived/implplan/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md
2026-01-05 16:02:11 +02:00

18 KiB

Golden File Establishment Guide

Overview

Golden files are baseline reference values that verify deterministic behavior remains stable over time. This guide explains how to establish, verify, and maintain golden hashes for CGS (Canonical Graph Signature) and other deterministic subsystems.

Table of Contents

  1. Prerequisites
  2. Initial Baseline Setup
  3. Cross-Platform Verification
  4. Golden Hash Maintenance
  5. Troubleshooting
  6. Breaking Change Process

Prerequisites

Local Environment

  • .NET 10 SDK (10.0.100 or later)
  • Git access to repository
  • Write access to CI/CD workflows

CI/CD Environment

  • Gitea Actions enabled
  • Cross-platform runners configured:
    • Windows (windows-latest)
    • macOS (macos-latest)
    • Linux (ubuntu-latest)
    • Alpine (mcr.microsoft.com/dotnet/sdk:10.0-alpine)
    • Debian (mcr.microsoft.com/dotnet/sdk:10.0-bookworm-slim)

Initial Baseline Setup

Step 1: Run Tests Locally

cd src/__Tests/Determinism

# Run CGS determinism tests
dotnet test --filter "Category=Determinism" --logger "console;verbosity=detailed"

Expected Output:

Test Name: CgsHash_WithKnownEvidence_MatchesGoldenHash
Outcome: Passed
Duration: 87ms

Standard Output Messages:
Computed CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
Golden CGS:   cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3

Step 2: Verify Hash Format

Computed CGS hash should match this format:

  • Prefix: cgs:sha256:
  • Hash: 64 hexadecimal characters (lowercase)
  • Total length: 75 characters

Example:

cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
|-------| |---------------------------------------------------------------|
 prefix                           64 hex chars

Step 3: Run 10-Iteration Stability Test

# Run 10 times to verify determinism
for i in {1..10}; do
  echo "=== Iteration $i ==="
  dotnet test \
    --filter "FullyQualifiedName~CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations" \
    --logger "console;verbosity=minimal"
done

Expected Result: All 10 iterations should pass.

If any iteration fails with:

Expected hashes.Distinct() to have count 1, but found 2 or more.

This indicates non-deterministic behavior. DO NOT proceed until determinism is fixed.

Step 4: Verify VEX Order Independence

dotnet test --filter "FullyQualifiedName~CgsHash_VexOrderIndependent_ProducesIdenticalHash"

This test creates evidence packs with VEX documents in different orders (1-2-3, 3-1-2, 2-3-1) and verifies all produce identical hash.

Expected Output:

Test Passed
VEX order-independent CGS: cgs:sha256:...

Step 5: Document Baseline

Create a baseline record:

cat > docs/testing/baselines/cgs-golden-hash-$(date +%Y%m%d).md <<EOF
# CGS Golden Hash Baseline - $(date +%Y-%m-%d)

## Environment
- .NET Version: $(dotnet --version)
- Platform: $(uname -s)
- Machine: $(uname -m)

## Golden Hash
\`\`\`
cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
\`\`\`

## Verification
- 10-iteration stability: ✅ PASS
- VEX order independence: ✅ PASS
- Empty evidence test: ✅ PASS

## Evidence Pack
\`\`\`json
{
  "sbomCanonJson": "{\"spdxVersion\":\"SPDX-3.0.1\",\"name\":\"test-sbom\"}",
  "vexCanonJson": ["{\"id\":\"vex-1\",\"cve\":\"CVE-2024-0001\",\"status\":\"not_affected\"}"],
  "reachabilityGraphJson": null,
  "feedSnapshotDigest": "sha256:0000000000000000000000000000000000000000000000000000000000000001"
}
\`\`\`

## Policy Lock
\`\`\`json
{
  "schemaVersion": "1.0",
  "policyVersion": "1.0.0",
  "ruleHashes": {
    "rule-001": "sha256:aaaa",
    "rule-002": "sha256:bbbb"
  },
  "engineVersion": "1.0.0",
  "generatedAt": "2025-01-01T00:00:00Z"
}
\`\`\`

## Established By
- Name: [Your Name]
- Date: $(date +%Y-%m-%d)
- Commit: $(git rev-parse --short HEAD)
EOF

Cross-Platform Verification

Step 1: Push to Feature Branch

git checkout -b feature/establish-golden-hash
git add src/__Tests/Determinism/CgsDeterminismTests.cs
git commit -m "chore: establish CGS golden hash baseline

- Verified 10-iteration stability locally
- Verified VEX order independence
- Ready for cross-platform verification"
git push origin feature/establish-golden-hash

Step 2: Create Pull Request

Create PR with description:

## Golden Hash Baseline Establishment

This PR establishes the golden hash baseline for CGS determinism testing.

### Local Verification ✅
- [x] 10-iteration stability test (all identical)
- [x] VEX order independence test
- [x] Empty evidence test
- [x] Policy lock version test

### Expected CI/CD Verification
- [ ] Windows: golden hash matches
- [ ] macOS: golden hash matches
- [ ] Linux (Ubuntu): golden hash matches
- [ ] Linux (Alpine, musl libc): golden hash matches
- [ ] Linux (Debian): golden hash matches

### Golden Hash

cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3


### References
- Baseline documentation: `docs/testing/baselines/cgs-golden-hash-20251229.md`
- Sprint: `docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md`

Step 3: Monitor CI/CD Pipeline

Watch for cross-platform determinism workflow: .gitea/workflows/cross-platform-determinism.yml

Expected Workflow Jobs:

  1. determinism-windows
  2. determinism-macos
  3. determinism-linux
  4. determinism-alpine
  5. determinism-debian
  6. compare-hashes

Step 4: Review Hash Comparison Report

After all platform tests complete, the compare-hashes job generates a report:

Successful Output:

{
  "divergences": [],
  "platforms": {
    "windows": "cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3",
    "macos": "cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3",
    "linux": "cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3",
    "alpine": "cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3",
    "debian": "cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3"
  },
  "status": "SUCCESS",
  "message": "All hashes match across platforms."
}

Divergence Detected ( FAILURE):

{
  "divergences": [
    {
      "key": "cgs_hash",
      "linux": "cgs:sha256:abc123...",
      "alpine": "cgs:sha256:def456...",
      "windows": "cgs:sha256:abc123...",
      "macos": "cgs:sha256:abc123...",
      "debian": "cgs:sha256:abc123..."
    }
  ],
  "status": "FAILURE",
  "message": "Hash divergence detected on Alpine platform (musl libc)"
}

If divergences are detected, DO NOT merge. See Troubleshooting.

Step 5: Uncomment Golden Hash Assertion

After successful cross-platform verification:

# Edit CgsDeterminismTests.cs
vi src/__Tests/Determinism/CgsDeterminismTests.cs

Line 68-69: Uncomment the assertion:

// Before:
// Uncomment when golden hash is established:
// result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file");

// After:
// Golden hash established 2025-12-29 (all platforms verified)
result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file");

Commit:

git add src/__Tests/Determinism/CgsDeterminismTests.cs
git commit -m "test: enable golden hash assertion

All platforms verified:
- Windows: ✅
- macOS: ✅
- Linux (Ubuntu): ✅
- Linux (Alpine): ✅
- Linux (Debian): ✅

Golden hash locked: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3"
git push origin feature/establish-golden-hash

Step 6: Merge to Main

After PR approval and final CI/CD run:

git checkout main
git merge feature/establish-golden-hash
git push origin main

Golden Hash Maintenance

Regular Verification

Run cross-platform tests weekly to detect drift:

# Trigger manual workflow dispatch
gh workflow run cross-platform-determinism.yml

Monitoring

Set up alerts for:

  • Hash divergence detected
  • Golden hash test failures
  • Cross-platform workflow failures

Slack/Email Alert Example:

⚠️ CGS Golden Hash Failure
Platform: Alpine (musl libc)
Expected: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
Actual: cgs:sha256:e5f67851g987bh09d121c97b51e6g67856b229e1017b45f70bfd9d1ec2cb9gb4

Investigate immediately - audit trail integrity at risk!

Version Tracking

Maintain golden hash changelog:

# CGS Golden Hash Changelog

## v1.0.0 (2025-01-01)
- Initial baseline: `cgs:sha256:d4e56740...`
- Established by: Team
- All platforms verified

## v1.1.0 (2025-02-15) - BREAKING CHANGE
- Updated to: `cgs:sha256:e5f67851...`
- Reason: Fixed VEX ordering bug in VerdictBuilderService
- Migration: Recompute all verdicts after 2025-02-01
- ADR: docs/adr/0042-cgs-vex-ordering-fix.md

Troubleshooting

Divergence on Alpine (musl libc)

Symptom:

Alpine: cgs:sha256:abc123...
Others: cgs:sha256:def456...

Likely Causes:

  1. String sorting differences (musl vs glibc strcoll)
  2. JSON serialization differences
  3. Floating-point formatting differences

Solutions:

  1. Use Ordinal String Comparison:
// ❌ Wrong (culture-dependent)
leaves.Sort();

// ✅ Correct (culture-independent)
leaves = leaves.OrderBy(l => l, StringComparer.Ordinal).ToList();
  1. Explicit UTF-8 Encoding:
// ❌ Wrong (platform default encoding)
var bytes = Encoding.Default.GetBytes(input);

// ✅ Correct (explicit UTF-8)
var bytes = Encoding.UTF8.GetBytes(input);
  1. Invariant Culture for Numbers:
// ❌ Wrong (culture-dependent)
var json = JsonSerializer.Serialize(data);

// ✅ Correct (invariant culture)
var json = JsonSerializer.Serialize(data, new JsonSerializerOptions
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
    WriteIndented = false,
    // Ensure invariant culture
    Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping
});

Divergence on Windows

Symptom:

Windows: cgs:sha256:abc123...
macOS/Linux: cgs:sha256:def456...

Likely Causes:

  1. Path separator differences (\ vs /)
  2. Line ending differences (CRLF vs LF)
  3. Case sensitivity in file paths

Solutions:

  1. Use Path.Combine:
// ❌ Wrong (hardcoded separator)
var path = "dir\\file.txt";

// ✅ Correct (cross-platform)
var path = Path.Combine("dir", "file.txt");
  1. Normalize Line Endings:
// ❌ Wrong (platform line endings)
var text = File.ReadAllText(path);

// ✅ Correct (normalized to \n)
var text = File.ReadAllText(path).Replace("\r\n", "\n");

Golden Hash Changes After .NET Upgrade

Symptom: After upgrading from .NET 10.0.100 to 10.0.101:

Expected: cgs:sha256:abc123...
Actual: cgs:sha256:def456...

Investigation:

  1. Check .NET Version:
dotnet --version  # Should be consistent across platforms
  1. Check JsonSerializer Behavior:
// Test JSON serialization consistency
var test = new { name = "test", value = 123 };
var json1 = JsonSerializer.Serialize(test, CanonicalJsonOptions);
var json2 = JsonSerializer.Serialize(test, CanonicalJsonOptions);
Assert.Equal(json1, json2);
  1. Check Hash Algorithm:
// Verify SHA256 produces expected output
var input = "test";
var hash = Convert.ToHexString(SHA256.HashData(Encoding.UTF8.GetBytes(input))).ToLowerInvariant();
// Should be: 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08

Resolution:

  • If .NET change is intentional and breaking, follow Breaking Change Process
  • If .NET change is unintentional, file bug with .NET team

Breaking Change Process

When Golden Hash MUST Change

Golden hash changes are breaking changes that affect audit trail integrity. Only change for:

  1. Security Fixes: Vulnerability in hash computation
  2. Correctness Bugs: Hash not deterministic or incorrect
  3. Platform Incompatibility: Hash diverges across platforms

Change Process

Step 1: Document in ADR

Create docs/adr/NNNN-cgs-hash-algorithm-change.md:

# ADR NNNN: CGS Hash Algorithm Change

## Status
ACCEPTED (2025-03-15)

## Context
The current CGS hash computation has a bug in VEX document ordering that causes non-deterministic results when VEX documents have identical timestamps.

## Decision
Update VerdictBuilderService to sort VEX documents by (timestamp, cve_id, issuer_id) instead of just (timestamp).

## Consequences

### Breaking Changes
- Golden hash will change from `cgs:sha256:abc123...` to `cgs:sha256:def456...`
- All historical verdicts computed before 2025-03-15 will have old hash format
- Audit trail verification requires dual-algorithm support during transition

### Migration Strategy
1. Deploy dual-algorithm support (v1 and v2 hash computation)
2. Recompute all verdicts created after 2025-02-01 with v2 algorithm
3. Store both v1 and v2 hashes for 90-day transition period
4. Deprecate v1 algorithm on 2025-06-15

### Testing
- Verify v2 hash is deterministic across all platforms
- Verify v1 verdicts can still be verified during transition
- Load test recomputation of 1M+ verdicts

Step 2: Implement Dual-Algorithm Support

public enum CgsHashVersion
{
    V1 = 1,  // Original algorithm (deprecated 2025-03-15)
    V2 = 2   // Fixed VEX ordering (current)
}

public string ComputeCgsHash(EvidencePack evidence, PolicyLock policyLock, CgsHashVersion version = CgsHashVersion.V2)
{
    return version switch
    {
        CgsHashVersion.V1 => ComputeCgsHashV1(evidence, policyLock),
        CgsHashVersion.V2 => ComputeCgsHashV2(evidence, policyLock),
        _ => throw new ArgumentException($"Unsupported CGS hash version: {version}")
    };
}

Step 3: Update Tests with Both Versions

[Theory]
[InlineData(CgsHashVersion.V1, "cgs:sha256:abc123...")]  // Old golden hash
[InlineData(CgsHashVersion.V2, "cgs:sha256:def456...")]  // New golden hash
public async Task CgsHash_WithKnownEvidence_MatchesGoldenHash_BothVersions(
    CgsHashVersion version,
    string expectedHash)
{
    // Test both algorithms during transition period
    var evidence = CreateKnownEvidencePack();
    var policyLock = CreateKnownPolicyLock();
    var service = CreateVerdictBuilder();

    var result = await service.BuildAsync(evidence, policyLock, version, CancellationToken.None);

    result.CgsHash.Should().Be(expectedHash);
}

Step 4: Create Migration Script

// tools/migrate-cgs-hashes.cs
public class CgsHashMigrator
{
    public async Task MigrateVerdicts(DateTimeOffset since)
    {
        var verdicts = await _repository.GetVerdictsSince(since);

        foreach (var verdict in verdicts)
        {
            // Recompute with V2 algorithm
            var newHash = ComputeCgsHashV2(verdict.Evidence, verdict.PolicyLock);

            // Store both hashes during transition
            await _repository.UpdateVerdict(verdict.Id, new
            {
                CgsHashV1 = verdict.CgsHash,
                CgsHashV2 = newHash,
                MigratedAt = DateTimeOffset.UtcNow
            });
        }
    }
}

Step 5: Coordinate Deployment

Timeline:

  • Week 1: Deploy dual-algorithm support to staging
  • Week 2: Run migration script on staging data
  • Week 3: Verify all verdicts have both v1 and v2 hashes
  • Week 4: Deploy to production
  • Week 5-16: 90-day transition period (both algorithms supported)
  • Week 17: Deprecate v1, remove from codebase

Step 6: Update Golden Hash

After successful migration:

// src/__Tests/Determinism/CgsDeterminismTests.cs
[Fact]
public async Task CgsHash_WithKnownEvidence_MatchesGoldenHash()
{
    // Arrange
    var evidence = CreateKnownEvidencePack();
    var policyLock = CreateKnownPolicyLock();
    var service = CreateVerdictBuilder();

    // Act
    var result = await service.BuildAsync(evidence, policyLock, CancellationToken.None);

    // Assert - Updated golden hash (2025-03-15)
    var goldenHash = "cgs:sha256:def456...";  // V2 algorithm
    result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file (V2 algorithm)");
}

Step 7: Document in Changelog

## CHANGELOG

### [2.0.0] - 2025-03-15 - BREAKING CHANGE

#### Changed
- **CGS Hash Algorithm**: Fixed VEX ordering bug (#1234)
  - Old: `cgs:sha256:abc123...`
  - New: `cgs:sha256:def456...`
  - Migration: All verdicts after 2025-02-01 recomputed
  - Dual-algorithm support: 90 days (until 2025-06-15)

#### Migration Guide
See: `docs/migrations/cgs-hash-v2-migration.md`

#### ADR
See: `docs/adr/0042-cgs-hash-algorithm-change.md`

Best Practices

1. Never Change Golden Hash Without ADR

Every golden hash change MUST have an ADR documenting:

  • Why the change is necessary
  • Impact on historical data
  • Migration strategy
  • Testing plan

2. Always Support Dual Algorithms During Transition

For 90 days after change, support both old and new algorithms to avoid breaking existing integrations.

3. Run Cross-Platform Tests Before Merge

Never merge golden hash changes without verifying all 5 platforms produce identical results.

4. Version Golden Hashes in Baseline Files

Maintain historical record:

docs/testing/baselines/
├── cgs-golden-hash-20250101-v1.md  # Original
└── cgs-golden-hash-20250315-v2.md  # Updated

5. Automate Monitoring

Set up daily cross-platform runs to detect drift early:

# .gitea/workflows/golden-hash-monitor.yml
on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight UTC

References

  • Sprint Documentation: docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md
  • Test README: src/__Tests/Determinism/README.md
  • CI/CD Workflow: .gitea/workflows/cross-platform-determinism.yml
  • Batch Summary: docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md

Support

For questions or issues:

  • Create issue with label: determinism, golden-file
  • Priority: Critical (affects audit trail integrity)
  • Slack: #determinism-testing