Files
git.stella-ops.org/docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
master a4badc275e UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization
Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
2025-12-29 19:12:38 +02:00

17 KiB

Determinism Developer Guide

Overview

This guide helps developers add new determinism tests to StellaOps. Deterministic behavior is critical for:

  • Reproducible verdicts
  • Auditable evidence chains
  • Cryptographic verification
  • Cross-platform consistency

Table of Contents

  1. Core Principles
  2. Test Structure
  3. Common Patterns
  4. Anti-Patterns to Avoid
  5. Adding New Tests
  6. Cross-Platform Considerations
  7. Performance Guidelines
  8. Troubleshooting

Core Principles

1. Determinism Guarantee

Definition: Same inputs always produce identical outputs, regardless of:

  • Platform (Windows, macOS, Linux, Alpine, Debian)
  • Runtime (.NET version, JIT compiler)
  • Execution order (parallel vs sequential)
  • Time of day
  • System locale

2. Golden File Philosophy

Golden files are baseline reference values that lock in correct behavior:

  • Established after careful verification
  • Never changed without ADR and migration plan
  • Verified on all platforms before acceptance

3. Test Independence

Each test must:

  • Not depend on other tests' execution or order
  • Clean up resources after completion
  • Use isolated data (no shared state)

Test Structure

Standard Test Template

[Fact]
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Unit)]
public async Task Feature_Behavior_ExpectedOutcome()
{
    // Arrange - Create deterministic inputs
    var input = CreateDeterministicInput();

    // Act - Execute feature
    var output1 = await ExecuteFeature(input);
    var output2 = await ExecuteFeature(input);

    // Assert - Verify determinism
    output1.Should().Be(output2, "same input should produce identical output");
}

Test Organization

src/__Tests/Determinism/
├── CgsDeterminismTests.cs          # CGS hash tests
├── LineageDeterminismTests.cs      # SBOM lineage tests
├── VexDeterminismTests.cs          # VEX consensus tests (future)
├── README.md                       # Test documentation
└── Fixtures/                       # Test data
    ├── known-evidence-pack.json
    ├── known-policy-lock.json
    └── golden-hashes/
        └── cgs-v1.txt

Common Patterns

Pattern 1: 10-Iteration Stability Test

Purpose: Verify that executing the same operation 10 times produces identical results.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_SameInput_ProducesIdenticalOutput_Across10Iterations()
{
    // Arrange
    var input = CreateDeterministicInput();
    var service = CreateService();
    var outputs = new List<string>();

    // Act - Execute 10 times
    for (int i = 0; i < 10; i++)
    {
        var result = await service.ProcessAsync(input, CancellationToken.None);
        outputs.Add(result.Hash);
        _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
    }

    // Assert - All hashes should be identical
    outputs.Distinct().Should().HaveCount(1,
        "same input should produce identical output across all iterations");
}

Why 10 iterations?

  • Catches non-deterministic behavior (e.g., GUID generation, random values)
  • Reasonable execution time (<5 seconds for most tests)
  • Industry standard for determinism verification

Pattern 2: Golden File Test

Purpose: Verify output matches a known-good baseline value.

[Fact]
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Golden)]
public async Task Feature_WithKnownInput_MatchesGoldenHash()
{
    // Arrange
    var input = CreateKnownInput();  // MUST be completely deterministic
    var service = CreateService();

    // Act
    var result = await service.ProcessAsync(input, CancellationToken.None);

    // Assert
    var goldenHash = "sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3";

    _output.WriteLine($"Computed Hash: {result.Hash}");
    _output.WriteLine($"Golden Hash:   {goldenHash}");

    result.Hash.Should().Be(goldenHash, "hash must match golden file");
}

Golden file best practices:

  • Document how golden value was established (date, platform, .NET version)
  • Include golden value directly in test code (not external file) for visibility
  • Add comment explaining what golden value represents
  • Test golden value on all platforms before merging

Pattern 3: Order Independence Test

Purpose: Verify that input ordering doesn't affect output.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_InputOrder_DoesNotAffectOutput()
{
    // Arrange
    var item1 = CreateItem("A");
    var item2 = CreateItem("B");
    var item3 = CreateItem("C");

    var service = CreateService();

    // Act - Process items in different orders
    var result1 = await service.ProcessAsync(new[] { item1, item2, item3 }, CancellationToken.None);
    var result2 = await service.ProcessAsync(new[] { item3, item1, item2 }, CancellationToken.None);
    var result3 = await service.ProcessAsync(new[] { item2, item3, item1 }, CancellationToken.None);

    // Assert - All should produce same hash
    result1.Hash.Should().Be(result2.Hash, "input order should not affect output");
    result1.Hash.Should().Be(result3.Hash, "input order should not affect output");

    _output.WriteLine($"Order-independent hash: {result1.Hash}");
}

When to use:

  • Collections that should be sorted internally (VEX documents, rules, dependencies)
  • APIs that accept unordered inputs (dictionary keys, sets)
  • Parallel processing where order is undefined

Pattern 4: Deterministic Timestamp Test

Purpose: Verify that fixed timestamps produce deterministic results.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_WithFixedTimestamp_IsDeterministic()
{
    // Arrange - Use FIXED timestamp (not DateTimeOffset.Now!)
    var timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z");
    var input = CreateInputWithTimestamp(timestamp);
    var service = CreateService();

    // Act
    var result1 = await service.ProcessAsync(input, CancellationToken.None);
    var result2 = await service.ProcessAsync(input, CancellationToken.None);

    // Assert
    result1.Hash.Should().Be(result2.Hash, "fixed timestamp should produce deterministic output");
}

Timestamp guidelines:

  • Never use: DateTimeOffset.Now, DateTime.UtcNow, Guid.NewGuid()
  • Always use: DateTimeOffset.Parse("2025-01-01T00:00:00Z") for tests

Pattern 5: Empty/Minimal Input Test

Purpose: Verify that minimal or empty inputs don't cause non-determinism.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_EmptyInput_ProducesDeterministicHash()
{
    // Arrange - Minimal input
    var input = CreateEmptyInput();
    var service = CreateService();

    // Act
    var result = await service.ProcessAsync(input, CancellationToken.None);

    // Assert - Verify format (hash may not be golden yet)
    result.Hash.Should().StartWith("sha256:");
    result.Hash.Length.Should().Be(71); // "sha256:" + 64 hex chars

    _output.WriteLine($"Empty input hash: {result.Hash}");
}

Edge cases to test:

  • Empty collections (Array.Empty<string>())
  • Null optional fields
  • Zero-length strings
  • Default values

Anti-Patterns to Avoid

Anti-Pattern 1: Using Current Time

// BAD - Non-deterministic!
var input = new Input
{
    Timestamp = DateTimeOffset.Now  // ❌ Different every run!
};

Fix:

// GOOD - Deterministic
var input = new Input
{
    Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z")  // ✅ Same every run
};

Anti-Pattern 2: Using Random Values

// BAD - Non-deterministic!
var random = new Random();
var input = new Input
{
    Id = random.Next()  // ❌ Different every run!
};

Fix:

// GOOD - Deterministic
var input = new Input
{
    Id = 12345  // ✅ Same every run
};

Anti-Pattern 3: Using GUID Generation

// BAD - Non-deterministic!
var input = new Input
{
    Id = Guid.NewGuid().ToString()  // ❌ Different every run!
};

Fix:

// GOOD - Deterministic
var input = new Input
{
    Id = "00000000-0000-0000-0000-000000000001"  // ✅ Same every run
};

Anti-Pattern 4: Using Unordered Collections

// BAD - Dictionary iteration order is NOT guaranteed!
var dict = new Dictionary<string, string>
{
    ["key1"] = "value1",
    ["key2"] = "value2"
};

foreach (var kvp in dict)  // ❌ Order may vary!
{
    hash.Update(kvp.Key);
}

Fix:

// GOOD - Explicit ordering
var dict = new Dictionary<string, string>
{
    ["key1"] = "value1",
    ["key2"] = "value2"
};

foreach (var kvp in dict.OrderBy(x => x.Key, StringComparer.Ordinal))  // ✅ Consistent order
{
    hash.Update(kvp.Key);
}

Anti-Pattern 5: Platform-Specific Paths

// BAD - Platform-specific!
var path = "dir\\file.txt";  // ❌ Windows-only!

Fix:

// GOOD - Cross-platform
var path = Path.Combine("dir", "file.txt");  // ✅ Works everywhere

Anti-Pattern 6: Culture-Dependent Formatting

// BAD - Culture-dependent!
var formatted = value.ToString();  // ❌ Locale-specific!

Fix:

// GOOD - Culture-invariant
var formatted = value.ToString(CultureInfo.InvariantCulture);  // ✅ Same everywhere

Adding New Tests

Step 1: Identify Determinism Requirement

Ask yourself:

  • Does this feature produce a hash, signature, or cryptographic output?
  • Will this feature's output be stored and verified later?
  • Does this feature need to be reproducible across platforms?
  • Is this feature part of an audit trail?

If YES to any → Add determinism test.

Step 2: Create Test File

cd src/__Tests/Determinism
touch MyFeatureDeterminismTests.cs

Step 3: Write Test Class

using FluentAssertions;
using StellaOps.TestKit;
using Xunit;
using Xunit.Abstractions;

namespace StellaOps.Tests.Determinism;

/// <summary>
/// Determinism tests for [Feature Name].
/// Verifies that [specific behavior] is deterministic across platforms and runs.
/// </summary>
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Unit)]
public sealed class MyFeatureDeterminismTests
{
    private readonly ITestOutputHelper _output;

    public MyFeatureDeterminismTests(ITestOutputHelper output)
    {
        _output = output;
    }

    [Fact]
    public async Task MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations()
    {
        // Arrange
        var input = CreateDeterministicInput();
        var service = CreateMyFeatureService();
        var outputs = new List<string>();

        // Act - Execute 10 times
        for (int i = 0; i < 10; i++)
        {
            var result = await service.ProcessAsync(input, CancellationToken.None);
            outputs.Add(result.Hash);
            _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
        }

        // Assert - All hashes should be identical
        outputs.Distinct().Should().HaveCount(1,
            "same input should produce identical output across all iterations");
    }

    #region Helper Methods

    private static MyInput CreateDeterministicInput()
    {
        return new MyInput
        {
            // ✅ Use fixed values
            Id = "test-001",
            Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z"),
            Data = new[] { "item1", "item2", "item3" }
        };
    }

    private static MyFeatureService CreateMyFeatureService()
    {
        return new MyFeatureService(NullLogger<MyFeatureService>.Instance);
    }

    #endregion
}

Step 4: Run Test Locally 10 Times

for i in {1..10}; do
  echo "=== Run $i ==="
  dotnet test --filter "FullyQualifiedName~MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations"
done

Expected: All 10 runs pass with identical output.

Step 5: Add to CI/CD

Test is automatically included when pushed (no configuration needed).

CI/CD workflow .gitea/workflows/cross-platform-determinism.yml runs all Category=Determinism tests on 5 platforms.

Step 6: Document in README

Update src/__Tests/Determinism/README.md:

### MyFeature Determinism

Tests that verify [feature] hash computation is deterministic:

- **10-Iteration Stability**: Same input produces identical hash 10 times
- **Order Independence**: Input ordering doesn't affect hash
- **Empty Input**: Minimal input produces deterministic hash

Cross-Platform Considerations

Platform Matrix

Tests run on:

  • Windows (windows-latest): glibc, CRLF line endings
  • macOS (macos-latest): BSD libc, LF line endings
  • Linux Ubuntu (ubuntu-latest): glibc, LF line endings
  • Linux Alpine (Alpine Docker): musl libc, LF line endings
  • Linux Debian (Debian Docker): glibc, LF line endings

Common Cross-Platform Issues

Issue 1: String Sorting (musl vs glibc)

Symptom: Alpine produces different hash than Ubuntu.

Cause: musl libc has different strcoll implementation than glibc.

Solution: Always use StringComparer.Ordinal for sorting:

// ❌ Wrong - Platform-dependent sorting
items.Sort();

// ✅ Correct - Culture-invariant sorting
items = items.OrderBy(x => x, StringComparer.Ordinal).ToList();

Issue 2: Path Separators

Symptom: Windows produces different hash than macOS/Linux.

Cause: Windows uses \, Unix uses /.

Solution: Use Path.Combine or normalize:

// ❌ Wrong - Hardcoded separator
var path = "dir\\file.txt";

// ✅ Correct - Cross-platform
var path = Path.Combine("dir", "file.txt");

// ✅ Alternative - Normalize to forward slash
var normalizedPath = path.Replace('\\', '/');

Issue 3: Line Endings

Symptom: Hash includes file content with different line endings.

Cause: Windows uses CRLF (\r\n), Unix uses LF (\n).

Solution: Normalize to LF:

// ❌ Wrong - Platform line endings
var content = File.ReadAllText(path);

// ✅ Correct - Normalized to LF
var content = File.ReadAllText(path).Replace("\r\n", "\n");

Issue 4: Floating-Point Precision

Symptom: Different platforms produce slightly different floating-point values.

Cause: JIT compiler optimizations, FPU rounding modes.

Solution: Use decimal for exact arithmetic, or round explicitly:

// ❌ Wrong - Floating-point non-determinism
var value = 0.1 + 0.2;  // Might be 0.30000000000000004

// ✅ Correct - Decimal for exact values
var value = 0.1m + 0.2m;  // Always 0.3

// ✅ Alternative - Round explicitly
var value = Math.Round(0.1 + 0.2, 2);  // 0.30

Performance Guidelines

Execution Time Targets

Test Type Target Max
Single iteration <100ms <500ms
10-iteration stability <1s <3s
Golden file test <100ms <500ms
Full test suite <5s <15s

Optimization Tips

  1. Avoid unnecessary I/O: Create test data in memory
  2. Use Task.CompletedTask: For synchronous operations
  3. Minimize allocations: Reuse test data across assertions
  4. Parallel test execution: xUnit runs tests in parallel by default

Performance Regression Detection

If test execution time increases by >2x:

  1. Profile with dotnet-trace or BenchmarkDotNet
  2. Identify bottleneck (I/O, CPU, memory)
  3. Optimize or split into separate test
  4. Document performance expectations in test comments

Troubleshooting

Problem: Test Passes 9/10 Times, Fails 1/10

Cause: Non-deterministic input or race condition.

Debug Steps:

  1. Add logging to each iteration:
    _output.WriteLine($"Iteration {i}: Input={JsonSerializer.Serialize(input)}, Output={output}");
    
  2. Look for differences in input or output
  3. Check for Guid.NewGuid(), Random, DateTimeOffset.Now
  4. Check for unsynchronized parallel operations

Problem: Test Fails on Alpine but Passes Elsewhere

Cause: musl libc vs glibc difference.

Debug Steps:

  1. Run test locally with Alpine Docker:
    docker run -it --rm -v $(pwd):/app mcr.microsoft.com/dotnet/sdk:10.0-alpine sh
    cd /app
    dotnet test --filter "FullyQualifiedName~MyTest"
    
  2. Compare output with local (glibc) output
  3. Check for string sorting, culture-dependent formatting
  4. Use StringComparer.Ordinal and CultureInfo.InvariantCulture

Problem: Golden Hash Changes After .NET Upgrade

Cause: .NET runtime change in JSON serialization or hash algorithm.

Debug Steps:

  1. Compare .NET versions:
    dotnet --version  # Should be same in CI/CD
    
  2. Check JsonSerializer behavior:
    var json1 = JsonSerializer.Serialize(input, options);
    var json2 = JsonSerializer.Serialize(input, options);
    json1.Should().Be(json2);
    
  3. If intentional .NET change, follow Breaking Change Process

References

  • Test README: src/__Tests/Determinism/README.md
  • Golden File Guide: docs/implplan/archived/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md
  • ADR 0042: CGS Merkle Tree Implementation
  • ADR 0043: Fulcio Keyless Signing
  • CI/CD Workflow: .gitea/workflows/cross-platform-determinism.yml

Getting Help

  • Slack: #determinism-testing
  • Issue Label: determinism, testing
  • Priority: High (determinism bugs affect audit trails)