stella-ops.org/git.stella-ops.org

Fork 0

Files

master 4789027317 docs consolidation and others

2026-01-06 19:07:48 +02:00

17 KiB

Raw Blame History

Determinism Developer Guide

Overview

This guide helps developers add new determinism tests to StellaOps. Deterministic behavior is critical for:

Reproducible verdicts
Auditable evidence chains
Cryptographic verification
Cross-platform consistency

Core Principles
Test Structure
Common Patterns
Anti-Patterns to Avoid
Adding New Tests
Cross-Platform Considerations
Performance Guidelines
Troubleshooting

Core Principles

1. Determinism Guarantee

Definition: Same inputs always produce identical outputs, regardless of:

Platform (Windows, macOS, Linux, Alpine, Debian)
Runtime (.NET version, JIT compiler)
Execution order (parallel vs sequential)
Time of day
System locale

2. Golden File Philosophy

Golden files are baseline reference values that lock in correct behavior:

Established after careful verification
Never changed without ADR and migration plan
Verified on all platforms before acceptance

3. Test Independence

Each test must:

Not depend on other tests' execution or order
Clean up resources after completion
Use isolated data (no shared state)

Test Structure

Standard Test Template

[Fact]
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Unit)]
public async Task Feature_Behavior_ExpectedOutcome()
{
    // Arrange - Create deterministic inputs
    var input = CreateDeterministicInput();

    // Act - Execute feature
    var output1 = await ExecuteFeature(input);
    var output2 = await ExecuteFeature(input);

    // Assert - Verify determinism
    output1.Should().Be(output2, "same input should produce identical output");
}

Test Organization

src/__Tests/Determinism/
├── CgsDeterminismTests.cs          # CGS hash tests
├── LineageDeterminismTests.cs      # SBOM lineage tests
├── VexDeterminismTests.cs          # VEX consensus tests (future)
├── README.md                       # Test documentation
└── Fixtures/                       # Test data
    ├── known-evidence-pack.json
    ├── known-policy-lock.json
    └── golden-hashes/
        └── cgs-v1.txt

Common Patterns

Pattern 1: 10-Iteration Stability Test

Purpose: Verify that executing the same operation 10 times produces identical results.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_SameInput_ProducesIdenticalOutput_Across10Iterations()
{
    // Arrange
    var input = CreateDeterministicInput();
    var service = CreateService();
    var outputs = new List<string>();

    // Act - Execute 10 times
    for (int i = 0; i < 10; i++)
    {
        var result = await service.ProcessAsync(input, CancellationToken.None);
        outputs.Add(result.Hash);
        _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
    }

    // Assert - All hashes should be identical
    outputs.Distinct().Should().HaveCount(1,
        "same input should produce identical output across all iterations");
}

Why 10 iterations?

Catches non-deterministic behavior (e.g., GUID generation, random values)
Reasonable execution time (<5 seconds for most tests)
Industry standard for determinism verification

Pattern 2: Golden File Test

Purpose: Verify output matches a known-good baseline value.

[Fact]
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Golden)]
public async Task Feature_WithKnownInput_MatchesGoldenHash()
{
    // Arrange
    var input = CreateKnownInput();  // MUST be completely deterministic
    var service = CreateService();

    // Act
    var result = await service.ProcessAsync(input, CancellationToken.None);

    // Assert
    var goldenHash = "sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3";

    _output.WriteLine($"Computed Hash: {result.Hash}");
    _output.WriteLine($"Golden Hash:   {goldenHash}");

    result.Hash.Should().Be(goldenHash, "hash must match golden file");
}

Golden file best practices:

Document how golden value was established (date, platform, .NET version)
Include golden value directly in test code (not external file) for visibility
Add comment explaining what golden value represents
Test golden value on all platforms before merging

Pattern 3: Order Independence Test

Purpose: Verify that input ordering doesn't affect output.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_InputOrder_DoesNotAffectOutput()
{
    // Arrange
    var item1 = CreateItem("A");
    var item2 = CreateItem("B");
    var item3 = CreateItem("C");

    var service = CreateService();

    // Act - Process items in different orders
    var result1 = await service.ProcessAsync(new[] { item1, item2, item3 }, CancellationToken.None);
    var result2 = await service.ProcessAsync(new[] { item3, item1, item2 }, CancellationToken.None);
    var result3 = await service.ProcessAsync(new[] { item2, item3, item1 }, CancellationToken.None);

    // Assert - All should produce same hash
    result1.Hash.Should().Be(result2.Hash, "input order should not affect output");
    result1.Hash.Should().Be(result3.Hash, "input order should not affect output");

    _output.WriteLine($"Order-independent hash: {result1.Hash}");
}

When to use:

Collections that should be sorted internally (VEX documents, rules, dependencies)
APIs that accept unordered inputs (dictionary keys, sets)
Parallel processing where order is undefined

Pattern 4: Deterministic Timestamp Test

Purpose: Verify that fixed timestamps produce deterministic results.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_WithFixedTimestamp_IsDeterministic()
{
    // Arrange - Use FIXED timestamp (not DateTimeOffset.Now!)
    var timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z");
    var input = CreateInputWithTimestamp(timestamp);
    var service = CreateService();

    // Act
    var result1 = await service.ProcessAsync(input, CancellationToken.None);
    var result2 = await service.ProcessAsync(input, CancellationToken.None);

    // Assert
    result1.Hash.Should().Be(result2.Hash, "fixed timestamp should produce deterministic output");
}

Timestamp guidelines:

❌ Never use: DateTimeOffset.Now, DateTime.UtcNow, Guid.NewGuid()
✅ Always use: DateTimeOffset.Parse("2025-01-01T00:00:00Z") for tests

Pattern 5: Empty/Minimal Input Test

Purpose: Verify that minimal or empty inputs don't cause non-determinism.

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_EmptyInput_ProducesDeterministicHash()
{
    // Arrange - Minimal input
    var input = CreateEmptyInput();
    var service = CreateService();

    // Act
    var result = await service.ProcessAsync(input, CancellationToken.None);

    // Assert - Verify format (hash may not be golden yet)
    result.Hash.Should().StartWith("sha256:");
    result.Hash.Length.Should().Be(71); // "sha256:" + 64 hex chars

    _output.WriteLine($"Empty input hash: {result.Hash}");
}

Edge cases to test:

Empty collections (Array.Empty<string>())
Null optional fields
Zero-length strings
Default values

Anti-Patterns to Avoid

❌ Anti-Pattern 1: Using Current Time

// BAD - Non-deterministic!
var input = new Input
{
    Timestamp = DateTimeOffset.Now  // ❌ Different every run!
};

Fix:

// GOOD - Deterministic
var input = new Input
{
    Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z")  // ✅ Same every run
};

❌ Anti-Pattern 2: Using Random Values

// BAD - Non-deterministic!
var random = new Random();
var input = new Input
{
    Id = random.Next()  // ❌ Different every run!
};

Fix:

// GOOD - Deterministic
var input = new Input
{
    Id = 12345  // ✅ Same every run
};

❌ Anti-Pattern 3: Using GUID Generation

// BAD - Non-deterministic!
var input = new Input
{
    Id = Guid.NewGuid().ToString()  // ❌ Different every run!
};

Fix:

// GOOD - Deterministic
var input = new Input
{
    Id = "00000000-0000-0000-0000-000000000001"  // ✅ Same every run
};

❌ Anti-Pattern 4: Using Unordered Collections

// BAD - Dictionary iteration order is NOT guaranteed!
var dict = new Dictionary<string, string>
{
    ["key1"] = "value1",
    ["key2"] = "value2"
};

foreach (var kvp in dict)  // ❌ Order may vary!
{
    hash.Update(kvp.Key);
}

Fix:

// GOOD - Explicit ordering
var dict = new Dictionary<string, string>
{
    ["key1"] = "value1",
    ["key2"] = "value2"
};

foreach (var kvp in dict.OrderBy(x => x.Key, StringComparer.Ordinal))  // ✅ Consistent order
{
    hash.Update(kvp.Key);
}

❌ Anti-Pattern 5: Platform-Specific Paths

// BAD - Platform-specific!
var path = "dir\\file.txt";  // ❌ Windows-only!

Fix:

// GOOD - Cross-platform
var path = Path.Combine("dir", "file.txt");  // ✅ Works everywhere

❌ Anti-Pattern 6: Culture-Dependent Formatting

// BAD - Culture-dependent!
var formatted = value.ToString();  // ❌ Locale-specific!

Fix:

// GOOD - Culture-invariant
var formatted = value.ToString(CultureInfo.InvariantCulture);  // ✅ Same everywhere

Adding New Tests

Step 1: Identify Determinism Requirement

Ask yourself:

Does this feature produce a hash, signature, or cryptographic output?
Will this feature's output be stored and verified later?
Does this feature need to be reproducible across platforms?
Is this feature part of an audit trail?

If YES to any → Add determinism test.

Step 2: Create Test File

cd src/__Tests/Determinism
touch MyFeatureDeterminismTests.cs

Step 3: Write Test Class

using FluentAssertions;
using StellaOps.TestKit;
using Xunit;
using Xunit.Abstractions;

namespace StellaOps.Tests.Determinism;

/// <summary>
/// Determinism tests for [Feature Name].
/// Verifies that [specific behavior] is deterministic across platforms and runs.
/// </summary>
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Unit)]
public sealed class MyFeatureDeterminismTests
{
    private readonly ITestOutputHelper _output;

    public MyFeatureDeterminismTests(ITestOutputHelper output)
    {
        _output = output;
    }

    [Fact]
    public async Task MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations()
    {
        // Arrange
        var input = CreateDeterministicInput();
        var service = CreateMyFeatureService();
        var outputs = new List<string>();

        // Act - Execute 10 times
        for (int i = 0; i < 10; i++)
        {
            var result = await service.ProcessAsync(input, CancellationToken.None);
            outputs.Add(result.Hash);
            _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
        }

        // Assert - All hashes should be identical
        outputs.Distinct().Should().HaveCount(1,
            "same input should produce identical output across all iterations");
    }

    #region Helper Methods

    private static MyInput CreateDeterministicInput()
    {
        return new MyInput
        {
            // ✅ Use fixed values
            Id = "test-001",
            Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z"),
            Data = new[] { "item1", "item2", "item3" }
        };
    }

    private static MyFeatureService CreateMyFeatureService()
    {
        return new MyFeatureService(NullLogger<MyFeatureService>.Instance);
    }

    #endregion
}

Step 4: Run Test Locally 10 Times

for i in {1..10}; do
  echo "=== Run $i ==="
  dotnet test --filter "FullyQualifiedName~MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations"
done

Expected: All 10 runs pass with identical output.

Step 5: Add to CI/CD

Test is automatically included when pushed (no configuration needed).

CI/CD workflow .gitea/workflows/cross-platform-determinism.yml runs all Category=Determinism tests on 5 platforms.

Step 6: Document in README

Update src/__Tests/Determinism/README.md:

### MyFeature Determinism

Tests that verify [feature] hash computation is deterministic:

- **10-Iteration Stability**: Same input produces identical hash 10 times
- **Order Independence**: Input ordering doesn't affect hash
- **Empty Input**: Minimal input produces deterministic hash

Cross-Platform Considerations

Platform Matrix

Tests run on:

Windows (windows-latest): glibc, CRLF line endings
macOS (macos-latest): BSD libc, LF line endings
Linux Ubuntu (ubuntu-latest): glibc, LF line endings
Linux Alpine (Alpine Docker): musl libc, LF line endings
Linux Debian (Debian Docker): glibc, LF line endings

Common Cross-Platform Issues

Issue 1: String Sorting (musl vs glibc)

Symptom: Alpine produces different hash than Ubuntu.

Cause: musl libc has different strcoll implementation than glibc.

Solution: Always use StringComparer.Ordinal for sorting:

// ❌ Wrong - Platform-dependent sorting
items.Sort();

// ✅ Correct - Culture-invariant sorting
items = items.OrderBy(x => x, StringComparer.Ordinal).ToList();

Issue 2: Path Separators

Symptom: Windows produces different hash than macOS/Linux.

Cause: Windows uses \, Unix uses /.

Solution: Use Path.Combine or normalize:

// ❌ Wrong - Hardcoded separator
var path = "dir\\file.txt";

// ✅ Correct - Cross-platform
var path = Path.Combine("dir", "file.txt");

// ✅ Alternative - Normalize to forward slash
var normalizedPath = path.Replace('\\', '/');

Issue 3: Line Endings

Symptom: Hash includes file content with different line endings.

Cause: Windows uses CRLF (\r\n), Unix uses LF (\n).

Solution: Normalize to LF:

// ❌ Wrong - Platform line endings
var content = File.ReadAllText(path);

// ✅ Correct - Normalized to LF
var content = File.ReadAllText(path).Replace("\r\n", "\n");

Issue 4: Floating-Point Precision

Symptom: Different platforms produce slightly different floating-point values.

Cause: JIT compiler optimizations, FPU rounding modes.

Solution: Use decimal for exact arithmetic, or round explicitly:

// ❌ Wrong - Floating-point non-determinism
var value = 0.1 + 0.2;  // Might be 0.30000000000000004

// ✅ Correct - Decimal for exact values
var value = 0.1m + 0.2m;  // Always 0.3

// ✅ Alternative - Round explicitly
var value = Math.Round(0.1 + 0.2, 2);  // 0.30

Performance Guidelines

Execution Time Targets

Test Type	Target	Max
Single iteration	<100ms	<500ms
10-iteration stability	<1s	<3s
Golden file test	<100ms	<500ms
Full test suite	<5s	<15s

Optimization Tips

Avoid unnecessary I/O: Create test data in memory
Use Task.CompletedTask: For synchronous operations
Minimize allocations: Reuse test data across assertions
Parallel test execution: xUnit runs tests in parallel by default

Performance Regression Detection

If test execution time increases by >2x:

Profile with dotnet-trace or BenchmarkDotNet
Identify bottleneck (I/O, CPU, memory)
Optimize or split into separate test
Document performance expectations in test comments

Troubleshooting

Problem: Test Passes 9/10 Times, Fails 1/10

Cause: Non-deterministic input or race condition.

Debug Steps:

Add logging to each iteration:

_output.WriteLine($"Iteration {i}: Input={JsonSerializer.Serialize(input)}, Output={output}");

Look for differences in input or output
Check for Guid.NewGuid(), Random, DateTimeOffset.Now
Check for unsynchronized parallel operations

Problem: Test Fails on Alpine but Passes Elsewhere

Cause: musl libc vs glibc difference.

Debug Steps:

Run test locally with Alpine Docker:

docker run -it --rm -v $(pwd):/app mcr.microsoft.com/dotnet/sdk:10.0-alpine sh
cd /app
dotnet test --filter "FullyQualifiedName~MyTest"

Compare output with local (glibc) output
Check for string sorting, culture-dependent formatting
Use StringComparer.Ordinal and CultureInfo.InvariantCulture

Problem: Golden Hash Changes After .NET Upgrade

Cause: .NET runtime change in JSON serialization or hash algorithm.

Debug Steps:

Compare .NET versions:

dotnet --version  # Should be same in CI/CD

Check JsonSerializer behavior:

var json1 = JsonSerializer.Serialize(input, options);
var json2 = JsonSerializer.Serialize(input, options);
json1.Should().Be(json2);

If intentional .NET change, follow Breaking Change Process

References

Test README: src/__Tests/Determinism/README.md
Golden File Guide: docs/implplan/archived/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md
ADR 0042: CGS Merkle Tree Implementation
ADR 0043: Fulcio Keyless Signing
CI/CD Workflow: .gitea/workflows/cross-platform-determinism.yml

Getting Help

Slack: #determinism-testing
Issue Label: determinism, testing
Priority: High (determinism bugs affect audit trails)

17 KiB Raw Blame History

Determinism Developer Guide

Overview

Table of Contents

Core Principles

1. Determinism Guarantee

2. Golden File Philosophy

3. Test Independence

Test Structure

Standard Test Template

Test Organization

Common Patterns

Pattern 1: 10-Iteration Stability Test

Pattern 2: Golden File Test

Pattern 3: Order Independence Test

Pattern 4: Deterministic Timestamp Test

Pattern 5: Empty/Minimal Input Test

Anti-Patterns to Avoid

❌ Anti-Pattern 1: Using Current Time

❌ Anti-Pattern 2: Using Random Values

❌ Anti-Pattern 3: Using GUID Generation

❌ Anti-Pattern 4: Using Unordered Collections

❌ Anti-Pattern 5: Platform-Specific Paths

❌ Anti-Pattern 6: Culture-Dependent Formatting

Adding New Tests

Step 1: Identify Determinism Requirement

Step 2: Create Test File

Step 3: Write Test Class

Step 4: Run Test Locally 10 Times

Step 5: Add to CI/CD

Step 6: Document in README

Cross-Platform Considerations

Platform Matrix

Common Cross-Platform Issues

Issue 1: String Sorting (musl vs glibc)

Issue 2: Path Separators

Issue 3: Line Endings

Issue 4: Floating-Point Precision

Performance Guidelines

Execution Time Targets

Optimization Tips

Performance Regression Detection

Troubleshooting

Problem: Test Passes 9/10 Times, Fails 1/10

Problem: Test Fails on Alpine but Passes Elsewhere

Problem: Golden Hash Changes After .NET Upgrade

References

Getting Help

17 KiB

Raw Blame History