Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
17 KiB
Determinism Developer Guide
Overview
This guide helps developers add new determinism tests to StellaOps. Deterministic behavior is critical for:
- Reproducible verdicts
- Auditable evidence chains
- Cryptographic verification
- Cross-platform consistency
Table of Contents
- Core Principles
- Test Structure
- Common Patterns
- Anti-Patterns to Avoid
- Adding New Tests
- Cross-Platform Considerations
- Performance Guidelines
- Troubleshooting
Core Principles
1. Determinism Guarantee
Definition: Same inputs always produce identical outputs, regardless of:
- Platform (Windows, macOS, Linux, Alpine, Debian)
- Runtime (.NET version, JIT compiler)
- Execution order (parallel vs sequential)
- Time of day
- System locale
2. Golden File Philosophy
Golden files are baseline reference values that lock in correct behavior:
- Established after careful verification
- Never changed without ADR and migration plan
- Verified on all platforms before acceptance
3. Test Independence
Each test must:
- Not depend on other tests' execution or order
- Clean up resources after completion
- Use isolated data (no shared state)
Test Structure
Standard Test Template
[Fact]
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Unit)]
public async Task Feature_Behavior_ExpectedOutcome()
{
// Arrange - Create deterministic inputs
var input = CreateDeterministicInput();
// Act - Execute feature
var output1 = await ExecuteFeature(input);
var output2 = await ExecuteFeature(input);
// Assert - Verify determinism
output1.Should().Be(output2, "same input should produce identical output");
}
Test Organization
src/__Tests/Determinism/
├── CgsDeterminismTests.cs # CGS hash tests
├── LineageDeterminismTests.cs # SBOM lineage tests
├── VexDeterminismTests.cs # VEX consensus tests (future)
├── README.md # Test documentation
└── Fixtures/ # Test data
├── known-evidence-pack.json
├── known-policy-lock.json
└── golden-hashes/
└── cgs-v1.txt
Common Patterns
Pattern 1: 10-Iteration Stability Test
Purpose: Verify that executing the same operation 10 times produces identical results.
[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_SameInput_ProducesIdenticalOutput_Across10Iterations()
{
// Arrange
var input = CreateDeterministicInput();
var service = CreateService();
var outputs = new List<string>();
// Act - Execute 10 times
for (int i = 0; i < 10; i++)
{
var result = await service.ProcessAsync(input, CancellationToken.None);
outputs.Add(result.Hash);
_output.WriteLine($"Iteration {i + 1}: {result.Hash}");
}
// Assert - All hashes should be identical
outputs.Distinct().Should().HaveCount(1,
"same input should produce identical output across all iterations");
}
Why 10 iterations?
- Catches non-deterministic behavior (e.g., GUID generation, random values)
- Reasonable execution time (<5 seconds for most tests)
- Industry standard for determinism verification
Pattern 2: Golden File Test
Purpose: Verify output matches a known-good baseline value.
[Fact]
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Golden)]
public async Task Feature_WithKnownInput_MatchesGoldenHash()
{
// Arrange
var input = CreateKnownInput(); // MUST be completely deterministic
var service = CreateService();
// Act
var result = await service.ProcessAsync(input, CancellationToken.None);
// Assert
var goldenHash = "sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3";
_output.WriteLine($"Computed Hash: {result.Hash}");
_output.WriteLine($"Golden Hash: {goldenHash}");
result.Hash.Should().Be(goldenHash, "hash must match golden file");
}
Golden file best practices:
- Document how golden value was established (date, platform, .NET version)
- Include golden value directly in test code (not external file) for visibility
- Add comment explaining what golden value represents
- Test golden value on all platforms before merging
Pattern 3: Order Independence Test
Purpose: Verify that input ordering doesn't affect output.
[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_InputOrder_DoesNotAffectOutput()
{
// Arrange
var item1 = CreateItem("A");
var item2 = CreateItem("B");
var item3 = CreateItem("C");
var service = CreateService();
// Act - Process items in different orders
var result1 = await service.ProcessAsync(new[] { item1, item2, item3 }, CancellationToken.None);
var result2 = await service.ProcessAsync(new[] { item3, item1, item2 }, CancellationToken.None);
var result3 = await service.ProcessAsync(new[] { item2, item3, item1 }, CancellationToken.None);
// Assert - All should produce same hash
result1.Hash.Should().Be(result2.Hash, "input order should not affect output");
result1.Hash.Should().Be(result3.Hash, "input order should not affect output");
_output.WriteLine($"Order-independent hash: {result1.Hash}");
}
When to use:
- Collections that should be sorted internally (VEX documents, rules, dependencies)
- APIs that accept unordered inputs (dictionary keys, sets)
- Parallel processing where order is undefined
Pattern 4: Deterministic Timestamp Test
Purpose: Verify that fixed timestamps produce deterministic results.
[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_WithFixedTimestamp_IsDeterministic()
{
// Arrange - Use FIXED timestamp (not DateTimeOffset.Now!)
var timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z");
var input = CreateInputWithTimestamp(timestamp);
var service = CreateService();
// Act
var result1 = await service.ProcessAsync(input, CancellationToken.None);
var result2 = await service.ProcessAsync(input, CancellationToken.None);
// Assert
result1.Hash.Should().Be(result2.Hash, "fixed timestamp should produce deterministic output");
}
Timestamp guidelines:
- ❌ Never use:
DateTimeOffset.Now,DateTime.UtcNow,Guid.NewGuid() - ✅ Always use:
DateTimeOffset.Parse("2025-01-01T00:00:00Z")for tests
Pattern 5: Empty/Minimal Input Test
Purpose: Verify that minimal or empty inputs don't cause non-determinism.
[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task Feature_EmptyInput_ProducesDeterministicHash()
{
// Arrange - Minimal input
var input = CreateEmptyInput();
var service = CreateService();
// Act
var result = await service.ProcessAsync(input, CancellationToken.None);
// Assert - Verify format (hash may not be golden yet)
result.Hash.Should().StartWith("sha256:");
result.Hash.Length.Should().Be(71); // "sha256:" + 64 hex chars
_output.WriteLine($"Empty input hash: {result.Hash}");
}
Edge cases to test:
- Empty collections (
Array.Empty<string>()) - Null optional fields
- Zero-length strings
- Default values
Anti-Patterns to Avoid
❌ Anti-Pattern 1: Using Current Time
// BAD - Non-deterministic!
var input = new Input
{
Timestamp = DateTimeOffset.Now // ❌ Different every run!
};
Fix:
// GOOD - Deterministic
var input = new Input
{
Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z") // ✅ Same every run
};
❌ Anti-Pattern 2: Using Random Values
// BAD - Non-deterministic!
var random = new Random();
var input = new Input
{
Id = random.Next() // ❌ Different every run!
};
Fix:
// GOOD - Deterministic
var input = new Input
{
Id = 12345 // ✅ Same every run
};
❌ Anti-Pattern 3: Using GUID Generation
// BAD - Non-deterministic!
var input = new Input
{
Id = Guid.NewGuid().ToString() // ❌ Different every run!
};
Fix:
// GOOD - Deterministic
var input = new Input
{
Id = "00000000-0000-0000-0000-000000000001" // ✅ Same every run
};
❌ Anti-Pattern 4: Using Unordered Collections
// BAD - Dictionary iteration order is NOT guaranteed!
var dict = new Dictionary<string, string>
{
["key1"] = "value1",
["key2"] = "value2"
};
foreach (var kvp in dict) // ❌ Order may vary!
{
hash.Update(kvp.Key);
}
Fix:
// GOOD - Explicit ordering
var dict = new Dictionary<string, string>
{
["key1"] = "value1",
["key2"] = "value2"
};
foreach (var kvp in dict.OrderBy(x => x.Key, StringComparer.Ordinal)) // ✅ Consistent order
{
hash.Update(kvp.Key);
}
❌ Anti-Pattern 5: Platform-Specific Paths
// BAD - Platform-specific!
var path = "dir\\file.txt"; // ❌ Windows-only!
Fix:
// GOOD - Cross-platform
var path = Path.Combine("dir", "file.txt"); // ✅ Works everywhere
❌ Anti-Pattern 6: Culture-Dependent Formatting
// BAD - Culture-dependent!
var formatted = value.ToString(); // ❌ Locale-specific!
Fix:
// GOOD - Culture-invariant
var formatted = value.ToString(CultureInfo.InvariantCulture); // ✅ Same everywhere
Adding New Tests
Step 1: Identify Determinism Requirement
Ask yourself:
- Does this feature produce a hash, signature, or cryptographic output?
- Will this feature's output be stored and verified later?
- Does this feature need to be reproducible across platforms?
- Is this feature part of an audit trail?
If YES to any → Add determinism test.
Step 2: Create Test File
cd src/__Tests/Determinism
touch MyFeatureDeterminismTests.cs
Step 3: Write Test Class
using FluentAssertions;
using StellaOps.TestKit;
using Xunit;
using Xunit.Abstractions;
namespace StellaOps.Tests.Determinism;
/// <summary>
/// Determinism tests for [Feature Name].
/// Verifies that [specific behavior] is deterministic across platforms and runs.
/// </summary>
[Trait("Category", TestCategories.Determinism)]
[Trait("Category", TestCategories.Unit)]
public sealed class MyFeatureDeterminismTests
{
private readonly ITestOutputHelper _output;
public MyFeatureDeterminismTests(ITestOutputHelper output)
{
_output = output;
}
[Fact]
public async Task MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations()
{
// Arrange
var input = CreateDeterministicInput();
var service = CreateMyFeatureService();
var outputs = new List<string>();
// Act - Execute 10 times
for (int i = 0; i < 10; i++)
{
var result = await service.ProcessAsync(input, CancellationToken.None);
outputs.Add(result.Hash);
_output.WriteLine($"Iteration {i + 1}: {result.Hash}");
}
// Assert - All hashes should be identical
outputs.Distinct().Should().HaveCount(1,
"same input should produce identical output across all iterations");
}
#region Helper Methods
private static MyInput CreateDeterministicInput()
{
return new MyInput
{
// ✅ Use fixed values
Id = "test-001",
Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z"),
Data = new[] { "item1", "item2", "item3" }
};
}
private static MyFeatureService CreateMyFeatureService()
{
return new MyFeatureService(NullLogger<MyFeatureService>.Instance);
}
#endregion
}
Step 4: Run Test Locally 10 Times
for i in {1..10}; do
echo "=== Run $i ==="
dotnet test --filter "FullyQualifiedName~MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations"
done
Expected: All 10 runs pass with identical output.
Step 5: Add to CI/CD
Test is automatically included when pushed (no configuration needed).
CI/CD workflow .gitea/workflows/cross-platform-determinism.yml runs all Category=Determinism tests on 5 platforms.
Step 6: Document in README
Update src/__Tests/Determinism/README.md:
### MyFeature Determinism
Tests that verify [feature] hash computation is deterministic:
- **10-Iteration Stability**: Same input produces identical hash 10 times
- **Order Independence**: Input ordering doesn't affect hash
- **Empty Input**: Minimal input produces deterministic hash
Cross-Platform Considerations
Platform Matrix
Tests run on:
- Windows (windows-latest): glibc, CRLF line endings
- macOS (macos-latest): BSD libc, LF line endings
- Linux Ubuntu (ubuntu-latest): glibc, LF line endings
- Linux Alpine (Alpine Docker): musl libc, LF line endings
- Linux Debian (Debian Docker): glibc, LF line endings
Common Cross-Platform Issues
Issue 1: String Sorting (musl vs glibc)
Symptom: Alpine produces different hash than Ubuntu.
Cause: musl libc has different strcoll implementation than glibc.
Solution: Always use StringComparer.Ordinal for sorting:
// ❌ Wrong - Platform-dependent sorting
items.Sort();
// ✅ Correct - Culture-invariant sorting
items = items.OrderBy(x => x, StringComparer.Ordinal).ToList();
Issue 2: Path Separators
Symptom: Windows produces different hash than macOS/Linux.
Cause: Windows uses \, Unix uses /.
Solution: Use Path.Combine or normalize:
// ❌ Wrong - Hardcoded separator
var path = "dir\\file.txt";
// ✅ Correct - Cross-platform
var path = Path.Combine("dir", "file.txt");
// ✅ Alternative - Normalize to forward slash
var normalizedPath = path.Replace('\\', '/');
Issue 3: Line Endings
Symptom: Hash includes file content with different line endings.
Cause: Windows uses CRLF (\r\n), Unix uses LF (\n).
Solution: Normalize to LF:
// ❌ Wrong - Platform line endings
var content = File.ReadAllText(path);
// ✅ Correct - Normalized to LF
var content = File.ReadAllText(path).Replace("\r\n", "\n");
Issue 4: Floating-Point Precision
Symptom: Different platforms produce slightly different floating-point values.
Cause: JIT compiler optimizations, FPU rounding modes.
Solution: Use decimal for exact arithmetic, or round explicitly:
// ❌ Wrong - Floating-point non-determinism
var value = 0.1 + 0.2; // Might be 0.30000000000000004
// ✅ Correct - Decimal for exact values
var value = 0.1m + 0.2m; // Always 0.3
// ✅ Alternative - Round explicitly
var value = Math.Round(0.1 + 0.2, 2); // 0.30
Performance Guidelines
Execution Time Targets
| Test Type | Target | Max |
|---|---|---|
| Single iteration | <100ms | <500ms |
| 10-iteration stability | <1s | <3s |
| Golden file test | <100ms | <500ms |
| Full test suite | <5s | <15s |
Optimization Tips
- Avoid unnecessary I/O: Create test data in memory
- Use Task.CompletedTask: For synchronous operations
- Minimize allocations: Reuse test data across assertions
- Parallel test execution: xUnit runs tests in parallel by default
Performance Regression Detection
If test execution time increases by >2x:
- Profile with
dotnet-traceor BenchmarkDotNet - Identify bottleneck (I/O, CPU, memory)
- Optimize or split into separate test
- Document performance expectations in test comments
Troubleshooting
Problem: Test Passes 9/10 Times, Fails 1/10
Cause: Non-deterministic input or race condition.
Debug Steps:
- Add logging to each iteration:
_output.WriteLine($"Iteration {i}: Input={JsonSerializer.Serialize(input)}, Output={output}"); - Look for differences in input or output
- Check for
Guid.NewGuid(),Random,DateTimeOffset.Now - Check for unsynchronized parallel operations
Problem: Test Fails on Alpine but Passes Elsewhere
Cause: musl libc vs glibc difference.
Debug Steps:
- Run test locally with Alpine Docker:
docker run -it --rm -v $(pwd):/app mcr.microsoft.com/dotnet/sdk:10.0-alpine sh cd /app dotnet test --filter "FullyQualifiedName~MyTest" - Compare output with local (glibc) output
- Check for string sorting, culture-dependent formatting
- Use
StringComparer.OrdinalandCultureInfo.InvariantCulture
Problem: Golden Hash Changes After .NET Upgrade
Cause: .NET runtime change in JSON serialization or hash algorithm.
Debug Steps:
- Compare .NET versions:
dotnet --version # Should be same in CI/CD - Check JsonSerializer behavior:
var json1 = JsonSerializer.Serialize(input, options); var json2 = JsonSerializer.Serialize(input, options); json1.Should().Be(json2); - If intentional .NET change, follow Breaking Change Process
References
- Test README:
src/__Tests/Determinism/README.md - Golden File Guide:
docs/implplan/archived/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md - ADR 0042: CGS Merkle Tree Implementation
- ADR 0043: Fulcio Keyless Signing
- CI/CD Workflow:
.gitea/workflows/cross-platform-determinism.yml
Getting Help
- Slack: #determinism-testing
- Issue Label:
determinism,testing - Priority: High (determinism bugs affect audit trails)