Determinism Tests
This test project verifies that StellaOps produces deterministic outputs across platforms, runs, and configurations. Deterministic behavior is critical for reproducible verdicts, auditable evidence chains, and cryptographic verification.
Test Categories
CGS (Canonical Graph Signature) Determinism
Tests that verify verdict hash computation is deterministic:
- Golden File Tests: Known evidence produces expected hash
- 10-Iteration Stability: Same input produces identical hash 10 times
- VEX Order Independence: VEX document ordering doesn't affect hash
- Reachability Graph Tests: Reachability inclusion changes hash predictably
- Policy Lock Tests: Different policy versions produce different hashes
Cross-Platform Verification
Tests run on multiple platforms via CI/CD:
- Windows (glibc)
- macOS (BSD libc)
- Linux Ubuntu (glibc)
- Linux Alpine (musl libc)
- Linux Debian (glibc)
Running Tests Locally
Prerequisites
- .NET 10 SDK
- Docker (for Testcontainers, if needed)
Run All Determinism Tests
cd src/__Tests/Determinism
dotnet test
Run Specific Test Category
# Run only determinism tests
dotnet test --filter "Category=Determinism"
# Run only unit tests
dotnet test --filter "Category=Unit"
Run with Detailed Output
dotnet test --logger "console;verbosity=detailed"
Run and Generate TRX Report
dotnet test --logger "trx;LogFileName=determinism-results.trx" --results-directory ./test-results
Test Structure
CgsDeterminismTests.cs
├── Golden File Tests
│ ├── CgsHash_WithKnownEvidence_MatchesGoldenHash
│ └── CgsHash_EmptyEvidence_ProducesDeterministicHash
├── 10-Iteration Stability Tests
│ ├── CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations
│ ├── CgsHash_VexOrderIndependent_ProducesIdenticalHash
│ └── CgsHash_WithReachability_IsDifferentFromWithout
└── Policy Lock Determinism Tests
└── CgsHash_DifferentPolicyVersion_ProducesDifferentHash
Golden File Workflow
Initial Baseline (First Time)
-
Run tests locally to compute initial hash:
dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash" -
Observe the computed CGS hash in test output:
Computed CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 Golden CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 -
Verify hash matches expected value (line 59 in CgsDeterminismTests.cs)
-
Uncomment golden hash assertion (line 69):
result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file"); -
Commit the change to lock in the golden hash
Verifying Golden Hash Stability
After establishing the baseline:
# Run 10 times to verify stability
for i in {1..10}; do
echo "Iteration $i"
dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash" --logger "console;verbosity=minimal"
done
All iterations should pass with identical hash.
Golden Hash Changes
⚠️ BREAKING CHANGE: If golden hash tests fail, the CGS algorithm has changed!
Impact:
- All historical verdicts become unverifiable
- Stored CGS hashes no longer match recomputed values
- Audit trails are broken
Process for Intentional Changes:
- Document the reason for algorithm change in ADR
- Create migration guide for existing verdicts
- Update golden hash in test
- Coordinate with all deployments
- Plan for dual-algorithm support during transition
CI/CD Integration
Cross-Platform Workflow
File: .gitea/workflows/cross-platform-determinism.yml
Triggers:
- Push to
mainbranch - Pull requests targeting
main - Manual dispatch
Platform Matrix:
- Windows:
windows-latest - macOS:
macos-latest - Linux:
ubuntu-latest - Alpine:
mcr.microsoft.com/dotnet/sdk:10.0-alpine(musl libc) - Debian:
mcr.microsoft.com/dotnet/sdk:10.0-bookworm-slim
Outputs:
- TRX test results per platform
- Cross-platform hash comparison report
- Divergence detection (fails if hashes differ)
Running CI/CD Locally
Using act to run Gitea Actions locally:
# Install act (if not already installed)
# macOS: brew install act
# Linux: curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash
# Run cross-platform determinism workflow
act -W .gitea/workflows/cross-platform-determinism.yml
Troubleshooting
Tests Fail with "Hashes Don't Match"
Symptoms:
Expected hashes.Distinct() to have count 1, but found 2.
Cause: Non-deterministic input or platform-specific behavior
Solutions:
- Check for timestamp usage (use fixed
DateTimeOffset.Parse("2025-01-01T00:00:00Z")) - Check for dictionary ordering (use
OrderBy) - Check for GUID generation (use fixed GUIDs in tests)
- Check for floating-point arithmetic (use decimal for determinism)
Tests Fail on Alpine (musl libc)
Symptoms:
Hash divergence detected: Alpine produces different hash than Ubuntu
Cause: musl libc vs glibc differences in string handling, sorting, or crypto
Solutions:
- Use
StringComparer.Ordinalfor all sorting - Use
Encoding.UTF8.GetBytes()explicitly (don't rely on platform default) - Use
CultureInfo.InvariantCulturefor number/date formatting
Golden Hash Test Fails After Upgrade
Symptoms:
Expected "cgs:sha256:abc123..." but found "cgs:sha256:def456..."
Cause: .NET upgrade changed hash computation or JSON serialization
Solutions:
- Verify .NET version in CI/CD matches local (should be 10.0.100)
- Check
CanonicalJsonOptionsconfiguration (line 33 in CgsDeterminismTests.cs) - Review recent changes to VerdictBuilderService.cs
Flaky Tests (Intermittent Failures)
Symptoms:
Test passes 9/10 times, fails 1/10
Cause: Race condition, timing dependency, or non-deterministic input
Solutions:
- Add
Interlockedfor thread-safe counters - Use
TaskCompletionSourceinstead ofTask.Delayfor synchronization - Remove randomness (no
Random,Guid.NewGuid()in test inputs) - Fix ordering of parallel operations
Adding New Determinism Tests
Step 1: Create Test Method
[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task MyNewFeature_IsDeterministic()
{
// Arrange
var evidence = CreateKnownEvidencePack();
var policyLock = CreateKnownPolicyLock();
var service = CreateVerdictBuilder();
// Act
var result1 = await service.BuildAsync(evidence, policyLock, CancellationToken.None);
var result2 = await service.BuildAsync(evidence, policyLock, CancellationToken.None);
// Assert
result1.CgsHash.Should().Be(result2.CgsHash, "same input should produce same hash");
}
Step 2: Run Locally 10 Times
for i in {1..10}; do
dotnet test --filter "FullyQualifiedName~MyNewFeature_IsDeterministic"
done
Step 3: Verify Cross-Platform
Push to branch and check CI/CD results:
- Windows ✅
- macOS ✅
- Linux ✅
- Alpine ✅
- Debian ✅
Step 4: Document Edge Cases
Add comments explaining:
- What makes this test deterministic
- Any platform-specific considerations
- Expected hash format/structure
Performance Baselines
Typical test execution times (on CI/CD runners):
| Test | Windows | macOS | Linux | Alpine | Debian |
|---|---|---|---|---|---|
| Golden File Test | <100ms | <100ms | <100ms | <150ms | <100ms |
| 10-Iteration Stability | <1s | <1s | <1s | <1.5s | <1s |
| VEX Order Independence | <200ms | <200ms | <200ms | <300ms | <200ms |
| Total Suite | <3s | <3s | <3s | <4s | <3s |
If tests exceed these baselines by 2x, investigate performance regression.
References
- Architecture:
docs/modules/verdict/architecture.md(CGS section) - Sprint Documentation:
docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md - Batch Summary:
docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md - CI/CD Workflow:
.gitea/workflows/cross-platform-determinism.yml
Contact
For questions or issues:
- Create issue in repository
- Tag:
determinism,testing,cgs - Priority: High (determinism bugs affect audit trails)