Files
git.stella-ops.org/src/__Tests/Determinism

Determinism Tests

This test project verifies that StellaOps produces deterministic outputs across platforms, runs, and configurations. Deterministic behavior is critical for reproducible verdicts, auditable evidence chains, and cryptographic verification.

Test Categories

CGS (Canonical Graph Signature) Determinism

Tests that verify verdict hash computation is deterministic:

  • Golden File Tests: Known evidence produces expected hash
  • 10-Iteration Stability: Same input produces identical hash 10 times
  • VEX Order Independence: VEX document ordering doesn't affect hash
  • Reachability Graph Tests: Reachability inclusion changes hash predictably
  • Policy Lock Tests: Different policy versions produce different hashes

Cross-Platform Verification

Tests run on multiple platforms via CI/CD:

  • Windows (glibc)
  • macOS (BSD libc)
  • Linux Ubuntu (glibc)
  • Linux Alpine (musl libc)
  • Linux Debian (glibc)

Running Tests Locally

Prerequisites

  • .NET 10 SDK
  • Docker (for Testcontainers, if needed)

Run All Determinism Tests

cd src/__Tests/Determinism
dotnet test

Run Specific Test Category

# Run only determinism tests
dotnet test --filter "Category=Determinism"

# Run only unit tests
dotnet test --filter "Category=Unit"

Run with Detailed Output

dotnet test --logger "console;verbosity=detailed"

Run and Generate TRX Report

dotnet test --logger "trx;LogFileName=determinism-results.trx" --results-directory ./test-results

Test Structure

CgsDeterminismTests.cs
├── Golden File Tests
│   ├── CgsHash_WithKnownEvidence_MatchesGoldenHash
│   └── CgsHash_EmptyEvidence_ProducesDeterministicHash
├── 10-Iteration Stability Tests
│   ├── CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations
│   ├── CgsHash_VexOrderIndependent_ProducesIdenticalHash
│   └── CgsHash_WithReachability_IsDifferentFromWithout
└── Policy Lock Determinism Tests
    └── CgsHash_DifferentPolicyVersion_ProducesDifferentHash

Golden File Workflow

Initial Baseline (First Time)

  1. Run tests locally to compute initial hash:

    dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash"
    
  2. Observe the computed CGS hash in test output:

    Computed CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
    Golden CGS:   cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
    
  3. Verify hash matches expected value (line 59 in CgsDeterminismTests.cs)

  4. Uncomment golden hash assertion (line 69):

    result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file");
    
  5. Commit the change to lock in the golden hash

Verifying Golden Hash Stability

After establishing the baseline:

# Run 10 times to verify stability
for i in {1..10}; do
  echo "Iteration $i"
  dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash" --logger "console;verbosity=minimal"
done

All iterations should pass with identical hash.

Golden Hash Changes

⚠️ BREAKING CHANGE: If golden hash tests fail, the CGS algorithm has changed!

Impact:

  • All historical verdicts become unverifiable
  • Stored CGS hashes no longer match recomputed values
  • Audit trails are broken

Process for Intentional Changes:

  1. Document the reason for algorithm change in ADR
  2. Create migration guide for existing verdicts
  3. Update golden hash in test
  4. Coordinate with all deployments
  5. Plan for dual-algorithm support during transition

CI/CD Integration

Cross-Platform Workflow

File: .gitea/workflows/cross-platform-determinism.yml

Triggers:

  • Push to main branch
  • Pull requests targeting main
  • Manual dispatch

Platform Matrix:

  • Windows: windows-latest
  • macOS: macos-latest
  • Linux: ubuntu-latest
  • Alpine: mcr.microsoft.com/dotnet/sdk:10.0-alpine (musl libc)
  • Debian: mcr.microsoft.com/dotnet/sdk:10.0-bookworm-slim

Outputs:

  • TRX test results per platform
  • Cross-platform hash comparison report
  • Divergence detection (fails if hashes differ)

Running CI/CD Locally

Using act to run Gitea Actions locally:

# Install act (if not already installed)
# macOS: brew install act
# Linux: curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash

# Run cross-platform determinism workflow
act -W .gitea/workflows/cross-platform-determinism.yml

Troubleshooting

Tests Fail with "Hashes Don't Match"

Symptoms:

Expected hashes.Distinct() to have count 1, but found 2.

Cause: Non-deterministic input or platform-specific behavior

Solutions:

  1. Check for timestamp usage (use fixed DateTimeOffset.Parse("2025-01-01T00:00:00Z"))
  2. Check for dictionary ordering (use OrderBy)
  3. Check for GUID generation (use fixed GUIDs in tests)
  4. Check for floating-point arithmetic (use decimal for determinism)

Tests Fail on Alpine (musl libc)

Symptoms:

Hash divergence detected: Alpine produces different hash than Ubuntu

Cause: musl libc vs glibc differences in string handling, sorting, or crypto

Solutions:

  1. Use StringComparer.Ordinal for all sorting
  2. Use Encoding.UTF8.GetBytes() explicitly (don't rely on platform default)
  3. Use CultureInfo.InvariantCulture for number/date formatting

Golden Hash Test Fails After Upgrade

Symptoms:

Expected "cgs:sha256:abc123..." but found "cgs:sha256:def456..."

Cause: .NET upgrade changed hash computation or JSON serialization

Solutions:

  1. Verify .NET version in CI/CD matches local (should be 10.0.100)
  2. Check CanonicalJsonOptions configuration (line 33 in CgsDeterminismTests.cs)
  3. Review recent changes to VerdictBuilderService.cs

Flaky Tests (Intermittent Failures)

Symptoms:

Test passes 9/10 times, fails 1/10

Cause: Race condition, timing dependency, or non-deterministic input

Solutions:

  1. Add Interlocked for thread-safe counters
  2. Use TaskCompletionSource instead of Task.Delay for synchronization
  3. Remove randomness (no Random, Guid.NewGuid() in test inputs)
  4. Fix ordering of parallel operations

Adding New Determinism Tests

Step 1: Create Test Method

[Fact]
[Trait("Category", TestCategories.Determinism)]
public async Task MyNewFeature_IsDeterministic()
{
    // Arrange
    var evidence = CreateKnownEvidencePack();
    var policyLock = CreateKnownPolicyLock();
    var service = CreateVerdictBuilder();

    // Act
    var result1 = await service.BuildAsync(evidence, policyLock, CancellationToken.None);
    var result2 = await service.BuildAsync(evidence, policyLock, CancellationToken.None);

    // Assert
    result1.CgsHash.Should().Be(result2.CgsHash, "same input should produce same hash");
}

Step 2: Run Locally 10 Times

for i in {1..10}; do
  dotnet test --filter "FullyQualifiedName~MyNewFeature_IsDeterministic"
done

Step 3: Verify Cross-Platform

Push to branch and check CI/CD results:

  • Windows
  • macOS
  • Linux
  • Alpine
  • Debian

Step 4: Document Edge Cases

Add comments explaining:

  • What makes this test deterministic
  • Any platform-specific considerations
  • Expected hash format/structure

Performance Baselines

Typical test execution times (on CI/CD runners):

Test Windows macOS Linux Alpine Debian
Golden File Test <100ms <100ms <100ms <150ms <100ms
10-Iteration Stability <1s <1s <1s <1.5s <1s
VEX Order Independence <200ms <200ms <200ms <300ms <200ms
Total Suite <3s <3s <3s <4s <3s

If tests exceed these baselines by 2x, investigate performance regression.

References

  • Architecture: docs/modules/verdict/architecture.md (CGS section)
  • Sprint Documentation: docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md
  • Batch Summary: docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md
  • CI/CD Workflow: .gitea/workflows/cross-platform-determinism.yml

Contact

For questions or issues:

  • Create issue in repository
  • Tag: determinism, testing, cgs
  • Priority: High (determinism bugs affect audit trails)