# Determinism Tests This test project verifies that StellaOps produces deterministic outputs across platforms, runs, and configurations. Deterministic behavior is critical for reproducible verdicts, auditable evidence chains, and cryptographic verification. ## Test Categories ### CGS (Canonical Graph Signature) Determinism Tests that verify verdict hash computation is deterministic: - **Golden File Tests**: Known evidence produces expected hash - **10-Iteration Stability**: Same input produces identical hash 10 times - **VEX Order Independence**: VEX document ordering doesn't affect hash - **Reachability Graph Tests**: Reachability inclusion changes hash predictably - **Policy Lock Tests**: Different policy versions produce different hashes ### Cross-Platform Verification Tests run on multiple platforms via CI/CD: - Windows (glibc) - macOS (BSD libc) - Linux Ubuntu (glibc) - Linux Alpine (musl libc) - Linux Debian (glibc) ## Running Tests Locally ### Prerequisites - .NET 10 SDK - Docker (for Testcontainers, if needed) ### Run All Determinism Tests ```bash cd src/__Tests/Determinism dotnet test ``` ### Run Specific Test Category ```bash # Run only determinism tests dotnet test --filter "Category=Determinism" # Run only unit tests dotnet test --filter "Category=Unit" ``` ### Run with Detailed Output ```bash dotnet test --logger "console;verbosity=detailed" ``` ### Run and Generate TRX Report ```bash dotnet test --logger "trx;LogFileName=determinism-results.trx" --results-directory ./test-results ``` ## Test Structure ``` CgsDeterminismTests.cs ├── Golden File Tests │ ├── CgsHash_WithKnownEvidence_MatchesGoldenHash │ └── CgsHash_EmptyEvidence_ProducesDeterministicHash ├── 10-Iteration Stability Tests │ ├── CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations │ ├── CgsHash_VexOrderIndependent_ProducesIdenticalHash │ └── CgsHash_WithReachability_IsDifferentFromWithout └── Policy Lock Determinism Tests └── CgsHash_DifferentPolicyVersion_ProducesDifferentHash ``` ## Golden File Workflow ### Initial Baseline (First Time) 1. Run tests locally to compute initial hash: ```bash dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash" ``` 2. Observe the computed CGS hash in test output: ``` Computed CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 Golden CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 ``` 3. Verify hash matches expected value (line 59 in CgsDeterminismTests.cs) 4. Uncomment golden hash assertion (line 69): ```csharp result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file"); ``` 5. Commit the change to lock in the golden hash ### Verifying Golden Hash Stability After establishing the baseline: ```bash # Run 10 times to verify stability for i in {1..10}; do echo "Iteration $i" dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash" --logger "console;verbosity=minimal" done ``` All iterations should pass with identical hash. ### Golden Hash Changes ⚠️ **BREAKING CHANGE**: If golden hash tests fail, the CGS algorithm has changed! **Impact**: - All historical verdicts become unverifiable - Stored CGS hashes no longer match recomputed values - Audit trails are broken **Process for Intentional Changes**: 1. Document the reason for algorithm change in ADR 2. Create migration guide for existing verdicts 3. Update golden hash in test 4. Coordinate with all deployments 5. Plan for dual-algorithm support during transition ## CI/CD Integration ### Cross-Platform Workflow File: `.gitea/workflows/cross-platform-determinism.yml` **Triggers**: - Push to `main` branch - Pull requests targeting `main` - Manual dispatch **Platform Matrix**: - Windows: `windows-latest` - macOS: `macos-latest` - Linux: `ubuntu-latest` - Alpine: `mcr.microsoft.com/dotnet/sdk:10.0-alpine` (musl libc) - Debian: `mcr.microsoft.com/dotnet/sdk:10.0-bookworm-slim` **Outputs**: - TRX test results per platform - Cross-platform hash comparison report - Divergence detection (fails if hashes differ) ### Running CI/CD Locally Using [act](https://github.com/nektos/act) to run Gitea Actions locally: ```bash # Install act (if not already installed) # macOS: brew install act # Linux: curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash # Run cross-platform determinism workflow act -W .gitea/workflows/cross-platform-determinism.yml ``` ## Troubleshooting ### Tests Fail with "Hashes Don't Match" **Symptoms**: ``` Expected hashes.Distinct() to have count 1, but found 2. ``` **Cause**: Non-deterministic input or platform-specific behavior **Solutions**: 1. Check for timestamp usage (use fixed `DateTimeOffset.Parse("2025-01-01T00:00:00Z")`) 2. Check for dictionary ordering (use `OrderBy`) 3. Check for GUID generation (use fixed GUIDs in tests) 4. Check for floating-point arithmetic (use decimal for determinism) ### Tests Fail on Alpine (musl libc) **Symptoms**: ``` Hash divergence detected: Alpine produces different hash than Ubuntu ``` **Cause**: musl libc vs glibc differences in string handling, sorting, or crypto **Solutions**: 1. Use `StringComparer.Ordinal` for all sorting 2. Use `Encoding.UTF8.GetBytes()` explicitly (don't rely on platform default) 3. Use `CultureInfo.InvariantCulture` for number/date formatting ### Golden Hash Test Fails After Upgrade **Symptoms**: ``` Expected "cgs:sha256:abc123..." but found "cgs:sha256:def456..." ``` **Cause**: .NET upgrade changed hash computation or JSON serialization **Solutions**: 1. Verify .NET version in CI/CD matches local (should be 10.0.100) 2. Check `CanonicalJsonOptions` configuration (line 33 in CgsDeterminismTests.cs) 3. Review recent changes to VerdictBuilderService.cs ### Flaky Tests (Intermittent Failures) **Symptoms**: ``` Test passes 9/10 times, fails 1/10 ``` **Cause**: Race condition, timing dependency, or non-deterministic input **Solutions**: 1. Add `Interlocked` for thread-safe counters 2. Use `TaskCompletionSource` instead of `Task.Delay` for synchronization 3. Remove randomness (no `Random`, `Guid.NewGuid()` in test inputs) 4. Fix ordering of parallel operations ## Adding New Determinism Tests ### Step 1: Create Test Method ```csharp [Fact] [Trait("Category", TestCategories.Determinism)] public async Task MyNewFeature_IsDeterministic() { // Arrange var evidence = CreateKnownEvidencePack(); var policyLock = CreateKnownPolicyLock(); var service = CreateVerdictBuilder(); // Act var result1 = await service.BuildAsync(evidence, policyLock, CancellationToken.None); var result2 = await service.BuildAsync(evidence, policyLock, CancellationToken.None); // Assert result1.CgsHash.Should().Be(result2.CgsHash, "same input should produce same hash"); } ``` ### Step 2: Run Locally 10 Times ```bash for i in {1..10}; do dotnet test --filter "FullyQualifiedName~MyNewFeature_IsDeterministic" done ``` ### Step 3: Verify Cross-Platform Push to branch and check CI/CD results: - Windows ✅ - macOS ✅ - Linux ✅ - Alpine ✅ - Debian ✅ ### Step 4: Document Edge Cases Add comments explaining: - What makes this test deterministic - Any platform-specific considerations - Expected hash format/structure ## Performance Baselines Typical test execution times (on CI/CD runners): | Test | Windows | macOS | Linux | Alpine | Debian | |------|---------|-------|-------|--------|--------| | Golden File Test | <100ms | <100ms | <100ms | <150ms | <100ms | | 10-Iteration Stability | <1s | <1s | <1s | <1.5s | <1s | | VEX Order Independence | <200ms | <200ms | <200ms | <300ms | <200ms | | **Total Suite** | **<3s** | **<3s** | **<3s** | **<4s** | **<3s** | If tests exceed these baselines by 2x, investigate performance regression. ## References - **Architecture**: `docs/modules/verdict/architecture.md` (CGS section) - **Sprint Documentation**: `docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md` - **Batch Summary**: `docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md` - **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml` ## Contact For questions or issues: - Create issue in repository - Tag: `determinism`, `testing`, `cgs` - Priority: High (determinism bugs affect audit trails)