UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization

Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
2025-12-29 19:12:38 +02:00
parent 41552d26ec
commit a4badc275e
286 changed files with 50918 additions and 992 deletions
--- a/src/__Tests/Determinism/README.md
+++ b/src/__Tests/Determinism/README.md
@@ -0,0 +1,299 @@
+# Determinism Tests
+
+This test project verifies that StellaOps produces deterministic outputs across platforms, runs, and configurations. Deterministic behavior is critical for reproducible verdicts, auditable evidence chains, and cryptographic verification.
+
+## Test Categories
+
+### CGS (Canonical Graph Signature) Determinism
+
+Tests that verify verdict hash computation is deterministic:
+
+- **Golden File Tests**: Known evidence produces expected hash
+- **10-Iteration Stability**: Same input produces identical hash 10 times
+- **VEX Order Independence**: VEX document ordering doesn't affect hash
+- **Reachability Graph Tests**: Reachability inclusion changes hash predictably
+- **Policy Lock Tests**: Different policy versions produce different hashes
+
+### Cross-Platform Verification
+
+Tests run on multiple platforms via CI/CD:
+- Windows (glibc)
+- macOS (BSD libc)
+- Linux Ubuntu (glibc)
+- Linux Alpine (musl libc)
+- Linux Debian (glibc)
+
+## Running Tests Locally
+
+### Prerequisites
+
+- .NET 10 SDK
+- Docker (for Testcontainers, if needed)
+
+### Run All Determinism Tests
+
+```bash
+cd src/__Tests/Determinism
+dotnet test
+```
+
+### Run Specific Test Category
+
+```bash
+# Run only determinism tests
+dotnet test --filter "Category=Determinism"
+
+# Run only unit tests
+dotnet test --filter "Category=Unit"
+```
+
+### Run with Detailed Output
+
+```bash
+dotnet test --logger "console;verbosity=detailed"
+```
+
+### Run and Generate TRX Report
+
+```bash
+dotnet test --logger "trx;LogFileName=determinism-results.trx" --results-directory ./test-results
+```
+
+## Test Structure
+
+```
+CgsDeterminismTests.cs
+├── Golden File Tests
+│   ├── CgsHash_WithKnownEvidence_MatchesGoldenHash
+│   └── CgsHash_EmptyEvidence_ProducesDeterministicHash
+├── 10-Iteration Stability Tests
+│   ├── CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations
+│   ├── CgsHash_VexOrderIndependent_ProducesIdenticalHash
+│   └── CgsHash_WithReachability_IsDifferentFromWithout
+└── Policy Lock Determinism Tests
+    └── CgsHash_DifferentPolicyVersion_ProducesDifferentHash
+```
+
+## Golden File Workflow
+
+### Initial Baseline (First Time)
+
+1. Run tests locally to compute initial hash:
+   ```bash
+   dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash"
+   ```
+
+2. Observe the computed CGS hash in test output:
+   ```
+   Computed CGS: cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
+   Golden CGS:   cgs:sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
+   ```
+
+3. Verify hash matches expected value (line 59 in CgsDeterminismTests.cs)
+
+4. Uncomment golden hash assertion (line 69):
+   ```csharp
+   result.CgsHash.Should().Be(goldenHash, "CGS hash must match golden file");
+   ```
+
+5. Commit the change to lock in the golden hash
+
+### Verifying Golden Hash Stability
+
+After establishing the baseline:
+
+```bash
+# Run 10 times to verify stability
+for i in {1..10}; do
+  echo "Iteration $i"
+  dotnet test --filter "FullyQualifiedName~CgsHash_WithKnownEvidence_MatchesGoldenHash" --logger "console;verbosity=minimal"
+done
+```
+
+All iterations should pass with identical hash.
+
+### Golden Hash Changes
+
+⚠️ **BREAKING CHANGE**: If golden hash tests fail, the CGS algorithm has changed!
+
+**Impact**:
+- All historical verdicts become unverifiable
+- Stored CGS hashes no longer match recomputed values
+- Audit trails are broken
+
+**Process for Intentional Changes**:
+1. Document the reason for algorithm change in ADR
+2. Create migration guide for existing verdicts
+3. Update golden hash in test
+4. Coordinate with all deployments
+5. Plan for dual-algorithm support during transition
+
+## CI/CD Integration
+
+### Cross-Platform Workflow
+
+File: `.gitea/workflows/cross-platform-determinism.yml`
+
+**Triggers**:
+- Push to `main` branch
+- Pull requests targeting `main`
+- Manual dispatch
+
+**Platform Matrix**:
+- Windows: `windows-latest`
+- macOS: `macos-latest`
+- Linux: `ubuntu-latest`
+- Alpine: `mcr.microsoft.com/dotnet/sdk:10.0-alpine` (musl libc)
+- Debian: `mcr.microsoft.com/dotnet/sdk:10.0-bookworm-slim`
+
+**Outputs**:
+- TRX test results per platform
+- Cross-platform hash comparison report
+- Divergence detection (fails if hashes differ)
+
+### Running CI/CD Locally
+
+Using [act](https://github.com/nektos/act) to run Gitea Actions locally:
+
+```bash
+# Install act (if not already installed)
+# macOS: brew install act
+# Linux: curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash
+
+# Run cross-platform determinism workflow
+act -W .gitea/workflows/cross-platform-determinism.yml
+```
+
+## Troubleshooting
+
+### Tests Fail with "Hashes Don't Match"
+
+**Symptoms**:
+```
+Expected hashes.Distinct() to have count 1, but found 2.
+```
+
+**Cause**: Non-deterministic input or platform-specific behavior
+
+**Solutions**:
+1. Check for timestamp usage (use fixed `DateTimeOffset.Parse("2025-01-01T00:00:00Z")`)
+2. Check for dictionary ordering (use `OrderBy`)
+3. Check for GUID generation (use fixed GUIDs in tests)
+4. Check for floating-point arithmetic (use decimal for determinism)
+
+### Tests Fail on Alpine (musl libc)
+
+**Symptoms**:
+```
+Hash divergence detected: Alpine produces different hash than Ubuntu
+```
+
+**Cause**: musl libc vs glibc differences in string handling, sorting, or crypto
+
+**Solutions**:
+1. Use `StringComparer.Ordinal` for all sorting
+2. Use `Encoding.UTF8.GetBytes()` explicitly (don't rely on platform default)
+3. Use `CultureInfo.InvariantCulture` for number/date formatting
+
+### Golden Hash Test Fails After Upgrade
+
+**Symptoms**:
+```
+Expected "cgs:sha256:abc123..." but found "cgs:sha256:def456..."
+```
+
+**Cause**: .NET upgrade changed hash computation or JSON serialization
+
+**Solutions**:
+1. Verify .NET version in CI/CD matches local (should be 10.0.100)
+2. Check `CanonicalJsonOptions` configuration (line 33 in CgsDeterminismTests.cs)
+3. Review recent changes to VerdictBuilderService.cs
+
+### Flaky Tests (Intermittent Failures)
+
+**Symptoms**:
+```
+Test passes 9/10 times, fails 1/10
+```
+
+**Cause**: Race condition, timing dependency, or non-deterministic input
+
+**Solutions**:
+1. Add `Interlocked` for thread-safe counters
+2. Use `TaskCompletionSource` instead of `Task.Delay` for synchronization
+3. Remove randomness (no `Random`, `Guid.NewGuid()` in test inputs)
+4. Fix ordering of parallel operations
+
+## Adding New Determinism Tests
+
+### Step 1: Create Test Method
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task MyNewFeature_IsDeterministic()
+{
+    // Arrange
+    var evidence = CreateKnownEvidencePack();
+    var policyLock = CreateKnownPolicyLock();
+    var service = CreateVerdictBuilder();
+
+    // Act
+    var result1 = await service.BuildAsync(evidence, policyLock, CancellationToken.None);
+    var result2 = await service.BuildAsync(evidence, policyLock, CancellationToken.None);
+
+    // Assert
+    result1.CgsHash.Should().Be(result2.CgsHash, "same input should produce same hash");
+}
+```
+
+### Step 2: Run Locally 10 Times
+
+```bash
+for i in {1..10}; do
+  dotnet test --filter "FullyQualifiedName~MyNewFeature_IsDeterministic"
+done
+```
+
+### Step 3: Verify Cross-Platform
+
+Push to branch and check CI/CD results:
+- Windows ✅
+- macOS ✅
+- Linux ✅
+- Alpine ✅
+- Debian ✅
+
+### Step 4: Document Edge Cases
+
+Add comments explaining:
+- What makes this test deterministic
+- Any platform-specific considerations
+- Expected hash format/structure
+
+## Performance Baselines
+
+Typical test execution times (on CI/CD runners):
+
+| Test | Windows | macOS | Linux | Alpine | Debian |
+|------|---------|-------|-------|--------|--------|
+| Golden File Test | <100ms | <100ms | <100ms | <150ms | <100ms |
+| 10-Iteration Stability | <1s | <1s | <1s | <1.5s | <1s |
+| VEX Order Independence | <200ms | <200ms | <200ms | <300ms | <200ms |
+| **Total Suite** | **<3s** | **<3s** | **<3s** | **<4s** | **<3s** |
+
+If tests exceed these baselines by 2x, investigate performance regression.
+
+## References
+
+- **Architecture**: `docs/modules/verdict/architecture.md` (CGS section)
+- **Sprint Documentation**: `docs/implplan/archived/SPRINT_20251229_001_001_BE_cgs_infrastructure.md`
+- **Batch Summary**: `docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md`
+- **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml`
+
+## Contact
+
+For questions or issues:
+- Create issue in repository
+- Tag: `determinism`, `testing`, `cgs`
+- Priority: High (determinism bugs affect audit trails)