UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization

Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
2025-12-29 19:12:38 +02:00
parent 41552d26ec
commit a4badc275e
286 changed files with 50918 additions and 992 deletions
--- a/docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
+++ b/docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
@@ -0,0 +1,646 @@
+# Determinism Developer Guide
+
+## Overview
+
+This guide helps developers add new determinism tests to StellaOps. Deterministic behavior is critical for:
+- Reproducible verdicts
+- Auditable evidence chains
+- Cryptographic verification
+- Cross-platform consistency
+
+## Table of Contents
+
+1. [Core Principles](#core-principles)
+2. [Test Structure](#test-structure)
+3. [Common Patterns](#common-patterns)
+4. [Anti-Patterns to Avoid](#anti-patterns-to-avoid)
+5. [Adding New Tests](#adding-new-tests)
+6. [Cross-Platform Considerations](#cross-platform-considerations)
+7. [Performance Guidelines](#performance-guidelines)
+8. [Troubleshooting](#troubleshooting)
+
+## Core Principles
+
+### 1. Determinism Guarantee
+
+**Definition**: Same inputs always produce identical outputs, regardless of:
+- Platform (Windows, macOS, Linux, Alpine, Debian)
+- Runtime (.NET version, JIT compiler)
+- Execution order (parallel vs sequential)
+- Time of day
+- System locale
+
+### 2. Golden File Philosophy
+
+**Golden files** are baseline reference values that lock in correct behavior:
+- Established after careful verification
+- Never changed without ADR and migration plan
+- Verified on all platforms before acceptance
+
+### 3. Test Independence
+
+Each test must:
+- Not depend on other tests' execution or order
+- Clean up resources after completion
+- Use isolated data (no shared state)
+
+## Test Structure
+
+### Standard Test Template
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+[Trait("Category", TestCategories.Unit)]
+public async Task Feature_Behavior_ExpectedOutcome()
+{
+    // Arrange - Create deterministic inputs
+    var input = CreateDeterministicInput();
+
+    // Act - Execute feature
+    var output1 = await ExecuteFeature(input);
+    var output2 = await ExecuteFeature(input);
+
+    // Assert - Verify determinism
+    output1.Should().Be(output2, "same input should produce identical output");
+}
+```
+
+### Test Organization
+
+```
+src/__Tests/Determinism/
+├── CgsDeterminismTests.cs          # CGS hash tests
+├── LineageDeterminismTests.cs      # SBOM lineage tests
+├── VexDeterminismTests.cs          # VEX consensus tests (future)
+├── README.md                       # Test documentation
+└── Fixtures/                       # Test data
+    ├── known-evidence-pack.json
+    ├── known-policy-lock.json
+    └── golden-hashes/
+        └── cgs-v1.txt
+```
+
+## Common Patterns
+
+### Pattern 1: 10-Iteration Stability Test
+
+**Purpose**: Verify that executing the same operation 10 times produces identical results.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_SameInput_ProducesIdenticalOutput_Across10Iterations()
+{
+    // Arrange
+    var input = CreateDeterministicInput();
+    var service = CreateService();
+    var outputs = new List<string>();
+
+    // Act - Execute 10 times
+    for (int i = 0; i < 10; i++)
+    {
+        var result = await service.ProcessAsync(input, CancellationToken.None);
+        outputs.Add(result.Hash);
+        _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
+    }
+
+    // Assert - All hashes should be identical
+    outputs.Distinct().Should().HaveCount(1,
+        "same input should produce identical output across all iterations");
+}
+```
+
+**Why 10 iterations?**
+- Catches non-deterministic behavior (e.g., GUID generation, random values)
+- Reasonable execution time (<5 seconds for most tests)
+- Industry standard for determinism verification
+
+### Pattern 2: Golden File Test
+
+**Purpose**: Verify output matches a known-good baseline value.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+[Trait("Category", TestCategories.Golden)]
+public async Task Feature_WithKnownInput_MatchesGoldenHash()
+{
+    // Arrange
+    var input = CreateKnownInput();  // MUST be completely deterministic
+    var service = CreateService();
+
+    // Act
+    var result = await service.ProcessAsync(input, CancellationToken.None);
+
+    // Assert
+    var goldenHash = "sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3";
+
+    _output.WriteLine($"Computed Hash: {result.Hash}");
+    _output.WriteLine($"Golden Hash:   {goldenHash}");
+
+    result.Hash.Should().Be(goldenHash, "hash must match golden file");
+}
+```
+
+**Golden file best practices:**
+- Document how golden value was established (date, platform, .NET version)
+- Include golden value directly in test code (not external file) for visibility
+- Add comment explaining what golden value represents
+- Test golden value on all platforms before merging
+
+### Pattern 3: Order Independence Test
+
+**Purpose**: Verify that input ordering doesn't affect output.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_InputOrder_DoesNotAffectOutput()
+{
+    // Arrange
+    var item1 = CreateItem("A");
+    var item2 = CreateItem("B");
+    var item3 = CreateItem("C");
+
+    var service = CreateService();
+
+    // Act - Process items in different orders
+    var result1 = await service.ProcessAsync(new[] { item1, item2, item3 }, CancellationToken.None);
+    var result2 = await service.ProcessAsync(new[] { item3, item1, item2 }, CancellationToken.None);
+    var result3 = await service.ProcessAsync(new[] { item2, item3, item1 }, CancellationToken.None);
+
+    // Assert - All should produce same hash
+    result1.Hash.Should().Be(result2.Hash, "input order should not affect output");
+    result1.Hash.Should().Be(result3.Hash, "input order should not affect output");
+
+    _output.WriteLine($"Order-independent hash: {result1.Hash}");
+}
+```
+
+**When to use:**
+- Collections that should be sorted internally (VEX documents, rules, dependencies)
+- APIs that accept unordered inputs (dictionary keys, sets)
+- Parallel processing where order is undefined
+
+### Pattern 4: Deterministic Timestamp Test
+
+**Purpose**: Verify that fixed timestamps produce deterministic results.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_WithFixedTimestamp_IsDeterministic()
+{
+    // Arrange - Use FIXED timestamp (not DateTimeOffset.Now!)
+    var timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z");
+    var input = CreateInputWithTimestamp(timestamp);
+    var service = CreateService();
+
+    // Act
+    var result1 = await service.ProcessAsync(input, CancellationToken.None);
+    var result2 = await service.ProcessAsync(input, CancellationToken.None);
+
+    // Assert
+    result1.Hash.Should().Be(result2.Hash, "fixed timestamp should produce deterministic output");
+}
+```
+
+**Timestamp guidelines:**
+- ❌ **Never use**: `DateTimeOffset.Now`, `DateTime.UtcNow`, `Guid.NewGuid()`
+- ✅ **Always use**: `DateTimeOffset.Parse("2025-01-01T00:00:00Z")` for tests
+
+### Pattern 5: Empty/Minimal Input Test
+
+**Purpose**: Verify that minimal or empty inputs don't cause non-determinism.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_EmptyInput_ProducesDeterministicHash()
+{
+    // Arrange - Minimal input
+    var input = CreateEmptyInput();
+    var service = CreateService();
+
+    // Act
+    var result = await service.ProcessAsync(input, CancellationToken.None);
+
+    // Assert - Verify format (hash may not be golden yet)
+    result.Hash.Should().StartWith("sha256:");
+    result.Hash.Length.Should().Be(71); // "sha256:" + 64 hex chars
+
+    _output.WriteLine($"Empty input hash: {result.Hash}");
+}
+```
+
+**Edge cases to test:**
+- Empty collections (`Array.Empty<string>()`)
+- Null optional fields
+- Zero-length strings
+- Default values
+
+## Anti-Patterns to Avoid
+
+### ❌ Anti-Pattern 1: Using Current Time
+
+```csharp
+// BAD - Non-deterministic!
+var input = new Input
+{
+    Timestamp = DateTimeOffset.Now  // ❌ Different every run!
+};
+```
+
+**Fix:**
+```csharp
+// GOOD - Deterministic
+var input = new Input
+{
+    Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z")  // ✅ Same every run
+};
+```
+
+### ❌ Anti-Pattern 2: Using Random Values
+
+```csharp
+// BAD - Non-deterministic!
+var random = new Random();
+var input = new Input
+{
+    Id = random.Next()  // ❌ Different every run!
+};
+```
+
+**Fix:**
+```csharp
+// GOOD - Deterministic
+var input = new Input
+{
+    Id = 12345  // ✅ Same every run
+};
+```
+
+### ❌ Anti-Pattern 3: Using GUID Generation
+
+```csharp
+// BAD - Non-deterministic!
+var input = new Input
+{
+    Id = Guid.NewGuid().ToString()  // ❌ Different every run!
+};
+```
+
+**Fix:**
+```csharp
+// GOOD - Deterministic
+var input = new Input
+{
+    Id = "00000000-0000-0000-0000-000000000001"  // ✅ Same every run
+};
+```
+
+### ❌ Anti-Pattern 4: Using Unordered Collections
+
+```csharp
+// BAD - Dictionary iteration order is NOT guaranteed!
+var dict = new Dictionary<string, string>
+{
+    ["key1"] = "value1",
+    ["key2"] = "value2"
+};
+
+foreach (var kvp in dict)  // ❌ Order may vary!
+{
+    hash.Update(kvp.Key);
+}
+```
+
+**Fix:**
+```csharp
+// GOOD - Explicit ordering
+var dict = new Dictionary<string, string>
+{
+    ["key1"] = "value1",
+    ["key2"] = "value2"
+};
+
+foreach (var kvp in dict.OrderBy(x => x.Key, StringComparer.Ordinal))  // ✅ Consistent order
+{
+    hash.Update(kvp.Key);
+}
+```
+
+### ❌ Anti-Pattern 5: Platform-Specific Paths
+
+```csharp
+// BAD - Platform-specific!
+var path = "dir\\file.txt";  // ❌ Windows-only!
+```
+
+**Fix:**
+```csharp
+// GOOD - Cross-platform
+var path = Path.Combine("dir", "file.txt");  // ✅ Works everywhere
+```
+
+### ❌ Anti-Pattern 6: Culture-Dependent Formatting
+
+```csharp
+// BAD - Culture-dependent!
+var formatted = value.ToString();  // ❌ Locale-specific!
+```
+
+**Fix:**
+```csharp
+// GOOD - Culture-invariant
+var formatted = value.ToString(CultureInfo.InvariantCulture);  // ✅ Same everywhere
+```
+
+## Adding New Tests
+
+### Step 1: Identify Determinism Requirement
+
+**Ask yourself:**
+- Does this feature produce a hash, signature, or cryptographic output?
+- Will this feature's output be stored and verified later?
+- Does this feature need to be reproducible across platforms?
+- Is this feature part of an audit trail?
+
+If **YES** to any → Add determinism test.
+
+### Step 2: Create Test File
+
+```bash
+cd src/__Tests/Determinism
+touch MyFeatureDeterminismTests.cs
+```
+
+### Step 3: Write Test Class
+
+```csharp
+using FluentAssertions;
+using StellaOps.TestKit;
+using Xunit;
+using Xunit.Abstractions;
+
+namespace StellaOps.Tests.Determinism;
+
+/// <summary>
+/// Determinism tests for [Feature Name].
+/// Verifies that [specific behavior] is deterministic across platforms and runs.
+/// </summary>
+[Trait("Category", TestCategories.Determinism)]
+[Trait("Category", TestCategories.Unit)]
+public sealed class MyFeatureDeterminismTests
+{
+    private readonly ITestOutputHelper _output;
+
+    public MyFeatureDeterminismTests(ITestOutputHelper output)
+    {
+        _output = output;
+    }
+
+    [Fact]
+    public async Task MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations()
+    {
+        // Arrange
+        var input = CreateDeterministicInput();
+        var service = CreateMyFeatureService();
+        var outputs = new List<string>();
+
+        // Act - Execute 10 times
+        for (int i = 0; i < 10; i++)
+        {
+            var result = await service.ProcessAsync(input, CancellationToken.None);
+            outputs.Add(result.Hash);
+            _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
+        }
+
+        // Assert - All hashes should be identical
+        outputs.Distinct().Should().HaveCount(1,
+            "same input should produce identical output across all iterations");
+    }
+
+    #region Helper Methods
+
+    private static MyInput CreateDeterministicInput()
+    {
+        return new MyInput
+        {
+            // ✅ Use fixed values
+            Id = "test-001",
+            Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z"),
+            Data = new[] { "item1", "item2", "item3" }
+        };
+    }
+
+    private static MyFeatureService CreateMyFeatureService()
+    {
+        return new MyFeatureService(NullLogger<MyFeatureService>.Instance);
+    }
+
+    #endregion
+}
+```
+
+### Step 4: Run Test Locally 10 Times
+
+```bash
+for i in {1..10}; do
+  echo "=== Run $i ==="
+  dotnet test --filter "FullyQualifiedName~MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations"
+done
+```
+
+**Expected:** All 10 runs pass with identical output.
+
+### Step 5: Add to CI/CD
+
+Test is automatically included when pushed (no configuration needed).
+
+CI/CD workflow `.gitea/workflows/cross-platform-determinism.yml` runs all `Category=Determinism` tests on 5 platforms.
+
+### Step 6: Document in README
+
+Update `src/__Tests/Determinism/README.md`:
+
+```markdown
+### MyFeature Determinism
+
+Tests that verify [feature] hash computation is deterministic:
+
+- **10-Iteration Stability**: Same input produces identical hash 10 times
+- **Order Independence**: Input ordering doesn't affect hash
+- **Empty Input**: Minimal input produces deterministic hash
+```
+
+## Cross-Platform Considerations
+
+### Platform Matrix
+
+Tests run on:
+- **Windows** (windows-latest): glibc, CRLF line endings
+- **macOS** (macos-latest): BSD libc, LF line endings
+- **Linux Ubuntu** (ubuntu-latest): glibc, LF line endings
+- **Linux Alpine** (Alpine Docker): musl libc, LF line endings
+- **Linux Debian** (Debian Docker): glibc, LF line endings
+
+### Common Cross-Platform Issues
+
+#### Issue 1: String Sorting (musl vs glibc)
+
+**Symptom**: Alpine produces different hash than Ubuntu.
+
+**Cause**: `musl` libc has different `strcoll` implementation than `glibc`.
+
+**Solution**: Always use `StringComparer.Ordinal` for sorting:
+
+```csharp
+// ❌ Wrong - Platform-dependent sorting
+items.Sort();
+
+// ✅ Correct - Culture-invariant sorting
+items = items.OrderBy(x => x, StringComparer.Ordinal).ToList();
+```
+
+#### Issue 2: Path Separators
+
+**Symptom**: Windows produces different hash than macOS/Linux.
+
+**Cause**: Windows uses `\`, Unix uses `/`.
+
+**Solution**: Use `Path.Combine` or normalize:
+
+```csharp
+// ❌ Wrong - Hardcoded separator
+var path = "dir\\file.txt";
+
+// ✅ Correct - Cross-platform
+var path = Path.Combine("dir", "file.txt");
+
+// ✅ Alternative - Normalize to forward slash
+var normalizedPath = path.Replace('\\', '/');
+```
+
+#### Issue 3: Line Endings
+
+**Symptom**: Hash includes file content with different line endings.
+
+**Cause**: Windows uses CRLF (`\r\n`), Unix uses LF (`\n`).
+
+**Solution**: Normalize to LF:
+
+```csharp
+// ❌ Wrong - Platform line endings
+var content = File.ReadAllText(path);
+
+// ✅ Correct - Normalized to LF
+var content = File.ReadAllText(path).Replace("\r\n", "\n");
+```
+
+#### Issue 4: Floating-Point Precision
+
+**Symptom**: Different platforms produce slightly different floating-point values.
+
+**Cause**: JIT compiler optimizations, FPU rounding modes.
+
+**Solution**: Use `decimal` for exact arithmetic, or round explicitly:
+
+```csharp
+// ❌ Wrong - Floating-point non-determinism
+var value = 0.1 + 0.2;  // Might be 0.30000000000000004
+
+// ✅ Correct - Decimal for exact values
+var value = 0.1m + 0.2m;  // Always 0.3
+
+// ✅ Alternative - Round explicitly
+var value = Math.Round(0.1 + 0.2, 2);  // 0.30
+```
+
+## Performance Guidelines
+
+### Execution Time Targets
+
+| Test Type | Target | Max |
+|-----------|--------|-----|
+| Single iteration | <100ms | <500ms |
+| 10-iteration stability | <1s | <3s |
+| Golden file test | <100ms | <500ms |
+| **Full test suite** | **<5s** | **<15s** |
+
+### Optimization Tips
+
+1. **Avoid unnecessary I/O**: Create test data in memory
+2. **Use Task.CompletedTask**: For synchronous operations
+3. **Minimize allocations**: Reuse test data across assertions
+4. **Parallel test execution**: xUnit runs tests in parallel by default
+
+### Performance Regression Detection
+
+If test execution time increases by >2x:
+1. Profile with `dotnet-trace` or BenchmarkDotNet
+2. Identify bottleneck (I/O, CPU, memory)
+3. Optimize or split into separate test
+4. Document performance expectations in test comments
+
+## Troubleshooting
+
+### Problem: Test Passes 9/10 Times, Fails 1/10
+
+**Cause**: Non-deterministic input or race condition.
+
+**Debug Steps:**
+1. Add logging to each iteration:
+   ```csharp
+   _output.WriteLine($"Iteration {i}: Input={JsonSerializer.Serialize(input)}, Output={output}");
+   ```
+2. Look for differences in input or output
+3. Check for `Guid.NewGuid()`, `Random`, `DateTimeOffset.Now`
+4. Check for unsynchronized parallel operations
+
+### Problem: Test Fails on Alpine but Passes Elsewhere
+
+**Cause**: musl libc vs glibc difference.
+
+**Debug Steps:**
+1. Run test locally with Alpine Docker:
+   ```bash
+   docker run -it --rm -v $(pwd):/app mcr.microsoft.com/dotnet/sdk:10.0-alpine sh
+   cd /app
+   dotnet test --filter "FullyQualifiedName~MyTest"
+   ```
+2. Compare output with local (glibc) output
+3. Check for string sorting, culture-dependent formatting
+4. Use `StringComparer.Ordinal` and `CultureInfo.InvariantCulture`
+
+### Problem: Golden Hash Changes After .NET Upgrade
+
+**Cause**: .NET runtime change in JSON serialization or hash algorithm.
+
+**Debug Steps:**
+1. Compare .NET versions:
+   ```bash
+   dotnet --version  # Should be same in CI/CD
+   ```
+2. Check JsonSerializer behavior:
+   ```csharp
+   var json1 = JsonSerializer.Serialize(input, options);
+   var json2 = JsonSerializer.Serialize(input, options);
+   json1.Should().Be(json2);
+   ```
+3. If intentional .NET change, follow [Breaking Change Process](./GOLDEN_FILE_ESTABLISHMENT_GUIDE.md#breaking-change-process)
+
+## References
+
+- **Test README**: `src/__Tests/Determinism/README.md`
+- **Golden File Guide**: `docs/implplan/archived/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md`
+- **ADR 0042**: CGS Merkle Tree Implementation
+- **ADR 0043**: Fulcio Keyless Signing
+- **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml`
+
+## Getting Help
+
+- **Slack**: #determinism-testing
+- **Issue Label**: `determinism`, `testing`
+- **Priority**: High (determinism bugs affect audit trails)