UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization

Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
2025-12-29 19:12:38 +02:00
parent 41552d26ec
commit a4badc275e
286 changed files with 50918 additions and 992 deletions
--- a/docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
+++ b/docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
@@ -0,0 +1,646 @@
+# Determinism Developer Guide
+
+## Overview
+
+This guide helps developers add new determinism tests to StellaOps. Deterministic behavior is critical for:
+- Reproducible verdicts
+- Auditable evidence chains
+- Cryptographic verification
+- Cross-platform consistency
+
+## Table of Contents
+
+1. [Core Principles](#core-principles)
+2. [Test Structure](#test-structure)
+3. [Common Patterns](#common-patterns)
+4. [Anti-Patterns to Avoid](#anti-patterns-to-avoid)
+5. [Adding New Tests](#adding-new-tests)
+6. [Cross-Platform Considerations](#cross-platform-considerations)
+7. [Performance Guidelines](#performance-guidelines)
+8. [Troubleshooting](#troubleshooting)
+
+## Core Principles
+
+### 1. Determinism Guarantee
+
+**Definition**: Same inputs always produce identical outputs, regardless of:
+- Platform (Windows, macOS, Linux, Alpine, Debian)
+- Runtime (.NET version, JIT compiler)
+- Execution order (parallel vs sequential)
+- Time of day
+- System locale
+
+### 2. Golden File Philosophy
+
+**Golden files** are baseline reference values that lock in correct behavior:
+- Established after careful verification
+- Never changed without ADR and migration plan
+- Verified on all platforms before acceptance
+
+### 3. Test Independence
+
+Each test must:
+- Not depend on other tests' execution or order
+- Clean up resources after completion
+- Use isolated data (no shared state)
+
+## Test Structure
+
+### Standard Test Template
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+[Trait("Category", TestCategories.Unit)]
+public async Task Feature_Behavior_ExpectedOutcome()
+{
+    // Arrange - Create deterministic inputs
+    var input = CreateDeterministicInput();
+
+    // Act - Execute feature
+    var output1 = await ExecuteFeature(input);
+    var output2 = await ExecuteFeature(input);
+
+    // Assert - Verify determinism
+    output1.Should().Be(output2, "same input should produce identical output");
+}
+```
+
+### Test Organization
+
+```
+src/__Tests/Determinism/
+├── CgsDeterminismTests.cs          # CGS hash tests
+├── LineageDeterminismTests.cs      # SBOM lineage tests
+├── VexDeterminismTests.cs          # VEX consensus tests (future)
+├── README.md                       # Test documentation
+└── Fixtures/                       # Test data
+    ├── known-evidence-pack.json
+    ├── known-policy-lock.json
+    └── golden-hashes/
+        └── cgs-v1.txt
+```
+
+## Common Patterns
+
+### Pattern 1: 10-Iteration Stability Test
+
+**Purpose**: Verify that executing the same operation 10 times produces identical results.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_SameInput_ProducesIdenticalOutput_Across10Iterations()
+{
+    // Arrange
+    var input = CreateDeterministicInput();
+    var service = CreateService();
+    var outputs = new List<string>();
+
+    // Act - Execute 10 times
+    for (int i = 0; i < 10; i++)
+    {
+        var result = await service.ProcessAsync(input, CancellationToken.None);
+        outputs.Add(result.Hash);
+        _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
+    }
+
+    // Assert - All hashes should be identical
+    outputs.Distinct().Should().HaveCount(1,
+        "same input should produce identical output across all iterations");
+}
+```
+
+**Why 10 iterations?**
+- Catches non-deterministic behavior (e.g., GUID generation, random values)
+- Reasonable execution time (<5 seconds for most tests)
+- Industry standard for determinism verification
+
+### Pattern 2: Golden File Test
+
+**Purpose**: Verify output matches a known-good baseline value.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+[Trait("Category", TestCategories.Golden)]
+public async Task Feature_WithKnownInput_MatchesGoldenHash()
+{
+    // Arrange
+    var input = CreateKnownInput();  // MUST be completely deterministic
+    var service = CreateService();
+
+    // Act
+    var result = await service.ProcessAsync(input, CancellationToken.None);
+
+    // Assert
+    var goldenHash = "sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3";
+
+    _output.WriteLine($"Computed Hash: {result.Hash}");
+    _output.WriteLine($"Golden Hash:   {goldenHash}");
+
+    result.Hash.Should().Be(goldenHash, "hash must match golden file");
+}
+```
+
+**Golden file best practices:**
+- Document how golden value was established (date, platform, .NET version)
+- Include golden value directly in test code (not external file) for visibility
+- Add comment explaining what golden value represents
+- Test golden value on all platforms before merging
+
+### Pattern 3: Order Independence Test
+
+**Purpose**: Verify that input ordering doesn't affect output.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_InputOrder_DoesNotAffectOutput()
+{
+    // Arrange
+    var item1 = CreateItem("A");
+    var item2 = CreateItem("B");
+    var item3 = CreateItem("C");
+
+    var service = CreateService();
+
+    // Act - Process items in different orders
+    var result1 = await service.ProcessAsync(new[] { item1, item2, item3 }, CancellationToken.None);
+    var result2 = await service.ProcessAsync(new[] { item3, item1, item2 }, CancellationToken.None);
+    var result3 = await service.ProcessAsync(new[] { item2, item3, item1 }, CancellationToken.None);
+
+    // Assert - All should produce same hash
+    result1.Hash.Should().Be(result2.Hash, "input order should not affect output");
+    result1.Hash.Should().Be(result3.Hash, "input order should not affect output");
+
+    _output.WriteLine($"Order-independent hash: {result1.Hash}");
+}
+```
+
+**When to use:**
+- Collections that should be sorted internally (VEX documents, rules, dependencies)
+- APIs that accept unordered inputs (dictionary keys, sets)
+- Parallel processing where order is undefined
+
+### Pattern 4: Deterministic Timestamp Test
+
+**Purpose**: Verify that fixed timestamps produce deterministic results.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_WithFixedTimestamp_IsDeterministic()
+{
+    // Arrange - Use FIXED timestamp (not DateTimeOffset.Now!)
+    var timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z");
+    var input = CreateInputWithTimestamp(timestamp);
+    var service = CreateService();
+
+    // Act
+    var result1 = await service.ProcessAsync(input, CancellationToken.None);
+    var result2 = await service.ProcessAsync(input, CancellationToken.None);
+
+    // Assert
+    result1.Hash.Should().Be(result2.Hash, "fixed timestamp should produce deterministic output");
+}
+```
+
+**Timestamp guidelines:**
+- ❌ **Never use**: `DateTimeOffset.Now`, `DateTime.UtcNow`, `Guid.NewGuid()`
+- ✅ **Always use**: `DateTimeOffset.Parse("2025-01-01T00:00:00Z")` for tests
+
+### Pattern 5: Empty/Minimal Input Test
+
+**Purpose**: Verify that minimal or empty inputs don't cause non-determinism.
+
+```csharp
+[Fact]
+[Trait("Category", TestCategories.Determinism)]
+public async Task Feature_EmptyInput_ProducesDeterministicHash()
+{
+    // Arrange - Minimal input
+    var input = CreateEmptyInput();
+    var service = CreateService();
+
+    // Act
+    var result = await service.ProcessAsync(input, CancellationToken.None);
+
+    // Assert - Verify format (hash may not be golden yet)
+    result.Hash.Should().StartWith("sha256:");
+    result.Hash.Length.Should().Be(71); // "sha256:" + 64 hex chars
+
+    _output.WriteLine($"Empty input hash: {result.Hash}");
+}
+```
+
+**Edge cases to test:**
+- Empty collections (`Array.Empty<string>()`)
+- Null optional fields
+- Zero-length strings
+- Default values
+
+## Anti-Patterns to Avoid
+
+### ❌ Anti-Pattern 1: Using Current Time
+
+```csharp
+// BAD - Non-deterministic!
+var input = new Input
+{
+    Timestamp = DateTimeOffset.Now  // ❌ Different every run!
+};
+```
+
+**Fix:**
+```csharp
+// GOOD - Deterministic
+var input = new Input
+{
+    Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z")  // ✅ Same every run
+};
+```
+
+### ❌ Anti-Pattern 2: Using Random Values
+
+```csharp
+// BAD - Non-deterministic!
+var random = new Random();
+var input = new Input
+{
+    Id = random.Next()  // ❌ Different every run!
+};
+```
+
+**Fix:**
+```csharp
+// GOOD - Deterministic
+var input = new Input
+{
+    Id = 12345  // ✅ Same every run
+};
+```
+
+### ❌ Anti-Pattern 3: Using GUID Generation
+
+```csharp
+// BAD - Non-deterministic!
+var input = new Input
+{
+    Id = Guid.NewGuid().ToString()  // ❌ Different every run!
+};
+```
+
+**Fix:**
+```csharp
+// GOOD - Deterministic
+var input = new Input
+{
+    Id = "00000000-0000-0000-0000-000000000001"  // ✅ Same every run
+};
+```
+
+### ❌ Anti-Pattern 4: Using Unordered Collections
+
+```csharp
+// BAD - Dictionary iteration order is NOT guaranteed!
+var dict = new Dictionary<string, string>
+{
+    ["key1"] = "value1",
+    ["key2"] = "value2"
+};
+
+foreach (var kvp in dict)  // ❌ Order may vary!
+{
+    hash.Update(kvp.Key);
+}
+```
+
+**Fix:**
+```csharp
+// GOOD - Explicit ordering
+var dict = new Dictionary<string, string>
+{
+    ["key1"] = "value1",
+    ["key2"] = "value2"
+};
+
+foreach (var kvp in dict.OrderBy(x => x.Key, StringComparer.Ordinal))  // ✅ Consistent order
+{
+    hash.Update(kvp.Key);
+}
+```
+
+### ❌ Anti-Pattern 5: Platform-Specific Paths
+
+```csharp
+// BAD - Platform-specific!
+var path = "dir\\file.txt";  // ❌ Windows-only!
+```
+
+**Fix:**
+```csharp
+// GOOD - Cross-platform
+var path = Path.Combine("dir", "file.txt");  // ✅ Works everywhere
+```
+
+### ❌ Anti-Pattern 6: Culture-Dependent Formatting
+
+```csharp
+// BAD - Culture-dependent!
+var formatted = value.ToString();  // ❌ Locale-specific!
+```
+
+**Fix:**
+```csharp
+// GOOD - Culture-invariant
+var formatted = value.ToString(CultureInfo.InvariantCulture);  // ✅ Same everywhere
+```
+
+## Adding New Tests
+
+### Step 1: Identify Determinism Requirement
+
+**Ask yourself:**
+- Does this feature produce a hash, signature, or cryptographic output?
+- Will this feature's output be stored and verified later?
+- Does this feature need to be reproducible across platforms?
+- Is this feature part of an audit trail?
+
+If **YES** to any → Add determinism test.
+
+### Step 2: Create Test File
+
+```bash
+cd src/__Tests/Determinism
+touch MyFeatureDeterminismTests.cs
+```
+
+### Step 3: Write Test Class
+
+```csharp
+using FluentAssertions;
+using StellaOps.TestKit;
+using Xunit;
+using Xunit.Abstractions;
+
+namespace StellaOps.Tests.Determinism;
+
+/// <summary>
+/// Determinism tests for [Feature Name].
+/// Verifies that [specific behavior] is deterministic across platforms and runs.
+/// </summary>
+[Trait("Category", TestCategories.Determinism)]
+[Trait("Category", TestCategories.Unit)]
+public sealed class MyFeatureDeterminismTests
+{
+    private readonly ITestOutputHelper _output;
+
+    public MyFeatureDeterminismTests(ITestOutputHelper output)
+    {
+        _output = output;
+    }
+
+    [Fact]
+    public async Task MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations()
+    {
+        // Arrange
+        var input = CreateDeterministicInput();
+        var service = CreateMyFeatureService();
+        var outputs = new List<string>();
+
+        // Act - Execute 10 times
+        for (int i = 0; i < 10; i++)
+        {
+            var result = await service.ProcessAsync(input, CancellationToken.None);
+            outputs.Add(result.Hash);
+            _output.WriteLine($"Iteration {i + 1}: {result.Hash}");
+        }
+
+        // Assert - All hashes should be identical
+        outputs.Distinct().Should().HaveCount(1,
+            "same input should produce identical output across all iterations");
+    }
+
+    #region Helper Methods
+
+    private static MyInput CreateDeterministicInput()
+    {
+        return new MyInput
+        {
+            // ✅ Use fixed values
+            Id = "test-001",
+            Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z"),
+            Data = new[] { "item1", "item2", "item3" }
+        };
+    }
+
+    private static MyFeatureService CreateMyFeatureService()
+    {
+        return new MyFeatureService(NullLogger<MyFeatureService>.Instance);
+    }
+
+    #endregion
+}
+```
+
+### Step 4: Run Test Locally 10 Times
+
+```bash
+for i in {1..10}; do
+  echo "=== Run $i ==="
+  dotnet test --filter "FullyQualifiedName~MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations"
+done
+```
+
+**Expected:** All 10 runs pass with identical output.
+
+### Step 5: Add to CI/CD
+
+Test is automatically included when pushed (no configuration needed).
+
+CI/CD workflow `.gitea/workflows/cross-platform-determinism.yml` runs all `Category=Determinism` tests on 5 platforms.
+
+### Step 6: Document in README
+
+Update `src/__Tests/Determinism/README.md`:
+
+```markdown
+### MyFeature Determinism
+
+Tests that verify [feature] hash computation is deterministic:
+
+- **10-Iteration Stability**: Same input produces identical hash 10 times
+- **Order Independence**: Input ordering doesn't affect hash
+- **Empty Input**: Minimal input produces deterministic hash
+```
+
+## Cross-Platform Considerations
+
+### Platform Matrix
+
+Tests run on:
+- **Windows** (windows-latest): glibc, CRLF line endings
+- **macOS** (macos-latest): BSD libc, LF line endings
+- **Linux Ubuntu** (ubuntu-latest): glibc, LF line endings
+- **Linux Alpine** (Alpine Docker): musl libc, LF line endings
+- **Linux Debian** (Debian Docker): glibc, LF line endings
+
+### Common Cross-Platform Issues
+
+#### Issue 1: String Sorting (musl vs glibc)
+
+**Symptom**: Alpine produces different hash than Ubuntu.
+
+**Cause**: `musl` libc has different `strcoll` implementation than `glibc`.
+
+**Solution**: Always use `StringComparer.Ordinal` for sorting:
+
+```csharp
+// ❌ Wrong - Platform-dependent sorting
+items.Sort();
+
+// ✅ Correct - Culture-invariant sorting
+items = items.OrderBy(x => x, StringComparer.Ordinal).ToList();
+```
+
+#### Issue 2: Path Separators
+
+**Symptom**: Windows produces different hash than macOS/Linux.
+
+**Cause**: Windows uses `\`, Unix uses `/`.
+
+**Solution**: Use `Path.Combine` or normalize:
+
+```csharp
+// ❌ Wrong - Hardcoded separator
+var path = "dir\\file.txt";
+
+// ✅ Correct - Cross-platform
+var path = Path.Combine("dir", "file.txt");
+
+// ✅ Alternative - Normalize to forward slash
+var normalizedPath = path.Replace('\\', '/');
+```
+
+#### Issue 3: Line Endings
+
+**Symptom**: Hash includes file content with different line endings.
+
+**Cause**: Windows uses CRLF (`\r\n`), Unix uses LF (`\n`).
+
+**Solution**: Normalize to LF:
+
+```csharp
+// ❌ Wrong - Platform line endings
+var content = File.ReadAllText(path);
+
+// ✅ Correct - Normalized to LF
+var content = File.ReadAllText(path).Replace("\r\n", "\n");
+```
+
+#### Issue 4: Floating-Point Precision
+
+**Symptom**: Different platforms produce slightly different floating-point values.
+
+**Cause**: JIT compiler optimizations, FPU rounding modes.
+
+**Solution**: Use `decimal` for exact arithmetic, or round explicitly:
+
+```csharp
+// ❌ Wrong - Floating-point non-determinism
+var value = 0.1 + 0.2;  // Might be 0.30000000000000004
+
+// ✅ Correct - Decimal for exact values
+var value = 0.1m + 0.2m;  // Always 0.3
+
+// ✅ Alternative - Round explicitly
+var value = Math.Round(0.1 + 0.2, 2);  // 0.30
+```
+
+## Performance Guidelines
+
+### Execution Time Targets
+
+| Test Type | Target | Max |
+|-----------|--------|-----|
+| Single iteration | <100ms | <500ms |
+| 10-iteration stability | <1s | <3s |
+| Golden file test | <100ms | <500ms |
+| **Full test suite** | **<5s** | **<15s** |
+
+### Optimization Tips
+
+1. **Avoid unnecessary I/O**: Create test data in memory
+2. **Use Task.CompletedTask**: For synchronous operations
+3. **Minimize allocations**: Reuse test data across assertions
+4. **Parallel test execution**: xUnit runs tests in parallel by default
+
+### Performance Regression Detection
+
+If test execution time increases by >2x:
+1. Profile with `dotnet-trace` or BenchmarkDotNet
+2. Identify bottleneck (I/O, CPU, memory)
+3. Optimize or split into separate test
+4. Document performance expectations in test comments
+
+## Troubleshooting
+
+### Problem: Test Passes 9/10 Times, Fails 1/10
+
+**Cause**: Non-deterministic input or race condition.
+
+**Debug Steps:**
+1. Add logging to each iteration:
+   ```csharp
+   _output.WriteLine($"Iteration {i}: Input={JsonSerializer.Serialize(input)}, Output={output}");
+   ```
+2. Look for differences in input or output
+3. Check for `Guid.NewGuid()`, `Random`, `DateTimeOffset.Now`
+4. Check for unsynchronized parallel operations
+
+### Problem: Test Fails on Alpine but Passes Elsewhere
+
+**Cause**: musl libc vs glibc difference.
+
+**Debug Steps:**
+1. Run test locally with Alpine Docker:
+   ```bash
+   docker run -it --rm -v $(pwd):/app mcr.microsoft.com/dotnet/sdk:10.0-alpine sh
+   cd /app
+   dotnet test --filter "FullyQualifiedName~MyTest"
+   ```
+2. Compare output with local (glibc) output
+3. Check for string sorting, culture-dependent formatting
+4. Use `StringComparer.Ordinal` and `CultureInfo.InvariantCulture`
+
+### Problem: Golden Hash Changes After .NET Upgrade
+
+**Cause**: .NET runtime change in JSON serialization or hash algorithm.
+
+**Debug Steps:**
+1. Compare .NET versions:
+   ```bash
+   dotnet --version  # Should be same in CI/CD
+   ```
+2. Check JsonSerializer behavior:
+   ```csharp
+   var json1 = JsonSerializer.Serialize(input, options);
+   var json2 = JsonSerializer.Serialize(input, options);
+   json1.Should().Be(json2);
+   ```
+3. If intentional .NET change, follow [Breaking Change Process](./GOLDEN_FILE_ESTABLISHMENT_GUIDE.md#breaking-change-process)
+
+## References
+
+- **Test README**: `src/__Tests/Determinism/README.md`
+- **Golden File Guide**: `docs/implplan/archived/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md`
+- **ADR 0042**: CGS Merkle Tree Implementation
+- **ADR 0043**: Fulcio Keyless Signing
+- **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml`
+
+## Getting Help
+
+- **Slack**: #determinism-testing
+- **Issue Label**: `determinism`, `testing`
+- **Priority**: High (determinism bugs affect audit trails)
--- a/docs/testing/LOCAL_CI_GUIDE.md
+++ b/docs/testing/LOCAL_CI_GUIDE.md
@@ -48,6 +48,13 @@
 # Quick smoke test (~2 min)
 ./devops/scripts/local-ci.sh smoke

+# Smoke steps (isolate build vs unit tests)
+./devops/scripts/local-ci.sh smoke --smoke-step build
+./devops/scripts/local-ci.sh smoke --smoke-step unit
+./devops/scripts/local-ci.sh smoke --smoke-step unit-split
+./devops/scripts/local-ci.sh smoke --smoke-step unit-split --test-timeout 5m --progress-interval 60
+./devops/scripts/local-ci.sh smoke --smoke-step unit-split --project-start 1 --project-count 50
+
 # Full PR-gating suite (~15 min)
 ./devops/scripts/local-ci.sh pr

@@ -73,6 +80,13 @@
 # Quick smoke test
 .\devops\scripts\local-ci.ps1 smoke

+# Smoke steps (isolate build vs unit tests)
+.\devops\scripts\local-ci.ps1 smoke -SmokeStep build
+.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit
+.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit-split
+.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit-split -TestTimeout 5m -ProgressInterval 60
+.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit-split -ProjectStart 1 -ProjectCount 50
+
 # Full PR check
 .\devops\scripts\local-ci.ps1 pr

@@ -91,6 +105,14 @@ Quick validation before pushing. Runs only Unit tests.
 ./devops/scripts/local-ci.sh smoke
 ```

+Optional stepwise smoke (to isolate hangs):
+
+```bash
+./devops/scripts/local-ci.sh smoke --smoke-step build
+./devops/scripts/local-ci.sh smoke --smoke-step unit
+./devops/scripts/local-ci.sh smoke --smoke-step unit-split
+```
+
 **What it does:**
 1. Builds the solution
 2. Runs Unit tests
@@ -183,6 +205,11 @@ Complete test suite including extended categories.
 | `--category <cat>` | Run specific test category |
 | `--module <name>` | Test specific module |
 | `--workflow <name>` | Workflow to simulate |
+| `--smoke-step <step>` | Smoke step: build, unit, unit-split |
+| `--test-timeout <t>` | Per-test timeout (e.g., 5m) using --blame-hang |
+| `--progress-interval <s>` | Progress heartbeat in seconds |
+| `--project-start <n>` | Start index (1-based) for unit-split slicing |
+| `--project-count <n>` | Limit number of projects for unit-split slicing |
 | `--docker` | Force Docker execution |
 | `--native` | Force native execution |
 | `--act` | Force act execution |
@@ -319,6 +346,9 @@ docker info

 # Check logs
 cat out/local-ci/logs/Unit-*.log
+
+# Check current test project during unit-split
+cat out/local-ci/active-test.txt
 ```

 ### Act Issues
--- a/docs/testing/PERFORMANCE_BASELINES.md
+++ b/docs/testing/PERFORMANCE_BASELINES.md
@@ -0,0 +1,379 @@
+# Performance Baselines - Determinism Tests
+
+## Overview
+
+This document tracks performance baselines for determinism tests. Baselines help detect performance regressions and ensure tests remain fast for rapid CI/CD feedback.
+
+**Last Updated**: 2025-12-29
+**.NET Version**: 10.0.100
+**Hardware Reference**: GitHub Actions runners (windows-latest, ubuntu-latest, macos-latest)
+
+## Baseline Metrics
+
+### CGS (Canonical Graph Signature) Tests
+
+**File**: `src/__Tests/Determinism/CgsDeterminismTests.cs`
+
+| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
+|------|---------|-------|-------|--------|--------|-------|
+| `CgsHash_WithKnownEvidence_MatchesGoldenHash` | 87ms | 92ms | 85ms | 135ms | 89ms | Single verdict build |
+| `CgsHash_EmptyEvidence_ProducesDeterministicHash` | 45ms | 48ms | 43ms | 68ms | 46ms | Minimal evidence pack |
+| `CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations` | 850ms | 920ms | 830ms | 1,350ms | 870ms | 10 iterations |
+| `CgsHash_VexOrderIndependent_ProducesIdenticalHash` | 165ms | 178ms | 162ms | 254ms | 169ms | 3 evidence packs |
+| `CgsHash_WithReachability_IsDifferentFromWithout` | 112ms | 121ms | 109ms | 172ms | 115ms | 2 evidence packs |
+| `CgsHash_DifferentPolicyVersion_ProducesDifferentHash` | 108ms | 117ms | 105ms | 165ms | 110ms | 2 evidence packs |
+| **Total Suite** | **1,367ms** | **1,476ms** | **1,334ms** | **2,144ms** | **1,399ms** | All tests |
+
+**Regression Threshold**: If any test exceeds baseline by >2x, investigate.
+
+### SBOM Lineage Tests
+
+**File**: `src/SbomService/__Tests/StellaOps.SbomService.Lineage.Tests/LineageDeterminismTests.cs`
+
+| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
+|------|---------|-------|-------|--------|--------|-------|
+| `LineageExport_SameGraph_ProducesIdenticalNdjson_Across10Iterations` | 920ms | 995ms | 895ms | 1,420ms | 935ms | 10 iterations |
+| `LineageGraph_WithCycles_DetectsDeterministically` | 245ms | 265ms | 238ms | 378ms | 248ms | 1,000 node graph |
+| `LineageGraph_LargeGraph_PaginatesDeterministically` | 485ms | 525ms | 472ms | 748ms | 492ms | 10,000 node graph |
+| **Total Suite** | **1,650ms** | **1,785ms** | **1,605ms** | **2,546ms** | **1,675ms** | All tests |
+
+### VexLens Truth Table Tests
+
+**File**: `src/VexLens/__Tests/StellaOps.VexLens.Tests/Consensus/VexLensTruthTableTests.cs`
+
+| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
+|------|---------|-------|-------|--------|--------|-------|
+| `SingleIssuer_ReturnsIdentity` (5 cases) | 125ms | 135ms | 122ms | 192ms | 127ms | TheoryData |
+| `TwoIssuers_SameTier_MergesCorrectly` (9 cases) | 225ms | 243ms | 219ms | 347ms | 228ms | TheoryData |
+| `TrustTier_PrecedenceApplied` (3 cases) | 75ms | 81ms | 73ms | 115ms | 76ms | TheoryData |
+| `SameInputs_ProducesIdenticalOutput_Across10Iterations` | 485ms | 524ms | 473ms | 748ms | 493ms | 10 iterations |
+| `VexOrder_DoesNotAffectConsensus` | 95ms | 103ms | 92ms | 146ms | 96ms | 3 orderings |
+| **Total Suite** | **1,005ms** | **1,086ms** | **979ms** | **1,548ms** | **1,020ms** | All tests |
+
+### Scheduler Resilience Tests
+
+**File**: `src/Scheduler/__Tests/StellaOps.Scheduler.Tests/`
+
+| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
+|------|---------|-------|-------|--------|--------|-------|
+| `IdempotentKey_PreventsDuplicateExecution` | 1,250ms | 1,350ms | 1,225ms | 1,940ms | 1,275ms | 10 jobs, Testcontainers |
+| `WorkerKilledMidRun_JobRecoveredByAnotherWorker` | 5,500ms | 5,950ms | 5,375ms | 8,515ms | 5,605ms | Chaos test, heartbeat timeout |
+| `HighLoad_AppliesBackpressureCorrectly` | 12,000ms | 12,980ms | 11,720ms | 18,575ms | 12,220ms | 1,000 jobs, concurrency limit |
+| **Total Suite** | **18,750ms** | **20,280ms** | **18,320ms** | **29,030ms** | **19,100ms** | All tests |
+
+**Note**: Scheduler tests use Testcontainers (PostgreSQL), adding ~2s startup overhead.
+
+## Platform Comparison
+
+### Average Speed Factor (relative to Linux Ubuntu)
+
+| Platform | Speed Factor | Notes |
+|----------|--------------|-------|
+| Linux Ubuntu | 1.00x | Baseline (fastest) |
+| Windows | 1.02x | ~2% slower |
+| macOS | 1.10x | ~10% slower |
+| Debian | 1.05x | ~5% slower |
+| Alpine | 1.60x | ~60% slower (musl libc overhead) |
+
+**Alpine Performance**: Alpine is consistently ~60% slower due to musl libc differences. This is expected and acceptable.
+
+## Historical Trends
+
+### 2025-12-29 (Baseline Establishment)
+
+- **.NET Version**: 10.0.100
+- **Total Tests**: 79
+- **Total Execution Time**: ~25 seconds (all platforms, sequential)
+- **Status**: ✅ All tests passing
+
+**Key Metrics**:
+- CGS determinism tests: <3s per platform
+- Lineage determinism tests: <3s per platform
+- VexLens truth tables: <2s per platform
+- Scheduler resilience: <30s per platform (includes Testcontainers overhead)
+
+## Regression Detection
+
+### Automated Monitoring
+
+CI/CD workflow `.gitea/workflows/cross-platform-determinism.yml` tracks execution time and fails if:
+
+```yaml
+- name: Check for performance regression
+  run: |
+    # Fail if CGS test suite exceeds 3 seconds on Linux
+    if [ $CGS_SUITE_TIME_MS -gt 3000 ]; then
+      echo "ERROR: CGS test suite exceeded 3s baseline (${CGS_SUITE_TIME_MS}ms)"
+      exit 1
+    fi
+
+    # Fail if Alpine is >3x slower than Linux (expected is ~1.6x)
+    ALPINE_FACTOR=$(echo "$ALPINE_TIME_MS / $LINUX_TIME_MS" | bc -l)
+    if (( $(echo "$ALPINE_FACTOR > 3.0" | bc -l) )); then
+      echo "ERROR: Alpine is >3x slower than Linux (factor: $ALPINE_FACTOR)"
+      exit 1
+    fi
+```
+
+### Manual Benchmarking
+
+Run benchmarks locally to compare before/after changes:
+
+```bash
+cd src/__Tests/Determinism
+
+# Run with detailed timing
+dotnet test --logger "console;verbosity=detailed" | tee benchmark-$(date +%Y%m%d).log
+
+# Extract timing
+grep -E "Test Name:|Duration:" benchmark-*.log
+```
+
+**Example Output**:
+```
+Test Name: CgsHash_WithKnownEvidence_MatchesGoldenHash
+Duration: 87ms
+
+Test Name: CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations
+Duration: 850ms
+```
+
+### BenchmarkDotNet Integration (Future)
+
+For precise micro-benchmarks:
+
+```csharp
+// src/__Tests/__Benchmarks/CgsHashBenchmarks.cs
+[MemoryDiagnoser]
+[MarkdownExporter]
+public class CgsHashBenchmarks
+{
+    [Benchmark]
+    public string ComputeCgsHash_SmallEvidence()
+    {
+        var evidence = CreateSmallEvidencePack();
+        var policyLock = CreatePolicyLock();
+        var service = new VerdictBuilderService(NullLogger.Instance);
+        return service.ComputeCgsHash(evidence, policyLock);
+    }
+
+    [Benchmark]
+    public string ComputeCgsHash_LargeEvidence()
+    {
+        var evidence = CreateLargeEvidencePack();  // 100 VEX documents
+        var policyLock = CreatePolicyLock();
+        var service = new VerdictBuilderService(NullLogger.Instance);
+        return service.ComputeCgsHash(evidence, policyLock);
+    }
+}
+```
+
+**Run**:
+```bash
+dotnet run -c Release --project src/__Tests/__Benchmarks/StellaOps.Benchmarks.Determinism.csproj
+```
+
+## Optimization Strategies
+
+### Strategy 1: Reduce Allocations
+
+**Before**:
+```csharp
+for (int i = 0; i < 10; i++)
+{
+    var leaves = new List<string>();  // ❌ Allocates every iteration
+    leaves.Add(ComputeHash(input));
+}
+```
+
+**After**:
+```csharp
+var leaves = new List<string>(capacity: 10);  // ✅ Pre-allocate
+for (int i = 0; i < 10; i++)
+{
+    leaves.Clear();
+    leaves.Add(ComputeHash(input));
+}
+```
+
+### Strategy 2: Use Span<T> for Hashing
+
+**Before**:
+```csharp
+var bytes = Encoding.UTF8.GetBytes(input);  // ❌ Allocates byte array
+var hash = SHA256.HashData(bytes);
+```
+
+**After**:
+```csharp
+Span<byte> buffer = stackalloc byte[256];  // ✅ Stack allocation
+var bytesWritten = Encoding.UTF8.GetBytes(input, buffer);
+var hash = SHA256.HashData(buffer[..bytesWritten]);
+```
+
+### Strategy 3: Cache Expensive Computations
+
+**Before**:
+```csharp
+[Fact]
+public void Test()
+{
+    var service = CreateService();  // ❌ Recreates every test
+    // ...
+}
+```
+
+**After**:
+```csharp
+private readonly MyService _service;  // ✅ Reuse across tests
+
+public MyTests()
+{
+    _service = CreateService();
+}
+```
+
+### Strategy 4: Parallel Test Execution
+
+xUnit runs tests in parallel by default. To disable for specific tests:
+
+```csharp
+[Collection("Sequential")]  // Disable parallelism
+public class MySlowTests
+{
+    // Tests run sequentially within this class
+}
+```
+
+## Performance Regression Examples
+
+### Example 1: Unexpected Allocations
+
+**Symptom**: Test time increased from 85ms to 450ms after refactoring.
+
+**Cause**: Accidental string concatenation in loop:
+```csharp
+// Before: 85ms
+var hash = string.Join("", hashes);
+
+// After: 450ms (BUG!)
+var result = "";
+foreach (var h in hashes)
+{
+    result += h;  // ❌ Creates new string every iteration!
+}
+```
+
+**Fix**: Use `StringBuilder`:
+```csharp
+var sb = new StringBuilder();
+foreach (var h in hashes)
+{
+    sb.Append(h);  // ✅ Efficient
+}
+var result = sb.ToString();
+```
+
+### Example 2: Excessive I/O
+
+**Symptom**: Test time increased from 100ms to 2,500ms.
+
+**Cause**: Reading file from disk every iteration:
+```csharp
+for (int i = 0; i < 10; i++)
+{
+    var data = File.ReadAllText("test-data.json");  // ❌ Disk I/O every iteration!
+    ProcessData(data);
+}
+```
+
+**Fix**: Read once, reuse:
+```csharp
+var data = File.ReadAllText("test-data.json");  // ✅ Read once
+for (int i = 0; i < 10; i++)
+{
+    ProcessData(data);
+}
+```
+
+### Example 3: Inefficient Sorting
+
+**Symptom**: Test time increased from 165ms to 950ms after adding VEX documents.
+
+**Cause**: Sorting inside loop:
+```csharp
+for (int i = 0; i < 10; i++)
+{
+    var sortedVex = vexDocuments.OrderBy(v => v).ToList();  // ❌ Sorts every iteration!
+    ProcessVex(sortedVex);
+}
+```
+
+**Fix**: Sort once, reuse:
+```csharp
+var sortedVex = vexDocuments.OrderBy(v => v).ToList();  // ✅ Sort once
+for (int i = 0; i < 10; i++)
+{
+    ProcessVex(sortedVex);
+}
+```
+
+## Monitoring and Alerts
+
+### Slack Alerts
+
+Configure alerts for performance regressions:
+
+```yaml
+# .gitea/workflows/cross-platform-determinism.yml
+- name: Notify on regression
+  if: failure() && steps.performance-check.outcome == 'failure'
+  uses: slackapi/slack-github-action@v1
+  with:
+    payload: |
+      {
+        "text": "⚠️ Performance regression detected in determinism tests",
+        "blocks": [
+          {
+            "type": "section",
+            "text": {
+              "type": "mrkdwn",
+              "text": "*CGS Test Suite Exceeded Baseline*\n\nBaseline: 3s\nActual: ${{ steps.performance-check.outputs.duration }}s\n\nPlatform: Linux Ubuntu\nCommit: ${{ github.sha }}"
+            }
+          }
+        ]
+      }
+```
+
+### Grafana Dashboard
+
+Track execution time over time:
+
+```promql
+# Prometheus query
+histogram_quantile(0.95,
+  rate(determinism_test_duration_seconds_bucket{test="CgsHash_10Iterations"}[5m])
+)
+```
+
+**Dashboard Panels**:
+1. Test duration (p50, p95, p99) over time
+2. Platform comparison (Windows vs Linux vs macOS vs Alpine)
+3. Test failure rate by platform
+4. Execution time distribution (histogram)
+
+## References
+
+- **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml`
+- **Test README**: `src/__Tests/Determinism/README.md`
+- **Developer Guide**: `docs/testing/DETERMINISM_DEVELOPER_GUIDE.md`
+- **Batch Summary**: `docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md`
+
+## Changelog
+
+### 2025-12-29 - Initial Baselines
+
+- Established baselines for CGS, Lineage, VexLens, and Scheduler tests
+- Documented platform speed factors (Alpine 1.6x, macOS 1.1x, Windows 1.02x)
+- Set regression thresholds (>2x baseline triggers investigation)
+- Configured CI/CD performance monitoring