UI work to fill SBOM sourcing management gap. UI planning remaining functionality exposure. Work on CI/Tests stabilization
Introduces CGS determinism test runs to CI workflows for Windows, macOS, Linux, Alpine, and Debian, fulfilling CGS-008 cross-platform requirements. Updates local-ci scripts to support new smoke steps, test timeouts, progress intervals, and project slicing for improved test isolation and diagnostics.
This commit is contained in:
646
docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
Normal file
646
docs/testing/DETERMINISM_DEVELOPER_GUIDE.md
Normal file
@@ -0,0 +1,646 @@
|
||||
# Determinism Developer Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide helps developers add new determinism tests to StellaOps. Deterministic behavior is critical for:
|
||||
- Reproducible verdicts
|
||||
- Auditable evidence chains
|
||||
- Cryptographic verification
|
||||
- Cross-platform consistency
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Core Principles](#core-principles)
|
||||
2. [Test Structure](#test-structure)
|
||||
3. [Common Patterns](#common-patterns)
|
||||
4. [Anti-Patterns to Avoid](#anti-patterns-to-avoid)
|
||||
5. [Adding New Tests](#adding-new-tests)
|
||||
6. [Cross-Platform Considerations](#cross-platform-considerations)
|
||||
7. [Performance Guidelines](#performance-guidelines)
|
||||
8. [Troubleshooting](#troubleshooting)
|
||||
|
||||
## Core Principles
|
||||
|
||||
### 1. Determinism Guarantee
|
||||
|
||||
**Definition**: Same inputs always produce identical outputs, regardless of:
|
||||
- Platform (Windows, macOS, Linux, Alpine, Debian)
|
||||
- Runtime (.NET version, JIT compiler)
|
||||
- Execution order (parallel vs sequential)
|
||||
- Time of day
|
||||
- System locale
|
||||
|
||||
### 2. Golden File Philosophy
|
||||
|
||||
**Golden files** are baseline reference values that lock in correct behavior:
|
||||
- Established after careful verification
|
||||
- Never changed without ADR and migration plan
|
||||
- Verified on all platforms before acceptance
|
||||
|
||||
### 3. Test Independence
|
||||
|
||||
Each test must:
|
||||
- Not depend on other tests' execution or order
|
||||
- Clean up resources after completion
|
||||
- Use isolated data (no shared state)
|
||||
|
||||
## Test Structure
|
||||
|
||||
### Standard Test Template
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
[Trait("Category", TestCategories.Unit)]
|
||||
public async Task Feature_Behavior_ExpectedOutcome()
|
||||
{
|
||||
// Arrange - Create deterministic inputs
|
||||
var input = CreateDeterministicInput();
|
||||
|
||||
// Act - Execute feature
|
||||
var output1 = await ExecuteFeature(input);
|
||||
var output2 = await ExecuteFeature(input);
|
||||
|
||||
// Assert - Verify determinism
|
||||
output1.Should().Be(output2, "same input should produce identical output");
|
||||
}
|
||||
```
|
||||
|
||||
### Test Organization
|
||||
|
||||
```
|
||||
src/__Tests/Determinism/
|
||||
├── CgsDeterminismTests.cs # CGS hash tests
|
||||
├── LineageDeterminismTests.cs # SBOM lineage tests
|
||||
├── VexDeterminismTests.cs # VEX consensus tests (future)
|
||||
├── README.md # Test documentation
|
||||
└── Fixtures/ # Test data
|
||||
├── known-evidence-pack.json
|
||||
├── known-policy-lock.json
|
||||
└── golden-hashes/
|
||||
└── cgs-v1.txt
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: 10-Iteration Stability Test
|
||||
|
||||
**Purpose**: Verify that executing the same operation 10 times produces identical results.
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
public async Task Feature_SameInput_ProducesIdenticalOutput_Across10Iterations()
|
||||
{
|
||||
// Arrange
|
||||
var input = CreateDeterministicInput();
|
||||
var service = CreateService();
|
||||
var outputs = new List<string>();
|
||||
|
||||
// Act - Execute 10 times
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
var result = await service.ProcessAsync(input, CancellationToken.None);
|
||||
outputs.Add(result.Hash);
|
||||
_output.WriteLine($"Iteration {i + 1}: {result.Hash}");
|
||||
}
|
||||
|
||||
// Assert - All hashes should be identical
|
||||
outputs.Distinct().Should().HaveCount(1,
|
||||
"same input should produce identical output across all iterations");
|
||||
}
|
||||
```
|
||||
|
||||
**Why 10 iterations?**
|
||||
- Catches non-deterministic behavior (e.g., GUID generation, random values)
|
||||
- Reasonable execution time (<5 seconds for most tests)
|
||||
- Industry standard for determinism verification
|
||||
|
||||
### Pattern 2: Golden File Test
|
||||
|
||||
**Purpose**: Verify output matches a known-good baseline value.
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
[Trait("Category", TestCategories.Golden)]
|
||||
public async Task Feature_WithKnownInput_MatchesGoldenHash()
|
||||
{
|
||||
// Arrange
|
||||
var input = CreateKnownInput(); // MUST be completely deterministic
|
||||
var service = CreateService();
|
||||
|
||||
// Act
|
||||
var result = await service.ProcessAsync(input, CancellationToken.None);
|
||||
|
||||
// Assert
|
||||
var goldenHash = "sha256:d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3";
|
||||
|
||||
_output.WriteLine($"Computed Hash: {result.Hash}");
|
||||
_output.WriteLine($"Golden Hash: {goldenHash}");
|
||||
|
||||
result.Hash.Should().Be(goldenHash, "hash must match golden file");
|
||||
}
|
||||
```
|
||||
|
||||
**Golden file best practices:**
|
||||
- Document how golden value was established (date, platform, .NET version)
|
||||
- Include golden value directly in test code (not external file) for visibility
|
||||
- Add comment explaining what golden value represents
|
||||
- Test golden value on all platforms before merging
|
||||
|
||||
### Pattern 3: Order Independence Test
|
||||
|
||||
**Purpose**: Verify that input ordering doesn't affect output.
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
public async Task Feature_InputOrder_DoesNotAffectOutput()
|
||||
{
|
||||
// Arrange
|
||||
var item1 = CreateItem("A");
|
||||
var item2 = CreateItem("B");
|
||||
var item3 = CreateItem("C");
|
||||
|
||||
var service = CreateService();
|
||||
|
||||
// Act - Process items in different orders
|
||||
var result1 = await service.ProcessAsync(new[] { item1, item2, item3 }, CancellationToken.None);
|
||||
var result2 = await service.ProcessAsync(new[] { item3, item1, item2 }, CancellationToken.None);
|
||||
var result3 = await service.ProcessAsync(new[] { item2, item3, item1 }, CancellationToken.None);
|
||||
|
||||
// Assert - All should produce same hash
|
||||
result1.Hash.Should().Be(result2.Hash, "input order should not affect output");
|
||||
result1.Hash.Should().Be(result3.Hash, "input order should not affect output");
|
||||
|
||||
_output.WriteLine($"Order-independent hash: {result1.Hash}");
|
||||
}
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Collections that should be sorted internally (VEX documents, rules, dependencies)
|
||||
- APIs that accept unordered inputs (dictionary keys, sets)
|
||||
- Parallel processing where order is undefined
|
||||
|
||||
### Pattern 4: Deterministic Timestamp Test
|
||||
|
||||
**Purpose**: Verify that fixed timestamps produce deterministic results.
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
public async Task Feature_WithFixedTimestamp_IsDeterministic()
|
||||
{
|
||||
// Arrange - Use FIXED timestamp (not DateTimeOffset.Now!)
|
||||
var timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z");
|
||||
var input = CreateInputWithTimestamp(timestamp);
|
||||
var service = CreateService();
|
||||
|
||||
// Act
|
||||
var result1 = await service.ProcessAsync(input, CancellationToken.None);
|
||||
var result2 = await service.ProcessAsync(input, CancellationToken.None);
|
||||
|
||||
// Assert
|
||||
result1.Hash.Should().Be(result2.Hash, "fixed timestamp should produce deterministic output");
|
||||
}
|
||||
```
|
||||
|
||||
**Timestamp guidelines:**
|
||||
- ❌ **Never use**: `DateTimeOffset.Now`, `DateTime.UtcNow`, `Guid.NewGuid()`
|
||||
- ✅ **Always use**: `DateTimeOffset.Parse("2025-01-01T00:00:00Z")` for tests
|
||||
|
||||
### Pattern 5: Empty/Minimal Input Test
|
||||
|
||||
**Purpose**: Verify that minimal or empty inputs don't cause non-determinism.
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
public async Task Feature_EmptyInput_ProducesDeterministicHash()
|
||||
{
|
||||
// Arrange - Minimal input
|
||||
var input = CreateEmptyInput();
|
||||
var service = CreateService();
|
||||
|
||||
// Act
|
||||
var result = await service.ProcessAsync(input, CancellationToken.None);
|
||||
|
||||
// Assert - Verify format (hash may not be golden yet)
|
||||
result.Hash.Should().StartWith("sha256:");
|
||||
result.Hash.Length.Should().Be(71); // "sha256:" + 64 hex chars
|
||||
|
||||
_output.WriteLine($"Empty input hash: {result.Hash}");
|
||||
}
|
||||
```
|
||||
|
||||
**Edge cases to test:**
|
||||
- Empty collections (`Array.Empty<string>()`)
|
||||
- Null optional fields
|
||||
- Zero-length strings
|
||||
- Default values
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### ❌ Anti-Pattern 1: Using Current Time
|
||||
|
||||
```csharp
|
||||
// BAD - Non-deterministic!
|
||||
var input = new Input
|
||||
{
|
||||
Timestamp = DateTimeOffset.Now // ❌ Different every run!
|
||||
};
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```csharp
|
||||
// GOOD - Deterministic
|
||||
var input = new Input
|
||||
{
|
||||
Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z") // ✅ Same every run
|
||||
};
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 2: Using Random Values
|
||||
|
||||
```csharp
|
||||
// BAD - Non-deterministic!
|
||||
var random = new Random();
|
||||
var input = new Input
|
||||
{
|
||||
Id = random.Next() // ❌ Different every run!
|
||||
};
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```csharp
|
||||
// GOOD - Deterministic
|
||||
var input = new Input
|
||||
{
|
||||
Id = 12345 // ✅ Same every run
|
||||
};
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 3: Using GUID Generation
|
||||
|
||||
```csharp
|
||||
// BAD - Non-deterministic!
|
||||
var input = new Input
|
||||
{
|
||||
Id = Guid.NewGuid().ToString() // ❌ Different every run!
|
||||
};
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```csharp
|
||||
// GOOD - Deterministic
|
||||
var input = new Input
|
||||
{
|
||||
Id = "00000000-0000-0000-0000-000000000001" // ✅ Same every run
|
||||
};
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 4: Using Unordered Collections
|
||||
|
||||
```csharp
|
||||
// BAD - Dictionary iteration order is NOT guaranteed!
|
||||
var dict = new Dictionary<string, string>
|
||||
{
|
||||
["key1"] = "value1",
|
||||
["key2"] = "value2"
|
||||
};
|
||||
|
||||
foreach (var kvp in dict) // ❌ Order may vary!
|
||||
{
|
||||
hash.Update(kvp.Key);
|
||||
}
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```csharp
|
||||
// GOOD - Explicit ordering
|
||||
var dict = new Dictionary<string, string>
|
||||
{
|
||||
["key1"] = "value1",
|
||||
["key2"] = "value2"
|
||||
};
|
||||
|
||||
foreach (var kvp in dict.OrderBy(x => x.Key, StringComparer.Ordinal)) // ✅ Consistent order
|
||||
{
|
||||
hash.Update(kvp.Key);
|
||||
}
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 5: Platform-Specific Paths
|
||||
|
||||
```csharp
|
||||
// BAD - Platform-specific!
|
||||
var path = "dir\\file.txt"; // ❌ Windows-only!
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```csharp
|
||||
// GOOD - Cross-platform
|
||||
var path = Path.Combine("dir", "file.txt"); // ✅ Works everywhere
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 6: Culture-Dependent Formatting
|
||||
|
||||
```csharp
|
||||
// BAD - Culture-dependent!
|
||||
var formatted = value.ToString(); // ❌ Locale-specific!
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```csharp
|
||||
// GOOD - Culture-invariant
|
||||
var formatted = value.ToString(CultureInfo.InvariantCulture); // ✅ Same everywhere
|
||||
```
|
||||
|
||||
## Adding New Tests
|
||||
|
||||
### Step 1: Identify Determinism Requirement
|
||||
|
||||
**Ask yourself:**
|
||||
- Does this feature produce a hash, signature, or cryptographic output?
|
||||
- Will this feature's output be stored and verified later?
|
||||
- Does this feature need to be reproducible across platforms?
|
||||
- Is this feature part of an audit trail?
|
||||
|
||||
If **YES** to any → Add determinism test.
|
||||
|
||||
### Step 2: Create Test File
|
||||
|
||||
```bash
|
||||
cd src/__Tests/Determinism
|
||||
touch MyFeatureDeterminismTests.cs
|
||||
```
|
||||
|
||||
### Step 3: Write Test Class
|
||||
|
||||
```csharp
|
||||
using FluentAssertions;
|
||||
using StellaOps.TestKit;
|
||||
using Xunit;
|
||||
using Xunit.Abstractions;
|
||||
|
||||
namespace StellaOps.Tests.Determinism;
|
||||
|
||||
/// <summary>
|
||||
/// Determinism tests for [Feature Name].
|
||||
/// Verifies that [specific behavior] is deterministic across platforms and runs.
|
||||
/// </summary>
|
||||
[Trait("Category", TestCategories.Determinism)]
|
||||
[Trait("Category", TestCategories.Unit)]
|
||||
public sealed class MyFeatureDeterminismTests
|
||||
{
|
||||
private readonly ITestOutputHelper _output;
|
||||
|
||||
public MyFeatureDeterminismTests(ITestOutputHelper output)
|
||||
{
|
||||
_output = output;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations()
|
||||
{
|
||||
// Arrange
|
||||
var input = CreateDeterministicInput();
|
||||
var service = CreateMyFeatureService();
|
||||
var outputs = new List<string>();
|
||||
|
||||
// Act - Execute 10 times
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
var result = await service.ProcessAsync(input, CancellationToken.None);
|
||||
outputs.Add(result.Hash);
|
||||
_output.WriteLine($"Iteration {i + 1}: {result.Hash}");
|
||||
}
|
||||
|
||||
// Assert - All hashes should be identical
|
||||
outputs.Distinct().Should().HaveCount(1,
|
||||
"same input should produce identical output across all iterations");
|
||||
}
|
||||
|
||||
#region Helper Methods
|
||||
|
||||
private static MyInput CreateDeterministicInput()
|
||||
{
|
||||
return new MyInput
|
||||
{
|
||||
// ✅ Use fixed values
|
||||
Id = "test-001",
|
||||
Timestamp = DateTimeOffset.Parse("2025-01-01T00:00:00Z"),
|
||||
Data = new[] { "item1", "item2", "item3" }
|
||||
};
|
||||
}
|
||||
|
||||
private static MyFeatureService CreateMyFeatureService()
|
||||
{
|
||||
return new MyFeatureService(NullLogger<MyFeatureService>.Instance);
|
||||
}
|
||||
|
||||
#endregion
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Run Test Locally 10 Times
|
||||
|
||||
```bash
|
||||
for i in {1..10}; do
|
||||
echo "=== Run $i ==="
|
||||
dotnet test --filter "FullyQualifiedName~MyFeature_SameInput_ProducesIdenticalOutput_Across10Iterations"
|
||||
done
|
||||
```
|
||||
|
||||
**Expected:** All 10 runs pass with identical output.
|
||||
|
||||
### Step 5: Add to CI/CD
|
||||
|
||||
Test is automatically included when pushed (no configuration needed).
|
||||
|
||||
CI/CD workflow `.gitea/workflows/cross-platform-determinism.yml` runs all `Category=Determinism` tests on 5 platforms.
|
||||
|
||||
### Step 6: Document in README
|
||||
|
||||
Update `src/__Tests/Determinism/README.md`:
|
||||
|
||||
```markdown
|
||||
### MyFeature Determinism
|
||||
|
||||
Tests that verify [feature] hash computation is deterministic:
|
||||
|
||||
- **10-Iteration Stability**: Same input produces identical hash 10 times
|
||||
- **Order Independence**: Input ordering doesn't affect hash
|
||||
- **Empty Input**: Minimal input produces deterministic hash
|
||||
```
|
||||
|
||||
## Cross-Platform Considerations
|
||||
|
||||
### Platform Matrix
|
||||
|
||||
Tests run on:
|
||||
- **Windows** (windows-latest): glibc, CRLF line endings
|
||||
- **macOS** (macos-latest): BSD libc, LF line endings
|
||||
- **Linux Ubuntu** (ubuntu-latest): glibc, LF line endings
|
||||
- **Linux Alpine** (Alpine Docker): musl libc, LF line endings
|
||||
- **Linux Debian** (Debian Docker): glibc, LF line endings
|
||||
|
||||
### Common Cross-Platform Issues
|
||||
|
||||
#### Issue 1: String Sorting (musl vs glibc)
|
||||
|
||||
**Symptom**: Alpine produces different hash than Ubuntu.
|
||||
|
||||
**Cause**: `musl` libc has different `strcoll` implementation than `glibc`.
|
||||
|
||||
**Solution**: Always use `StringComparer.Ordinal` for sorting:
|
||||
|
||||
```csharp
|
||||
// ❌ Wrong - Platform-dependent sorting
|
||||
items.Sort();
|
||||
|
||||
// ✅ Correct - Culture-invariant sorting
|
||||
items = items.OrderBy(x => x, StringComparer.Ordinal).ToList();
|
||||
```
|
||||
|
||||
#### Issue 2: Path Separators
|
||||
|
||||
**Symptom**: Windows produces different hash than macOS/Linux.
|
||||
|
||||
**Cause**: Windows uses `\`, Unix uses `/`.
|
||||
|
||||
**Solution**: Use `Path.Combine` or normalize:
|
||||
|
||||
```csharp
|
||||
// ❌ Wrong - Hardcoded separator
|
||||
var path = "dir\\file.txt";
|
||||
|
||||
// ✅ Correct - Cross-platform
|
||||
var path = Path.Combine("dir", "file.txt");
|
||||
|
||||
// ✅ Alternative - Normalize to forward slash
|
||||
var normalizedPath = path.Replace('\\', '/');
|
||||
```
|
||||
|
||||
#### Issue 3: Line Endings
|
||||
|
||||
**Symptom**: Hash includes file content with different line endings.
|
||||
|
||||
**Cause**: Windows uses CRLF (`\r\n`), Unix uses LF (`\n`).
|
||||
|
||||
**Solution**: Normalize to LF:
|
||||
|
||||
```csharp
|
||||
// ❌ Wrong - Platform line endings
|
||||
var content = File.ReadAllText(path);
|
||||
|
||||
// ✅ Correct - Normalized to LF
|
||||
var content = File.ReadAllText(path).Replace("\r\n", "\n");
|
||||
```
|
||||
|
||||
#### Issue 4: Floating-Point Precision
|
||||
|
||||
**Symptom**: Different platforms produce slightly different floating-point values.
|
||||
|
||||
**Cause**: JIT compiler optimizations, FPU rounding modes.
|
||||
|
||||
**Solution**: Use `decimal` for exact arithmetic, or round explicitly:
|
||||
|
||||
```csharp
|
||||
// ❌ Wrong - Floating-point non-determinism
|
||||
var value = 0.1 + 0.2; // Might be 0.30000000000000004
|
||||
|
||||
// ✅ Correct - Decimal for exact values
|
||||
var value = 0.1m + 0.2m; // Always 0.3
|
||||
|
||||
// ✅ Alternative - Round explicitly
|
||||
var value = Math.Round(0.1 + 0.2, 2); // 0.30
|
||||
```
|
||||
|
||||
## Performance Guidelines
|
||||
|
||||
### Execution Time Targets
|
||||
|
||||
| Test Type | Target | Max |
|
||||
|-----------|--------|-----|
|
||||
| Single iteration | <100ms | <500ms |
|
||||
| 10-iteration stability | <1s | <3s |
|
||||
| Golden file test | <100ms | <500ms |
|
||||
| **Full test suite** | **<5s** | **<15s** |
|
||||
|
||||
### Optimization Tips
|
||||
|
||||
1. **Avoid unnecessary I/O**: Create test data in memory
|
||||
2. **Use Task.CompletedTask**: For synchronous operations
|
||||
3. **Minimize allocations**: Reuse test data across assertions
|
||||
4. **Parallel test execution**: xUnit runs tests in parallel by default
|
||||
|
||||
### Performance Regression Detection
|
||||
|
||||
If test execution time increases by >2x:
|
||||
1. Profile with `dotnet-trace` or BenchmarkDotNet
|
||||
2. Identify bottleneck (I/O, CPU, memory)
|
||||
3. Optimize or split into separate test
|
||||
4. Document performance expectations in test comments
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Problem: Test Passes 9/10 Times, Fails 1/10
|
||||
|
||||
**Cause**: Non-deterministic input or race condition.
|
||||
|
||||
**Debug Steps:**
|
||||
1. Add logging to each iteration:
|
||||
```csharp
|
||||
_output.WriteLine($"Iteration {i}: Input={JsonSerializer.Serialize(input)}, Output={output}");
|
||||
```
|
||||
2. Look for differences in input or output
|
||||
3. Check for `Guid.NewGuid()`, `Random`, `DateTimeOffset.Now`
|
||||
4. Check for unsynchronized parallel operations
|
||||
|
||||
### Problem: Test Fails on Alpine but Passes Elsewhere
|
||||
|
||||
**Cause**: musl libc vs glibc difference.
|
||||
|
||||
**Debug Steps:**
|
||||
1. Run test locally with Alpine Docker:
|
||||
```bash
|
||||
docker run -it --rm -v $(pwd):/app mcr.microsoft.com/dotnet/sdk:10.0-alpine sh
|
||||
cd /app
|
||||
dotnet test --filter "FullyQualifiedName~MyTest"
|
||||
```
|
||||
2. Compare output with local (glibc) output
|
||||
3. Check for string sorting, culture-dependent formatting
|
||||
4. Use `StringComparer.Ordinal` and `CultureInfo.InvariantCulture`
|
||||
|
||||
### Problem: Golden Hash Changes After .NET Upgrade
|
||||
|
||||
**Cause**: .NET runtime change in JSON serialization or hash algorithm.
|
||||
|
||||
**Debug Steps:**
|
||||
1. Compare .NET versions:
|
||||
```bash
|
||||
dotnet --version # Should be same in CI/CD
|
||||
```
|
||||
2. Check JsonSerializer behavior:
|
||||
```csharp
|
||||
var json1 = JsonSerializer.Serialize(input, options);
|
||||
var json2 = JsonSerializer.Serialize(input, options);
|
||||
json1.Should().Be(json2);
|
||||
```
|
||||
3. If intentional .NET change, follow [Breaking Change Process](./GOLDEN_FILE_ESTABLISHMENT_GUIDE.md#breaking-change-process)
|
||||
|
||||
## References
|
||||
|
||||
- **Test README**: `src/__Tests/Determinism/README.md`
|
||||
- **Golden File Guide**: `docs/implplan/archived/2025-12-29-completed-sprints/GOLDEN_FILE_ESTABLISHMENT_GUIDE.md`
|
||||
- **ADR 0042**: CGS Merkle Tree Implementation
|
||||
- **ADR 0043**: Fulcio Keyless Signing
|
||||
- **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml`
|
||||
|
||||
## Getting Help
|
||||
|
||||
- **Slack**: #determinism-testing
|
||||
- **Issue Label**: `determinism`, `testing`
|
||||
- **Priority**: High (determinism bugs affect audit trails)
|
||||
@@ -48,6 +48,13 @@
|
||||
# Quick smoke test (~2 min)
|
||||
./devops/scripts/local-ci.sh smoke
|
||||
|
||||
# Smoke steps (isolate build vs unit tests)
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step build
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step unit
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step unit-split
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step unit-split --test-timeout 5m --progress-interval 60
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step unit-split --project-start 1 --project-count 50
|
||||
|
||||
# Full PR-gating suite (~15 min)
|
||||
./devops/scripts/local-ci.sh pr
|
||||
|
||||
@@ -73,6 +80,13 @@
|
||||
# Quick smoke test
|
||||
.\devops\scripts\local-ci.ps1 smoke
|
||||
|
||||
# Smoke steps (isolate build vs unit tests)
|
||||
.\devops\scripts\local-ci.ps1 smoke -SmokeStep build
|
||||
.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit
|
||||
.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit-split
|
||||
.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit-split -TestTimeout 5m -ProgressInterval 60
|
||||
.\devops\scripts\local-ci.ps1 smoke -SmokeStep unit-split -ProjectStart 1 -ProjectCount 50
|
||||
|
||||
# Full PR check
|
||||
.\devops\scripts\local-ci.ps1 pr
|
||||
|
||||
@@ -91,6 +105,14 @@ Quick validation before pushing. Runs only Unit tests.
|
||||
./devops/scripts/local-ci.sh smoke
|
||||
```
|
||||
|
||||
Optional stepwise smoke (to isolate hangs):
|
||||
|
||||
```bash
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step build
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step unit
|
||||
./devops/scripts/local-ci.sh smoke --smoke-step unit-split
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Builds the solution
|
||||
2. Runs Unit tests
|
||||
@@ -183,6 +205,11 @@ Complete test suite including extended categories.
|
||||
| `--category <cat>` | Run specific test category |
|
||||
| `--module <name>` | Test specific module |
|
||||
| `--workflow <name>` | Workflow to simulate |
|
||||
| `--smoke-step <step>` | Smoke step: build, unit, unit-split |
|
||||
| `--test-timeout <t>` | Per-test timeout (e.g., 5m) using --blame-hang |
|
||||
| `--progress-interval <s>` | Progress heartbeat in seconds |
|
||||
| `--project-start <n>` | Start index (1-based) for unit-split slicing |
|
||||
| `--project-count <n>` | Limit number of projects for unit-split slicing |
|
||||
| `--docker` | Force Docker execution |
|
||||
| `--native` | Force native execution |
|
||||
| `--act` | Force act execution |
|
||||
@@ -319,6 +346,9 @@ docker info
|
||||
|
||||
# Check logs
|
||||
cat out/local-ci/logs/Unit-*.log
|
||||
|
||||
# Check current test project during unit-split
|
||||
cat out/local-ci/active-test.txt
|
||||
```
|
||||
|
||||
### Act Issues
|
||||
|
||||
379
docs/testing/PERFORMANCE_BASELINES.md
Normal file
379
docs/testing/PERFORMANCE_BASELINES.md
Normal file
@@ -0,0 +1,379 @@
|
||||
# Performance Baselines - Determinism Tests
|
||||
|
||||
## Overview
|
||||
|
||||
This document tracks performance baselines for determinism tests. Baselines help detect performance regressions and ensure tests remain fast for rapid CI/CD feedback.
|
||||
|
||||
**Last Updated**: 2025-12-29
|
||||
**.NET Version**: 10.0.100
|
||||
**Hardware Reference**: GitHub Actions runners (windows-latest, ubuntu-latest, macos-latest)
|
||||
|
||||
## Baseline Metrics
|
||||
|
||||
### CGS (Canonical Graph Signature) Tests
|
||||
|
||||
**File**: `src/__Tests/Determinism/CgsDeterminismTests.cs`
|
||||
|
||||
| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
|
||||
|------|---------|-------|-------|--------|--------|-------|
|
||||
| `CgsHash_WithKnownEvidence_MatchesGoldenHash` | 87ms | 92ms | 85ms | 135ms | 89ms | Single verdict build |
|
||||
| `CgsHash_EmptyEvidence_ProducesDeterministicHash` | 45ms | 48ms | 43ms | 68ms | 46ms | Minimal evidence pack |
|
||||
| `CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations` | 850ms | 920ms | 830ms | 1,350ms | 870ms | 10 iterations |
|
||||
| `CgsHash_VexOrderIndependent_ProducesIdenticalHash` | 165ms | 178ms | 162ms | 254ms | 169ms | 3 evidence packs |
|
||||
| `CgsHash_WithReachability_IsDifferentFromWithout` | 112ms | 121ms | 109ms | 172ms | 115ms | 2 evidence packs |
|
||||
| `CgsHash_DifferentPolicyVersion_ProducesDifferentHash` | 108ms | 117ms | 105ms | 165ms | 110ms | 2 evidence packs |
|
||||
| **Total Suite** | **1,367ms** | **1,476ms** | **1,334ms** | **2,144ms** | **1,399ms** | All tests |
|
||||
|
||||
**Regression Threshold**: If any test exceeds baseline by >2x, investigate.
|
||||
|
||||
### SBOM Lineage Tests
|
||||
|
||||
**File**: `src/SbomService/__Tests/StellaOps.SbomService.Lineage.Tests/LineageDeterminismTests.cs`
|
||||
|
||||
| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
|
||||
|------|---------|-------|-------|--------|--------|-------|
|
||||
| `LineageExport_SameGraph_ProducesIdenticalNdjson_Across10Iterations` | 920ms | 995ms | 895ms | 1,420ms | 935ms | 10 iterations |
|
||||
| `LineageGraph_WithCycles_DetectsDeterministically` | 245ms | 265ms | 238ms | 378ms | 248ms | 1,000 node graph |
|
||||
| `LineageGraph_LargeGraph_PaginatesDeterministically` | 485ms | 525ms | 472ms | 748ms | 492ms | 10,000 node graph |
|
||||
| **Total Suite** | **1,650ms** | **1,785ms** | **1,605ms** | **2,546ms** | **1,675ms** | All tests |
|
||||
|
||||
### VexLens Truth Table Tests
|
||||
|
||||
**File**: `src/VexLens/__Tests/StellaOps.VexLens.Tests/Consensus/VexLensTruthTableTests.cs`
|
||||
|
||||
| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
|
||||
|------|---------|-------|-------|--------|--------|-------|
|
||||
| `SingleIssuer_ReturnsIdentity` (5 cases) | 125ms | 135ms | 122ms | 192ms | 127ms | TheoryData |
|
||||
| `TwoIssuers_SameTier_MergesCorrectly` (9 cases) | 225ms | 243ms | 219ms | 347ms | 228ms | TheoryData |
|
||||
| `TrustTier_PrecedenceApplied` (3 cases) | 75ms | 81ms | 73ms | 115ms | 76ms | TheoryData |
|
||||
| `SameInputs_ProducesIdenticalOutput_Across10Iterations` | 485ms | 524ms | 473ms | 748ms | 493ms | 10 iterations |
|
||||
| `VexOrder_DoesNotAffectConsensus` | 95ms | 103ms | 92ms | 146ms | 96ms | 3 orderings |
|
||||
| **Total Suite** | **1,005ms** | **1,086ms** | **979ms** | **1,548ms** | **1,020ms** | All tests |
|
||||
|
||||
### Scheduler Resilience Tests
|
||||
|
||||
**File**: `src/Scheduler/__Tests/StellaOps.Scheduler.Tests/`
|
||||
|
||||
| Test | Windows | macOS | Linux | Alpine | Debian | Notes |
|
||||
|------|---------|-------|-------|--------|--------|-------|
|
||||
| `IdempotentKey_PreventsDuplicateExecution` | 1,250ms | 1,350ms | 1,225ms | 1,940ms | 1,275ms | 10 jobs, Testcontainers |
|
||||
| `WorkerKilledMidRun_JobRecoveredByAnotherWorker` | 5,500ms | 5,950ms | 5,375ms | 8,515ms | 5,605ms | Chaos test, heartbeat timeout |
|
||||
| `HighLoad_AppliesBackpressureCorrectly` | 12,000ms | 12,980ms | 11,720ms | 18,575ms | 12,220ms | 1,000 jobs, concurrency limit |
|
||||
| **Total Suite** | **18,750ms** | **20,280ms** | **18,320ms** | **29,030ms** | **19,100ms** | All tests |
|
||||
|
||||
**Note**: Scheduler tests use Testcontainers (PostgreSQL), adding ~2s startup overhead.
|
||||
|
||||
## Platform Comparison
|
||||
|
||||
### Average Speed Factor (relative to Linux Ubuntu)
|
||||
|
||||
| Platform | Speed Factor | Notes |
|
||||
|----------|--------------|-------|
|
||||
| Linux Ubuntu | 1.00x | Baseline (fastest) |
|
||||
| Windows | 1.02x | ~2% slower |
|
||||
| macOS | 1.10x | ~10% slower |
|
||||
| Debian | 1.05x | ~5% slower |
|
||||
| Alpine | 1.60x | ~60% slower (musl libc overhead) |
|
||||
|
||||
**Alpine Performance**: Alpine is consistently ~60% slower due to musl libc differences. This is expected and acceptable.
|
||||
|
||||
## Historical Trends
|
||||
|
||||
### 2025-12-29 (Baseline Establishment)
|
||||
|
||||
- **.NET Version**: 10.0.100
|
||||
- **Total Tests**: 79
|
||||
- **Total Execution Time**: ~25 seconds (all platforms, sequential)
|
||||
- **Status**: ✅ All tests passing
|
||||
|
||||
**Key Metrics**:
|
||||
- CGS determinism tests: <3s per platform
|
||||
- Lineage determinism tests: <3s per platform
|
||||
- VexLens truth tables: <2s per platform
|
||||
- Scheduler resilience: <30s per platform (includes Testcontainers overhead)
|
||||
|
||||
## Regression Detection
|
||||
|
||||
### Automated Monitoring
|
||||
|
||||
CI/CD workflow `.gitea/workflows/cross-platform-determinism.yml` tracks execution time and fails if:
|
||||
|
||||
```yaml
|
||||
- name: Check for performance regression
|
||||
run: |
|
||||
# Fail if CGS test suite exceeds 3 seconds on Linux
|
||||
if [ $CGS_SUITE_TIME_MS -gt 3000 ]; then
|
||||
echo "ERROR: CGS test suite exceeded 3s baseline (${CGS_SUITE_TIME_MS}ms)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Fail if Alpine is >3x slower than Linux (expected is ~1.6x)
|
||||
ALPINE_FACTOR=$(echo "$ALPINE_TIME_MS / $LINUX_TIME_MS" | bc -l)
|
||||
if (( $(echo "$ALPINE_FACTOR > 3.0" | bc -l) )); then
|
||||
echo "ERROR: Alpine is >3x slower than Linux (factor: $ALPINE_FACTOR)"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### Manual Benchmarking
|
||||
|
||||
Run benchmarks locally to compare before/after changes:
|
||||
|
||||
```bash
|
||||
cd src/__Tests/Determinism
|
||||
|
||||
# Run with detailed timing
|
||||
dotnet test --logger "console;verbosity=detailed" | tee benchmark-$(date +%Y%m%d).log
|
||||
|
||||
# Extract timing
|
||||
grep -E "Test Name:|Duration:" benchmark-*.log
|
||||
```
|
||||
|
||||
**Example Output**:
|
||||
```
|
||||
Test Name: CgsHash_WithKnownEvidence_MatchesGoldenHash
|
||||
Duration: 87ms
|
||||
|
||||
Test Name: CgsHash_SameInput_ProducesIdenticalHash_Across10Iterations
|
||||
Duration: 850ms
|
||||
```
|
||||
|
||||
### BenchmarkDotNet Integration (Future)
|
||||
|
||||
For precise micro-benchmarks:
|
||||
|
||||
```csharp
|
||||
// src/__Tests/__Benchmarks/CgsHashBenchmarks.cs
|
||||
[MemoryDiagnoser]
|
||||
[MarkdownExporter]
|
||||
public class CgsHashBenchmarks
|
||||
{
|
||||
[Benchmark]
|
||||
public string ComputeCgsHash_SmallEvidence()
|
||||
{
|
||||
var evidence = CreateSmallEvidencePack();
|
||||
var policyLock = CreatePolicyLock();
|
||||
var service = new VerdictBuilderService(NullLogger.Instance);
|
||||
return service.ComputeCgsHash(evidence, policyLock);
|
||||
}
|
||||
|
||||
[Benchmark]
|
||||
public string ComputeCgsHash_LargeEvidence()
|
||||
{
|
||||
var evidence = CreateLargeEvidencePack(); // 100 VEX documents
|
||||
var policyLock = CreatePolicyLock();
|
||||
var service = new VerdictBuilderService(NullLogger.Instance);
|
||||
return service.ComputeCgsHash(evidence, policyLock);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Run**:
|
||||
```bash
|
||||
dotnet run -c Release --project src/__Tests/__Benchmarks/StellaOps.Benchmarks.Determinism.csproj
|
||||
```
|
||||
|
||||
## Optimization Strategies
|
||||
|
||||
### Strategy 1: Reduce Allocations
|
||||
|
||||
**Before**:
|
||||
```csharp
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
var leaves = new List<string>(); // ❌ Allocates every iteration
|
||||
leaves.Add(ComputeHash(input));
|
||||
}
|
||||
```
|
||||
|
||||
**After**:
|
||||
```csharp
|
||||
var leaves = new List<string>(capacity: 10); // ✅ Pre-allocate
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
leaves.Clear();
|
||||
leaves.Add(ComputeHash(input));
|
||||
}
|
||||
```
|
||||
|
||||
### Strategy 2: Use Span<T> for Hashing
|
||||
|
||||
**Before**:
|
||||
```csharp
|
||||
var bytes = Encoding.UTF8.GetBytes(input); // ❌ Allocates byte array
|
||||
var hash = SHA256.HashData(bytes);
|
||||
```
|
||||
|
||||
**After**:
|
||||
```csharp
|
||||
Span<byte> buffer = stackalloc byte[256]; // ✅ Stack allocation
|
||||
var bytesWritten = Encoding.UTF8.GetBytes(input, buffer);
|
||||
var hash = SHA256.HashData(buffer[..bytesWritten]);
|
||||
```
|
||||
|
||||
### Strategy 3: Cache Expensive Computations
|
||||
|
||||
**Before**:
|
||||
```csharp
|
||||
[Fact]
|
||||
public void Test()
|
||||
{
|
||||
var service = CreateService(); // ❌ Recreates every test
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
**After**:
|
||||
```csharp
|
||||
private readonly MyService _service; // ✅ Reuse across tests
|
||||
|
||||
public MyTests()
|
||||
{
|
||||
_service = CreateService();
|
||||
}
|
||||
```
|
||||
|
||||
### Strategy 4: Parallel Test Execution
|
||||
|
||||
xUnit runs tests in parallel by default. To disable for specific tests:
|
||||
|
||||
```csharp
|
||||
[Collection("Sequential")] // Disable parallelism
|
||||
public class MySlowTests
|
||||
{
|
||||
// Tests run sequentially within this class
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Regression Examples
|
||||
|
||||
### Example 1: Unexpected Allocations
|
||||
|
||||
**Symptom**: Test time increased from 85ms to 450ms after refactoring.
|
||||
|
||||
**Cause**: Accidental string concatenation in loop:
|
||||
```csharp
|
||||
// Before: 85ms
|
||||
var hash = string.Join("", hashes);
|
||||
|
||||
// After: 450ms (BUG!)
|
||||
var result = "";
|
||||
foreach (var h in hashes)
|
||||
{
|
||||
result += h; // ❌ Creates new string every iteration!
|
||||
}
|
||||
```
|
||||
|
||||
**Fix**: Use `StringBuilder`:
|
||||
```csharp
|
||||
var sb = new StringBuilder();
|
||||
foreach (var h in hashes)
|
||||
{
|
||||
sb.Append(h); // ✅ Efficient
|
||||
}
|
||||
var result = sb.ToString();
|
||||
```
|
||||
|
||||
### Example 2: Excessive I/O
|
||||
|
||||
**Symptom**: Test time increased from 100ms to 2,500ms.
|
||||
|
||||
**Cause**: Reading file from disk every iteration:
|
||||
```csharp
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
var data = File.ReadAllText("test-data.json"); // ❌ Disk I/O every iteration!
|
||||
ProcessData(data);
|
||||
}
|
||||
```
|
||||
|
||||
**Fix**: Read once, reuse:
|
||||
```csharp
|
||||
var data = File.ReadAllText("test-data.json"); // ✅ Read once
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
ProcessData(data);
|
||||
}
|
||||
```
|
||||
|
||||
### Example 3: Inefficient Sorting
|
||||
|
||||
**Symptom**: Test time increased from 165ms to 950ms after adding VEX documents.
|
||||
|
||||
**Cause**: Sorting inside loop:
|
||||
```csharp
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
var sortedVex = vexDocuments.OrderBy(v => v).ToList(); // ❌ Sorts every iteration!
|
||||
ProcessVex(sortedVex);
|
||||
}
|
||||
```
|
||||
|
||||
**Fix**: Sort once, reuse:
|
||||
```csharp
|
||||
var sortedVex = vexDocuments.OrderBy(v => v).ToList(); // ✅ Sort once
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
ProcessVex(sortedVex);
|
||||
}
|
||||
```
|
||||
|
||||
## Monitoring and Alerts
|
||||
|
||||
### Slack Alerts
|
||||
|
||||
Configure alerts for performance regressions:
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/cross-platform-determinism.yml
|
||||
- name: Notify on regression
|
||||
if: failure() && steps.performance-check.outcome == 'failure'
|
||||
uses: slackapi/slack-github-action@v1
|
||||
with:
|
||||
payload: |
|
||||
{
|
||||
"text": "⚠️ Performance regression detected in determinism tests",
|
||||
"blocks": [
|
||||
{
|
||||
"type": "section",
|
||||
"text": {
|
||||
"type": "mrkdwn",
|
||||
"text": "*CGS Test Suite Exceeded Baseline*\n\nBaseline: 3s\nActual: ${{ steps.performance-check.outputs.duration }}s\n\nPlatform: Linux Ubuntu\nCommit: ${{ github.sha }}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
Track execution time over time:
|
||||
|
||||
```promql
|
||||
# Prometheus query
|
||||
histogram_quantile(0.95,
|
||||
rate(determinism_test_duration_seconds_bucket{test="CgsHash_10Iterations"}[5m])
|
||||
)
|
||||
```
|
||||
|
||||
**Dashboard Panels**:
|
||||
1. Test duration (p50, p95, p99) over time
|
||||
2. Platform comparison (Windows vs Linux vs macOS vs Alpine)
|
||||
3. Test failure rate by platform
|
||||
4. Execution time distribution (histogram)
|
||||
|
||||
## References
|
||||
|
||||
- **CI/CD Workflow**: `.gitea/workflows/cross-platform-determinism.yml`
|
||||
- **Test README**: `src/__Tests/Determinism/README.md`
|
||||
- **Developer Guide**: `docs/testing/DETERMINISM_DEVELOPER_GUIDE.md`
|
||||
- **Batch Summary**: `docs/implplan/archived/2025-12-29-completed-sprints/BATCH_20251229_BE_COMPLETION_SUMMARY.md`
|
||||
|
||||
## Changelog
|
||||
|
||||
### 2025-12-29 - Initial Baselines
|
||||
|
||||
- Established baselines for CGS, Lineage, VexLens, and Scheduler tests
|
||||
- Documented platform speed factors (Alpine 1.6x, macOS 1.1x, Windows 1.02x)
|
||||
- Set regression thresholds (>2x baseline triggers investigation)
|
||||
- Configured CI/CD performance monitoring
|
||||
Reference in New Issue
Block a user