docs consolidation
This commit is contained in:
354
docs/testing/e2e-reproducibility.md
Normal file
354
docs/testing/e2e-reproducibility.md
Normal file
@@ -0,0 +1,354 @@
|
||||
# End-to-End Reproducibility Testing Guide
|
||||
|
||||
> **Sprint:** SPRINT_8200_0001_0004_e2e_reproducibility_test
|
||||
> **Tasks:** E2E-8200-025, E2E-8200-026
|
||||
> **Last Updated:** 2025-06-15
|
||||
|
||||
## Overview
|
||||
|
||||
StellaOps implements comprehensive end-to-end (E2E) reproducibility testing to ensure that identical inputs always produce identical outputs across:
|
||||
|
||||
- Sequential pipeline runs
|
||||
- Parallel pipeline runs
|
||||
- Different execution environments (Ubuntu, Windows, macOS)
|
||||
- Different points in time (using frozen timestamps)
|
||||
|
||||
This document describes the E2E test structure, how to run tests, and how to troubleshoot reproducibility failures.
|
||||
|
||||
## Test Architecture
|
||||
|
||||
### Pipeline Stages
|
||||
|
||||
The E2E reproducibility tests cover the full security scanning pipeline:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Full E2E Pipeline │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────┐ ┌───────────┐ ┌──────┐ ┌────────┐ ┌──────────┐ │
|
||||
│ │ Ingest │───▶│ Normalize │───▶│ Diff │───▶│ Decide │───▶│ Attest │ │
|
||||
│ │ Advisory │ │ Merge & │ │ SBOM │ │ Policy │ │ DSSE │ │
|
||||
│ │ Feeds │ │ Dedup │ │ vs │ │ Verdict│ │ Envelope │ │
|
||||
│ └──────────┘ └───────────┘ │Adviso│ └────────┘ └──────────┘ │
|
||||
│ │ries │ │ │
|
||||
│ └──────┘ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ Bundle │ │
|
||||
│ │ Package │ │
|
||||
│ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Components
|
||||
|
||||
| Component | File | Purpose |
|
||||
|-----------|------|---------|
|
||||
| Test Project | `StellaOps.Integration.E2E.csproj` | MSBuild project for E2E tests |
|
||||
| Test Fixture | `E2EReproducibilityTestFixture.cs` | Pipeline composition and execution |
|
||||
| Tests | `E2EReproducibilityTests.cs` | Reproducibility verification tests |
|
||||
| Comparer | `ManifestComparer.cs` | Byte-for-byte manifest comparison |
|
||||
| CI Workflow | `.gitea/workflows/e2e-reproducibility.yml` | Cross-platform CI pipeline |
|
||||
|
||||
## Running E2E Tests
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- .NET 10.0 SDK
|
||||
- Docker (for PostgreSQL container)
|
||||
- At least 4GB RAM available
|
||||
|
||||
### Local Execution
|
||||
|
||||
```bash
|
||||
# Run all E2E reproducibility tests
|
||||
dotnet test tests/integration/StellaOps.Integration.E2E/ \
|
||||
--logger "console;verbosity=detailed"
|
||||
|
||||
# Run specific test category
|
||||
dotnet test tests/integration/StellaOps.Integration.E2E/ \
|
||||
--filter "Category=Integration" \
|
||||
--logger "console;verbosity=detailed"
|
||||
|
||||
# Run with code coverage
|
||||
dotnet test tests/integration/StellaOps.Integration.E2E/ \
|
||||
--collect:"XPlat Code Coverage" \
|
||||
--results-directory ./TestResults
|
||||
```
|
||||
|
||||
### CI Execution
|
||||
|
||||
E2E tests run automatically on:
|
||||
|
||||
- Pull requests affecting `src/**` or `tests/integration/**`
|
||||
- Pushes to `main` and `develop` branches
|
||||
- Nightly at 2:00 AM UTC (full cross-platform suite)
|
||||
- Manual trigger with optional cross-platform flag
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 1. Sequential Reproducibility (Tasks 11-14)
|
||||
|
||||
Tests that the pipeline produces identical results when run multiple times:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task FullPipeline_ProducesIdenticalVerdictHash_AcrossRuns()
|
||||
{
|
||||
// Arrange
|
||||
var inputs = await _fixture.SnapshotInputsAsync();
|
||||
|
||||
// Act - Run twice
|
||||
var result1 = await _fixture.RunFullPipelineAsync(inputs);
|
||||
var result2 = await _fixture.RunFullPipelineAsync(inputs);
|
||||
|
||||
// Assert
|
||||
result1.VerdictId.Should().Be(result2.VerdictId);
|
||||
result1.BundleManifestHash.Should().Be(result2.BundleManifestHash);
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Parallel Reproducibility (Task 14)
|
||||
|
||||
Tests that concurrent execution produces identical results:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task FullPipeline_ParallelExecution_10Concurrent_AllIdentical()
|
||||
{
|
||||
var inputs = await _fixture.SnapshotInputsAsync();
|
||||
const int concurrentRuns = 10;
|
||||
|
||||
var tasks = Enumerable.Range(0, concurrentRuns)
|
||||
.Select(_ => _fixture.RunFullPipelineAsync(inputs));
|
||||
|
||||
var results = await Task.WhenAll(tasks);
|
||||
var comparison = ManifestComparer.CompareMultiple(results.ToList());
|
||||
|
||||
comparison.AllMatch.Should().BeTrue();
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Cross-Platform Reproducibility (Tasks 15-18)
|
||||
|
||||
Tests that identical inputs produce identical outputs on different operating systems:
|
||||
|
||||
| Platform | Runner | Status |
|
||||
|----------|--------|--------|
|
||||
| Ubuntu | `ubuntu-latest` | Primary (runs on every PR) |
|
||||
| Windows | `windows-latest` | Nightly / On-demand |
|
||||
| macOS | `macos-latest` | Nightly / On-demand |
|
||||
|
||||
### 4. Golden Baseline Verification (Tasks 19-21)
|
||||
|
||||
Tests that current results match a pre-approved baseline:
|
||||
|
||||
```json
|
||||
// bench/determinism/golden-baseline/e2e-hashes.json
|
||||
{
|
||||
"verdict_hash": "sha256:abc123...",
|
||||
"manifest_hash": "sha256:def456...",
|
||||
"envelope_hash": "sha256:ghi789...",
|
||||
"updated_at": "2025-06-15T12:00:00Z",
|
||||
"updated_by": "ci",
|
||||
"commit": "abc123def456"
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting Reproducibility Failures
|
||||
|
||||
### Common Causes
|
||||
|
||||
#### 1. Non-Deterministic Ordering
|
||||
|
||||
**Symptom:** Different verdict hashes despite identical inputs.
|
||||
|
||||
**Diagnosis:**
|
||||
```csharp
|
||||
// Check if collections are being ordered
|
||||
var comparison = ManifestComparer.Compare(result1, result2);
|
||||
var report = ManifestComparer.GenerateDiffReport(comparison);
|
||||
Console.WriteLine(report);
|
||||
```
|
||||
|
||||
**Solution:** Ensure all collections are sorted before hashing:
|
||||
```csharp
|
||||
// Bad - non-deterministic
|
||||
var findings = results.ToList();
|
||||
|
||||
// Good - deterministic
|
||||
var findings = results.OrderBy(f => f.CveId, StringComparer.Ordinal)
|
||||
.ThenBy(f => f.Purl, StringComparer.Ordinal)
|
||||
.ToList();
|
||||
```
|
||||
|
||||
#### 2. Timestamp Drift
|
||||
|
||||
**Symptom:** Bundle manifests differ in `createdAt` field.
|
||||
|
||||
**Diagnosis:**
|
||||
```csharp
|
||||
var jsonComparison = ManifestComparer.CompareJson(
|
||||
result1.BundleManifest,
|
||||
result2.BundleManifest);
|
||||
```
|
||||
|
||||
**Solution:** Use frozen timestamps in tests:
|
||||
```csharp
|
||||
// In test fixture
|
||||
public DateTimeOffset FrozenTimestamp { get; } =
|
||||
new DateTimeOffset(2025, 6, 15, 12, 0, 0, TimeSpan.Zero);
|
||||
```
|
||||
|
||||
#### 3. Platform-Specific Behavior
|
||||
|
||||
**Symptom:** Tests pass on Ubuntu but fail on Windows/macOS.
|
||||
|
||||
**Common causes:**
|
||||
- Line ending differences (`\n` vs `\r\n`)
|
||||
- Path separator differences (`/` vs `\`)
|
||||
- Unicode normalization differences
|
||||
- Floating-point representation differences
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Download artifacts from all platforms
|
||||
# Compare hex dumps
|
||||
xxd ubuntu-manifest.bin > ubuntu.hex
|
||||
xxd windows-manifest.bin > windows.hex
|
||||
diff ubuntu.hex windows.hex
|
||||
```
|
||||
|
||||
**Solution:** Use platform-agnostic serialization:
|
||||
```csharp
|
||||
// Use canonical JSON
|
||||
var json = CanonJson.Serialize(data);
|
||||
|
||||
// Normalize line endings
|
||||
var normalized = content.Replace("\r\n", "\n");
|
||||
```
|
||||
|
||||
#### 4. Key/Signature Differences
|
||||
|
||||
**Symptom:** Envelope hashes differ despite identical payloads.
|
||||
|
||||
**Diagnosis:**
|
||||
```csharp
|
||||
// Compare envelope structure
|
||||
var envelope1 = JsonSerializer.Deserialize<DsseEnvelope>(result1.EnvelopeBytes);
|
||||
var envelope2 = JsonSerializer.Deserialize<DsseEnvelope>(result2.EnvelopeBytes);
|
||||
|
||||
// Check if payloads match
|
||||
envelope1.Payload.SequenceEqual(envelope2.Payload).Should().BeTrue();
|
||||
```
|
||||
|
||||
**Solution:** Use deterministic key generation:
|
||||
```csharp
|
||||
// Generate key from fixed seed for reproducibility
|
||||
private static ECDsa GenerateDeterministicKey(int seed)
|
||||
{
|
||||
var rng = new DeterministicRng(seed);
|
||||
var keyBytes = new byte[32];
|
||||
rng.GetBytes(keyBytes);
|
||||
// ... create key from bytes
|
||||
}
|
||||
```
|
||||
|
||||
### Debugging Tools
|
||||
|
||||
#### ManifestComparer
|
||||
|
||||
```csharp
|
||||
// Full comparison
|
||||
var comparison = ManifestComparer.Compare(expected, actual);
|
||||
|
||||
// Multiple results
|
||||
var multiComparison = ManifestComparer.CompareMultiple(results);
|
||||
|
||||
// Detailed report
|
||||
var report = ManifestComparer.GenerateDiffReport(comparison);
|
||||
|
||||
// Hex dump for byte-level debugging
|
||||
var hexDump = ManifestComparer.GenerateHexDump(expected.BundleManifest, actual.BundleManifest);
|
||||
```
|
||||
|
||||
#### JSON Comparison
|
||||
|
||||
```csharp
|
||||
var jsonComparison = ManifestComparer.CompareJson(
|
||||
expected.BundleManifest,
|
||||
actual.BundleManifest);
|
||||
|
||||
foreach (var diff in jsonComparison.Differences)
|
||||
{
|
||||
Console.WriteLine($"Path: {diff.Path}");
|
||||
Console.WriteLine($"Expected: {diff.Expected}");
|
||||
Console.WriteLine($"Actual: {diff.Actual}");
|
||||
}
|
||||
```
|
||||
|
||||
## Updating the Golden Baseline
|
||||
|
||||
When intentional changes affect reproducibility (e.g., new fields, algorithm changes):
|
||||
|
||||
### 1. Manual Update
|
||||
|
||||
```bash
|
||||
# Run tests and capture new hashes
|
||||
dotnet test tests/integration/StellaOps.Integration.E2E/ \
|
||||
--results-directory ./TestResults
|
||||
|
||||
# Update baseline
|
||||
cp ./TestResults/verdict_hash.txt ./bench/determinism/golden-baseline/
|
||||
# ... update e2e-hashes.json
|
||||
```
|
||||
|
||||
### 2. CI Update (Recommended)
|
||||
|
||||
```bash
|
||||
# Trigger workflow with update flag
|
||||
# Via Gitea UI: Actions → E2E Reproducibility → Run workflow
|
||||
# Set update_baseline = true
|
||||
```
|
||||
|
||||
### 3. Approval Process
|
||||
|
||||
1. Create PR with baseline update
|
||||
2. Explain why the change is intentional
|
||||
3. Verify all platforms produce consistent results
|
||||
4. Get approval from Platform Guild lead
|
||||
5. Merge after CI passes
|
||||
|
||||
## CI Workflow Reference
|
||||
|
||||
### Jobs
|
||||
|
||||
| Job | Runs On | Trigger | Purpose |
|
||||
|-----|---------|---------|---------|
|
||||
| `reproducibility-ubuntu` | Every PR | PR/Push | Primary reproducibility check |
|
||||
| `reproducibility-windows` | Nightly | Schedule/Manual | Cross-platform Windows |
|
||||
| `reproducibility-macos` | Nightly | Schedule/Manual | Cross-platform macOS |
|
||||
| `cross-platform-compare` | After platform jobs | Schedule/Manual | Compare hashes |
|
||||
| `golden-baseline` | After Ubuntu | Always | Baseline verification |
|
||||
| `reproducibility-gate` | After all | Always | Final status check |
|
||||
|
||||
### Artifacts
|
||||
|
||||
| Artifact | Retention | Contents |
|
||||
|----------|-----------|----------|
|
||||
| `e2e-results-{platform}` | 14 days | Test results (.trx), logs |
|
||||
| `hashes-{platform}` | 14 days | Hash files for comparison |
|
||||
| `cross-platform-report` | 30 days | Markdown comparison report |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Reproducibility Architecture](../reproducibility.md)
|
||||
- [VerdictId Content-Addressing](../modules/policy/architecture.md#verdictid)
|
||||
- [DSSE Envelope Format](../modules/attestor/architecture.md#dsse)
|
||||
- [Determinism Testing](./determinism-verification.md)
|
||||
|
||||
## Sprint History
|
||||
|
||||
- **8200.0001.0004** - Initial E2E reproducibility test implementation
|
||||
- **8200.0001.0001** - VerdictId content-addressing (dependency)
|
||||
- **8200.0001.0002** - DSSE round-trip testing (dependency)
|
||||
Reference in New Issue
Block a user