Files
git.stella-ops.org/docs/testing/determinism-verification.md
master 5590a99a1a Add tests for SBOM generation determinism across multiple formats
- Created `StellaOps.TestKit.Tests` project for unit tests related to determinism.
- Implemented `DeterminismManifestTests` to validate deterministic output for canonical bytes and strings, file read/write operations, and error handling for invalid schema versions.
- Added `SbomDeterminismTests` to ensure identical inputs produce consistent SBOMs across SPDX 3.0.1 and CycloneDX 1.6/1.7 formats, including parallel execution tests.
- Updated project references in `StellaOps.Integration.Determinism` to include the new determinism testing library.
2025-12-23 23:51:58 +02:00

363 lines
9.6 KiB
Markdown

# Determinism Verification Guide
**Sprint:** 5100.0007.0003 (Epic B)
**Last Updated:** 2025-12-23
## Overview
StellaOps enforces deterministic artifact generation across all exported formats. This ensures:
1. **Reproducibility**: Given the same inputs, outputs are byte-for-byte identical
2. **Auditability**: Hash verification proves artifact integrity
3. **Compliance**: Regulated environments can replay and verify builds
4. **CI Gating**: Drift detection prevents unintended changes
## Supported Artifact Types
| Type | Format(s) | Test File |
|------|-----------|-----------|
| SBOM | SPDX 3.0.1, CycloneDX 1.6, CycloneDX 1.7 | `SbomDeterminismTests.cs` |
| VEX | OpenVEX, CSAF 2.0 | `VexDeterminismTests.cs` |
| Policy Verdicts | JSON | `PolicyDeterminismTests.cs` |
| Evidence Bundles | JSON, DSSE, in-toto | `EvidenceBundleDeterminismTests.cs` |
| AirGap Bundles | NDJSON | `AirGapBundleDeterminismTests.cs` |
| Advisory Normalization | Canonical JSON | `IngestionDeterminismTests.cs` |
## Determinism Manifest Format
Every deterministic artifact can produce a manifest describing its content hash and generation context.
### Schema (v1.0)
```json
{
"schemaVersion": "1.0",
"artifact": {
"type": "sbom | vex | policy-verdict | evidence-bundle | airgap-bundle",
"name": "artifact-identifier",
"version": "1.0.0",
"format": "SPDX 3.0.1 | CycloneDX 1.6 | OpenVEX | CSAF 2.0 | ..."
},
"canonicalHash": {
"algorithm": "SHA-256",
"value": "abc123..."
},
"toolchain": {
"platform": ".NET 10.0",
"components": [
{ "name": "StellaOps.Scanner", "version": "1.0.0" }
]
},
"inputs": {
"feedSnapshotHash": "def456...",
"policyManifestHash": "ghi789...",
"configHash": "jkl012..."
},
"generatedAt": "2025-12-23T18:00:00Z"
}
```
### Field Descriptions
| Field | Description |
|-------|-------------|
| `schemaVersion` | Manifest schema version (currently `1.0`) |
| `artifact.type` | Category of the artifact |
| `artifact.name` | Identifier for the artifact |
| `artifact.version` | Version of the artifact (if applicable) |
| `artifact.format` | Specific format/spec version |
| `canonicalHash.algorithm` | Hash algorithm (always `SHA-256`) |
| `canonicalHash.value` | Lowercase hex hash of canonical bytes |
| `toolchain.platform` | Runtime platform |
| `toolchain.components` | List of generating components with versions |
| `inputs` | Hashes of input artifacts (feed snapshots, policies, etc.) |
| `generatedAt` | ISO-8601 UTC timestamp of generation |
## Creating a Determinism Manifest
Use `DeterminismManifestWriter` from `StellaOps.Testing.Determinism`:
```csharp
using StellaOps.Testing.Determinism;
// Generate artifact bytes
var sbomBytes = GenerateSbom(input, frozenTime);
// Create artifact info
var artifactInfo = new ArtifactInfo
{
Type = "sbom",
Name = "my-container-sbom",
Version = "1.0.0",
Format = "CycloneDX 1.6"
};
// Create toolchain info
var toolchain = new ToolchainInfo
{
Platform = ".NET 10.0",
Components = new[]
{
new ComponentInfo { Name = "StellaOps.Scanner", Version = "1.0.0" }
}
};
// Create manifest
var manifest = DeterminismManifestWriter.CreateManifest(
sbomBytes,
artifactInfo,
toolchain);
// Save manifest
DeterminismManifestWriter.Save(manifest, "determinism.json");
```
## Reading and Verifying Manifests
```csharp
// Load manifest
var manifest = DeterminismManifestReader.Load("determinism.json");
// Verify artifact bytes match manifest hash
var currentBytes = File.ReadAllBytes("artifact.json");
var isValid = DeterminismManifestReader.Verify(manifest, currentBytes);
if (!isValid)
{
throw new DeterminismDriftException(
$"Artifact hash mismatch. Expected: {manifest.CanonicalHash.Value}");
}
```
## Determinism Rules
### 1. Canonical JSON Serialization
All JSON output must use canonical serialization via `StellaOps.Canonical.Json`:
```csharp
using StellaOps.Canonical.Json;
var json = CanonJson.Serialize(myObject);
var hash = CanonJson.Sha256Hex(Encoding.UTF8.GetBytes(json));
```
Rules:
- Keys sorted lexicographically
- No trailing whitespace
- Unix line endings (`\n`)
- No BOM
- UTF-8 encoding
### 2. Frozen Timestamps
All timestamps must be provided externally or use `DeterministicTime`:
```csharp
// ❌ BAD - Non-deterministic
var timestamp = DateTimeOffset.UtcNow;
// ✅ GOOD - Deterministic
var timestamp = frozenTime; // Passed as parameter
```
### 3. Deterministic IDs
UUIDs and IDs must be derived from content, not random:
```csharp
// ❌ BAD - Random UUID
var id = Guid.NewGuid();
// ✅ GOOD - Content-derived ID
var seed = $"{input.Name}:{input.Version}:{timestamp:O}";
var hash = CanonJson.Sha256Hex(Encoding.UTF8.GetBytes(seed));
var id = new Guid(Convert.FromHexString(hash[..32]));
```
### 4. Stable Ordering
Collections must be sorted before serialization:
```csharp
// ❌ BAD - Non-deterministic order
var items = dictionary.Values;
// ✅ GOOD - Sorted order
var items = dictionary.Values
.OrderBy(v => v.Key, StringComparer.Ordinal);
```
### 5. Parallel Safety
Determinism must hold under parallel execution:
```csharp
var tasks = Enumerable.Range(0, 20)
.Select(_ => Task.Run(() => GenerateArtifact(input, frozenTime)))
.ToArray();
var results = await Task.WhenAll(tasks);
results.Should().AllBe(results[0]); // All identical
```
## CI Integration
### PR Merge Gate
The determinism gate runs on PR merge:
```yaml
# .gitea/workflows/determinism-gate.yaml
name: Determinism Gate
on:
pull_request:
types: [synchronize, ready_for_review]
jobs:
determinism:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- name: Run Determinism Tests
run: |
dotnet test tests/integration/StellaOps.Integration.Determinism \
--logger "trx;LogFileName=determinism.trx"
- name: Generate Determinism Manifest
run: |
dotnet run --project tools/DeterminismManifestGenerator \
--output determinism.json
- name: Upload Determinism Artifact
uses: actions/upload-artifact@v4
with:
name: determinism-manifest
path: determinism.json
```
### Baseline Storage
Determinism baselines are stored as CI artifacts:
```
ci-artifacts/
determinism/
baseline/
sbom-spdx-3.0.1.json
sbom-cyclonedx-1.6.json
sbom-cyclonedx-1.7.json
vex-openvex.json
vex-csaf.json
policy-verdict.json
evidence-bundle.json
airgap-bundle.json
```
### Drift Detection
When a PR changes artifact output:
1. CI compares new manifest hash against baseline
2. If different, CI fails with diff report
3. Developer must either:
- Fix the regression (restore determinism)
- Update the baseline (if change is intentional)
### Baseline Update Process
To intentionally update a baseline:
```bash
# 1. Run determinism tests to generate new manifests
dotnet test tests/integration/StellaOps.Integration.Determinism
# 2. Update baseline files
cp determinism/*.json ci-artifacts/determinism/baseline/
# 3. Commit with explicit message
git add ci-artifacts/determinism/baseline/
git commit -m "chore(determinism): update baselines for [reason]
Breaking: [explain what changed]
Justification: [explain why this is correct]"
```
## Replay Verification
To verify an artifact was produced deterministically:
```bash
# 1. Get the manifest
curl -O https://releases.stellaops.io/v1.0.0/sbom.determinism.json
# 2. Get the artifact
curl -O https://releases.stellaops.io/v1.0.0/sbom.cdx.json
# 3. Verify
dotnet run --project tools/DeterminismVerifier \
--manifest sbom.determinism.json \
--artifact sbom.cdx.json
```
Output:
```
Determinism Verification
========================
Artifact: sbom.cdx.json
Manifest: sbom.determinism.json
Expected Hash: abc123...
Actual Hash: abc123...
Status: ✅ VERIFIED
```
## Test Files Reference
All determinism tests are in `tests/integration/StellaOps.Integration.Determinism/`:
| File | Tests | Description |
|------|-------|-------------|
| `DeterminismValidationTests.cs` | 16 | Manifest format and reader/writer |
| `SbomDeterminismTests.cs` | 14 | SPDX 3.0.1, CycloneDX 1.6/1.7 |
| `VexDeterminismTests.cs` | 17 | OpenVEX, CSAF 2.0 |
| `PolicyDeterminismTests.cs` | 18 | Policy verdict artifacts |
| `EvidenceBundleDeterminismTests.cs` | 15 | DSSE, in-toto attestations |
| `AirGapBundleDeterminismTests.cs` | 14 | NDJSON bundles, manifests |
| `IngestionDeterminismTests.cs` | 17 | NVD/OSV/GHSA/CSAF normalization |
## Troubleshooting
### Hash Mismatch
If you see a hash mismatch:
1. **Check timestamps**: Ensure frozen time is used
2. **Check ordering**: Ensure all collections are sorted
3. **Check IDs**: Ensure IDs are content-derived
4. **Check encoding**: Ensure UTF-8 without BOM
### Flaky Tests
If determinism tests are flaky:
1. **Check parallelism**: Ensure no shared mutable state
2. **Check time zones**: Use UTC explicitly
3. **Check random sources**: Remove all random number generation
4. **Check hash inputs**: Ensure all inputs are captured
### CI Failures
If CI determinism gate fails:
1. Compare the diff between expected and actual
2. Identify which field changed
3. Track back to the code change that caused it
4. Either fix the regression or update baseline with justification
## Related Documentation
- [Testing Strategy Models](testing-strategy-models.md) - Overview of testing models
- [Canonical JSON Specification](../11_DATA_SCHEMAS.md#canonical-json) - JSON serialization rules
- [CI/CD Workflows](../modules/devops/architecture.md) - CI pipeline details
- [Evidence Bundle Schema](../modules/evidence-locker/architecture.md) - Bundle format reference