# Determinism CI Harness for New Formats (SC5) Status: Draft · Date: 2025-12-04 Scope: Define the determinism CI harness for validating stable ordering, hash checks, golden fixtures, and RNG seeds for CVSS v4, CycloneDX 1.7/CBOM, and SLSA 1.2 outputs. ## Objectives - Ensure Scanner outputs are reproducible across builds, platforms, and time. - Validate that serialized SBOM/VEX/attestation outputs have deterministic ordering. - Anchor CI validation to golden fixtures with pre-computed hashes. - Enable offline verification without network dependencies. ## CI Pipeline Integration ### Environment Setup ```yaml # .gitea/workflows/scanner-determinism.yml additions env: DOTNET_DISABLE_BUILTIN_GRAPH: "1" TZ: "UTC" LC_ALL: "C" STELLAOPS_DETERMINISM_SEED: "42" STELLAOPS_DETERMINISM_TIMESTAMP: "2025-01-01T00:00:00Z" ``` ### Required Environment Variables | Variable | Purpose | Default | |----------|---------|---------| | `TZ` | Force UTC timezone | `UTC` | | `LC_ALL` | Force locale-invariant sorting | `C` | | `STELLAOPS_DETERMINISM_SEED` | Fixed RNG seed for reproducibility | `42` | | `STELLAOPS_DETERMINISM_TIMESTAMP` | Fixed timestamp for output | `2025-01-01T00:00:00Z` | | `DOTNET_DISABLE_BUILTIN_GRAPH` | Disable non-deterministic graph features | `1` | ## Hash Validation Steps ### 1. Golden Fixture Verification ```bash #!/bin/bash # scripts/scanner/verify-determinism.sh set -euo pipefail FIXTURE_DIR="docs/modules/scanner/fixtures/cdx17-cbom" HASH_FILE="${FIXTURE_DIR}/hashes.txt" verify_fixture() { local file="$1" local expected_blake3="$2" local expected_sha256="$3" actual_blake3=$(b3sum "${file}" | cut -d' ' -f1) actual_sha256=$(sha256sum "${file}" | cut -d' ' -f1) if [[ "${actual_blake3}" != "${expected_blake3}" ]]; then echo "FAIL: ${file} BLAKE3 mismatch" echo " expected: ${expected_blake3}" echo " actual: ${actual_blake3}" return 1 fi if [[ "${actual_sha256}" != "${expected_sha256}" ]]; then echo "FAIL: ${file} SHA256 mismatch" echo " expected: ${expected_sha256}" echo " actual: ${actual_sha256}" return 1 fi echo "PASS: ${file}" return 0 } # Parse hashes.txt and verify each fixture while IFS=': ' read -r filename hashes; do blake3=$(echo "${hashes}" | grep -oP 'BLAKE3=\K[a-f0-9]+') sha256=$(echo "${hashes}" | grep -oP 'SHA256=\K[a-f0-9]+') verify_fixture "${FIXTURE_DIR}/${filename}" "${blake3}" "${sha256}" done < <(grep -v '^#' "${HASH_FILE}") ``` ### 2. Deterministic Serialization Test ```csharp // src/Scanner/__Tests/StellaOps.Scanner.Determinism.Tests/CdxDeterminismTests.cs [Fact] public async Task Cdx17_Serialization_Is_Deterministic() { // Arrange var options = new DeterminismOptions { Seed = 42, Timestamp = new DateTimeOffset(2025, 1, 1, 0, 0, 0, TimeSpan.Zero), CultureInvariant = true }; var sbom = CreateTestSbom(); // Act - serialize twice var json1 = await _serializer.SerializeAsync(sbom, options); var json2 = await _serializer.SerializeAsync(sbom, options); // Assert - must be identical Assert.Equal(json1, json2); // Compute and verify hash var hash = Blake3.HashData(Encoding.UTF8.GetBytes(json1)); Assert.Equal(ExpectedHash, Convert.ToHexString(hash).ToLowerInvariant()); } ``` ### 3. Downgrade Adapter Verification ```csharp [Fact] public async Task Cdx17_To_Cdx16_Downgrade_Is_Deterministic() { // Arrange var cdx17 = await LoadFixture("sample-cdx17-cbom.json"); // Act var cdx16_1 = await _adapter.Downgrade(cdx17); var cdx16_2 = await _adapter.Downgrade(cdx17); // Assert var json1 = await _serializer.SerializeAsync(cdx16_1); var json2 = await _serializer.SerializeAsync(cdx16_2); Assert.Equal(json1, json2); // Verify matches golden fixture hash var hash = Blake3.HashData(Encoding.UTF8.GetBytes(json1)); var expectedHash = LoadExpectedHash("sample-cdx16.json"); Assert.Equal(expectedHash, Convert.ToHexString(hash).ToLowerInvariant()); } ``` ## Ordering Rules ### Components (CycloneDX) 1. Sort by `purl` (case-insensitive, locale-invariant) 2. Ties: sort by `name` (case-insensitive) 3. Ties: sort by `version` (semantic version comparison) ### Vulnerabilities 1. Sort by `id` (lexicographic) 2. Ties: sort by `source.name` (lexicographic) 3. Ties: sort by highest severity rating score (descending) ### Properties 1. Sort by `name` (lexicographic, locale-invariant) ### Hashes 1. Sort by `alg` (BLAKE3-256, SHA-256, SHA-512 order) ### Ratings (CVSS) 1. CVSSv4 first 2. CVSSv31 second 3. CVSSv30 third 4. Others alphabetically by method ## Fixture Requirements (SC8 Cross-Reference) Each golden fixture must include: | Format | Fixture File | Contents | |--------|--------------|----------| | CDX 1.7 + CBOM | `sample-cdx17-cbom.json` | Full SBOM with CVSS v4/v3.1, CBOM properties, SLSA Source Track, evidence | | CDX 1.6 (downgraded) | `sample-cdx16.json` | Downgraded version with CVSS v4 removed, CBOM dropped, audit markers | | SLSA Source Track | `source-track.sample.json` | Standalone source provenance block | ## CI Workflow Steps ```yaml # Add to .gitea/workflows/scanner-determinism.yml jobs: determinism-check: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup .NET uses: actions/setup-dotnet@v4 with: dotnet-version: '10.0.x' - name: Set determinism environment run: | echo "TZ=UTC" >> $GITHUB_ENV echo "LC_ALL=C" >> $GITHUB_ENV echo "DOTNET_DISABLE_BUILTIN_GRAPH=1" >> $GITHUB_ENV echo "STELLAOPS_DETERMINISM_SEED=42" >> $GITHUB_ENV - name: Verify golden fixtures run: scripts/scanner/verify-determinism.sh - name: Run determinism tests run: | dotnet test src/Scanner/__Tests/StellaOps.Scanner.Determinism.Tests \ --configuration Release \ --verbosity normal - name: Run adapter determinism tests run: | dotnet test src/Scanner/__Tests/StellaOps.Scanner.Adapters.Tests \ --filter "Category=Determinism" \ --configuration Release ``` ## Failure Handling ### Hash Mismatch Protocol 1. **Do not auto-update hashes** - manual review required 2. Log diff between expected and actual output 3. Capture both BLAKE3 and SHA256 for audit trail 4. Block merge until resolved ### Acceptable Reasons for Hash Update - Schema version bump (documented in change log) - Intentional ordering rule change (documented in adapter CSV) - Bug fix that corrects previously non-deterministic output - Never: cosmetic changes, timestamp updates, random salts ## Offline Verification The harness must work completely offline: - No network calls during serialization - No external schema validation endpoints - Trust roots and schemas bundled in repository - All RNG seeded from environment variable ## Integration with SC8 Fixtures The fixtures defined in SC8 serve as golden sources for this harness: ``` docs/modules/scanner/fixtures/ ├── cdx17-cbom/ │ ├── sample-cdx17-cbom.json # CVSS v4 + v3.1, CBOM, evidence │ ├── sample-cdx16.json # Downgraded, CVSS v3.1 only │ ├── source-track.sample.json # SLSA Source Track │ └── hashes.txt # BLAKE3 + SHA256 for all fixtures ├── adapters/ │ ├── mapping-cvss4-to-cvss3.csv │ ├── mapping-cdx17-to-cdx16.csv │ ├── mapping-slsa12-to-slsa10.csv │ └── hashes.txt └── competitor-adapters/ └── fixtures/ ├── normalized-syft.json ├── normalized-trivy.json └── normalized-clair.json ``` ## Links - Sprint: `docs/implplan/SPRINT_0186_0001_0001_record_deterministic_execution.md` (SC5) - Roadmap: `docs/modules/scanner/design/standards-convergence-roadmap.md` (SC1) - Contract: `docs/modules/scanner/design/cdx17-cbom-contract.md` (SC2)