# SBOM Determinism Guide > **Sprint**: SPRINT_20260118_025_ReleaseOrchestrator_sbom_release_association > **Task**: TASK-025-005 > **Status**: Living Document This document consolidates all determinism requirements for Stella Ops SBOMs. Deterministic SBOMs are critical for reproducible builds, verifiable release gates, and trust chain integrity. --- ## 1. Why Determinism Matters ### 1.1 Reproducibility Deterministic SBOMs ensure that scanning the same artifact multiple times produces identical output. This is essential for: - **CI/CD Reliability**: Re-running a pipeline should produce the same SBOM hash - **Audit Trails**: Evidence submitted to compliance frameworks must be reproducible - **Caching**: Content-addressed storage can deduplicate identical SBOMs - **Debugging**: Engineers can reproduce exact SBOM state from artifact digest ### 1.2 Verifiable Gates Policy gates rely on SBOM hashes for trust verification: ```plaintext Artifact Digest → SBOM Generation → Canonical Hash → DSSE Signature → Policy Evaluation ``` If SBOM generation is non-deterministic, the same artifact could produce different hashes, breaking: - Signature verification (hash mismatch) - Gate decisions (different vulnerability sets) - Attestation chains (broken proof lineage) ### 1.3 Trust Chaining Evidence chains require stable identifiers. A release component's `SbomDigest` must match the SBOM retrieved later for verification. Non-determinism breaks this chain: ```plaintext Release Finalization: SbomDigest = sha256:abc123... Later Verification: sha256(regenerated-sbom) = sha256:xyz789... ← BROKEN ``` --- ## 2. Canonicalization Rules Stella Ops uses [RFC 8785 JSON Canonicalization Scheme (JCS)](https://tools.ietf.org/html/rfc8785) for deterministic JSON serialization. ### 2.1 Core JCS Rules 1. **No Whitespace**: Output has no formatting, newlines, or indentation 2. **Sorted Keys**: Object keys are sorted lexicographically (Unicode code point order) 3. **Normalized Numbers**: No leading zeros, no trailing decimal zeros, no positive exponent sign 4. **UTF-8 Encoding**: All strings encoded as UTF-8 without BOM 5. **No Duplicate Keys**: Object keys must be unique ### 2.2 Implementation ```csharp // Using StellaOps.Canonical.Json using StellaOps.Canonical.Json; // Canonicalize raw JSON bytes byte[] canonical = CanonJson.CanonicalizeParsedJson(jsonBytes); // Compute SHA-256 of canonical form string digest = CanonJson.Sha256Hex(canonical); ``` ### 2.3 SBOM-Specific Ordering Beyond JCS, Stella Ops applies additional ordering for SBOM elements: | Element | Ordering Strategy | |---------|-------------------| | `components` | Sorted by `bom-ref` (Ordinal) | | `dependencies` | Sorted by `ref` (Ordinal) | | `hashes` | Sorted by `alg` (Ordinal) | | `licenses` | Sorted by license ID (Ordinal) | | `dependsOn` | Sorted lexicographically | This ensures component order doesn't affect the canonical hash. --- ## 3. Identity Field Derivation ### 3.1 serialNumber (CycloneDX) **Rule**: Use `urn:sha256:` format for deterministic identification. ```json { "bomFormat": "CycloneDX", "specVersion": "1.6", "serialNumber": "urn:sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" } ``` **Benefits**: - Directly ties SBOM identity to the artifact it describes - Enables verification: `serialNumber == urn:sha256:$(sha256sum artifact)` - Content-addressed: identical artifacts produce identical serialNumbers **Fallback**: If artifact digest is unavailable, UUIDv5 derived from sorted components is used for backwards compatibility. This produces a warning during validation. ### 3.2 bom-ref **Rule**: Use deterministic derivation based on purl or component identity. ```plaintext bom-ref = sha256(purl || name || version)[:12] // truncated hash ``` Or use the package URL directly if available: ```json { "bom-ref": "pkg:npm/lodash@4.17.21", "name": "lodash", "version": "4.17.21", "purl": "pkg:npm/lodash@4.17.21" } ``` **Anti-pattern**: Random UUIDs or incrementing counters as bom-ref. ### 3.3 SPDX Document Namespace **Rule**: Use artifact-derived namespace for SPDX documents. ```plaintext DocumentNamespace: https://stella-ops.org/spdx/sha256/ ``` --- ## 4. Ephemeral Data Policy Certain SBOM fields are inherently non-deterministic and should be handled carefully. ### 4.1 Prunable Fields These fields should be omitted or normalized before hashing: | Field | Treatment | |-------|-----------| | `metadata.timestamp` | Use fixed epoch or artifact build time | | `metadata.tools[].version` | Optional: pin tool versions | | File paths (absolute) | Convert to relative paths | | Environment variables | Exclude from SBOM | ### 4.2 Timestamp Strategy Option 1: **Fixed Epoch** (Recommended) ```json "timestamp": "1970-01-01T00:00:00Z" ``` Option 2: **Artifact Build Time** ```json "timestamp": "" ``` Option 3: **Omit Field** ```json // No timestamp field - allowed by CycloneDX ``` ### 4.3 Tool Metadata Tool information aids debugging but affects hashes: ```json "tools": [ { "vendor": "Stella Ops", "name": "stella-scanner", "version": "1.0.0" // Pin this version } ] ``` **Recommendation**: Pin tool versions in CI configuration to ensure reproducibility. --- ## 5. Verification Workflow ### 5.1 CLI Commands **Verify Canonical Form**: ```bash stella sbom verify input.json --canonical # Exit 0: Input is canonical # Exit 1: Input is not canonical (outputs SHA-256 of canonical form) ``` **Canonicalize and Output**: ```bash stella sbom verify input.json --canonical --output bom.canonical.json # Writes: bom.canonical.json (canonical SBOM) # Writes: bom.canonical.json.sha256 (digest sidecar) ``` **Verbose Output**: ```bash stella sbom verify input.json --canonical --verbose # SHA-256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 # Canonical: yes # Input size: 15234 bytes # Canonical size: 12456 bytes ``` ### 5.2 CI Gate Integration ```yaml # .gitea/workflows/sbom-gate.yaml steps: - name: Generate SBOM run: stella sbom generate --artifact ${{ artifact }} --output bom.json - name: Verify Canonical run: | stella sbom verify bom.json --canonical --output bom.canonical.json if [ $? -ne 0 ]; then echo "SBOM is not in canonical form" exit 1 fi - name: Sign SBOM run: stella sbom sign bom.canonical.json --key ${{ signing_key }} - name: Store Digest run: | DIGEST=$(cat bom.canonical.json.sha256) echo "SBOM_DIGEST=$DIGEST" >> $GITHUB_ENV ``` ### 5.3 Release Finalization At release finalization, the SBOM digest is captured: ```plaintext 1. Lookup SBOM for artifact: ISbomService.GetByDigestAsync(artifact.Digest) 2. Extract canonical digest: sbom.SbomSha256 3. Store on ReleaseComponent: component.SbomDigest = sbom.SbomSha256 4. Include in release manifest hash computation ``` --- ## 6. KPIs and Monitoring ### 6.1 Byte-Identical Rate **Metric**: Percentage of SBOM regenerations that produce identical bytes. **Target**: 100% for same artifact + same scanner version **Alert**: < 99.9% indicates non-determinism bug ### 6.2 Stable-Field Coverage **Metric**: Percentage of SBOM fields that are deterministic. | Field Type | Target | |------------|--------| | Component identifiers | 100% | | Hashes | 100% | | Dependencies | 100% | | Metadata timestamps | 95%+ (fixed epoch) | | Tool versions | 90%+ (pinned) | ### 6.3 Gate False Positives **Metric**: Signature verification failures due to hash mismatch. **Target**: 0% for valid artifacts **Investigation**: Any mismatch indicates canonicalization or regeneration issue. --- ## 7. Troubleshooting ### 7.1 Hash Mismatch on Regeneration **Symptom**: Same artifact produces different SBOM hashes. **Causes**: 1. **Timestamp drift**: Check if `metadata.timestamp` varies 2. **Tool version change**: Check scanner/tool versions 3. **Ordering instability**: Check component/dependency ordering 4. **Unicode normalization**: Check for composed vs decomposed characters **Debug**: ```bash # Compare two SBOMs stella sbom diff bom1.json bom2.json # Check canonical form stella sbom verify bom1.json --canonical --verbose stella sbom verify bom2.json --canonical --verbose ``` ### 7.2 serialNumber Warning **Symptom**: Warning `CDX_SERIAL_NON_DETERMINISTIC` during validation. **Cause**: SBOM uses `urn:uuid:` format instead of `urn:sha256:`. **Fix**: Ensure `ArtifactDigest` is provided when generating SBOM: ```csharp var document = new SbomDocument { Name = "my-app", ArtifactDigest = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", // ... }; ``` ### 7.3 Canonical vs Pretty-Printed **Symptom**: SBOM appears valid but fails canonical verification. **Cause**: SBOM was saved with indentation/formatting. **Fix**: ```bash # Convert to canonical form stella sbom verify input.json --canonical --output output.json # Use output.json for signing and storage ``` ### 7.4 Platform-Specific Differences **Symptom**: Same code produces different SBOMs on Windows vs Linux. **Causes**: 1. **Line endings**: CR+LF vs LF in embedded content 2. **Path separators**: `\` vs `/` in file paths 3. **Locale differences**: Number formatting, date formatting **Prevention**: - Normalize line endings in CI - Use forward slashes for paths - Use invariant culture for formatting --- ## References - [RFC 8785: JSON Canonicalization Scheme](https://tools.ietf.org/html/rfc8785) - [CycloneDX 1.6 Specification](https://cyclonedx.org/docs/1.6/json/) - [SPDX 2.3 Specification](https://spdx.github.io/spdx-spec/v2.3/) - `docs/modules/scanner/signed-sbom-archive-spec.md` - Archive format - `docs/modules/scanner/deterministic-sbom-compose.md` - Composition rules - `src/Attestor/__Libraries/StellaOps.Attestor.StandardPredicates/Writers/CycloneDxWriter.cs` - Implementation - `src/__Libraries/StellaOps.Canonical.Json/CanonJson.cs` - Canonicalization library --- ## Changelog | Date | Change | |------|--------| | 2026-01-19 | Initial creation (TASK-025-005) |