Files
git.stella-ops.org/docs/sboms/DETERMINISM.md

9.9 KiB

SBOM Determinism Guide

Sprint: SPRINT_20260118_025_ReleaseOrchestrator_sbom_release_association Task: TASK-025-005 Status: Living Document

This document consolidates all determinism requirements for Stella Ops SBOMs. Deterministic SBOMs are critical for reproducible builds, verifiable release gates, and trust chain integrity.


1. Why Determinism Matters

1.1 Reproducibility

Deterministic SBOMs ensure that scanning the same artifact multiple times produces identical output. This is essential for:

  • CI/CD Reliability: Re-running a pipeline should produce the same SBOM hash
  • Audit Trails: Evidence submitted to compliance frameworks must be reproducible
  • Caching: Content-addressed storage can deduplicate identical SBOMs
  • Debugging: Engineers can reproduce exact SBOM state from artifact digest

1.2 Verifiable Gates

Policy gates rely on SBOM hashes for trust verification:

Artifact Digest → SBOM Generation → Canonical Hash → DSSE Signature → Policy Evaluation

If SBOM generation is non-deterministic, the same artifact could produce different hashes, breaking:

  • Signature verification (hash mismatch)
  • Gate decisions (different vulnerability sets)
  • Attestation chains (broken proof lineage)

1.3 Trust Chaining

Evidence chains require stable identifiers. A release component's SbomDigest must match the SBOM retrieved later for verification. Non-determinism breaks this chain:

Release Finalization:     SbomDigest = sha256:abc123...
Later Verification:       sha256(regenerated-sbom) = sha256:xyz789...  ← BROKEN

2. Canonicalization Rules

Stella Ops uses RFC 8785 JSON Canonicalization Scheme (JCS) for deterministic JSON serialization.

2.1 Core JCS Rules

  1. No Whitespace: Output has no formatting, newlines, or indentation
  2. Sorted Keys: Object keys are sorted lexicographically (Unicode code point order)
  3. Normalized Numbers: No leading zeros, no trailing decimal zeros, no positive exponent sign
  4. UTF-8 Encoding: All strings encoded as UTF-8 without BOM
  5. No Duplicate Keys: Object keys must be unique

2.2 Implementation

// Using StellaOps.Canonical.Json
using StellaOps.Canonical.Json;

// Canonicalize raw JSON bytes
byte[] canonical = CanonJson.CanonicalizeParsedJson(jsonBytes);

// Compute SHA-256 of canonical form
string digest = CanonJson.Sha256Hex(canonical);

2.3 SBOM-Specific Ordering

Beyond JCS, Stella Ops applies additional ordering for SBOM elements:

Element Ordering Strategy
components Sorted by bom-ref (Ordinal)
dependencies Sorted by ref (Ordinal)
hashes Sorted by alg (Ordinal)
licenses Sorted by license ID (Ordinal)
dependsOn Sorted lexicographically

This ensures component order doesn't affect the canonical hash.


3. Identity Field Derivation

3.1 serialNumber (CycloneDX)

Rule: Use urn:sha256:<artifact-digest> format for deterministic identification.

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.7",
  "serialNumber": "urn:sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}

Benefits:

  • Directly ties SBOM identity to the artifact it describes
  • Enables verification: serialNumber == urn:sha256:$(sha256sum artifact)
  • Content-addressed: identical artifacts produce identical serialNumbers

Fallback: If artifact digest is unavailable, UUIDv5 derived from sorted components is used for backwards compatibility. This produces a warning during validation.

3.2 bom-ref

Rule: Use deterministic derivation based on purl or component identity.

bom-ref = sha256(purl || name || version)[:12]  // truncated hash

Or use the package URL directly if available:

{
  "bom-ref": "pkg:npm/lodash@4.17.21",
  "name": "lodash",
  "version": "4.17.21",
  "purl": "pkg:npm/lodash@4.17.21"
}

Anti-pattern: Random UUIDs or incrementing counters as bom-ref.

3.3 SPDX Document Namespace

Rule: Use artifact-derived namespace for SPDX documents.

DocumentNamespace: https://stella-ops.org/spdx/sha256/<artifact-digest>

4. Ephemeral Data Policy

Certain SBOM fields are inherently non-deterministic and should be handled carefully.

4.1 Prunable Fields

These fields should be omitted or normalized before hashing:

Field Treatment
metadata.timestamp Use fixed epoch or artifact build time
metadata.tools[].version Optional: pin tool versions
File paths (absolute) Convert to relative paths
Environment variables Exclude from SBOM

4.2 Timestamp Strategy

Option 1: Fixed Epoch (Recommended)

"timestamp": "1970-01-01T00:00:00Z"

Option 2: Artifact Build Time

"timestamp": "<artifact-created-at>"

Option 3: Omit Field

// No timestamp field - allowed by CycloneDX

4.3 Tool Metadata

Tool information aids debugging but affects hashes:

"tools": [
  {
    "vendor": "Stella Ops",
    "name": "stella-scanner",
    "version": "1.0.0"  // Pin this version
  }
]

Recommendation: Pin tool versions in CI configuration to ensure reproducibility.


5. Verification Workflow

5.1 CLI Commands

Verify Canonical Form:

stella sbom verify input.json --canonical
# Exit 0: Input is canonical
# Exit 1: Input is not canonical (outputs SHA-256 of canonical form)

Canonicalize and Output:

stella sbom verify input.json --canonical --output bom.canonical.json
# Writes: bom.canonical.json (canonical SBOM)
# Writes: bom.canonical.json.sha256 (digest sidecar)

Verbose Output:

stella sbom verify input.json --canonical --verbose
# SHA-256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Canonical: yes
# Input size: 15234 bytes
# Canonical size: 12456 bytes

5.2 CI Gate Integration

# .gitea/workflows/sbom-gate.yaml
steps:
  - name: Generate SBOM
    run: stella sbom generate --artifact ${{ artifact }} --output bom.json

  - name: Verify Canonical
    run: |
      stella sbom verify bom.json --canonical --output bom.canonical.json
      if [ $? -ne 0 ]; then
        echo "SBOM is not in canonical form"
        exit 1
      fi

  - name: Sign SBOM
    run: stella sbom sign bom.canonical.json --key ${{ signing_key }}

  - name: Store Digest
    run: |
      DIGEST=$(cat bom.canonical.json.sha256)
      echo "SBOM_DIGEST=$DIGEST" >> $GITHUB_ENV

5.3 Release Finalization

At release finalization, the SBOM digest is captured:

1. Lookup SBOM for artifact: ISbomService.GetByDigestAsync(artifact.Digest)
2. Extract canonical digest: sbom.SbomSha256
3. Store on ReleaseComponent: component.SbomDigest = sbom.SbomSha256
4. Include in release manifest hash computation

6. KPIs and Monitoring

6.1 Byte-Identical Rate

Metric: Percentage of SBOM regenerations that produce identical bytes.

Target: 100% for same artifact + same scanner version

Alert: < 99.9% indicates non-determinism bug

6.2 Stable-Field Coverage

Metric: Percentage of SBOM fields that are deterministic.

Field Type Target
Component identifiers 100%
Hashes 100%
Dependencies 100%
Metadata timestamps 95%+ (fixed epoch)
Tool versions 90%+ (pinned)

6.3 Gate False Positives

Metric: Signature verification failures due to hash mismatch.

Target: 0% for valid artifacts

Investigation: Any mismatch indicates canonicalization or regeneration issue.


7. Troubleshooting

7.1 Hash Mismatch on Regeneration

Symptom: Same artifact produces different SBOM hashes.

Causes:

  1. Timestamp drift: Check if metadata.timestamp varies
  2. Tool version change: Check scanner/tool versions
  3. Ordering instability: Check component/dependency ordering
  4. Unicode normalization: Check for composed vs decomposed characters

Debug:

# Compare two SBOMs
stella sbom diff bom1.json bom2.json

# Check canonical form
stella sbom verify bom1.json --canonical --verbose
stella sbom verify bom2.json --canonical --verbose

7.2 serialNumber Warning

Symptom: Warning CDX_SERIAL_NON_DETERMINISTIC during validation.

Cause: SBOM uses urn:uuid: format instead of urn:sha256:.

Fix: Ensure ArtifactDigest is provided when generating SBOM:

var document = new SbomDocument
{
    Name = "my-app",
    ArtifactDigest = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    // ...
};

7.3 Canonical vs Pretty-Printed

Symptom: SBOM appears valid but fails canonical verification.

Cause: SBOM was saved with indentation/formatting.

Fix:

# Convert to canonical form
stella sbom verify input.json --canonical --output output.json

# Use output.json for signing and storage

7.4 Platform-Specific Differences

Symptom: Same code produces different SBOMs on Windows vs Linux.

Causes:

  1. Line endings: CR+LF vs LF in embedded content
  2. Path separators: \ vs / in file paths
  3. Locale differences: Number formatting, date formatting

Prevention:

  • Normalize line endings in CI
  • Use forward slashes for paths
  • Use invariant culture for formatting

References


Changelog

Date Change
2026-01-19 Initial creation (TASK-025-005)