Files
git.stella-ops.org/docs/contracts/canonical-sbom-id-v1.md
2026-02-19 22:07:11 +02:00

5.5 KiB

Contract: Canonical SBOM Identifier (v1)

Status

  • Status: DRAFT (2026-02-19)
  • Owners: Scanner Guild, Attestor Guild
  • Consumers: Scanner, Attestor, VexLens, EvidenceLocker, Graph, Policy

Purpose

Define a single, deterministic, cross-module identifier for any CycloneDX SBOM document. All modules that reference SBOMs must use this identifier for cross-module joins, evidence threading, and verification.

Definition

canonical_id := "sha256:" + hex(SHA-256(JCS(sbom_json)))

Where:

  • sbom_json is the raw CycloneDX JSON document (v1.4 through v1.7)
  • JCS is JSON Canonicalization Scheme per RFC 8785
  • SHA-256 is the hash function
  • hex is lowercase hexadecimal encoding
  • The result is prefixed with sha256: for algorithm agility

Canonicalization Rules (RFC 8785)

Object Key Ordering

All JSON object keys are sorted lexicographically using Unicode code point ordering (equivalent to StringComparer.Ordinal in .NET).

Number Serialization

  • Integers: no leading zeros, no trailing zeros after decimal point
  • Floating point: use shortest representation that round-trips exactly
  • No scientific notation for integers

String Serialization

  • Use \uXXXX escaping for control characters (U+0000 through U+001F)
  • Use minimal escaping (no unnecessary escape sequences)
  • UTF-8 encoding

Whitespace

  • No whitespace between tokens (no indentation, no trailing newlines)

Null Handling

  • Null values are serialized as null (not omitted)
  • Missing optional fields are omitted entirely

Array Element Order

  • Array elements maintain their original order (arrays are NOT sorted)
  • This is critical: CycloneDX component arrays preserve document order

Implementation Reference

.NET Implementation

Use StellaOps.AuditPack.Services.CanonicalJson.Canonicalize(ReadOnlySpan<byte> json):

using StellaOps.AuditPack.Services;
using System.Security.Cryptography;

byte[] sbomJsonBytes = ...; // raw CycloneDX JSON
byte[] canonicalBytes = CanonicalJson.Canonicalize(sbomJsonBytes);
byte[] hash = SHA256.HashData(canonicalBytes);
string canonicalId = "sha256:" + Convert.ToHexString(hash).ToLowerInvariant();

CLI Implementation

# Using jcs_canonicalize (or equivalent tool)
jcs_canonicalize ./bom.json | sha256sum | awk '{print "sha256:" $1}'

Relationship to Existing Identifiers

ContentHash (CycloneDxArtifact.ContentHash)

  • Current: sha256(raw_json_bytes) -- hash of serialized JSON, NOT canonical
  • Relationship: ContentHash is a serialization-specific hash that depends on whitespace, key ordering, and JSON serializer settings. canonical_id is a content-specific hash that is stable across serializers.
  • Coexistence: Both identifiers are retained. ContentHash is used for integrity verification of a specific serialized form. canonical_id is used for cross-module reference and evidence threading.

stella.contentHash (SBOM metadata property)

  • Current: Same as ContentHash above, emitted as a metadata property
  • New: stella:canonical_id is emitted as a separate metadata property alongside stella.contentHash

CompositionRecipeSha256

  • Purpose: Hash of the composition recipe (layer ordering), not the SBOM content itself
  • Relationship: Independent. Composition recipe describes HOW the SBOM was built; canonical_id describes WHAT was built.

DSSE Subject Binding

When a DSSE attestation is created for an SBOM (predicate type StellaOps.SBOMAttestation@1), the subject MUST include canonical_id:

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [
    {
      "name": "sbom",
      "digest": {
        "sha256": "<canonical_id_hex_without_prefix>"
      }
    }
  ],
  "predicateType": "StellaOps.SBOMAttestation@1",
  "predicate": { ... }
}

Note: The subject.digest.sha256 value is the hex hash WITHOUT the sha256: prefix, following in-toto convention.

Stability Guarantee

Given identical CycloneDX content (same components, same metadata, same vulnerabilities), canonical_id MUST produce the same value:

  • Across different machines
  • Across different .NET runtime versions
  • Across serialization/deserialization round-trips
  • Regardless of original JSON formatting (whitespace, key order)

This is the fundamental invariant that enables cross-module evidence joins.

Test Vectors

Vector 1: Minimal CycloneDX 1.7

Input:

{"bomFormat":"CycloneDX","specVersion":"1.7","version":1,"components":[]}

Expected canonical form: {"bomFormat":"CycloneDX","components":[],"specVersion":"1.7","version":1} Expected canonical_id: compute SHA-256 of the canonical form bytes.

Vector 2: Key ordering

Input (keys out of order):

{"specVersion":"1.7","bomFormat":"CycloneDX","version":1}

Expected canonical form: {"bomFormat":"CycloneDX","specVersion":"1.7","version":1} Must produce same canonical_id as any other key ordering of the same content.

Vector 3: Whitespace normalization

Input (pretty-printed):

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.7",
  "version": 1
}

Must produce same canonical_id as the minified form.

Migration Notes

  • New attestations MUST include canonical_id in the DSSE subject
  • Existing attestations are NOT backfilled (they retain their original subject digests)
  • Verification of historical attestations uses their original subject binding
  • The stella:canonical_id metadata property is added to new SBOMs only

References

  • RFC 8785: JSON Canonicalization Scheme (JCS)
  • CycloneDX v1.7 specification
  • DSSE v1.0 specification
  • in-toto Statement v1 specification