5.5 KiB
Contract: Canonical SBOM Identifier (v1)
Status
- Status: DRAFT (2026-02-19)
- Owners: Scanner Guild, Attestor Guild
- Consumers: Scanner, Attestor, VexLens, EvidenceLocker, Graph, Policy
Purpose
Define a single, deterministic, cross-module identifier for any CycloneDX SBOM document. All modules that reference SBOMs must use this identifier for cross-module joins, evidence threading, and verification.
Definition
canonical_id := "sha256:" + hex(SHA-256(JCS(sbom_json)))
Where:
sbom_jsonis the raw CycloneDX JSON document (v1.4 through v1.7)JCSis JSON Canonicalization Scheme per RFC 8785SHA-256is the hash functionhexis lowercase hexadecimal encoding- The result is prefixed with
sha256:for algorithm agility
Canonicalization Rules (RFC 8785)
Object Key Ordering
All JSON object keys are sorted lexicographically using Unicode code point ordering (equivalent to StringComparer.Ordinal in .NET).
Number Serialization
- Integers: no leading zeros, no trailing zeros after decimal point
- Floating point: use shortest representation that round-trips exactly
- No scientific notation for integers
String Serialization
- Use
\uXXXXescaping for control characters (U+0000 through U+001F) - Use minimal escaping (no unnecessary escape sequences)
- UTF-8 encoding
Whitespace
- No whitespace between tokens (no indentation, no trailing newlines)
Null Handling
- Null values are serialized as
null(not omitted) - Missing optional fields are omitted entirely
Array Element Order
- Array elements maintain their original order (arrays are NOT sorted)
- This is critical: CycloneDX component arrays preserve document order
Implementation Reference
.NET Implementation
Use StellaOps.AuditPack.Services.CanonicalJson.Canonicalize(ReadOnlySpan<byte> json):
using StellaOps.AuditPack.Services;
using System.Security.Cryptography;
byte[] sbomJsonBytes = ...; // raw CycloneDX JSON
byte[] canonicalBytes = CanonicalJson.Canonicalize(sbomJsonBytes);
byte[] hash = SHA256.HashData(canonicalBytes);
string canonicalId = "sha256:" + Convert.ToHexString(hash).ToLowerInvariant();
CLI Implementation
# Using jcs_canonicalize (or equivalent tool)
jcs_canonicalize ./bom.json | sha256sum | awk '{print "sha256:" $1}'
Relationship to Existing Identifiers
ContentHash (CycloneDxArtifact.ContentHash)
- Current:
sha256(raw_json_bytes)-- hash of serialized JSON, NOT canonical - Relationship:
ContentHashis a serialization-specific hash that depends on whitespace, key ordering, and JSON serializer settings.canonical_idis a content-specific hash that is stable across serializers. - Coexistence: Both identifiers are retained.
ContentHashis used for integrity verification of a specific serialized form.canonical_idis used for cross-module reference and evidence threading.
stella.contentHash (SBOM metadata property)
- Current: Same as
ContentHashabove, emitted as a metadata property - New:
stella:canonical_idis emitted as a separate metadata property alongsidestella.contentHash
CompositionRecipeSha256
- Purpose: Hash of the composition recipe (layer ordering), not the SBOM content itself
- Relationship: Independent. Composition recipe describes HOW the SBOM was built;
canonical_iddescribes WHAT was built.
DSSE Subject Binding
When a DSSE attestation is created for an SBOM (predicate type StellaOps.SBOMAttestation@1), the subject MUST include canonical_id:
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "sbom",
"digest": {
"sha256": "<canonical_id_hex_without_prefix>"
}
}
],
"predicateType": "StellaOps.SBOMAttestation@1",
"predicate": { ... }
}
Note: The subject.digest.sha256 value is the hex hash WITHOUT the sha256: prefix, following in-toto convention.
Stability Guarantee
Given identical CycloneDX content (same components, same metadata, same vulnerabilities), canonical_id MUST produce the same value:
- Across different machines
- Across different .NET runtime versions
- Across serialization/deserialization round-trips
- Regardless of original JSON formatting (whitespace, key order)
This is the fundamental invariant that enables cross-module evidence joins.
Test Vectors
Vector 1: Minimal CycloneDX 1.7
Input:
{"bomFormat":"CycloneDX","specVersion":"1.7","version":1,"components":[]}
Expected canonical form: {"bomFormat":"CycloneDX","components":[],"specVersion":"1.7","version":1}
Expected canonical_id: compute SHA-256 of the canonical form bytes.
Vector 2: Key ordering
Input (keys out of order):
{"specVersion":"1.7","bomFormat":"CycloneDX","version":1}
Expected canonical form: {"bomFormat":"CycloneDX","specVersion":"1.7","version":1}
Must produce same canonical_id as any other key ordering of the same content.
Vector 3: Whitespace normalization
Input (pretty-printed):
{
"bomFormat": "CycloneDX",
"specVersion": "1.7",
"version": 1
}
Must produce same canonical_id as the minified form.
Migration Notes
- New attestations MUST include
canonical_idin the DSSE subject - Existing attestations are NOT backfilled (they retain their original subject digests)
- Verification of historical attestations uses their original subject binding
- The
stella:canonical_idmetadata property is added to new SBOMs only
References
- RFC 8785: JSON Canonicalization Scheme (JCS)
- CycloneDX v1.7 specification
- DSSE v1.0 specification
- in-toto Statement v1 specification