# Contract: Canonical SBOM Identifier (v1) ## Status - Status: DRAFT (2026-02-19) - Owners: Scanner Guild, Attestor Guild - Consumers: Scanner, Attestor, VexLens, EvidenceLocker, Graph, Policy ## Purpose Define a single, deterministic, cross-module identifier for any CycloneDX SBOM document. All modules that reference SBOMs must use this identifier for cross-module joins, evidence threading, and verification. ## Definition ``` canonical_id := "sha256:" + hex(SHA-256(JCS(sbom_json))) ``` Where: - `sbom_json` is the raw CycloneDX JSON document (v1.4 through v1.7) - `JCS` is JSON Canonicalization Scheme per RFC 8785 - `SHA-256` is the hash function - `hex` is lowercase hexadecimal encoding - The result is prefixed with `sha256:` for algorithm agility ## Canonicalization Rules (RFC 8785) ### Object Key Ordering All JSON object keys are sorted lexicographically using Unicode code point ordering (equivalent to `StringComparer.Ordinal` in .NET). ### Number Serialization - Integers: no leading zeros, no trailing zeros after decimal point - Floating point: use shortest representation that round-trips exactly - No scientific notation for integers ### String Serialization - Use `\uXXXX` escaping for control characters (U+0000 through U+001F) - Use minimal escaping (no unnecessary escape sequences) - UTF-8 encoding ### Whitespace - No whitespace between tokens (no indentation, no trailing newlines) ### Null Handling - Null values are serialized as `null` (not omitted) - Missing optional fields are omitted entirely ### Array Element Order - Array elements maintain their original order (arrays are NOT sorted) - This is critical: CycloneDX component arrays preserve document order ## Implementation Reference ### .NET Implementation Use `StellaOps.AuditPack.Services.CanonicalJson.Canonicalize(ReadOnlySpan json)`: ```csharp using StellaOps.AuditPack.Services; using System.Security.Cryptography; byte[] sbomJsonBytes = ...; // raw CycloneDX JSON byte[] canonicalBytes = CanonicalJson.Canonicalize(sbomJsonBytes); byte[] hash = SHA256.HashData(canonicalBytes); string canonicalId = "sha256:" + Convert.ToHexString(hash).ToLowerInvariant(); ``` ### CLI Implementation ```bash # Using jcs_canonicalize (or equivalent tool) jcs_canonicalize ./bom.json | sha256sum | awk '{print "sha256:" $1}' ``` ## Relationship to Existing Identifiers ### `ContentHash` (CycloneDxArtifact.ContentHash) - **Current:** `sha256(raw_json_bytes)` -- hash of serialized JSON, NOT canonical - **Relationship:** `ContentHash` is a serialization-specific hash that depends on whitespace, key ordering, and JSON serializer settings. `canonical_id` is a content-specific hash that is stable across serializers. - **Coexistence:** Both identifiers are retained. `ContentHash` is used for integrity verification of a specific serialized form. `canonical_id` is used for cross-module reference and evidence threading. ### `stella.contentHash` (SBOM metadata property) - **Current:** Same as `ContentHash` above, emitted as a metadata property - **New:** `stella:canonical_id` is emitted as a separate metadata property alongside `stella.contentHash` ### `CompositionRecipeSha256` - **Purpose:** Hash of the composition recipe (layer ordering), not the SBOM content itself - **Relationship:** Independent. Composition recipe describes HOW the SBOM was built; `canonical_id` describes WHAT was built. ## DSSE Subject Binding When a DSSE attestation is created for an SBOM (predicate type `StellaOps.SBOMAttestation@1`), the subject MUST include `canonical_id`: ```json { "_type": "https://in-toto.io/Statement/v1", "subject": [ { "name": "sbom", "digest": { "sha256": "" } } ], "predicateType": "StellaOps.SBOMAttestation@1", "predicate": { ... } } ``` Note: The `subject.digest.sha256` value is the hex hash WITHOUT the `sha256:` prefix, following in-toto convention. ## Stability Guarantee Given identical CycloneDX content (same components, same metadata, same vulnerabilities), `canonical_id` MUST produce the same value: - Across different machines - Across different .NET runtime versions - Across serialization/deserialization round-trips - Regardless of original JSON formatting (whitespace, key order) This is the fundamental invariant that enables cross-module evidence joins. ## Test Vectors ### Vector 1: Minimal CycloneDX 1.7 Input: ```json {"bomFormat":"CycloneDX","specVersion":"1.7","version":1,"components":[]} ``` Expected canonical form: `{"bomFormat":"CycloneDX","components":[],"specVersion":"1.7","version":1}` Expected canonical_id: compute SHA-256 of the canonical form bytes. ### Vector 2: Key ordering Input (keys out of order): ```json {"specVersion":"1.7","bomFormat":"CycloneDX","version":1} ``` Expected canonical form: `{"bomFormat":"CycloneDX","specVersion":"1.7","version":1}` Must produce same `canonical_id` as any other key ordering of the same content. ### Vector 3: Whitespace normalization Input (pretty-printed): ```json { "bomFormat": "CycloneDX", "specVersion": "1.7", "version": 1 } ``` Must produce same `canonical_id` as the minified form. ## Migration Notes - New attestations MUST include `canonical_id` in the DSSE subject - Existing attestations are NOT backfilled (they retain their original subject digests) - Verification of historical attestations uses their original subject binding - The `stella:canonical_id` metadata property is added to new SBOMs only ## References - RFC 8785: JSON Canonicalization Scheme (JCS) - CycloneDX v1.7 specification - DSSE v1.0 specification - in-toto Statement v1 specification