157 lines
5.5 KiB
Markdown
157 lines
5.5 KiB
Markdown
# Contract: Canonical SBOM Identifier (v1)
|
|
|
|
## Status
|
|
- Status: DRAFT (2026-02-19)
|
|
- Owners: Scanner Guild, Attestor Guild
|
|
- Consumers: Scanner, Attestor, VexLens, EvidenceLocker, Graph, Policy
|
|
|
|
## Purpose
|
|
Define a single, deterministic, cross-module identifier for any CycloneDX SBOM document. All modules that reference SBOMs must use this identifier for cross-module joins, evidence threading, and verification.
|
|
|
|
## Definition
|
|
```
|
|
canonical_id := "sha256:" + hex(SHA-256(JCS(sbom_json)))
|
|
```
|
|
|
|
Where:
|
|
- `sbom_json` is the raw CycloneDX JSON document (v1.4 through v1.7)
|
|
- `JCS` is JSON Canonicalization Scheme per RFC 8785
|
|
- `SHA-256` is the hash function
|
|
- `hex` is lowercase hexadecimal encoding
|
|
- The result is prefixed with `sha256:` for algorithm agility
|
|
|
|
## Canonicalization Rules (RFC 8785)
|
|
|
|
### Object Key Ordering
|
|
All JSON object keys are sorted lexicographically using Unicode code point ordering (equivalent to `StringComparer.Ordinal` in .NET).
|
|
|
|
### Number Serialization
|
|
- Integers: no leading zeros, no trailing zeros after decimal point
|
|
- Floating point: use shortest representation that round-trips exactly
|
|
- No scientific notation for integers
|
|
|
|
### String Serialization
|
|
- Use `\uXXXX` escaping for control characters (U+0000 through U+001F)
|
|
- Use minimal escaping (no unnecessary escape sequences)
|
|
- UTF-8 encoding
|
|
|
|
### Whitespace
|
|
- No whitespace between tokens (no indentation, no trailing newlines)
|
|
|
|
### Null Handling
|
|
- Null values are serialized as `null` (not omitted)
|
|
- Missing optional fields are omitted entirely
|
|
|
|
### Array Element Order
|
|
- Array elements maintain their original order (arrays are NOT sorted)
|
|
- This is critical: CycloneDX component arrays preserve document order
|
|
|
|
## Implementation Reference
|
|
|
|
### .NET Implementation
|
|
Use `StellaOps.AuditPack.Services.CanonicalJson.Canonicalize(ReadOnlySpan<byte> json)`:
|
|
```csharp
|
|
using StellaOps.AuditPack.Services;
|
|
using System.Security.Cryptography;
|
|
|
|
byte[] sbomJsonBytes = ...; // raw CycloneDX JSON
|
|
byte[] canonicalBytes = CanonicalJson.Canonicalize(sbomJsonBytes);
|
|
byte[] hash = SHA256.HashData(canonicalBytes);
|
|
string canonicalId = "sha256:" + Convert.ToHexString(hash).ToLowerInvariant();
|
|
```
|
|
|
|
### CLI Implementation
|
|
```bash
|
|
# Using jcs_canonicalize (or equivalent tool)
|
|
jcs_canonicalize ./bom.json | sha256sum | awk '{print "sha256:" $1}'
|
|
```
|
|
|
|
## Relationship to Existing Identifiers
|
|
|
|
### `ContentHash` (CycloneDxArtifact.ContentHash)
|
|
- **Current:** `sha256(raw_json_bytes)` -- hash of serialized JSON, NOT canonical
|
|
- **Relationship:** `ContentHash` is a serialization-specific hash that depends on whitespace, key ordering, and JSON serializer settings. `canonical_id` is a content-specific hash that is stable across serializers.
|
|
- **Coexistence:** Both identifiers are retained. `ContentHash` is used for integrity verification of a specific serialized form. `canonical_id` is used for cross-module reference and evidence threading.
|
|
|
|
### `stella.contentHash` (SBOM metadata property)
|
|
- **Current:** Same as `ContentHash` above, emitted as a metadata property
|
|
- **New:** `stella:canonical_id` is emitted as a separate metadata property alongside `stella.contentHash`
|
|
|
|
### `CompositionRecipeSha256`
|
|
- **Purpose:** Hash of the composition recipe (layer ordering), not the SBOM content itself
|
|
- **Relationship:** Independent. Composition recipe describes HOW the SBOM was built; `canonical_id` describes WHAT was built.
|
|
|
|
## DSSE Subject Binding
|
|
|
|
When a DSSE attestation is created for an SBOM (predicate type `StellaOps.SBOMAttestation@1`), the subject MUST include `canonical_id`:
|
|
|
|
```json
|
|
{
|
|
"_type": "https://in-toto.io/Statement/v1",
|
|
"subject": [
|
|
{
|
|
"name": "sbom",
|
|
"digest": {
|
|
"sha256": "<canonical_id_hex_without_prefix>"
|
|
}
|
|
}
|
|
],
|
|
"predicateType": "StellaOps.SBOMAttestation@1",
|
|
"predicate": { ... }
|
|
}
|
|
```
|
|
|
|
Note: The `subject.digest.sha256` value is the hex hash WITHOUT the `sha256:` prefix, following in-toto convention.
|
|
|
|
## Stability Guarantee
|
|
|
|
Given identical CycloneDX content (same components, same metadata, same vulnerabilities), `canonical_id` MUST produce the same value:
|
|
- Across different machines
|
|
- Across different .NET runtime versions
|
|
- Across serialization/deserialization round-trips
|
|
- Regardless of original JSON formatting (whitespace, key order)
|
|
|
|
This is the fundamental invariant that enables cross-module evidence joins.
|
|
|
|
## Test Vectors
|
|
|
|
### Vector 1: Minimal CycloneDX 1.7
|
|
Input:
|
|
```json
|
|
{"bomFormat":"CycloneDX","specVersion":"1.7","version":1,"components":[]}
|
|
```
|
|
Expected canonical form: `{"bomFormat":"CycloneDX","components":[],"specVersion":"1.7","version":1}`
|
|
Expected canonical_id: compute SHA-256 of the canonical form bytes.
|
|
|
|
### Vector 2: Key ordering
|
|
Input (keys out of order):
|
|
```json
|
|
{"specVersion":"1.7","bomFormat":"CycloneDX","version":1}
|
|
```
|
|
Expected canonical form: `{"bomFormat":"CycloneDX","specVersion":"1.7","version":1}`
|
|
Must produce same `canonical_id` as any other key ordering of the same content.
|
|
|
|
### Vector 3: Whitespace normalization
|
|
Input (pretty-printed):
|
|
```json
|
|
{
|
|
"bomFormat": "CycloneDX",
|
|
"specVersion": "1.7",
|
|
"version": 1
|
|
}
|
|
```
|
|
Must produce same `canonical_id` as the minified form.
|
|
|
|
## Migration Notes
|
|
|
|
- New attestations MUST include `canonical_id` in the DSSE subject
|
|
- Existing attestations are NOT backfilled (they retain their original subject digests)
|
|
- Verification of historical attestations uses their original subject binding
|
|
- The `stella:canonical_id` metadata property is added to new SBOMs only
|
|
|
|
## References
|
|
- RFC 8785: JSON Canonicalization Scheme (JCS)
|
|
- CycloneDX v1.7 specification
|
|
- DSSE v1.0 specification
|
|
- in-toto Statement v1 specification
|