5.9 KiB
Canonicalization Version Migration Guide
Version: 1.0 Status: Active Last Updated: 2025-12-24
Overview
This guide describes the migration path for content-addressed identifiers from legacy (unversioned) canonicalization to versioned canonicalization (stella:canon:v1). Versioned canonicalization embeds algorithm version markers in the canonical JSON before hashing, ensuring forward compatibility and verifier clarity.
Why Versioning?
Problem
Legacy content-addressed IDs (EvidenceID, ReasoningID, VEXVerdictID, ProofBundleID) were computed as:
hash = SHA256(canonicalize(payload))
If the canonicalization algorithm ever changed (bug fix, specification update, optimization), existing hashes would become invalid with no way to detect which algorithm produced them.
Solution
Versioned content-addressed IDs embed a version marker:
hash = SHA256(canonicalize_with_version(payload, "stella:canon:v1"))
The canonical JSON now includes _canonVersion as the first field:
{
"_canonVersion": "stella:canon:v1",
"sbomEntryId": "sha256:91f2ab3c:pkg:npm/lodash@4.17.21",
"vulnerabilityId": "CVE-2021-23337"
}
Migration Phases
Phase 1: Dual-Mode (Current)
Timeline: Now
Behavior:
- Generation: All new content-addressed IDs use versioned canonicalization (v1)
- Verification: Accept both legacy and v1 hashes
- Detection:
CanonVersion.IsVersioned()distinguishes format
Impact:
- Zero downtime migration
- Existing attestations remain valid
- New attestations get version markers
Phase 2: Deprecation Warning
Timeline: +6 months from Phase 1
Behavior:
- Log warnings when verifying legacy hashes
- Emit metrics for legacy hash encounters
- Continue accepting legacy hashes
Operator Action:
- Monitor
canon_legacy_hash_verified_totalmetric - Plan re-attestation of critical assets
Phase 3: Legacy Rejection
Timeline: +12 months from Phase 1
Behavior:
- Reject legacy hashes during verification
- Only v1 (or newer) hashes accepted
Operator Action:
- Re-attest any remaining legacy attestations before cutoff
- Use
stella rehashCLI command for bulk re-attestation
Detection and Verification
Detecting Versioned Hashes
Versioned canonical JSON always starts with {"_canonVersion": due to lexicographic sorting (underscore sorts before all ASCII letters).
using StellaOps.Canonical.Json;
// Check if canonical JSON is versioned
byte[] canonicalJson = GetCanonicalPayload();
bool isVersioned = CanonVersion.IsVersioned(canonicalJson);
// Extract version if present
string? version = CanonVersion.ExtractVersion(canonicalJson);
if (version == CanonVersion.V1)
{
// Use V1 verification algorithm
}
Verifying Both Formats
During Phase 1, verifiers should accept both formats:
public bool VerifyContentAddressedId(byte[] payload, string expectedHash)
{
// Try versioned first (new format)
if (CanonVersion.IsVersioned(payload))
{
var hash = CanonJson.HashVersioned(payload, CanonVersion.Current);
return hash == expectedHash;
}
// Fall back to legacy (unversioned)
var legacyHash = CanonJson.Hash(payload);
return legacyHash == expectedHash;
}
Re-Attestation Procedure
When to Re-Attest
Re-attestation is required when:
- Moving from Phase 1 to Phase 3
- Migrating attestations between systems
- Archiving for long-term storage
CLI Re-Attestation
# Re-attest a single attestation bundle
stella rehash --input attestation.json --output attestation-v1.json
# Bulk re-attest all attestations in a directory
stella rehash --input-dir /var/stella/attestations \
--output-dir /var/stella/attestations-v1 \
--version stella:canon:v1
# Dry-run to preview changes
stella rehash --input attestation.json --dry-run
Database Migration
For PostgreSQL-stored attestations:
-- Find legacy attestations (those without version marker)
SELECT id, content_hash, created_at
FROM attestations
WHERE NOT content LIKE '{"_canonVersion":%'
ORDER BY created_at;
-- Export for re-processing
COPY (
SELECT id, content
FROM attestations
WHERE NOT content LIKE '{"_canonVersion":%'
) TO '/tmp/legacy_attestations.csv' WITH CSV HEADER;
Troubleshooting
Hash Mismatch Errors
Symptom: Verification fails with "hash mismatch" error.
Diagnosis:
- Check if the stored hash was computed with legacy or versioned canonicalization
- Check the current verification mode (Phase 1/2/3)
Resolution:
# Inspect the attestation format
stella inspect attestation.json --show-version
# Output:
# Canonicalization Version: stella:canon:v1
# Hash Algorithm: SHA-256
# Computed Hash: sha256:7b8c9d0e...
Legacy Hash in Phase 3
Symptom: Verification rejected with "legacy hash not accepted" error.
Resolution:
- Re-attest the content with versioned canonicalization
- Update any references to the old hash
stella rehash --input old.json --output new.json
stella verify new.json # Should pass
Performance Considerations
Versioned canonicalization adds ~25-30 bytes to each canonical payload ({"_canonVersion":"stella:canon:v1",). This has negligible impact on:
- Hash computation time (<1μs overhead)
- Storage size (<0.1% increase for typical payloads)
- Network transfer (compression eliminates overhead)
Version History
| Version | Identifier | Status | Notes |
|---|---|---|---|
| V1 | stella:canon:v1 |
Current | RFC 8785 JSON canonicalization |
| Legacy | (none) | Deprecated | Pre-versioning; no version marker |