# Canonicalization Version Migration Guide **Version**: 1.0 **Status**: Active **Last Updated**: 2025-12-24 --- ## Overview This guide describes the migration path for content-addressed identifiers from legacy (unversioned) canonicalization to versioned canonicalization (`stella:canon:v1`). Versioned canonicalization embeds algorithm version markers in the canonical JSON before hashing, ensuring forward compatibility and verifier clarity. ## Why Versioning? ### Problem Legacy content-addressed IDs (EvidenceID, ReasoningID, VEXVerdictID, ProofBundleID) were computed as: ``` hash = SHA256(canonicalize(payload)) ``` If the canonicalization algorithm ever changed (bug fix, specification update, optimization), existing hashes would become invalid with no way to detect which algorithm produced them. ### Solution Versioned content-addressed IDs embed a version marker: ``` hash = SHA256(canonicalize_with_version(payload, "stella:canon:v1")) ``` The canonical JSON now includes `_canonVersion` as the first field: ```json { "_canonVersion": "stella:canon:v1", "sbomEntryId": "sha256:91f2ab3c:pkg:npm/lodash@4.17.21", "vulnerabilityId": "CVE-2021-23337" } ``` ## Migration Phases ### Phase 1: Dual-Mode (Current) **Timeline**: Now **Behavior**: - **Generation**: All new content-addressed IDs use versioned canonicalization (v1) - **Verification**: Accept both legacy and v1 hashes - **Detection**: `CanonVersion.IsVersioned()` distinguishes format **Impact**: - Zero downtime migration - Existing attestations remain valid - New attestations get version markers ### Phase 2: Deprecation Warning **Timeline**: +6 months from Phase 1 **Behavior**: - Log warnings when verifying legacy hashes - Emit metrics for legacy hash encounters - Continue accepting legacy hashes **Operator Action**: - Monitor `canon_legacy_hash_verified_total` metric - Plan re-attestation of critical assets ### Phase 3: Legacy Rejection **Timeline**: +12 months from Phase 1 **Behavior**: - Reject legacy hashes during verification - Only v1 (or newer) hashes accepted **Operator Action**: - Re-attest any remaining legacy attestations before cutoff - Use `stella rehash` CLI command for bulk re-attestation --- ## Detection and Verification ### Detecting Versioned Hashes Versioned canonical JSON always starts with `{"_canonVersion":` due to lexicographic sorting (underscore sorts before all ASCII letters). ```csharp using StellaOps.Canonical.Json; // Check if canonical JSON is versioned byte[] canonicalJson = GetCanonicalPayload(); bool isVersioned = CanonVersion.IsVersioned(canonicalJson); // Extract version if present string? version = CanonVersion.ExtractVersion(canonicalJson); if (version == CanonVersion.V1) { // Use V1 verification algorithm } ``` ### Verifying Both Formats During Phase 1, verifiers should accept both formats: ```csharp public bool VerifyContentAddressedId(byte[] payload, string expectedHash) { // Try versioned first (new format) if (CanonVersion.IsVersioned(payload)) { var hash = CanonJson.HashVersioned(payload, CanonVersion.Current); return hash == expectedHash; } // Fall back to legacy (unversioned) var legacyHash = CanonJson.Hash(payload); return legacyHash == expectedHash; } ``` --- ## Re-Attestation Procedure ### When to Re-Attest Re-attestation is required when: - Moving from Phase 1 to Phase 3 - Migrating attestations between systems - Archiving for long-term storage ### CLI Re-Attestation ```bash # Re-attest a single attestation bundle stella rehash --input attestation.json --output attestation-v1.json # Bulk re-attest all attestations in a directory stella rehash --input-dir /var/stella/attestations \ --output-dir /var/stella/attestations-v1 \ --version stella:canon:v1 # Dry-run to preview changes stella rehash --input attestation.json --dry-run ``` ### Database Migration For PostgreSQL-stored attestations: ```sql -- Find legacy attestations (those without version marker) SELECT id, content_hash, created_at FROM attestations WHERE NOT content LIKE '{"_canonVersion":%' ORDER BY created_at; -- Export for re-processing COPY ( SELECT id, content FROM attestations WHERE NOT content LIKE '{"_canonVersion":%' ) TO '/tmp/legacy_attestations.csv' WITH CSV HEADER; ``` --- ## Troubleshooting ### Hash Mismatch Errors **Symptom**: Verification fails with "hash mismatch" error. **Diagnosis**: 1. Check if the stored hash was computed with legacy or versioned canonicalization 2. Check the current verification mode (Phase 1/2/3) **Resolution**: ```bash # Inspect the attestation format stella inspect attestation.json --show-version # Output: # Canonicalization Version: stella:canon:v1 # Hash Algorithm: SHA-256 # Computed Hash: sha256:7b8c9d0e... ``` ### Legacy Hash in Phase 3 **Symptom**: Verification rejected with "legacy hash not accepted" error. **Resolution**: 1. Re-attest the content with versioned canonicalization 2. Update any references to the old hash ```bash stella rehash --input old.json --output new.json stella verify new.json # Should pass ``` ### Performance Considerations Versioned canonicalization adds ~25-30 bytes to each canonical payload (`{"_canonVersion":"stella:canon:v1",`). This has negligible impact on: - Hash computation time (<1μs overhead) - Storage size (<0.1% increase for typical payloads) - Network transfer (compression eliminates overhead) --- ## Version History | Version | Identifier | Status | Notes | |---------|------------|--------|-------| | V1 | `stella:canon:v1` | **Current** | RFC 8785 JSON canonicalization | | Legacy | (none) | Deprecated | Pre-versioning; no version marker | ## Related Documentation - [Proof Chain Specification](../modules/attestor/proof-chain-specification.md) - [Content-Addressed Identifier System](../modules/attestor/proof-chain-specification.md#content-addressed-identifier-system) - [CanonJson API Reference](../api/canon-json.md)