Files
git.stella-ops.org/docs/operations/canon-version-migration.md
2025-12-24 21:45:46 +02:00

5.9 KiB

Canonicalization Version Migration Guide

Version: 1.0 Status: Active Last Updated: 2025-12-24


Overview

This guide describes the migration path for content-addressed identifiers from legacy (unversioned) canonicalization to versioned canonicalization (stella:canon:v1). Versioned canonicalization embeds algorithm version markers in the canonical JSON before hashing, ensuring forward compatibility and verifier clarity.

Why Versioning?

Problem

Legacy content-addressed IDs (EvidenceID, ReasoningID, VEXVerdictID, ProofBundleID) were computed as:

hash = SHA256(canonicalize(payload))

If the canonicalization algorithm ever changed (bug fix, specification update, optimization), existing hashes would become invalid with no way to detect which algorithm produced them.

Solution

Versioned content-addressed IDs embed a version marker:

hash = SHA256(canonicalize_with_version(payload, "stella:canon:v1"))

The canonical JSON now includes _canonVersion as the first field:

{
  "_canonVersion": "stella:canon:v1",
  "sbomEntryId": "sha256:91f2ab3c:pkg:npm/lodash@4.17.21",
  "vulnerabilityId": "CVE-2021-23337"
}

Migration Phases

Phase 1: Dual-Mode (Current)

Timeline: Now
Behavior:

  • Generation: All new content-addressed IDs use versioned canonicalization (v1)
  • Verification: Accept both legacy and v1 hashes
  • Detection: CanonVersion.IsVersioned() distinguishes format

Impact:

  • Zero downtime migration
  • Existing attestations remain valid
  • New attestations get version markers

Phase 2: Deprecation Warning

Timeline: +6 months from Phase 1
Behavior:

  • Log warnings when verifying legacy hashes
  • Emit metrics for legacy hash encounters
  • Continue accepting legacy hashes

Operator Action:

  • Monitor canon_legacy_hash_verified_total metric
  • Plan re-attestation of critical assets

Phase 3: Legacy Rejection

Timeline: +12 months from Phase 1
Behavior:

  • Reject legacy hashes during verification
  • Only v1 (or newer) hashes accepted

Operator Action:

  • Re-attest any remaining legacy attestations before cutoff
  • Use stella rehash CLI command for bulk re-attestation

Detection and Verification

Detecting Versioned Hashes

Versioned canonical JSON always starts with {"_canonVersion": due to lexicographic sorting (underscore sorts before all ASCII letters).

using StellaOps.Canonical.Json;

// Check if canonical JSON is versioned
byte[] canonicalJson = GetCanonicalPayload();
bool isVersioned = CanonVersion.IsVersioned(canonicalJson);

// Extract version if present
string? version = CanonVersion.ExtractVersion(canonicalJson);
if (version == CanonVersion.V1)
{
    // Use V1 verification algorithm
}

Verifying Both Formats

During Phase 1, verifiers should accept both formats:

public bool VerifyContentAddressedId(byte[] payload, string expectedHash)
{
    // Try versioned first (new format)
    if (CanonVersion.IsVersioned(payload))
    {
        var hash = CanonJson.HashVersioned(payload, CanonVersion.Current);
        return hash == expectedHash;
    }

    // Fall back to legacy (unversioned)
    var legacyHash = CanonJson.Hash(payload);
    return legacyHash == expectedHash;
}

Re-Attestation Procedure

When to Re-Attest

Re-attestation is required when:

  • Moving from Phase 1 to Phase 3
  • Migrating attestations between systems
  • Archiving for long-term storage

CLI Re-Attestation

# Re-attest a single attestation bundle
stella rehash --input attestation.json --output attestation-v1.json

# Bulk re-attest all attestations in a directory
stella rehash --input-dir /var/stella/attestations \
              --output-dir /var/stella/attestations-v1 \
              --version stella:canon:v1

# Dry-run to preview changes
stella rehash --input attestation.json --dry-run

Database Migration

For PostgreSQL-stored attestations:

-- Find legacy attestations (those without version marker)
SELECT id, content_hash, created_at
FROM attestations
WHERE NOT content LIKE '{"_canonVersion":%'
ORDER BY created_at;

-- Export for re-processing
COPY (
  SELECT id, content
  FROM attestations
  WHERE NOT content LIKE '{"_canonVersion":%'
) TO '/tmp/legacy_attestations.csv' WITH CSV HEADER;

Troubleshooting

Hash Mismatch Errors

Symptom: Verification fails with "hash mismatch" error.

Diagnosis:

  1. Check if the stored hash was computed with legacy or versioned canonicalization
  2. Check the current verification mode (Phase 1/2/3)

Resolution:

# Inspect the attestation format
stella inspect attestation.json --show-version

# Output:
# Canonicalization Version: stella:canon:v1
# Hash Algorithm: SHA-256
# Computed Hash: sha256:7b8c9d0e...

Legacy Hash in Phase 3

Symptom: Verification rejected with "legacy hash not accepted" error.

Resolution:

  1. Re-attest the content with versioned canonicalization
  2. Update any references to the old hash
stella rehash --input old.json --output new.json
stella verify new.json  # Should pass

Performance Considerations

Versioned canonicalization adds ~25-30 bytes to each canonical payload ({"_canonVersion":"stella:canon:v1",). This has negligible impact on:

  • Hash computation time (<1μs overhead)
  • Storage size (<0.1% increase for typical payloads)
  • Network transfer (compression eliminates overhead)

Version History

Version Identifier Status Notes
V1 stella:canon:v1 Current RFC 8785 JSON canonicalization
Legacy (none) Deprecated Pre-versioning; no version marker