Files
git.stella-ops.org/docs/operations/canon-version-migration.md
2025-12-24 21:45:46 +02:00

226 lines
5.9 KiB
Markdown

# Canonicalization Version Migration Guide
**Version**: 1.0
**Status**: Active
**Last Updated**: 2025-12-24
---
## Overview
This guide describes the migration path for content-addressed identifiers from legacy (unversioned) canonicalization to versioned canonicalization (`stella:canon:v1`). Versioned canonicalization embeds algorithm version markers in the canonical JSON before hashing, ensuring forward compatibility and verifier clarity.
## Why Versioning?
### Problem
Legacy content-addressed IDs (EvidenceID, ReasoningID, VEXVerdictID, ProofBundleID) were computed as:
```
hash = SHA256(canonicalize(payload))
```
If the canonicalization algorithm ever changed (bug fix, specification update, optimization), existing hashes would become invalid with no way to detect which algorithm produced them.
### Solution
Versioned content-addressed IDs embed a version marker:
```
hash = SHA256(canonicalize_with_version(payload, "stella:canon:v1"))
```
The canonical JSON now includes `_canonVersion` as the first field:
```json
{
"_canonVersion": "stella:canon:v1",
"sbomEntryId": "sha256:91f2ab3c:pkg:npm/lodash@4.17.21",
"vulnerabilityId": "CVE-2021-23337"
}
```
## Migration Phases
### Phase 1: Dual-Mode (Current)
**Timeline**: Now
**Behavior**:
- **Generation**: All new content-addressed IDs use versioned canonicalization (v1)
- **Verification**: Accept both legacy and v1 hashes
- **Detection**: `CanonVersion.IsVersioned()` distinguishes format
**Impact**:
- Zero downtime migration
- Existing attestations remain valid
- New attestations get version markers
### Phase 2: Deprecation Warning
**Timeline**: +6 months from Phase 1
**Behavior**:
- Log warnings when verifying legacy hashes
- Emit metrics for legacy hash encounters
- Continue accepting legacy hashes
**Operator Action**:
- Monitor `canon_legacy_hash_verified_total` metric
- Plan re-attestation of critical assets
### Phase 3: Legacy Rejection
**Timeline**: +12 months from Phase 1
**Behavior**:
- Reject legacy hashes during verification
- Only v1 (or newer) hashes accepted
**Operator Action**:
- Re-attest any remaining legacy attestations before cutoff
- Use `stella rehash` CLI command for bulk re-attestation
---
## Detection and Verification
### Detecting Versioned Hashes
Versioned canonical JSON always starts with `{"_canonVersion":` due to lexicographic sorting (underscore sorts before all ASCII letters).
```csharp
using StellaOps.Canonical.Json;
// Check if canonical JSON is versioned
byte[] canonicalJson = GetCanonicalPayload();
bool isVersioned = CanonVersion.IsVersioned(canonicalJson);
// Extract version if present
string? version = CanonVersion.ExtractVersion(canonicalJson);
if (version == CanonVersion.V1)
{
// Use V1 verification algorithm
}
```
### Verifying Both Formats
During Phase 1, verifiers should accept both formats:
```csharp
public bool VerifyContentAddressedId(byte[] payload, string expectedHash)
{
// Try versioned first (new format)
if (CanonVersion.IsVersioned(payload))
{
var hash = CanonJson.HashVersioned(payload, CanonVersion.Current);
return hash == expectedHash;
}
// Fall back to legacy (unversioned)
var legacyHash = CanonJson.Hash(payload);
return legacyHash == expectedHash;
}
```
---
## Re-Attestation Procedure
### When to Re-Attest
Re-attestation is required when:
- Moving from Phase 1 to Phase 3
- Migrating attestations between systems
- Archiving for long-term storage
### CLI Re-Attestation
```bash
# Re-attest a single attestation bundle
stella rehash --input attestation.json --output attestation-v1.json
# Bulk re-attest all attestations in a directory
stella rehash --input-dir /var/stella/attestations \
--output-dir /var/stella/attestations-v1 \
--version stella:canon:v1
# Dry-run to preview changes
stella rehash --input attestation.json --dry-run
```
### Database Migration
For PostgreSQL-stored attestations:
```sql
-- Find legacy attestations (those without version marker)
SELECT id, content_hash, created_at
FROM attestations
WHERE NOT content LIKE '{"_canonVersion":%'
ORDER BY created_at;
-- Export for re-processing
COPY (
SELECT id, content
FROM attestations
WHERE NOT content LIKE '{"_canonVersion":%'
) TO '/tmp/legacy_attestations.csv' WITH CSV HEADER;
```
---
## Troubleshooting
### Hash Mismatch Errors
**Symptom**: Verification fails with "hash mismatch" error.
**Diagnosis**:
1. Check if the stored hash was computed with legacy or versioned canonicalization
2. Check the current verification mode (Phase 1/2/3)
**Resolution**:
```bash
# Inspect the attestation format
stella inspect attestation.json --show-version
# Output:
# Canonicalization Version: stella:canon:v1
# Hash Algorithm: SHA-256
# Computed Hash: sha256:7b8c9d0e...
```
### Legacy Hash in Phase 3
**Symptom**: Verification rejected with "legacy hash not accepted" error.
**Resolution**:
1. Re-attest the content with versioned canonicalization
2. Update any references to the old hash
```bash
stella rehash --input old.json --output new.json
stella verify new.json # Should pass
```
### Performance Considerations
Versioned canonicalization adds ~25-30 bytes to each canonical payload (`{"_canonVersion":"stella:canon:v1",`). This has negligible impact on:
- Hash computation time (<1μs overhead)
- Storage size (<0.1% increase for typical payloads)
- Network transfer (compression eliminates overhead)
---
## Version History
| Version | Identifier | Status | Notes |
|---------|------------|--------|-------|
| V1 | `stella:canon:v1` | **Current** | RFC 8785 JSON canonicalization |
| Legacy | (none) | Deprecated | Pre-versioning; no version marker |
## Related Documentation
- [Proof Chain Specification](../modules/attestor/proof-chain-specification.md)
- [Content-Addressed Identifier System](../modules/attestor/proof-chain-specification.md#content-addressed-identifier-system)
- [CanonJson API Reference](../api/canon-json.md)