sprints work

This commit is contained in:
StellaOps Bot
2025-12-24 21:46:08 +02:00
parent 43e2af88f6
commit b9f71fc7e9
161 changed files with 29566 additions and 527 deletions

View File

@@ -0,0 +1,363 @@
# Sprint 8100.0012.0001 · Canonicalizer Versioning for Content-Addressed Identifiers
## Topic & Scope
Embed canonicalization version markers in content-addressed hashes to prevent future hash collisions when canonicalization logic evolves. This sprint delivers:
1. **Canonicalizer Version Constant**: Define `CanonVersion.V1 = "stella:canon:v1"` as a stable version identifier.
2. **Version-Prefixed Hashing**: Update `ContentAddressedIdGenerator` to include version marker in canonicalized payloads before hashing.
3. **Backward Compatibility**: Existing hashes remain valid; new hashes include version marker; verification can detect and handle both formats.
4. **Documentation**: Update architecture docs with canonicalization versioning rationale and upgrade path.
**Working directory:** `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/`, `src/__Libraries/StellaOps.Canonical.Json/`, `src/__Libraries/__Tests/`.
**Evidence:** All content-addressed IDs include version marker; determinism tests pass; backward compatibility verified; no hash collisions between v0 (legacy) and v1 (versioned).
---
## Dependencies & Concurrency
- **Depends on:** None (foundational change).
- **Blocks:** Sprint 8100.0012.0002 (Unified Evidence Model), Sprint 8100.0012.0003 (Graph Root Attestation) — both depend on stable versioned hashing.
- **Safe to run in parallel with:** Unrelated module work.
---
## Documentation Prerequisites
- `docs/modules/attestor/README.md` (Attestor architecture)
- `docs/modules/attestor/proof-chain.md` (Proof chain design)
- Product Advisory: Merkle-Hash REG (this sprint's origin)
---
## Problem Statement
### Current State
The `ContentAddressedIdGenerator` computes hashes by:
1. Serializing predicates to JSON with `JsonSerializer`
2. Canonicalizing via `IJsonCanonicalizer` (RFC 8785)
3. Computing SHA-256 of canonical bytes
**Problem:** If the canonicalization algorithm ever changes (bug fix, spec update, optimization), existing hashes become invalid with no way to distinguish which version produced them.
### Target State
Include a version marker in the canonical representation:
```json
{
"_canonVersion": "stella:canon:v1",
"evidenceSource": "...",
"sbomEntryId": "...",
...
}
```
The version marker:
- Is sorted first (underscore prefix ensures lexicographic ordering)
- Identifies the exact canonicalization algorithm used
- Enables verifiers to select the correct algorithm
- Allows graceful migration to future versions
---
## Design Specification
### CanonVersion Constants
```csharp
// src/__Libraries/StellaOps.Canonical.Json/CanonVersion.cs
namespace StellaOps.Canonical.Json;
/// <summary>
/// Canonicalization version identifiers for content-addressed hashing.
/// </summary>
public static class CanonVersion
{
/// <summary>
/// Version 1: RFC 8785 JSON canonicalization with:
/// - Ordinal key sorting
/// - No whitespace
/// - UTF-8 encoding without BOM
/// - IEEE 754 number formatting
/// </summary>
public const string V1 = "stella:canon:v1";
/// <summary>
/// Field name for version marker in canonical JSON.
/// Underscore prefix ensures it sorts first.
/// </summary>
public const string VersionFieldName = "_canonVersion";
/// <summary>
/// Current default version for new hashes.
/// </summary>
public const string Current = V1;
}
```
### Updated CanonJson API
```csharp
// src/__Libraries/StellaOps.Canonical.Json/CanonJson.cs (additions)
/// <summary>
/// Canonicalizes an object with version marker for content-addressed hashing.
/// </summary>
/// <typeparam name="T">The type to serialize.</typeparam>
/// <param name="obj">The object to canonicalize.</param>
/// <param name="version">Canonicalization version (default: Current).</param>
/// <returns>UTF-8 encoded canonical JSON bytes with version marker.</returns>
public static byte[] CanonicalizeVersioned<T>(T obj, string version = CanonVersion.Current)
{
var json = JsonSerializer.SerializeToUtf8Bytes(obj, DefaultOptions);
using var doc = JsonDocument.Parse(json);
using var ms = new MemoryStream();
using var writer = new Utf8JsonWriter(ms, new JsonWriterOptions { Indented = false });
writer.WriteStartObject();
writer.WriteString(CanonVersion.VersionFieldName, version);
// Write sorted properties from original object
foreach (var prop in doc.RootElement.EnumerateObject()
.OrderBy(p => p.Name, StringComparer.Ordinal))
{
writer.WritePropertyName(prop.Name);
WriteElementSorted(prop.Value, writer);
}
writer.WriteEndObject();
writer.Flush();
return ms.ToArray();
}
/// <summary>
/// Computes SHA-256 hash with version marker.
/// </summary>
public static string HashVersioned<T>(T obj, string version = CanonVersion.Current)
{
var canonical = CanonicalizeVersioned(obj, version);
return Sha256Hex(canonical);
}
/// <summary>
/// Computes prefixed SHA-256 hash with version marker.
/// </summary>
public static string HashVersionedPrefixed<T>(T obj, string version = CanonVersion.Current)
{
var canonical = CanonicalizeVersioned(obj, version);
return Sha256Prefixed(canonical);
}
```
### Updated ContentAddressedIdGenerator
```csharp
// src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Identifiers/ContentAddressedIdGenerator.cs
public EvidenceId ComputeEvidenceId(EvidencePredicate predicate)
{
ArgumentNullException.ThrowIfNull(predicate);
// Clear self-referential field, add version marker
var toHash = predicate with { EvidenceId = null };
var canonical = CanonicalizeVersioned(toHash, CanonVersion.Current);
return new EvidenceId(HashSha256Hex(canonical));
}
// Similar updates for ComputeReasoningId, ComputeVexVerdictId, etc.
private byte[] CanonicalizeVersioned<T>(T value, string version)
{
var json = JsonSerializer.SerializeToUtf8Bytes(value, SerializerOptions);
return _canonicalizer.CanonicalizeWithVersion(json, version);
}
```
### IJsonCanonicalizer Extension
```csharp
// src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Json/IJsonCanonicalizer.cs
public interface IJsonCanonicalizer
{
/// <summary>
/// Canonicalizes JSON bytes per RFC 8785.
/// </summary>
byte[] Canonicalize(ReadOnlySpan<byte> json);
/// <summary>
/// Canonicalizes JSON bytes with version marker prepended.
/// </summary>
byte[] CanonicalizeWithVersion(ReadOnlySpan<byte> json, string version);
}
```
---
## Backward Compatibility Strategy
### Phase 1: Dual-Mode (This Sprint)
- **Generation:** Always emit versioned hashes (v1)
- **Verification:** Accept both legacy (unversioned) and v1 hashes
- **Detection:** Check if canonical JSON starts with `{"_canonVersion":` to determine format
```csharp
public static bool IsVersionedHash(ReadOnlySpan<byte> canonicalJson)
{
// Check for version field at start (after lexicographic sorting, _ comes first)
return canonicalJson.Length > 20 &&
canonicalJson.StartsWith("{\"_canonVersion\":"u8);
}
```
### Phase 2: Migration (Future Sprint)
- Emit migration warnings for legacy hashes in logs
- Provide tooling to rehash attestations with version marker
- Document upgrade path in `docs/operations/canon-version-migration.md`
### Phase 3: Deprecation (Future Sprint)
- Remove legacy hash acceptance
- Fail verification for unversioned hashes
---
## Delivery Tracker
| # | Task ID | Status | Key dependency | Owners | Task Definition |
|---|---------|--------|----------------|--------|-----------------|
| **Wave 0 (Constants & Types)** | | | | | |
| 1 | CANON-8100-001 | DONE | None | Platform Guild | Create `CanonVersion.cs` with V1 constant and field name. |
| 2 | CANON-8100-002 | DONE | Task 1 | Platform Guild | Add `CanonicalizeVersioned<T>()` to `CanonJson.cs`. |
| 3 | CANON-8100-003 | DONE | Task 1 | Platform Guild | Add `HashVersioned<T>()` and `HashVersionedPrefixed<T>()` to `CanonJson.cs`. |
| **Wave 1 (Canonicalizer Updates)** | | | | | |
| 4 | CANON-8100-004 | DONE | Task 2 | Attestor Guild | Extend `IJsonCanonicalizer` with `CanonicalizeWithVersion()` method. |
| 5 | CANON-8100-005 | DONE | Task 4 | Attestor Guild | Implement `CanonicalizeWithVersion()` in `Rfc8785JsonCanonicalizer`. |
| 6 | CANON-8100-006 | DONE | Task 5 | Attestor Guild | Add `IsVersionedHash()` detection utility. |
| **Wave 2 (Generator Updates)** | | | | | |
| 7 | CANON-8100-007 | DONE | Tasks 4-6 | Attestor Guild | Update `ComputeEvidenceId()` to use versioned canonicalization. |
| 8 | CANON-8100-008 | DONE | Task 7 | Attestor Guild | Update `ComputeReasoningId()` to use versioned canonicalization. |
| 9 | CANON-8100-009 | DONE | Task 7 | Attestor Guild | Update `ComputeVexVerdictId()` to use versioned canonicalization. |
| 10 | CANON-8100-010 | DONE | Task 7 | Attestor Guild | Update `ComputeProofBundleId()` to use versioned canonicalization. |
| 11 | CANON-8100-011 | DONE | Task 7 | Attestor Guild | Update `ComputeGraphRevisionId()` to use versioned canonicalization. |
| **Wave 3 (Tests)** | | | | | |
| 12 | CANON-8100-012 | DONE | Tasks 7-11 | QA Guild | Add unit tests: versioned hash differs from legacy hash for same input. |
| 13 | CANON-8100-013 | DONE | Task 12 | QA Guild | Add determinism tests: same input + same version = same hash. |
| 14 | CANON-8100-014 | DONE | Task 12 | QA Guild | Add backward compatibility tests: verify both legacy and v1 hashes accepted. |
| 15 | CANON-8100-015 | DONE | Task 12 | QA Guild | Add golden file tests: snapshot of v1 canonical output for known inputs. |
| **Wave 4 (Documentation)** | | | | | |
| 16 | CANON-8100-016 | DONE | Tasks 7-11 | Docs Guild | Update `docs/modules/attestor/proof-chain.md` with versioning rationale. |
| 17 | CANON-8100-017 | DONE | Task 16 | Docs Guild | Create `docs/operations/canon-version-migration.md` with upgrade path. |
| 18 | CANON-8100-018 | DONE | Task 16 | Docs Guild | Update API reference with new `CanonJson` methods. |
---
## Wave Coordination
| Wave | Tasks | Focus | Evidence |
|------|-------|-------|----------|
| **Wave 0** | 1-3 | Constants and CanonJson API | `CanonVersion.cs` exists; `CanonJson` has versioned methods |
| **Wave 1** | 4-6 | Canonicalizer implementation | `IJsonCanonicalizer.CanonicalizeWithVersion()` works; detection utility works |
| **Wave 2** | 7-11 | Generator updates | All `Compute*Id()` methods use versioned hashing |
| **Wave 3** | 12-15 | Tests | All tests pass; golden files stable |
| **Wave 4** | 16-18 | Documentation | Docs updated; migration guide complete |
---
## Test Cases
### TC-001: Versioned Hash Differs from Legacy
```csharp
[Fact]
public void VersionedHash_DiffersFromLegacy_ForSameInput()
{
var predicate = new EvidencePredicate { /* ... */ };
var legacyHash = CanonJson.Hash(predicate);
var versionedHash = CanonJson.HashVersioned(predicate, CanonVersion.V1);
Assert.NotEqual(legacyHash, versionedHash);
}
```
### TC-002: Determinism Across Environments
```csharp
[Fact]
public void VersionedHash_IsDeterministic()
{
var predicate = new EvidencePredicate { /* ... */ };
var hash1 = CanonJson.HashVersioned(predicate, CanonVersion.V1);
var hash2 = CanonJson.HashVersioned(predicate, CanonVersion.V1);
Assert.Equal(hash1, hash2);
}
```
### TC-003: Version Field Sorts First
```csharp
[Fact]
public void VersionedCanonical_HasVersionFieldFirst()
{
var predicate = new EvidencePredicate { Source = "test" };
var canonical = CanonJson.CanonicalizeVersioned(predicate, CanonVersion.V1);
var json = Encoding.UTF8.GetString(canonical);
Assert.StartsWith("{\"_canonVersion\":\"stella:canon:v1\"", json);
}
```
### TC-004: Golden File Stability
```csharp
[Fact]
public async Task VersionedCanonical_MatchesGoldenFile()
{
var predicate = CreateKnownPredicate();
var canonical = CanonJson.CanonicalizeVersioned(predicate, CanonVersion.V1);
await Verify(Encoding.UTF8.GetString(canonical))
.UseDirectory("Golden")
.UseFileName("EvidencePredicate_v1");
}
```
---
## Decisions & Risks
### Decisions
| Decision | Rationale |
|----------|-----------|
| Use underscore prefix for version field | Ensures lexicographic first position |
| Version string format `stella:canon:v1` | Namespaced, unambiguous, extensible |
| Dual-mode verification initially | Backward compatibility for existing attestations |
| Version field in payload, not hash prefix | Keeps hash format consistent (sha256:...) |
### Risks
| Risk | Impact | Mitigation | Owner |
|------|--------|------------|-------|
| Existing attestations invalidated | Verification failures | Dual-mode verification; migration tooling | Attestor Guild |
| Performance overhead of version injection | Latency | Minimal (~100 bytes); benchmark | Platform Guild |
| Version field conflicts with user data | Hash collision | Reserved `_` prefix; schema validation | Attestor Guild |
| Future canonicalization changes | V2 needed | Design allows unlimited versions | Platform Guild |
---
## Execution Log
| Date (UTC) | Update | Owner |
|------------|--------|-------|
| 2025-12-24 | Sprint created from Merkle-Hash REG product advisory gap analysis. | Project Mgmt |
| 2025-12-24 | Wave 0-2 completed: CanonVersion.cs, CanonJson versioned methods, IJsonCanonicalizer.CanonicalizeWithVersion(), ContentAddressedIdGenerator updated. | Platform Guild |
| 2025-12-24 | Wave 3 completed: 33 unit tests added covering versioned vs legacy, determinism, backward compatibility, golden files, edge cases. All tests pass. | QA Guild |
| 2025-12-24 | Wave 4 completed: Updated proof-chain-specification.md with versioning section, created canon-version-migration.md guide, created canon-json.md API reference. Sprint complete. | Docs Guild |