# Canonicalization & Determinism Patterns **Version:** 1.0 **Date:** December 2025 **Sprint:** SPRINT_20251226_007_BE_determinism_gaps (DET-GAP-20) > **Audience:** All StellaOps contributors working on code that produces digests, attestations, or replayable outputs. > **Goal:** Ensure byte-identical outputs for identical inputs across platforms, time, and Rust/Go/Node re-implementations. --- ## 1. Why Determinism Matters StellaOps is built on **proof-of-state**: every verdict, attestation, and replay must be reproducible. Non-determinism breaks: - **Signature verification:** Different serialization → different digest → invalid signature. - **Replay guarantees:** Feed snapshots that produce different hashes cannot be replayed. - **Audit trails:** Compliance teams require bit-exact reproduction of historical scans. - **Cross-platform compatibility:** Windows/Linux/macOS must produce identical outputs. --- ## 2. RFC 8785 JSON Canonicalization Scheme (JCS) All JSON that participates in digest computation **must** use RFC 8785 JCS. This includes: - Attestation payloads (DSSE) - Verdict JSON - Policy evaluation results - Feed snapshot manifests - Proof bundles ### 2.1 The Rfc8785JsonCanonicalizer Use the `Rfc8785JsonCanonicalizer` class for all canonical JSON operations: ```csharp using StellaOps.Attestor.ProofChain.Json; // Create canonicalizer (optionally with NFC normalization) var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true); // Canonicalize JSON string canonical = canonicalizer.Canonicalize(jsonString); // Or from JsonElement string canonical = canonicalizer.Canonicalize(jsonElement); ``` ### 2.2 JCS Rules Summary RFC 8785 requires: 1. **No whitespace** between tokens. 2. **Lexicographic key ordering** within objects. 3. **Number serialization:** No leading zeros, no trailing zeros after decimal, integers without decimal point. 4. **String escaping:** Minimal escaping (only `"`, `\`, and control chars). 5. **UTF-8 encoding** without BOM. ### 2.3 Common Mistakes ❌ **Wrong:** Using `JsonSerializer.Serialize()` directly for digest input. ```csharp // WRONG - non-deterministic ordering var json = JsonSerializer.Serialize(obj); var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json)); ``` ✅ **Correct:** Canonicalize before hashing. ```csharp // CORRECT - deterministic var canonicalizer = new Rfc8785JsonCanonicalizer(); var canonical = canonicalizer.Canonicalize(obj); var hash = SHA256.HashData(Encoding.UTF8.GetBytes(canonical)); ``` --- ## 3. Unicode NFC Normalization Different platforms may store the same string in different Unicode normalization forms. Enable NFC normalization when: - Processing user-supplied strings - Aggregating data from multiple sources - Working with file paths or identifiers from different systems ```csharp // Enable NFC for cross-platform string stability var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true); ``` When NFC is enabled, all strings are normalized via `string.Normalize(NormalizationForm.FormC)` before serialization. --- ## 4. Resolver Boundary Pattern **Key principle:** All data entering or leaving a "resolver" (a service that produces verdicts, attestations, or replayable state) must be canonicalized. ### 4.1 What Is a Resolver Boundary? A resolver boundary is any point where: - Data is **serialized** for storage, transmission, or signing - Data is **hashed** to produce a digest - Data is **compared** for equality in replay validation ### 4.2 Boundary Enforcement At resolver boundaries: 1. **Canonicalize** all JSON payloads using `Rfc8785JsonCanonicalizer`. 2. **Sort** collections deterministically (alphabetically by key or ID). 3. **Normalize** timestamps to ISO 8601 UTC with `Z` suffix. 4. **Freeze** dictionaries using `FrozenDictionary` for stable iteration order. ### 4.3 Example: Feed Snapshot Coordinator ```csharp public sealed class FeedSnapshotCoordinatorService : IFeedSnapshotCoordinator { private readonly FrozenDictionary _providers; public FeedSnapshotCoordinatorService(IEnumerable providers, ...) { // Sort providers alphabetically for deterministic digest computation _providers = providers .OrderBy(p => p.SourceId, StringComparer.Ordinal) .ToFrozenDictionary(p => p.SourceId, p => p, StringComparer.OrdinalIgnoreCase); } private string ComputeCompositeDigest(IReadOnlyList sources) { // Sources are already sorted by SourceId (alphabetically) using var sha256 = SHA256.Create(); foreach (var source in sources.OrderBy(s => s.SourceId, StringComparer.Ordinal)) { // Append each source digest to the hash computation var digestBytes = Encoding.UTF8.GetBytes(source.Digest); sha256.TransformBlock(digestBytes, 0, digestBytes.Length, null, 0); } sha256.TransformFinalBlock([], 0, 0); return $"sha256:{Convert.ToHexString(sha256.Hash!).ToLowerInvariant()}"; } } ``` --- ## 5. Timestamp Handling ### 5.1 Rules 1. **Always use UTC** - never local time. 2. **ISO 8601 format** with `Z` suffix: `2025-12-27T14:30:00Z` 3. **Consistent precision** - truncate to seconds unless milliseconds are required. 4. **Use TimeProvider** for testability. ### 5.2 Example ```csharp // CORRECT - UTC with Z suffix var timestamp = timeProvider.GetUtcNow().ToString("yyyy-MM-ddTHH:mm:ssZ"); // WRONG - local time var wrong = DateTime.Now.ToString("o"); // WRONG - inconsistent format var wrong2 = DateTimeOffset.UtcNow.ToString(); ``` --- ## 6. Numeric Stability ### 6.1 Avoid Floating Point for Determinism Floating-point arithmetic can produce different results on different platforms. For deterministic values: - Use `decimal` for scores, percentages, and monetary values. - Use `int` or `long` for counts and identifiers. - If floating-point is unavoidable, document the acceptable epsilon and rounding rules. ### 6.2 Number Serialization RFC 8785 requires specific number formatting: - Integers: no decimal point (`42`, not `42.0`) - Decimals: no trailing zeros (`3.14`, not `3.140`) - No leading zeros (`0.5`, not `00.5`) The `Rfc8785JsonCanonicalizer` handles this automatically. --- ## 7. Collection Ordering ### 7.1 Rule All collections that participate in digest computation must have **deterministic order**. ### 7.2 Implementation ```csharp // CORRECT - use FrozenDictionary for stable iteration var orderedDict = items .OrderBy(x => x.Key, StringComparer.Ordinal) .ToFrozenDictionary(x => x.Key, x => x.Value); // CORRECT - sort before iteration foreach (var item in items.OrderBy(x => x.Id, StringComparer.Ordinal)) { // ... } // WRONG - iteration order is undefined foreach (var item in dictionary) { // Order may vary between runs } ``` --- ## 8. Audit Hash Logging For debugging determinism issues, use the `AuditHashLogger`: ```csharp using StellaOps.Attestor.ProofChain.Audit; var auditLogger = new AuditHashLogger(logger); // Log both raw and canonical hashes auditLogger.LogHashAudit( rawContent, canonicalContent, "sha256:abc...", "verdict", "scan-123", metadata); ``` This enables post-mortem analysis of canonicalization issues. --- ## 9. Testing Determinism ### 9.1 Required Tests Every component that produces digests must have tests verifying: 1. **Idempotency:** Same input → same digest (multiple calls). 2. **Permutation invariance:** Reordering input collections → same digest. 3. **Cross-platform:** Windows/Linux/macOS produce identical outputs. ### 9.2 Example Test ```csharp [Fact] public async Task CreateSnapshot_ProducesDeterministicDigest() { // Arrange var sources = CreateTestSources(); // Act - create multiple snapshots with same data var bundle1 = await coordinator.CreateSnapshotAsync(); var bundle2 = await coordinator.CreateSnapshotAsync(); // Assert - digests must be identical Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest); } [Fact] public async Task CreateSnapshot_OrderIndependent() { // Arrange - sources in different orders var sourcesAscending = sources.OrderBy(s => s.Id); var sourcesDescending = sources.OrderByDescending(s => s.Id); // Act var bundle1 = await CreateWithSources(sourcesAscending); var bundle2 = await CreateWithSources(sourcesDescending); // Assert - digest must be identical regardless of input order Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest); } ``` --- ## 10. Determinism Manifest Schema All replayable artifacts must include a determinism manifest conforming to the JSON Schema at: `docs/testing/schemas/determinism-manifest.schema.json` Key fields: - `schemaVersion`: Must be `"1.0"`. - `artifactType`: One of `verdict`, `attestation`, `snapshot`, `proof`, `sbom`, `vex`. - `hashAlgorithm`: One of `sha256`, `sha384`, `sha512`. - `ordering`: One of `alphabetical`, `timestamp`, `insertion`, `canonical`. - `determinismGuarantee`: One of `strict`, `relaxed`, `best_effort`. --- ## 11. Checklist for Contributors Before submitting a PR that involves digests or attestations: - [ ] JSON is canonicalized via `Rfc8785JsonCanonicalizer` before hashing. - [ ] NFC normalization is enabled if user-supplied strings are involved. - [ ] Collections are sorted deterministically before iteration. - [ ] Timestamps are UTC with ISO 8601 format and `Z` suffix. - [ ] Numeric values avoid floating-point where possible. - [ ] Unit tests verify digest idempotency and permutation invariance. - [ ] Determinism manifest schema is validated for new artifact types. --- ## 12. Related Documents - [docs/testing/schemas/determinism-manifest.schema.json](../testing/schemas/determinism-manifest.schema.json) - JSON Schema for manifests - [docs/modules/policy/design/policy-determinism-tests.md](../modules/policy/design/policy-determinism-tests.md) - Policy engine determinism - [docs/19_TEST_SUITE_OVERVIEW.md](../19_TEST_SUITE_OVERVIEW.md) - Testing strategy --- ## 13. Change Log | Version | Date | Notes | |---------|------------|----------------------------------------------------| | 1.0 | 2025-12-27 | Initial version per DET-GAP-20. |