Files
git.stella-ops.org/docs/contributing/canonicalization-determinism.md
StellaOps Bot 907783f625 Add property-based tests for SBOM/VEX document ordering and Unicode normalization determinism
- Implement `SbomVexOrderingDeterminismProperties` for testing component list and vulnerability metadata hash consistency.
- Create `UnicodeNormalizationDeterminismProperties` to validate NFC normalization and Unicode string handling.
- Add project file for `StellaOps.Testing.Determinism.Properties` with necessary dependencies.
- Introduce CI/CD template validation tests including YAML syntax checks and documentation content verification.
- Create validation script for CI/CD templates ensuring all required files and structures are present.
2025-12-26 15:17:58 +02:00

10 KiB

Canonicalization & Determinism Patterns

Version: 1.0
Date: December 2025
Sprint: SPRINT_20251226_007_BE_determinism_gaps (DET-GAP-20)

Audience: All StellaOps contributors working on code that produces digests, attestations, or replayable outputs.
Goal: Ensure byte-identical outputs for identical inputs across platforms, time, and Rust/Go/Node re-implementations.


1. Why Determinism Matters

StellaOps is built on proof-of-state: every verdict, attestation, and replay must be reproducible. Non-determinism breaks:

  • Signature verification: Different serialization → different digest → invalid signature.
  • Replay guarantees: Feed snapshots that produce different hashes cannot be replayed.
  • Audit trails: Compliance teams require bit-exact reproduction of historical scans.
  • Cross-platform compatibility: Windows/Linux/macOS must produce identical outputs.

2. RFC 8785 JSON Canonicalization Scheme (JCS)

All JSON that participates in digest computation must use RFC 8785 JCS. This includes:

  • Attestation payloads (DSSE)
  • Verdict JSON
  • Policy evaluation results
  • Feed snapshot manifests
  • Proof bundles

2.1 The Rfc8785JsonCanonicalizer

Use the Rfc8785JsonCanonicalizer class for all canonical JSON operations:

using StellaOps.Attestor.ProofChain.Json;

// Create canonicalizer (optionally with NFC normalization)
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);

// Canonicalize JSON
string canonical = canonicalizer.Canonicalize(jsonString);

// Or from JsonElement
string canonical = canonicalizer.Canonicalize(jsonElement);

2.2 JCS Rules Summary

RFC 8785 requires:

  1. No whitespace between tokens.
  2. Lexicographic key ordering within objects.
  3. Number serialization: No leading zeros, no trailing zeros after decimal, integers without decimal point.
  4. String escaping: Minimal escaping (only ", \, and control chars).
  5. UTF-8 encoding without BOM.

2.3 Common Mistakes

Wrong: Using JsonSerializer.Serialize() directly for digest input.

// WRONG - non-deterministic ordering
var json = JsonSerializer.Serialize(obj);
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json));

Correct: Canonicalize before hashing.

// CORRECT - deterministic
var canonicalizer = new Rfc8785JsonCanonicalizer();
var canonical = canonicalizer.Canonicalize(obj);
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(canonical));

3. Unicode NFC Normalization

Different platforms may store the same string in different Unicode normalization forms. Enable NFC normalization when:

  • Processing user-supplied strings
  • Aggregating data from multiple sources
  • Working with file paths or identifiers from different systems
// Enable NFC for cross-platform string stability
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);

When NFC is enabled, all strings are normalized via string.Normalize(NormalizationForm.FormC) before serialization.


4. Resolver Boundary Pattern

Key principle: All data entering or leaving a "resolver" (a service that produces verdicts, attestations, or replayable state) must be canonicalized.

4.1 What Is a Resolver Boundary?

A resolver boundary is any point where:

  • Data is serialized for storage, transmission, or signing
  • Data is hashed to produce a digest
  • Data is compared for equality in replay validation

4.2 Boundary Enforcement

At resolver boundaries:

  1. Canonicalize all JSON payloads using Rfc8785JsonCanonicalizer.
  2. Sort collections deterministically (alphabetically by key or ID).
  3. Normalize timestamps to ISO 8601 UTC with Z suffix.
  4. Freeze dictionaries using FrozenDictionary for stable iteration order.

4.3 Example: Feed Snapshot Coordinator

public sealed class FeedSnapshotCoordinatorService : IFeedSnapshotCoordinator
{
    private readonly FrozenDictionary<string, IFeedSourceProvider> _providers;

    public FeedSnapshotCoordinatorService(IEnumerable<IFeedSourceProvider> providers, ...)
    {
        // Sort providers alphabetically for deterministic digest computation
        _providers = providers
            .OrderBy(p => p.SourceId, StringComparer.Ordinal)
            .ToFrozenDictionary(p => p.SourceId, p => p, StringComparer.OrdinalIgnoreCase);
    }

    private string ComputeCompositeDigest(IReadOnlyList<SourceSnapshot> sources)
    {
        // Sources are already sorted by SourceId (alphabetically)
        using var sha256 = SHA256.Create();
        foreach (var source in sources.OrderBy(s => s.SourceId, StringComparer.Ordinal))
        {
            // Append each source digest to the hash computation
            var digestBytes = Encoding.UTF8.GetBytes(source.Digest);
            sha256.TransformBlock(digestBytes, 0, digestBytes.Length, null, 0);
        }
        sha256.TransformFinalBlock([], 0, 0);
        return $"sha256:{Convert.ToHexString(sha256.Hash!).ToLowerInvariant()}";
    }
}

5. Timestamp Handling

5.1 Rules

  1. Always use UTC - never local time.
  2. ISO 8601 format with Z suffix: 2025-12-27T14:30:00Z
  3. Consistent precision - truncate to seconds unless milliseconds are required.
  4. Use TimeProvider for testability.

5.2 Example

// CORRECT - UTC with Z suffix
var timestamp = timeProvider.GetUtcNow().ToString("yyyy-MM-ddTHH:mm:ssZ");

// WRONG - local time
var wrong = DateTime.Now.ToString("o");

// WRONG - inconsistent format
var wrong2 = DateTimeOffset.UtcNow.ToString();

6. Numeric Stability

6.1 Avoid Floating Point for Determinism

Floating-point arithmetic can produce different results on different platforms. For deterministic values:

  • Use decimal for scores, percentages, and monetary values.
  • Use int or long for counts and identifiers.
  • If floating-point is unavoidable, document the acceptable epsilon and rounding rules.

6.2 Number Serialization

RFC 8785 requires specific number formatting:

  • Integers: no decimal point (42, not 42.0)
  • Decimals: no trailing zeros (3.14, not 3.140)
  • No leading zeros (0.5, not 00.5)

The Rfc8785JsonCanonicalizer handles this automatically.


7. Collection Ordering

7.1 Rule

All collections that participate in digest computation must have deterministic order.

7.2 Implementation

// CORRECT - use FrozenDictionary for stable iteration
var orderedDict = items
    .OrderBy(x => x.Key, StringComparer.Ordinal)
    .ToFrozenDictionary(x => x.Key, x => x.Value);

// CORRECT - sort before iteration
foreach (var item in items.OrderBy(x => x.Id, StringComparer.Ordinal))
{
    // ...
}

// WRONG - iteration order is undefined
foreach (var item in dictionary)
{
    // Order may vary between runs
}

8. Audit Hash Logging

For debugging determinism issues, use the AuditHashLogger:

using StellaOps.Attestor.ProofChain.Audit;

var auditLogger = new AuditHashLogger(logger);

// Log both raw and canonical hashes
auditLogger.LogHashAudit(
    rawContent,
    canonicalContent,
    "sha256:abc...",
    "verdict",
    "scan-123",
    metadata);

This enables post-mortem analysis of canonicalization issues.


9. Testing Determinism

9.1 Required Tests

Every component that produces digests must have tests verifying:

  1. Idempotency: Same input → same digest (multiple calls).
  2. Permutation invariance: Reordering input collections → same digest.
  3. Cross-platform: Windows/Linux/macOS produce identical outputs.

9.2 Example Test

[Fact]
public async Task CreateSnapshot_ProducesDeterministicDigest()
{
    // Arrange
    var sources = CreateTestSources();
    
    // Act - create multiple snapshots with same data
    var bundle1 = await coordinator.CreateSnapshotAsync();
    var bundle2 = await coordinator.CreateSnapshotAsync();
    
    // Assert - digests must be identical
    Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
}

[Fact]
public async Task CreateSnapshot_OrderIndependent()
{
    // Arrange - sources in different orders
    var sourcesAscending = sources.OrderBy(s => s.Id);
    var sourcesDescending = sources.OrderByDescending(s => s.Id);
    
    // Act
    var bundle1 = await CreateWithSources(sourcesAscending);
    var bundle2 = await CreateWithSources(sourcesDescending);
    
    // Assert - digest must be identical regardless of input order
    Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
}

10. Determinism Manifest Schema

All replayable artifacts must include a determinism manifest conforming to the JSON Schema at:

docs/testing/schemas/determinism-manifest.schema.json

Key fields:

  • schemaVersion: Must be "1.0".
  • artifactType: One of verdict, attestation, snapshot, proof, sbom, vex.
  • hashAlgorithm: One of sha256, sha384, sha512.
  • ordering: One of alphabetical, timestamp, insertion, canonical.
  • determinismGuarantee: One of strict, relaxed, best_effort.

11. Checklist for Contributors

Before submitting a PR that involves digests or attestations:

  • JSON is canonicalized via Rfc8785JsonCanonicalizer before hashing.
  • NFC normalization is enabled if user-supplied strings are involved.
  • Collections are sorted deterministically before iteration.
  • Timestamps are UTC with ISO 8601 format and Z suffix.
  • Numeric values avoid floating-point where possible.
  • Unit tests verify digest idempotency and permutation invariance.
  • Determinism manifest schema is validated for new artifact types.


13. Change Log

Version Date Notes
1.0 2025-12-27 Initial version per DET-GAP-20.