- Implement `SbomVexOrderingDeterminismProperties` for testing component list and vulnerability metadata hash consistency. - Create `UnicodeNormalizationDeterminismProperties` to validate NFC normalization and Unicode string handling. - Add project file for `StellaOps.Testing.Determinism.Properties` with necessary dependencies. - Introduce CI/CD template validation tests including YAML syntax checks and documentation content verification. - Create validation script for CI/CD templates ensuring all required files and structures are present.
10 KiB
Canonicalization & Determinism Patterns
Version: 1.0
Date: December 2025
Sprint: SPRINT_20251226_007_BE_determinism_gaps (DET-GAP-20)
Audience: All StellaOps contributors working on code that produces digests, attestations, or replayable outputs.
Goal: Ensure byte-identical outputs for identical inputs across platforms, time, and Rust/Go/Node re-implementations.
1. Why Determinism Matters
StellaOps is built on proof-of-state: every verdict, attestation, and replay must be reproducible. Non-determinism breaks:
- Signature verification: Different serialization → different digest → invalid signature.
- Replay guarantees: Feed snapshots that produce different hashes cannot be replayed.
- Audit trails: Compliance teams require bit-exact reproduction of historical scans.
- Cross-platform compatibility: Windows/Linux/macOS must produce identical outputs.
2. RFC 8785 JSON Canonicalization Scheme (JCS)
All JSON that participates in digest computation must use RFC 8785 JCS. This includes:
- Attestation payloads (DSSE)
- Verdict JSON
- Policy evaluation results
- Feed snapshot manifests
- Proof bundles
2.1 The Rfc8785JsonCanonicalizer
Use the Rfc8785JsonCanonicalizer class for all canonical JSON operations:
using StellaOps.Attestor.ProofChain.Json;
// Create canonicalizer (optionally with NFC normalization)
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
// Canonicalize JSON
string canonical = canonicalizer.Canonicalize(jsonString);
// Or from JsonElement
string canonical = canonicalizer.Canonicalize(jsonElement);
2.2 JCS Rules Summary
RFC 8785 requires:
- No whitespace between tokens.
- Lexicographic key ordering within objects.
- Number serialization: No leading zeros, no trailing zeros after decimal, integers without decimal point.
- String escaping: Minimal escaping (only
",\, and control chars). - UTF-8 encoding without BOM.
2.3 Common Mistakes
❌ Wrong: Using JsonSerializer.Serialize() directly for digest input.
// WRONG - non-deterministic ordering
var json = JsonSerializer.Serialize(obj);
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json));
✅ Correct: Canonicalize before hashing.
// CORRECT - deterministic
var canonicalizer = new Rfc8785JsonCanonicalizer();
var canonical = canonicalizer.Canonicalize(obj);
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(canonical));
3. Unicode NFC Normalization
Different platforms may store the same string in different Unicode normalization forms. Enable NFC normalization when:
- Processing user-supplied strings
- Aggregating data from multiple sources
- Working with file paths or identifiers from different systems
// Enable NFC for cross-platform string stability
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
When NFC is enabled, all strings are normalized via string.Normalize(NormalizationForm.FormC) before serialization.
4. Resolver Boundary Pattern
Key principle: All data entering or leaving a "resolver" (a service that produces verdicts, attestations, or replayable state) must be canonicalized.
4.1 What Is a Resolver Boundary?
A resolver boundary is any point where:
- Data is serialized for storage, transmission, or signing
- Data is hashed to produce a digest
- Data is compared for equality in replay validation
4.2 Boundary Enforcement
At resolver boundaries:
- Canonicalize all JSON payloads using
Rfc8785JsonCanonicalizer. - Sort collections deterministically (alphabetically by key or ID).
- Normalize timestamps to ISO 8601 UTC with
Zsuffix. - Freeze dictionaries using
FrozenDictionaryfor stable iteration order.
4.3 Example: Feed Snapshot Coordinator
public sealed class FeedSnapshotCoordinatorService : IFeedSnapshotCoordinator
{
private readonly FrozenDictionary<string, IFeedSourceProvider> _providers;
public FeedSnapshotCoordinatorService(IEnumerable<IFeedSourceProvider> providers, ...)
{
// Sort providers alphabetically for deterministic digest computation
_providers = providers
.OrderBy(p => p.SourceId, StringComparer.Ordinal)
.ToFrozenDictionary(p => p.SourceId, p => p, StringComparer.OrdinalIgnoreCase);
}
private string ComputeCompositeDigest(IReadOnlyList<SourceSnapshot> sources)
{
// Sources are already sorted by SourceId (alphabetically)
using var sha256 = SHA256.Create();
foreach (var source in sources.OrderBy(s => s.SourceId, StringComparer.Ordinal))
{
// Append each source digest to the hash computation
var digestBytes = Encoding.UTF8.GetBytes(source.Digest);
sha256.TransformBlock(digestBytes, 0, digestBytes.Length, null, 0);
}
sha256.TransformFinalBlock([], 0, 0);
return $"sha256:{Convert.ToHexString(sha256.Hash!).ToLowerInvariant()}";
}
}
5. Timestamp Handling
5.1 Rules
- Always use UTC - never local time.
- ISO 8601 format with
Zsuffix:2025-12-27T14:30:00Z - Consistent precision - truncate to seconds unless milliseconds are required.
- Use TimeProvider for testability.
5.2 Example
// CORRECT - UTC with Z suffix
var timestamp = timeProvider.GetUtcNow().ToString("yyyy-MM-ddTHH:mm:ssZ");
// WRONG - local time
var wrong = DateTime.Now.ToString("o");
// WRONG - inconsistent format
var wrong2 = DateTimeOffset.UtcNow.ToString();
6. Numeric Stability
6.1 Avoid Floating Point for Determinism
Floating-point arithmetic can produce different results on different platforms. For deterministic values:
- Use
decimalfor scores, percentages, and monetary values. - Use
intorlongfor counts and identifiers. - If floating-point is unavoidable, document the acceptable epsilon and rounding rules.
6.2 Number Serialization
RFC 8785 requires specific number formatting:
- Integers: no decimal point (
42, not42.0) - Decimals: no trailing zeros (
3.14, not3.140) - No leading zeros (
0.5, not00.5)
The Rfc8785JsonCanonicalizer handles this automatically.
7. Collection Ordering
7.1 Rule
All collections that participate in digest computation must have deterministic order.
7.2 Implementation
// CORRECT - use FrozenDictionary for stable iteration
var orderedDict = items
.OrderBy(x => x.Key, StringComparer.Ordinal)
.ToFrozenDictionary(x => x.Key, x => x.Value);
// CORRECT - sort before iteration
foreach (var item in items.OrderBy(x => x.Id, StringComparer.Ordinal))
{
// ...
}
// WRONG - iteration order is undefined
foreach (var item in dictionary)
{
// Order may vary between runs
}
8. Audit Hash Logging
For debugging determinism issues, use the AuditHashLogger:
using StellaOps.Attestor.ProofChain.Audit;
var auditLogger = new AuditHashLogger(logger);
// Log both raw and canonical hashes
auditLogger.LogHashAudit(
rawContent,
canonicalContent,
"sha256:abc...",
"verdict",
"scan-123",
metadata);
This enables post-mortem analysis of canonicalization issues.
9. Testing Determinism
9.1 Required Tests
Every component that produces digests must have tests verifying:
- Idempotency: Same input → same digest (multiple calls).
- Permutation invariance: Reordering input collections → same digest.
- Cross-platform: Windows/Linux/macOS produce identical outputs.
9.2 Example Test
[Fact]
public async Task CreateSnapshot_ProducesDeterministicDigest()
{
// Arrange
var sources = CreateTestSources();
// Act - create multiple snapshots with same data
var bundle1 = await coordinator.CreateSnapshotAsync();
var bundle2 = await coordinator.CreateSnapshotAsync();
// Assert - digests must be identical
Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
}
[Fact]
public async Task CreateSnapshot_OrderIndependent()
{
// Arrange - sources in different orders
var sourcesAscending = sources.OrderBy(s => s.Id);
var sourcesDescending = sources.OrderByDescending(s => s.Id);
// Act
var bundle1 = await CreateWithSources(sourcesAscending);
var bundle2 = await CreateWithSources(sourcesDescending);
// Assert - digest must be identical regardless of input order
Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
}
10. Determinism Manifest Schema
All replayable artifacts must include a determinism manifest conforming to the JSON Schema at:
docs/testing/schemas/determinism-manifest.schema.json
Key fields:
schemaVersion: Must be"1.0".artifactType: One ofverdict,attestation,snapshot,proof,sbom,vex.hashAlgorithm: One ofsha256,sha384,sha512.ordering: One ofalphabetical,timestamp,insertion,canonical.determinismGuarantee: One ofstrict,relaxed,best_effort.
11. Checklist for Contributors
Before submitting a PR that involves digests or attestations:
- JSON is canonicalized via
Rfc8785JsonCanonicalizerbefore hashing. - NFC normalization is enabled if user-supplied strings are involved.
- Collections are sorted deterministically before iteration.
- Timestamps are UTC with ISO 8601 format and
Zsuffix. - Numeric values avoid floating-point where possible.
- Unit tests verify digest idempotency and permutation invariance.
- Determinism manifest schema is validated for new artifact types.
12. Related Documents
- docs/testing/schemas/determinism-manifest.schema.json - JSON Schema for manifests
- docs/modules/policy/design/policy-determinism-tests.md - Policy engine determinism
- docs/19_TEST_SUITE_OVERVIEW.md - Testing strategy
13. Change Log
| Version | Date | Notes |
|---|---|---|
| 1.0 | 2025-12-27 | Initial version per DET-GAP-20. |