- Implement `SbomVexOrderingDeterminismProperties` for testing component list and vulnerability metadata hash consistency. - Create `UnicodeNormalizationDeterminismProperties` to validate NFC normalization and Unicode string handling. - Add project file for `StellaOps.Testing.Determinism.Properties` with necessary dependencies. - Introduce CI/CD template validation tests including YAML syntax checks and documentation content verification. - Create validation script for CI/CD templates ensuring all required files and structures are present.
337 lines
10 KiB
Markdown
337 lines
10 KiB
Markdown
# Canonicalization & Determinism Patterns
|
|
|
|
**Version:** 1.0
|
|
**Date:** December 2025
|
|
**Sprint:** SPRINT_20251226_007_BE_determinism_gaps (DET-GAP-20)
|
|
|
|
> **Audience:** All StellaOps contributors working on code that produces digests, attestations, or replayable outputs.
|
|
> **Goal:** Ensure byte-identical outputs for identical inputs across platforms, time, and Rust/Go/Node re-implementations.
|
|
|
|
---
|
|
|
|
## 1. Why Determinism Matters
|
|
|
|
StellaOps is built on **proof-of-state**: every verdict, attestation, and replay must be reproducible. Non-determinism breaks:
|
|
|
|
- **Signature verification:** Different serialization → different digest → invalid signature.
|
|
- **Replay guarantees:** Feed snapshots that produce different hashes cannot be replayed.
|
|
- **Audit trails:** Compliance teams require bit-exact reproduction of historical scans.
|
|
- **Cross-platform compatibility:** Windows/Linux/macOS must produce identical outputs.
|
|
|
|
---
|
|
|
|
## 2. RFC 8785 JSON Canonicalization Scheme (JCS)
|
|
|
|
All JSON that participates in digest computation **must** use RFC 8785 JCS. This includes:
|
|
|
|
- Attestation payloads (DSSE)
|
|
- Verdict JSON
|
|
- Policy evaluation results
|
|
- Feed snapshot manifests
|
|
- Proof bundles
|
|
|
|
### 2.1 The Rfc8785JsonCanonicalizer
|
|
|
|
Use the `Rfc8785JsonCanonicalizer` class for all canonical JSON operations:
|
|
|
|
```csharp
|
|
using StellaOps.Attestor.ProofChain.Json;
|
|
|
|
// Create canonicalizer (optionally with NFC normalization)
|
|
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
|
|
|
|
// Canonicalize JSON
|
|
string canonical = canonicalizer.Canonicalize(jsonString);
|
|
|
|
// Or from JsonElement
|
|
string canonical = canonicalizer.Canonicalize(jsonElement);
|
|
```
|
|
|
|
### 2.2 JCS Rules Summary
|
|
|
|
RFC 8785 requires:
|
|
|
|
1. **No whitespace** between tokens.
|
|
2. **Lexicographic key ordering** within objects.
|
|
3. **Number serialization:** No leading zeros, no trailing zeros after decimal, integers without decimal point.
|
|
4. **String escaping:** Minimal escaping (only `"`, `\`, and control chars).
|
|
5. **UTF-8 encoding** without BOM.
|
|
|
|
### 2.3 Common Mistakes
|
|
|
|
❌ **Wrong:** Using `JsonSerializer.Serialize()` directly for digest input.
|
|
|
|
```csharp
|
|
// WRONG - non-deterministic ordering
|
|
var json = JsonSerializer.Serialize(obj);
|
|
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json));
|
|
```
|
|
|
|
✅ **Correct:** Canonicalize before hashing.
|
|
|
|
```csharp
|
|
// CORRECT - deterministic
|
|
var canonicalizer = new Rfc8785JsonCanonicalizer();
|
|
var canonical = canonicalizer.Canonicalize(obj);
|
|
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(canonical));
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Unicode NFC Normalization
|
|
|
|
Different platforms may store the same string in different Unicode normalization forms. Enable NFC normalization when:
|
|
|
|
- Processing user-supplied strings
|
|
- Aggregating data from multiple sources
|
|
- Working with file paths or identifiers from different systems
|
|
|
|
```csharp
|
|
// Enable NFC for cross-platform string stability
|
|
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
|
|
```
|
|
|
|
When NFC is enabled, all strings are normalized via `string.Normalize(NormalizationForm.FormC)` before serialization.
|
|
|
|
---
|
|
|
|
## 4. Resolver Boundary Pattern
|
|
|
|
**Key principle:** All data entering or leaving a "resolver" (a service that produces verdicts, attestations, or replayable state) must be canonicalized.
|
|
|
|
### 4.1 What Is a Resolver Boundary?
|
|
|
|
A resolver boundary is any point where:
|
|
|
|
- Data is **serialized** for storage, transmission, or signing
|
|
- Data is **hashed** to produce a digest
|
|
- Data is **compared** for equality in replay validation
|
|
|
|
### 4.2 Boundary Enforcement
|
|
|
|
At resolver boundaries:
|
|
|
|
1. **Canonicalize** all JSON payloads using `Rfc8785JsonCanonicalizer`.
|
|
2. **Sort** collections deterministically (alphabetically by key or ID).
|
|
3. **Normalize** timestamps to ISO 8601 UTC with `Z` suffix.
|
|
4. **Freeze** dictionaries using `FrozenDictionary` for stable iteration order.
|
|
|
|
### 4.3 Example: Feed Snapshot Coordinator
|
|
|
|
```csharp
|
|
public sealed class FeedSnapshotCoordinatorService : IFeedSnapshotCoordinator
|
|
{
|
|
private readonly FrozenDictionary<string, IFeedSourceProvider> _providers;
|
|
|
|
public FeedSnapshotCoordinatorService(IEnumerable<IFeedSourceProvider> providers, ...)
|
|
{
|
|
// Sort providers alphabetically for deterministic digest computation
|
|
_providers = providers
|
|
.OrderBy(p => p.SourceId, StringComparer.Ordinal)
|
|
.ToFrozenDictionary(p => p.SourceId, p => p, StringComparer.OrdinalIgnoreCase);
|
|
}
|
|
|
|
private string ComputeCompositeDigest(IReadOnlyList<SourceSnapshot> sources)
|
|
{
|
|
// Sources are already sorted by SourceId (alphabetically)
|
|
using var sha256 = SHA256.Create();
|
|
foreach (var source in sources.OrderBy(s => s.SourceId, StringComparer.Ordinal))
|
|
{
|
|
// Append each source digest to the hash computation
|
|
var digestBytes = Encoding.UTF8.GetBytes(source.Digest);
|
|
sha256.TransformBlock(digestBytes, 0, digestBytes.Length, null, 0);
|
|
}
|
|
sha256.TransformFinalBlock([], 0, 0);
|
|
return $"sha256:{Convert.ToHexString(sha256.Hash!).ToLowerInvariant()}";
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Timestamp Handling
|
|
|
|
### 5.1 Rules
|
|
|
|
1. **Always use UTC** - never local time.
|
|
2. **ISO 8601 format** with `Z` suffix: `2025-12-27T14:30:00Z`
|
|
3. **Consistent precision** - truncate to seconds unless milliseconds are required.
|
|
4. **Use TimeProvider** for testability.
|
|
|
|
### 5.2 Example
|
|
|
|
```csharp
|
|
// CORRECT - UTC with Z suffix
|
|
var timestamp = timeProvider.GetUtcNow().ToString("yyyy-MM-ddTHH:mm:ssZ");
|
|
|
|
// WRONG - local time
|
|
var wrong = DateTime.Now.ToString("o");
|
|
|
|
// WRONG - inconsistent format
|
|
var wrong2 = DateTimeOffset.UtcNow.ToString();
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Numeric Stability
|
|
|
|
### 6.1 Avoid Floating Point for Determinism
|
|
|
|
Floating-point arithmetic can produce different results on different platforms. For deterministic values:
|
|
|
|
- Use `decimal` for scores, percentages, and monetary values.
|
|
- Use `int` or `long` for counts and identifiers.
|
|
- If floating-point is unavoidable, document the acceptable epsilon and rounding rules.
|
|
|
|
### 6.2 Number Serialization
|
|
|
|
RFC 8785 requires specific number formatting:
|
|
|
|
- Integers: no decimal point (`42`, not `42.0`)
|
|
- Decimals: no trailing zeros (`3.14`, not `3.140`)
|
|
- No leading zeros (`0.5`, not `00.5`)
|
|
|
|
The `Rfc8785JsonCanonicalizer` handles this automatically.
|
|
|
|
---
|
|
|
|
## 7. Collection Ordering
|
|
|
|
### 7.1 Rule
|
|
|
|
All collections that participate in digest computation must have **deterministic order**.
|
|
|
|
### 7.2 Implementation
|
|
|
|
```csharp
|
|
// CORRECT - use FrozenDictionary for stable iteration
|
|
var orderedDict = items
|
|
.OrderBy(x => x.Key, StringComparer.Ordinal)
|
|
.ToFrozenDictionary(x => x.Key, x => x.Value);
|
|
|
|
// CORRECT - sort before iteration
|
|
foreach (var item in items.OrderBy(x => x.Id, StringComparer.Ordinal))
|
|
{
|
|
// ...
|
|
}
|
|
|
|
// WRONG - iteration order is undefined
|
|
foreach (var item in dictionary)
|
|
{
|
|
// Order may vary between runs
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Audit Hash Logging
|
|
|
|
For debugging determinism issues, use the `AuditHashLogger`:
|
|
|
|
```csharp
|
|
using StellaOps.Attestor.ProofChain.Audit;
|
|
|
|
var auditLogger = new AuditHashLogger(logger);
|
|
|
|
// Log both raw and canonical hashes
|
|
auditLogger.LogHashAudit(
|
|
rawContent,
|
|
canonicalContent,
|
|
"sha256:abc...",
|
|
"verdict",
|
|
"scan-123",
|
|
metadata);
|
|
```
|
|
|
|
This enables post-mortem analysis of canonicalization issues.
|
|
|
|
---
|
|
|
|
## 9. Testing Determinism
|
|
|
|
### 9.1 Required Tests
|
|
|
|
Every component that produces digests must have tests verifying:
|
|
|
|
1. **Idempotency:** Same input → same digest (multiple calls).
|
|
2. **Permutation invariance:** Reordering input collections → same digest.
|
|
3. **Cross-platform:** Windows/Linux/macOS produce identical outputs.
|
|
|
|
### 9.2 Example Test
|
|
|
|
```csharp
|
|
[Fact]
|
|
public async Task CreateSnapshot_ProducesDeterministicDigest()
|
|
{
|
|
// Arrange
|
|
var sources = CreateTestSources();
|
|
|
|
// Act - create multiple snapshots with same data
|
|
var bundle1 = await coordinator.CreateSnapshotAsync();
|
|
var bundle2 = await coordinator.CreateSnapshotAsync();
|
|
|
|
// Assert - digests must be identical
|
|
Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
|
|
}
|
|
|
|
[Fact]
|
|
public async Task CreateSnapshot_OrderIndependent()
|
|
{
|
|
// Arrange - sources in different orders
|
|
var sourcesAscending = sources.OrderBy(s => s.Id);
|
|
var sourcesDescending = sources.OrderByDescending(s => s.Id);
|
|
|
|
// Act
|
|
var bundle1 = await CreateWithSources(sourcesAscending);
|
|
var bundle2 = await CreateWithSources(sourcesDescending);
|
|
|
|
// Assert - digest must be identical regardless of input order
|
|
Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Determinism Manifest Schema
|
|
|
|
All replayable artifacts must include a determinism manifest conforming to the JSON Schema at:
|
|
|
|
`docs/testing/schemas/determinism-manifest.schema.json`
|
|
|
|
Key fields:
|
|
- `schemaVersion`: Must be `"1.0"`.
|
|
- `artifactType`: One of `verdict`, `attestation`, `snapshot`, `proof`, `sbom`, `vex`.
|
|
- `hashAlgorithm`: One of `sha256`, `sha384`, `sha512`.
|
|
- `ordering`: One of `alphabetical`, `timestamp`, `insertion`, `canonical`.
|
|
- `determinismGuarantee`: One of `strict`, `relaxed`, `best_effort`.
|
|
|
|
---
|
|
|
|
## 11. Checklist for Contributors
|
|
|
|
Before submitting a PR that involves digests or attestations:
|
|
|
|
- [ ] JSON is canonicalized via `Rfc8785JsonCanonicalizer` before hashing.
|
|
- [ ] NFC normalization is enabled if user-supplied strings are involved.
|
|
- [ ] Collections are sorted deterministically before iteration.
|
|
- [ ] Timestamps are UTC with ISO 8601 format and `Z` suffix.
|
|
- [ ] Numeric values avoid floating-point where possible.
|
|
- [ ] Unit tests verify digest idempotency and permutation invariance.
|
|
- [ ] Determinism manifest schema is validated for new artifact types.
|
|
|
|
---
|
|
|
|
## 12. Related Documents
|
|
|
|
- [docs/testing/schemas/determinism-manifest.schema.json](../testing/schemas/determinism-manifest.schema.json) - JSON Schema for manifests
|
|
- [docs/modules/policy/design/policy-determinism-tests.md](../modules/policy/design/policy-determinism-tests.md) - Policy engine determinism
|
|
- [docs/19_TEST_SUITE_OVERVIEW.md](../19_TEST_SUITE_OVERVIEW.md) - Testing strategy
|
|
|
|
---
|
|
|
|
## 13. Change Log
|
|
|
|
| Version | Date | Notes |
|
|
|---------|------------|----------------------------------------------------|
|
|
| 1.0 | 2025-12-27 | Initial version per DET-GAP-20. |
|