Add property-based tests for SBOM/VEX document ordering and Unicode normalization determinism
- Implement `SbomVexOrderingDeterminismProperties` for testing component list and vulnerability metadata hash consistency. - Create `UnicodeNormalizationDeterminismProperties` to validate NFC normalization and Unicode string handling. - Add project file for `StellaOps.Testing.Determinism.Properties` with necessary dependencies. - Introduce CI/CD template validation tests including YAML syntax checks and documentation content verification. - Create validation script for CI/CD templates ensuring all required files and structures are present.
This commit is contained in:
437
docs/technical/architecture/determinism-specification.md
Normal file
437
docs/technical/architecture/determinism-specification.md
Normal file
@@ -0,0 +1,437 @@
|
||||
# Determinism Specification
|
||||
|
||||
> **Status:** Living document
|
||||
> **Version:** 1.0
|
||||
> **Created:** 2025-12-26
|
||||
> **Owners:** Policy Guild, Platform Guild
|
||||
> **Related:** [`CONSOLIDATED - Deterministic Evidence and Verdict Architecture.md`](../../product-advisories/CONSOLIDATED%20-%20Deterministic%20Evidence%20and%20Verdict%20Architecture.md)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This specification defines the determinism guarantees for StellaOps verdict computation, including digest algorithms, canonicalization rules, and migration strategies. All services that produce or verify verdicts MUST comply with this specification.
|
||||
|
||||
---
|
||||
|
||||
## 1. Digest Algorithms
|
||||
|
||||
### 1.1 VerdictId
|
||||
|
||||
**Purpose:** Uniquely identifies a verdict computation result.
|
||||
|
||||
**Algorithm:**
|
||||
```
|
||||
VerdictId = SHA256(CanonicalJson(verdict_payload))
|
||||
```
|
||||
|
||||
**Input Structure:**
|
||||
```json
|
||||
{
|
||||
"_canonVersion": "stella:canon:v1",
|
||||
"evidence_refs": ["sha256:..."],
|
||||
"explanations": [...],
|
||||
"risk_score": 42,
|
||||
"status": "pass",
|
||||
"unknowns_count": 0
|
||||
}
|
||||
```
|
||||
|
||||
**Implementation:** `StellaOps.Attestor.ProofChain.Identifiers.VerdictIdGenerator`
|
||||
|
||||
---
|
||||
|
||||
### 1.2 EvidenceId
|
||||
|
||||
**Purpose:** Uniquely identifies an evidence artifact (SBOM, VEX, graph, etc.).
|
||||
|
||||
**Algorithm:**
|
||||
```
|
||||
EvidenceId = SHA256(raw_bytes)
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- For JSON artifacts, use JCS-canonical bytes
|
||||
- For binary artifacts, use raw bytes
|
||||
- For multi-file bundles, use Merkle root
|
||||
|
||||
**Implementation:** `StellaOps.Attestor.ProofChain.Identifiers.EvidenceIdGenerator`
|
||||
|
||||
---
|
||||
|
||||
### 1.3 GraphRevisionId
|
||||
|
||||
**Purpose:** Uniquely identifies a call graph or reachability graph snapshot.
|
||||
|
||||
**Algorithm:**
|
||||
```
|
||||
GraphRevisionId = SHA256(CanonicalJson({
|
||||
nodes: SortedBy(nodes, n => n.id),
|
||||
edges: SortedBy(edges, e => (e.source, e.target, e.kind))
|
||||
}))
|
||||
```
|
||||
|
||||
**Sorting Rules:**
|
||||
- Nodes: lexicographic by `id` (Ordinal)
|
||||
- Edges: tuple sort by `(source, target, kind)`
|
||||
|
||||
**Implementation:** `StellaOps.Scanner.CallGraph.Identifiers.GraphRevisionIdGenerator`
|
||||
|
||||
---
|
||||
|
||||
### 1.4 ManifestId
|
||||
|
||||
**Purpose:** Uniquely identifies a scan manifest (all inputs for an evaluation).
|
||||
|
||||
**Algorithm:**
|
||||
```
|
||||
ManifestId = SHA256(CanonicalJson(manifest_payload))
|
||||
```
|
||||
|
||||
**Input Structure:**
|
||||
```json
|
||||
{
|
||||
"_canonVersion": "stella:canon:v1",
|
||||
"engine_version": "1.0.0",
|
||||
"feeds_snapshot_sha256": "sha256:...",
|
||||
"options_hash": "sha256:...",
|
||||
"policy_bundle_sha256": "sha256:...",
|
||||
"policy_semver": "2025.12",
|
||||
"reach_subgraph_sha256": "sha256:...",
|
||||
"sbom_sha256": "sha256:...",
|
||||
"vex_set_sha256": ["sha256:..."]
|
||||
}
|
||||
```
|
||||
|
||||
**Implementation:** `StellaOps.Replay.Core.ManifestIdGenerator`
|
||||
|
||||
---
|
||||
|
||||
### 1.5 PolicyBundleId
|
||||
|
||||
**Purpose:** Uniquely identifies a compiled policy bundle.
|
||||
|
||||
**Algorithm:**
|
||||
```
|
||||
PolicyBundleId = SHA256(CanonicalJson({
|
||||
rules: SortedBy(rules, r => r.id),
|
||||
version: semver,
|
||||
lattice_config: {...}
|
||||
}))
|
||||
```
|
||||
|
||||
**Implementation:** `StellaOps.Policy.Engine.PolicyBundleIdGenerator`
|
||||
|
||||
---
|
||||
|
||||
## 2. Canonicalization Rules
|
||||
|
||||
### 2.1 JSON Canonicalization (JCS - RFC 8785)
|
||||
|
||||
All JSON artifacts MUST be canonicalized before hashing or signing.
|
||||
|
||||
**Rules:**
|
||||
1. Object keys sorted lexicographically (Ordinal comparison)
|
||||
2. No whitespace between tokens
|
||||
3. No trailing commas
|
||||
4. UTF-8 encoding without BOM
|
||||
5. Numbers: IEEE 754 double-precision, no unnecessary trailing zeros, no exponent for integers ≤ 10^21
|
||||
|
||||
**Example:**
|
||||
```json
|
||||
// Before
|
||||
{ "b": 1, "a": 2, "c": { "z": true, "y": false } }
|
||||
|
||||
// After (canonical)
|
||||
{"a":2,"b":1,"c":{"y":false,"z":true}}
|
||||
```
|
||||
|
||||
**Implementation:** `StellaOps.Canonical.Json.Rfc8785JsonCanonicalizer`
|
||||
|
||||
---
|
||||
|
||||
### 2.2 String Normalization (Unicode NFC)
|
||||
|
||||
All string values MUST be normalized to Unicode NFC before canonicalization.
|
||||
|
||||
**Why:** Different Unicode representations of the same visual character produce different hashes.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
// Before: é as e + combining acute (U+0065 U+0301)
|
||||
// After NFC: é as single codepoint (U+00E9)
|
||||
```
|
||||
|
||||
**Implementation:** `StellaOps.Resolver.NfcStringNormalizer`
|
||||
|
||||
---
|
||||
|
||||
### 2.3 Version Markers
|
||||
|
||||
All canonical JSON MUST include a version marker for migration safety:
|
||||
|
||||
```json
|
||||
{
|
||||
"_canonVersion": "stella:canon:v1",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Current Version:** `stella:canon:v1`
|
||||
|
||||
**Migration Path:** When canonicalization rules change:
|
||||
1. Introduce new version marker (e.g., `stella:canon:v2`)
|
||||
2. Support both versions during transition period
|
||||
3. Re-hash legacy artifacts once, store `old_hash → new_hash` mapping
|
||||
4. Deprecate old version after migration window
|
||||
|
||||
---
|
||||
|
||||
## 3. Determinism Guards
|
||||
|
||||
### 3.1 Forbidden Operations
|
||||
|
||||
The following operations are FORBIDDEN during verdict evaluation:
|
||||
|
||||
| Operation | Reason | Alternative |
|
||||
|-----------|--------|-------------|
|
||||
| `DateTime.Now` / `DateTimeOffset.Now` | Non-deterministic | Use `TimeProvider` from manifest |
|
||||
| `Random` / `Guid.NewGuid()` | Non-deterministic | Use content-based IDs |
|
||||
| `Dictionary<K,V>` iteration | Unstable order | Use `SortedDictionary` or explicit ordering |
|
||||
| `HashSet<T>` iteration | Unstable order | Use `SortedSet` or explicit ordering |
|
||||
| `Parallel.ForEach` (unordered) | Race conditions | Use ordered parallel with merge |
|
||||
| HTTP calls | External dependency | Use pre-fetched snapshots |
|
||||
| File system reads | External dependency | Use CAS-cached blobs |
|
||||
|
||||
### 3.2 Runtime Enforcement
|
||||
|
||||
The `DeterminismGuard` class provides runtime enforcement:
|
||||
|
||||
```csharp
|
||||
using StellaOps.Policy.Engine.DeterminismGuard;
|
||||
|
||||
// Wraps evaluation in a determinism context
|
||||
var result = await DeterminismGuard.ExecuteAsync(async () =>
|
||||
{
|
||||
// Any forbidden operation throws DeterminismViolationException
|
||||
return await evaluator.EvaluateAsync(manifest);
|
||||
});
|
||||
```
|
||||
|
||||
**Implementation:** `StellaOps.Policy.Engine.DeterminismGuard.DeterminismGuard`
|
||||
|
||||
### 3.3 Compile-Time Enforcement (Planned)
|
||||
|
||||
A Roslyn analyzer will flag determinism violations at compile time:
|
||||
|
||||
```csharp
|
||||
// This will produce a compiler warning/error
|
||||
public Verdict Evaluate(Manifest m)
|
||||
{
|
||||
var now = DateTime.Now; // STELLA001: Forbidden in deterministic context
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Status:** Planned for Q1 2026 (SPRINT_20251226_007 DET-GAP-18)
|
||||
|
||||
---
|
||||
|
||||
## 4. Replay Contract
|
||||
|
||||
### 4.1 Requirements
|
||||
|
||||
For deterministic replay, the following MUST be pinned and recorded:
|
||||
|
||||
| Input | Storage | Notes |
|
||||
|-------|---------|-------|
|
||||
| Feed snapshots | CAS by hash | CVE, VEX advisories |
|
||||
| Scanner version | Manifest | Exact semver |
|
||||
| Rule packs | CAS by hash | Policy rules |
|
||||
| Lattice/policy version | Manifest | Semver |
|
||||
| SBOM generator version | Manifest | For generator-specific quirks |
|
||||
| Reachability engine settings | Manifest | Language analyzers, depth limits |
|
||||
| Merge semantics ID | Manifest | Lattice configuration |
|
||||
|
||||
### 4.2 Replay Verification
|
||||
|
||||
```csharp
|
||||
// Load original manifest
|
||||
var manifest = await manifestStore.GetAsync(manifestId);
|
||||
|
||||
// Replay evaluation
|
||||
var replayVerdict = await engine.ReplayAsync(manifest);
|
||||
|
||||
// Verify determinism
|
||||
var originalHash = CanonJson.Hash(originalVerdict);
|
||||
var replayHash = CanonJson.Hash(replayVerdict);
|
||||
|
||||
if (originalHash != replayHash)
|
||||
{
|
||||
throw new DeterminismViolationException(
|
||||
$"Replay produced different verdict: {originalHash} vs {replayHash}");
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Replay API
|
||||
|
||||
```
|
||||
GET /replay?manifest_sha=sha256:...
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"verdict": {...},
|
||||
"replay_manifest_sha": "sha256:...",
|
||||
"verdict_sha": "sha256:...",
|
||||
"determinism_verified": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Testing Requirements
|
||||
|
||||
### 5.1 Golden Tests
|
||||
|
||||
Every service that produces verdicts MUST maintain golden test fixtures:
|
||||
|
||||
```
|
||||
tests/fixtures/golden/
|
||||
├── manifest-001.json
|
||||
├── verdict-001.json (expected)
|
||||
├── manifest-002.json
|
||||
├── verdict-002.json (expected)
|
||||
└── ...
|
||||
```
|
||||
|
||||
**Test Pattern:**
|
||||
```csharp
|
||||
[Theory]
|
||||
[MemberData(nameof(GoldenTestCases))]
|
||||
public async Task Verdict_MatchesGolden(string manifestPath, string expectedPath)
|
||||
{
|
||||
var manifest = await LoadManifest(manifestPath);
|
||||
var actual = await engine.EvaluateAsync(manifest);
|
||||
var expected = await File.ReadAllBytesAsync(expectedPath);
|
||||
|
||||
Assert.Equal(expected, CanonJson.Canonicalize(actual));
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Chaos Tests
|
||||
|
||||
Chaos tests verify determinism under varying conditions:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task Verdict_IsDeterministic_UnderChaos()
|
||||
{
|
||||
var manifest = CreateTestManifest();
|
||||
var baseline = await engine.EvaluateAsync(manifest);
|
||||
|
||||
// Vary conditions
|
||||
for (int i = 0; i < 100; i++)
|
||||
{
|
||||
Environment.SetEnvironmentVariable("RANDOM_SEED", i.ToString());
|
||||
ThreadPool.SetMinThreads(i % 16 + 1, i % 16 + 1);
|
||||
|
||||
var verdict = await engine.EvaluateAsync(manifest);
|
||||
|
||||
Assert.Equal(
|
||||
CanonJson.Hash(baseline),
|
||||
CanonJson.Hash(verdict));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Cross-Platform Tests
|
||||
|
||||
Verdicts MUST be identical across:
|
||||
- Windows / Linux / macOS
|
||||
- x64 / ARM64
|
||||
- .NET versions (within major version)
|
||||
|
||||
---
|
||||
|
||||
## 6. Troubleshooting Guide
|
||||
|
||||
### 6.1 "Why are my verdicts different?"
|
||||
|
||||
**Symptom:** Same inputs produce different verdict hashes.
|
||||
|
||||
**Checklist:**
|
||||
1. ✅ Are all inputs content-addressed? Check manifest hashes.
|
||||
2. ✅ Is canonicalization version the same? Check `_canonVersion`.
|
||||
3. ✅ Is engine version the same? Check `engine_version` in manifest.
|
||||
4. ✅ Are feeds from the same snapshot? Check `feeds_snapshot_sha256`.
|
||||
5. ✅ Is policy bundle the same? Check `policy_bundle_sha256`.
|
||||
|
||||
**Debug Logging:**
|
||||
Enable pre-canonical hash logging to compare inputs:
|
||||
```json
|
||||
{
|
||||
"Logging": {
|
||||
"DeterminismDebug": {
|
||||
"LogPreCanonicalHashes": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Common Causes
|
||||
|
||||
| Symptom | Likely Cause | Fix |
|
||||
|---------|--------------|-----|
|
||||
| Different verdict hash, same risk score | Explanation order | Sort explanations by template + params |
|
||||
| Different verdict hash, same findings | Evidence ref order | Sort evidence_refs lexicographically |
|
||||
| Different graph hash | Node iteration order | Use `SortedDictionary` for nodes |
|
||||
| Different VEX merge | Feed freshness | Pin feeds to exact snapshot |
|
||||
|
||||
### 6.3 Reporting Issues
|
||||
|
||||
When reporting determinism issues, include:
|
||||
1. Both manifest JSONs (canonical form)
|
||||
2. Both verdict JSONs (canonical form)
|
||||
3. Engine versions
|
||||
4. Platform details (OS, architecture, .NET version)
|
||||
5. Pre-canonical hash logs (if available)
|
||||
|
||||
---
|
||||
|
||||
## 7. Migration History
|
||||
|
||||
### v1 (2025-12-26)
|
||||
- Initial specification
|
||||
- RFC 8785 JCS + Unicode NFC
|
||||
- Version marker: `stella:canon:v1`
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Reference Implementations
|
||||
|
||||
| Component | Location |
|
||||
|-----------|----------|
|
||||
| JCS Canonicalizer | `src/__Libraries/StellaOps.Canonical.Json/` |
|
||||
| NFC Normalizer | `src/__Libraries/StellaOps.Resolver/NfcStringNormalizer.cs` |
|
||||
| Determinism Guard | `src/Policy/__Libraries/StellaOps.Policy.Engine/DeterminismGuard/` |
|
||||
| Content-Addressed IDs | `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Identifiers/` |
|
||||
| Replay Core | `src/__Libraries/StellaOps.Replay.Core/` |
|
||||
| Golden Test Base | `src/__Libraries/StellaOps.TestKit/Determinism/` |
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: Compliance Checklist
|
||||
|
||||
Services producing verdicts MUST complete this checklist:
|
||||
|
||||
- [ ] All JSON outputs use JCS canonicalization
|
||||
- [ ] All strings are NFC-normalized before hashing
|
||||
- [ ] Version marker included in all canonical JSON
|
||||
- [ ] Determinism guard enabled for evaluation code
|
||||
- [ ] Golden tests cover all verdict paths
|
||||
- [ ] Chaos tests verify multi-threaded determinism
|
||||
- [ ] Cross-platform tests pass on CI
|
||||
- [ ] Replay API returns identical verdicts
|
||||
- [ ] Documentation references this specification
|
||||
Reference in New Issue
Block a user