Files
git.stella-ops.org/docs/technical/architecture/determinism-specification.md
StellaOps Bot 907783f625 Add property-based tests for SBOM/VEX document ordering and Unicode normalization determinism
- Implement `SbomVexOrderingDeterminismProperties` for testing component list and vulnerability metadata hash consistency.
- Create `UnicodeNormalizationDeterminismProperties` to validate NFC normalization and Unicode string handling.
- Add project file for `StellaOps.Testing.Determinism.Properties` with necessary dependencies.
- Introduce CI/CD template validation tests including YAML syntax checks and documentation content verification.
- Create validation script for CI/CD templates ensuring all required files and structures are present.
2025-12-26 15:17:58 +02:00

11 KiB

Determinism Specification

Status: Living document Version: 1.0 Created: 2025-12-26 Owners: Policy Guild, Platform Guild Related: CONSOLIDATED - Deterministic Evidence and Verdict Architecture.md


Overview

This specification defines the determinism guarantees for StellaOps verdict computation, including digest algorithms, canonicalization rules, and migration strategies. All services that produce or verify verdicts MUST comply with this specification.


1. Digest Algorithms

1.1 VerdictId

Purpose: Uniquely identifies a verdict computation result.

Algorithm:

VerdictId = SHA256(CanonicalJson(verdict_payload))

Input Structure:

{
  "_canonVersion": "stella:canon:v1",
  "evidence_refs": ["sha256:..."],
  "explanations": [...],
  "risk_score": 42,
  "status": "pass",
  "unknowns_count": 0
}

Implementation: StellaOps.Attestor.ProofChain.Identifiers.VerdictIdGenerator


1.2 EvidenceId

Purpose: Uniquely identifies an evidence artifact (SBOM, VEX, graph, etc.).

Algorithm:

EvidenceId = SHA256(raw_bytes)

Notes:

  • For JSON artifacts, use JCS-canonical bytes
  • For binary artifacts, use raw bytes
  • For multi-file bundles, use Merkle root

Implementation: StellaOps.Attestor.ProofChain.Identifiers.EvidenceIdGenerator


1.3 GraphRevisionId

Purpose: Uniquely identifies a call graph or reachability graph snapshot.

Algorithm:

GraphRevisionId = SHA256(CanonicalJson({
  nodes: SortedBy(nodes, n => n.id),
  edges: SortedBy(edges, e => (e.source, e.target, e.kind))
}))

Sorting Rules:

  • Nodes: lexicographic by id (Ordinal)
  • Edges: tuple sort by (source, target, kind)

Implementation: StellaOps.Scanner.CallGraph.Identifiers.GraphRevisionIdGenerator


1.4 ManifestId

Purpose: Uniquely identifies a scan manifest (all inputs for an evaluation).

Algorithm:

ManifestId = SHA256(CanonicalJson(manifest_payload))

Input Structure:

{
  "_canonVersion": "stella:canon:v1",
  "engine_version": "1.0.0",
  "feeds_snapshot_sha256": "sha256:...",
  "options_hash": "sha256:...",
  "policy_bundle_sha256": "sha256:...",
  "policy_semver": "2025.12",
  "reach_subgraph_sha256": "sha256:...",
  "sbom_sha256": "sha256:...",
  "vex_set_sha256": ["sha256:..."]
}

Implementation: StellaOps.Replay.Core.ManifestIdGenerator


1.5 PolicyBundleId

Purpose: Uniquely identifies a compiled policy bundle.

Algorithm:

PolicyBundleId = SHA256(CanonicalJson({
  rules: SortedBy(rules, r => r.id),
  version: semver,
  lattice_config: {...}
}))

Implementation: StellaOps.Policy.Engine.PolicyBundleIdGenerator


2. Canonicalization Rules

2.1 JSON Canonicalization (JCS - RFC 8785)

All JSON artifacts MUST be canonicalized before hashing or signing.

Rules:

  1. Object keys sorted lexicographically (Ordinal comparison)
  2. No whitespace between tokens
  3. No trailing commas
  4. UTF-8 encoding without BOM
  5. Numbers: IEEE 754 double-precision, no unnecessary trailing zeros, no exponent for integers ≤ 10^21

Example:

// Before
{ "b": 1, "a": 2, "c": { "z": true, "y": false } }

// After (canonical)
{"a":2,"b":1,"c":{"y":false,"z":true}}

Implementation: StellaOps.Canonical.Json.Rfc8785JsonCanonicalizer


2.2 String Normalization (Unicode NFC)

All string values MUST be normalized to Unicode NFC before canonicalization.

Why: Different Unicode representations of the same visual character produce different hashes.

Example:

// Before: é as e + combining acute (U+0065 U+0301)
// After NFC: é as single codepoint (U+00E9)

Implementation: StellaOps.Resolver.NfcStringNormalizer


2.3 Version Markers

All canonical JSON MUST include a version marker for migration safety:

{
  "_canonVersion": "stella:canon:v1",
  ...
}

Current Version: stella:canon:v1

Migration Path: When canonicalization rules change:

  1. Introduce new version marker (e.g., stella:canon:v2)
  2. Support both versions during transition period
  3. Re-hash legacy artifacts once, store old_hash → new_hash mapping
  4. Deprecate old version after migration window

3. Determinism Guards

3.1 Forbidden Operations

The following operations are FORBIDDEN during verdict evaluation:

Operation Reason Alternative
DateTime.Now / DateTimeOffset.Now Non-deterministic Use TimeProvider from manifest
Random / Guid.NewGuid() Non-deterministic Use content-based IDs
Dictionary<K,V> iteration Unstable order Use SortedDictionary or explicit ordering
HashSet<T> iteration Unstable order Use SortedSet or explicit ordering
Parallel.ForEach (unordered) Race conditions Use ordered parallel with merge
HTTP calls External dependency Use pre-fetched snapshots
File system reads External dependency Use CAS-cached blobs

3.2 Runtime Enforcement

The DeterminismGuard class provides runtime enforcement:

using StellaOps.Policy.Engine.DeterminismGuard;

// Wraps evaluation in a determinism context
var result = await DeterminismGuard.ExecuteAsync(async () =>
{
    // Any forbidden operation throws DeterminismViolationException
    return await evaluator.EvaluateAsync(manifest);
});

Implementation: StellaOps.Policy.Engine.DeterminismGuard.DeterminismGuard

3.3 Compile-Time Enforcement (Planned)

A Roslyn analyzer will flag determinism violations at compile time:

// This will produce a compiler warning/error
public Verdict Evaluate(Manifest m)
{
    var now = DateTime.Now; // STELLA001: Forbidden in deterministic context
    ...
}

Status: Planned for Q1 2026 (SPRINT_20251226_007 DET-GAP-18)


4. Replay Contract

4.1 Requirements

For deterministic replay, the following MUST be pinned and recorded:

Input Storage Notes
Feed snapshots CAS by hash CVE, VEX advisories
Scanner version Manifest Exact semver
Rule packs CAS by hash Policy rules
Lattice/policy version Manifest Semver
SBOM generator version Manifest For generator-specific quirks
Reachability engine settings Manifest Language analyzers, depth limits
Merge semantics ID Manifest Lattice configuration

4.2 Replay Verification

// Load original manifest
var manifest = await manifestStore.GetAsync(manifestId);

// Replay evaluation
var replayVerdict = await engine.ReplayAsync(manifest);

// Verify determinism
var originalHash = CanonJson.Hash(originalVerdict);
var replayHash = CanonJson.Hash(replayVerdict);

if (originalHash != replayHash)
{
    throw new DeterminismViolationException(
        $"Replay produced different verdict: {originalHash} vs {replayHash}");
}

4.3 Replay API

GET /replay?manifest_sha=sha256:...

Response:

{
  "verdict": {...},
  "replay_manifest_sha": "sha256:...",
  "verdict_sha": "sha256:...",
  "determinism_verified": true
}

5. Testing Requirements

5.1 Golden Tests

Every service that produces verdicts MUST maintain golden test fixtures:

tests/fixtures/golden/
├── manifest-001.json
├── verdict-001.json (expected)
├── manifest-002.json
├── verdict-002.json (expected)
└── ...

Test Pattern:

[Theory]
[MemberData(nameof(GoldenTestCases))]
public async Task Verdict_MatchesGolden(string manifestPath, string expectedPath)
{
    var manifest = await LoadManifest(manifestPath);
    var actual = await engine.EvaluateAsync(manifest);
    var expected = await File.ReadAllBytesAsync(expectedPath);
    
    Assert.Equal(expected, CanonJson.Canonicalize(actual));
}

5.2 Chaos Tests

Chaos tests verify determinism under varying conditions:

[Fact]
public async Task Verdict_IsDeterministic_UnderChaos()
{
    var manifest = CreateTestManifest();
    var baseline = await engine.EvaluateAsync(manifest);
    
    // Vary conditions
    for (int i = 0; i < 100; i++)
    {
        Environment.SetEnvironmentVariable("RANDOM_SEED", i.ToString());
        ThreadPool.SetMinThreads(i % 16 + 1, i % 16 + 1);
        
        var verdict = await engine.EvaluateAsync(manifest);
        
        Assert.Equal(
            CanonJson.Hash(baseline),
            CanonJson.Hash(verdict));
    }
}

5.3 Cross-Platform Tests

Verdicts MUST be identical across:

  • Windows / Linux / macOS
  • x64 / ARM64
  • .NET versions (within major version)

6. Troubleshooting Guide

6.1 "Why are my verdicts different?"

Symptom: Same inputs produce different verdict hashes.

Checklist:

  1. Are all inputs content-addressed? Check manifest hashes.
  2. Is canonicalization version the same? Check _canonVersion.
  3. Is engine version the same? Check engine_version in manifest.
  4. Are feeds from the same snapshot? Check feeds_snapshot_sha256.
  5. Is policy bundle the same? Check policy_bundle_sha256.

Debug Logging: Enable pre-canonical hash logging to compare inputs:

{
  "Logging": {
    "DeterminismDebug": {
      "LogPreCanonicalHashes": true
    }
  }
}

6.2 Common Causes

Symptom Likely Cause Fix
Different verdict hash, same risk score Explanation order Sort explanations by template + params
Different verdict hash, same findings Evidence ref order Sort evidence_refs lexicographically
Different graph hash Node iteration order Use SortedDictionary for nodes
Different VEX merge Feed freshness Pin feeds to exact snapshot

6.3 Reporting Issues

When reporting determinism issues, include:

  1. Both manifest JSONs (canonical form)
  2. Both verdict JSONs (canonical form)
  3. Engine versions
  4. Platform details (OS, architecture, .NET version)
  5. Pre-canonical hash logs (if available)

7. Migration History

v1 (2025-12-26)

  • Initial specification
  • RFC 8785 JCS + Unicode NFC
  • Version marker: stella:canon:v1

Appendix A: Reference Implementations

Component Location
JCS Canonicalizer src/__Libraries/StellaOps.Canonical.Json/
NFC Normalizer src/__Libraries/StellaOps.Resolver/NfcStringNormalizer.cs
Determinism Guard src/Policy/__Libraries/StellaOps.Policy.Engine/DeterminismGuard/
Content-Addressed IDs src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Identifiers/
Replay Core src/__Libraries/StellaOps.Replay.Core/
Golden Test Base src/__Libraries/StellaOps.TestKit/Determinism/

Appendix B: Compliance Checklist

Services producing verdicts MUST complete this checklist:

  • All JSON outputs use JCS canonicalization
  • All strings are NFC-normalized before hashing
  • Version marker included in all canonical JSON
  • Determinism guard enabled for evaluation code
  • Golden tests cover all verdict paths
  • Chaos tests verify multi-threaded determinism
  • Cross-platform tests pass on CI
  • Replay API returns identical verdicts
  • Documentation references this specification