Files
git.stella-ops.org/docs/sboms/DETERMINISM.md

372 lines
9.9 KiB
Markdown

# SBOM Determinism Guide
> **Sprint**: SPRINT_20260118_025_ReleaseOrchestrator_sbom_release_association
> **Task**: TASK-025-005
> **Status**: Living Document
This document consolidates all determinism requirements for Stella Ops SBOMs. Deterministic SBOMs are critical for reproducible builds, verifiable release gates, and trust chain integrity.
---
## 1. Why Determinism Matters
### 1.1 Reproducibility
Deterministic SBOMs ensure that scanning the same artifact multiple times produces identical output. This is essential for:
- **CI/CD Reliability**: Re-running a pipeline should produce the same SBOM hash
- **Audit Trails**: Evidence submitted to compliance frameworks must be reproducible
- **Caching**: Content-addressed storage can deduplicate identical SBOMs
- **Debugging**: Engineers can reproduce exact SBOM state from artifact digest
### 1.2 Verifiable Gates
Policy gates rely on SBOM hashes for trust verification:
```plaintext
Artifact Digest → SBOM Generation → Canonical Hash → DSSE Signature → Policy Evaluation
```
If SBOM generation is non-deterministic, the same artifact could produce different hashes, breaking:
- Signature verification (hash mismatch)
- Gate decisions (different vulnerability sets)
- Attestation chains (broken proof lineage)
### 1.3 Trust Chaining
Evidence chains require stable identifiers. A release component's `SbomDigest` must match the SBOM retrieved later for verification. Non-determinism breaks this chain:
```plaintext
Release Finalization: SbomDigest = sha256:abc123...
Later Verification: sha256(regenerated-sbom) = sha256:xyz789... ← BROKEN
```
---
## 2. Canonicalization Rules
Stella Ops uses [RFC 8785 JSON Canonicalization Scheme (JCS)](https://tools.ietf.org/html/rfc8785) for deterministic JSON serialization.
### 2.1 Core JCS Rules
1. **No Whitespace**: Output has no formatting, newlines, or indentation
2. **Sorted Keys**: Object keys are sorted lexicographically (Unicode code point order)
3. **Normalized Numbers**: No leading zeros, no trailing decimal zeros, no positive exponent sign
4. **UTF-8 Encoding**: All strings encoded as UTF-8 without BOM
5. **No Duplicate Keys**: Object keys must be unique
### 2.2 Implementation
```csharp
// Using StellaOps.Canonical.Json
using StellaOps.Canonical.Json;
// Canonicalize raw JSON bytes
byte[] canonical = CanonJson.CanonicalizeParsedJson(jsonBytes);
// Compute SHA-256 of canonical form
string digest = CanonJson.Sha256Hex(canonical);
```
### 2.3 SBOM-Specific Ordering
Beyond JCS, Stella Ops applies additional ordering for SBOM elements:
| Element | Ordering Strategy |
|---------|-------------------|
| `components` | Sorted by `bom-ref` (Ordinal) |
| `dependencies` | Sorted by `ref` (Ordinal) |
| `hashes` | Sorted by `alg` (Ordinal) |
| `licenses` | Sorted by license ID (Ordinal) |
| `dependsOn` | Sorted lexicographically |
This ensures component order doesn't affect the canonical hash.
---
## 3. Identity Field Derivation
### 3.1 serialNumber (CycloneDX)
**Rule**: Use `urn:sha256:<artifact-digest>` format for deterministic identification.
```json
{
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"serialNumber": "urn:sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
```
**Benefits**:
- Directly ties SBOM identity to the artifact it describes
- Enables verification: `serialNumber == urn:sha256:$(sha256sum artifact)`
- Content-addressed: identical artifacts produce identical serialNumbers
**Fallback**: If artifact digest is unavailable, UUIDv5 derived from sorted components is used for backwards compatibility. This produces a warning during validation.
### 3.2 bom-ref
**Rule**: Use deterministic derivation based on purl or component identity.
```plaintext
bom-ref = sha256(purl || name || version)[:12] // truncated hash
```
Or use the package URL directly if available:
```json
{
"bom-ref": "pkg:npm/lodash@4.17.21",
"name": "lodash",
"version": "4.17.21",
"purl": "pkg:npm/lodash@4.17.21"
}
```
**Anti-pattern**: Random UUIDs or incrementing counters as bom-ref.
### 3.3 SPDX Document Namespace
**Rule**: Use artifact-derived namespace for SPDX documents.
```plaintext
DocumentNamespace: https://stella-ops.org/spdx/sha256/<artifact-digest>
```
---
## 4. Ephemeral Data Policy
Certain SBOM fields are inherently non-deterministic and should be handled carefully.
### 4.1 Prunable Fields
These fields should be omitted or normalized before hashing:
| Field | Treatment |
|-------|-----------|
| `metadata.timestamp` | Use fixed epoch or artifact build time |
| `metadata.tools[].version` | Optional: pin tool versions |
| File paths (absolute) | Convert to relative paths |
| Environment variables | Exclude from SBOM |
### 4.2 Timestamp Strategy
Option 1: **Fixed Epoch** (Recommended)
```json
"timestamp": "1970-01-01T00:00:00Z"
```
Option 2: **Artifact Build Time**
```json
"timestamp": "<artifact-created-at>"
```
Option 3: **Omit Field**
```json
// No timestamp field - allowed by CycloneDX
```
### 4.3 Tool Metadata
Tool information aids debugging but affects hashes:
```json
"tools": [
{
"vendor": "Stella Ops",
"name": "stella-scanner",
"version": "1.0.0" // Pin this version
}
]
```
**Recommendation**: Pin tool versions in CI configuration to ensure reproducibility.
---
## 5. Verification Workflow
### 5.1 CLI Commands
**Verify Canonical Form**:
```bash
stella sbom verify input.json --canonical
# Exit 0: Input is canonical
# Exit 1: Input is not canonical (outputs SHA-256 of canonical form)
```
**Canonicalize and Output**:
```bash
stella sbom verify input.json --canonical --output bom.canonical.json
# Writes: bom.canonical.json (canonical SBOM)
# Writes: bom.canonical.json.sha256 (digest sidecar)
```
**Verbose Output**:
```bash
stella sbom verify input.json --canonical --verbose
# SHA-256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Canonical: yes
# Input size: 15234 bytes
# Canonical size: 12456 bytes
```
### 5.2 CI Gate Integration
```yaml
# .gitea/workflows/sbom-gate.yaml
steps:
- name: Generate SBOM
run: stella sbom generate --artifact ${{ artifact }} --output bom.json
- name: Verify Canonical
run: |
stella sbom verify bom.json --canonical --output bom.canonical.json
if [ $? -ne 0 ]; then
echo "SBOM is not in canonical form"
exit 1
fi
- name: Sign SBOM
run: stella sbom sign bom.canonical.json --key ${{ signing_key }}
- name: Store Digest
run: |
DIGEST=$(cat bom.canonical.json.sha256)
echo "SBOM_DIGEST=$DIGEST" >> $GITHUB_ENV
```
### 5.3 Release Finalization
At release finalization, the SBOM digest is captured:
```plaintext
1. Lookup SBOM for artifact: ISbomService.GetByDigestAsync(artifact.Digest)
2. Extract canonical digest: sbom.SbomSha256
3. Store on ReleaseComponent: component.SbomDigest = sbom.SbomSha256
4. Include in release manifest hash computation
```
---
## 6. KPIs and Monitoring
### 6.1 Byte-Identical Rate
**Metric**: Percentage of SBOM regenerations that produce identical bytes.
**Target**: 100% for same artifact + same scanner version
**Alert**: < 99.9% indicates non-determinism bug
### 6.2 Stable-Field Coverage
**Metric**: Percentage of SBOM fields that are deterministic.
| Field Type | Target |
|------------|--------|
| Component identifiers | 100% |
| Hashes | 100% |
| Dependencies | 100% |
| Metadata timestamps | 95%+ (fixed epoch) |
| Tool versions | 90%+ (pinned) |
### 6.3 Gate False Positives
**Metric**: Signature verification failures due to hash mismatch.
**Target**: 0% for valid artifacts
**Investigation**: Any mismatch indicates canonicalization or regeneration issue.
---
## 7. Troubleshooting
### 7.1 Hash Mismatch on Regeneration
**Symptom**: Same artifact produces different SBOM hashes.
**Causes**:
1. **Timestamp drift**: Check if `metadata.timestamp` varies
2. **Tool version change**: Check scanner/tool versions
3. **Ordering instability**: Check component/dependency ordering
4. **Unicode normalization**: Check for composed vs decomposed characters
**Debug**:
```bash
# Compare two SBOMs
stella sbom diff bom1.json bom2.json
# Check canonical form
stella sbom verify bom1.json --canonical --verbose
stella sbom verify bom2.json --canonical --verbose
```
### 7.2 serialNumber Warning
**Symptom**: Warning `CDX_SERIAL_NON_DETERMINISTIC` during validation.
**Cause**: SBOM uses `urn:uuid:` format instead of `urn:sha256:`.
**Fix**: Ensure `ArtifactDigest` is provided when generating SBOM:
```csharp
var document = new SbomDocument
{
Name = "my-app",
ArtifactDigest = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
// ...
};
```
### 7.3 Canonical vs Pretty-Printed
**Symptom**: SBOM appears valid but fails canonical verification.
**Cause**: SBOM was saved with indentation/formatting.
**Fix**:
```bash
# Convert to canonical form
stella sbom verify input.json --canonical --output output.json
# Use output.json for signing and storage
```
### 7.4 Platform-Specific Differences
**Symptom**: Same code produces different SBOMs on Windows vs Linux.
**Causes**:
1. **Line endings**: CR+LF vs LF in embedded content
2. **Path separators**: `\` vs `/` in file paths
3. **Locale differences**: Number formatting, date formatting
**Prevention**:
- Normalize line endings in CI
- Use forward slashes for paths
- Use invariant culture for formatting
---
## References
- [RFC 8785: JSON Canonicalization Scheme](https://tools.ietf.org/html/rfc8785)
- [CycloneDX 1.6 Specification](https://cyclonedx.org/docs/1.6/json/)
- [SPDX 2.3 Specification](https://spdx.github.io/spdx-spec/v2.3/)
- `docs/modules/scanner/signed-sbom-archive-spec.md` - Archive format
- `docs/modules/scanner/deterministic-sbom-compose.md` - Composition rules
- `src/Attestor/__Libraries/StellaOps.Attestor.StandardPredicates/Writers/CycloneDxWriter.cs` - Implementation
- `src/__Libraries/StellaOps.Canonical.Json/CanonJson.cs` - Canonicalization library
---
## Changelog
| Date | Change |
|------|--------|
| 2026-01-19 | Initial creation (TASK-025-005) |