372 lines
9.9 KiB
Markdown
372 lines
9.9 KiB
Markdown
# SBOM Determinism Guide
|
|
|
|
> **Sprint**: SPRINT_20260118_025_ReleaseOrchestrator_sbom_release_association
|
|
> **Task**: TASK-025-005
|
|
> **Status**: Living Document
|
|
|
|
This document consolidates all determinism requirements for Stella Ops SBOMs. Deterministic SBOMs are critical for reproducible builds, verifiable release gates, and trust chain integrity.
|
|
|
|
---
|
|
|
|
## 1. Why Determinism Matters
|
|
|
|
### 1.1 Reproducibility
|
|
|
|
Deterministic SBOMs ensure that scanning the same artifact multiple times produces identical output. This is essential for:
|
|
|
|
- **CI/CD Reliability**: Re-running a pipeline should produce the same SBOM hash
|
|
- **Audit Trails**: Evidence submitted to compliance frameworks must be reproducible
|
|
- **Caching**: Content-addressed storage can deduplicate identical SBOMs
|
|
- **Debugging**: Engineers can reproduce exact SBOM state from artifact digest
|
|
|
|
### 1.2 Verifiable Gates
|
|
|
|
Policy gates rely on SBOM hashes for trust verification:
|
|
|
|
```plaintext
|
|
Artifact Digest → SBOM Generation → Canonical Hash → DSSE Signature → Policy Evaluation
|
|
```
|
|
|
|
If SBOM generation is non-deterministic, the same artifact could produce different hashes, breaking:
|
|
- Signature verification (hash mismatch)
|
|
- Gate decisions (different vulnerability sets)
|
|
- Attestation chains (broken proof lineage)
|
|
|
|
### 1.3 Trust Chaining
|
|
|
|
Evidence chains require stable identifiers. A release component's `SbomDigest` must match the SBOM retrieved later for verification. Non-determinism breaks this chain:
|
|
|
|
```plaintext
|
|
Release Finalization: SbomDigest = sha256:abc123...
|
|
Later Verification: sha256(regenerated-sbom) = sha256:xyz789... ← BROKEN
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Canonicalization Rules
|
|
|
|
Stella Ops uses [RFC 8785 JSON Canonicalization Scheme (JCS)](https://tools.ietf.org/html/rfc8785) for deterministic JSON serialization.
|
|
|
|
### 2.1 Core JCS Rules
|
|
|
|
1. **No Whitespace**: Output has no formatting, newlines, or indentation
|
|
2. **Sorted Keys**: Object keys are sorted lexicographically (Unicode code point order)
|
|
3. **Normalized Numbers**: No leading zeros, no trailing decimal zeros, no positive exponent sign
|
|
4. **UTF-8 Encoding**: All strings encoded as UTF-8 without BOM
|
|
5. **No Duplicate Keys**: Object keys must be unique
|
|
|
|
### 2.2 Implementation
|
|
|
|
```csharp
|
|
// Using StellaOps.Canonical.Json
|
|
using StellaOps.Canonical.Json;
|
|
|
|
// Canonicalize raw JSON bytes
|
|
byte[] canonical = CanonJson.CanonicalizeParsedJson(jsonBytes);
|
|
|
|
// Compute SHA-256 of canonical form
|
|
string digest = CanonJson.Sha256Hex(canonical);
|
|
```
|
|
|
|
### 2.3 SBOM-Specific Ordering
|
|
|
|
Beyond JCS, Stella Ops applies additional ordering for SBOM elements:
|
|
|
|
| Element | Ordering Strategy |
|
|
|---------|-------------------|
|
|
| `components` | Sorted by `bom-ref` (Ordinal) |
|
|
| `dependencies` | Sorted by `ref` (Ordinal) |
|
|
| `hashes` | Sorted by `alg` (Ordinal) |
|
|
| `licenses` | Sorted by license ID (Ordinal) |
|
|
| `dependsOn` | Sorted lexicographically |
|
|
|
|
This ensures component order doesn't affect the canonical hash.
|
|
|
|
---
|
|
|
|
## 3. Identity Field Derivation
|
|
|
|
### 3.1 serialNumber (CycloneDX)
|
|
|
|
**Rule**: Use `urn:sha256:<artifact-digest>` format for deterministic identification.
|
|
|
|
```json
|
|
{
|
|
"bomFormat": "CycloneDX",
|
|
"specVersion": "1.7",
|
|
"serialNumber": "urn:sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
|
|
}
|
|
```
|
|
|
|
**Benefits**:
|
|
- Directly ties SBOM identity to the artifact it describes
|
|
- Enables verification: `serialNumber == urn:sha256:$(sha256sum artifact)`
|
|
- Content-addressed: identical artifacts produce identical serialNumbers
|
|
|
|
**Fallback**: If artifact digest is unavailable, UUIDv5 derived from sorted components is used for backwards compatibility. This produces a warning during validation.
|
|
|
|
### 3.2 bom-ref
|
|
|
|
**Rule**: Use deterministic derivation based on purl or component identity.
|
|
|
|
```plaintext
|
|
bom-ref = sha256(purl || name || version)[:12] // truncated hash
|
|
```
|
|
|
|
Or use the package URL directly if available:
|
|
|
|
```json
|
|
{
|
|
"bom-ref": "pkg:npm/lodash@4.17.21",
|
|
"name": "lodash",
|
|
"version": "4.17.21",
|
|
"purl": "pkg:npm/lodash@4.17.21"
|
|
}
|
|
```
|
|
|
|
**Anti-pattern**: Random UUIDs or incrementing counters as bom-ref.
|
|
|
|
### 3.3 SPDX Document Namespace
|
|
|
|
**Rule**: Use artifact-derived namespace for SPDX documents.
|
|
|
|
```plaintext
|
|
DocumentNamespace: https://stella-ops.org/spdx/sha256/<artifact-digest>
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Ephemeral Data Policy
|
|
|
|
Certain SBOM fields are inherently non-deterministic and should be handled carefully.
|
|
|
|
### 4.1 Prunable Fields
|
|
|
|
These fields should be omitted or normalized before hashing:
|
|
|
|
| Field | Treatment |
|
|
|-------|-----------|
|
|
| `metadata.timestamp` | Use fixed epoch or artifact build time |
|
|
| `metadata.tools[].version` | Optional: pin tool versions |
|
|
| File paths (absolute) | Convert to relative paths |
|
|
| Environment variables | Exclude from SBOM |
|
|
|
|
### 4.2 Timestamp Strategy
|
|
|
|
Option 1: **Fixed Epoch** (Recommended)
|
|
```json
|
|
"timestamp": "1970-01-01T00:00:00Z"
|
|
```
|
|
|
|
Option 2: **Artifact Build Time**
|
|
```json
|
|
"timestamp": "<artifact-created-at>"
|
|
```
|
|
|
|
Option 3: **Omit Field**
|
|
```json
|
|
// No timestamp field - allowed by CycloneDX
|
|
```
|
|
|
|
### 4.3 Tool Metadata
|
|
|
|
Tool information aids debugging but affects hashes:
|
|
|
|
```json
|
|
"tools": [
|
|
{
|
|
"vendor": "Stella Ops",
|
|
"name": "stella-scanner",
|
|
"version": "1.0.0" // Pin this version
|
|
}
|
|
]
|
|
```
|
|
|
|
**Recommendation**: Pin tool versions in CI configuration to ensure reproducibility.
|
|
|
|
---
|
|
|
|
## 5. Verification Workflow
|
|
|
|
### 5.1 CLI Commands
|
|
|
|
**Verify Canonical Form**:
|
|
```bash
|
|
stella sbom verify input.json --canonical
|
|
# Exit 0: Input is canonical
|
|
# Exit 1: Input is not canonical (outputs SHA-256 of canonical form)
|
|
```
|
|
|
|
**Canonicalize and Output**:
|
|
```bash
|
|
stella sbom verify input.json --canonical --output bom.canonical.json
|
|
# Writes: bom.canonical.json (canonical SBOM)
|
|
# Writes: bom.canonical.json.sha256 (digest sidecar)
|
|
```
|
|
|
|
**Verbose Output**:
|
|
```bash
|
|
stella sbom verify input.json --canonical --verbose
|
|
# SHA-256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
|
|
# Canonical: yes
|
|
# Input size: 15234 bytes
|
|
# Canonical size: 12456 bytes
|
|
```
|
|
|
|
### 5.2 CI Gate Integration
|
|
|
|
```yaml
|
|
# .gitea/workflows/sbom-gate.yaml
|
|
steps:
|
|
- name: Generate SBOM
|
|
run: stella sbom generate --artifact ${{ artifact }} --output bom.json
|
|
|
|
- name: Verify Canonical
|
|
run: |
|
|
stella sbom verify bom.json --canonical --output bom.canonical.json
|
|
if [ $? -ne 0 ]; then
|
|
echo "SBOM is not in canonical form"
|
|
exit 1
|
|
fi
|
|
|
|
- name: Sign SBOM
|
|
run: stella sbom sign bom.canonical.json --key ${{ signing_key }}
|
|
|
|
- name: Store Digest
|
|
run: |
|
|
DIGEST=$(cat bom.canonical.json.sha256)
|
|
echo "SBOM_DIGEST=$DIGEST" >> $GITHUB_ENV
|
|
```
|
|
|
|
### 5.3 Release Finalization
|
|
|
|
At release finalization, the SBOM digest is captured:
|
|
|
|
```plaintext
|
|
1. Lookup SBOM for artifact: ISbomService.GetByDigestAsync(artifact.Digest)
|
|
2. Extract canonical digest: sbom.SbomSha256
|
|
3. Store on ReleaseComponent: component.SbomDigest = sbom.SbomSha256
|
|
4. Include in release manifest hash computation
|
|
```
|
|
|
|
---
|
|
|
|
## 6. KPIs and Monitoring
|
|
|
|
### 6.1 Byte-Identical Rate
|
|
|
|
**Metric**: Percentage of SBOM regenerations that produce identical bytes.
|
|
|
|
**Target**: 100% for same artifact + same scanner version
|
|
|
|
**Alert**: < 99.9% indicates non-determinism bug
|
|
|
|
### 6.2 Stable-Field Coverage
|
|
|
|
**Metric**: Percentage of SBOM fields that are deterministic.
|
|
|
|
| Field Type | Target |
|
|
|------------|--------|
|
|
| Component identifiers | 100% |
|
|
| Hashes | 100% |
|
|
| Dependencies | 100% |
|
|
| Metadata timestamps | 95%+ (fixed epoch) |
|
|
| Tool versions | 90%+ (pinned) |
|
|
|
|
### 6.3 Gate False Positives
|
|
|
|
**Metric**: Signature verification failures due to hash mismatch.
|
|
|
|
**Target**: 0% for valid artifacts
|
|
|
|
**Investigation**: Any mismatch indicates canonicalization or regeneration issue.
|
|
|
|
---
|
|
|
|
## 7. Troubleshooting
|
|
|
|
### 7.1 Hash Mismatch on Regeneration
|
|
|
|
**Symptom**: Same artifact produces different SBOM hashes.
|
|
|
|
**Causes**:
|
|
1. **Timestamp drift**: Check if `metadata.timestamp` varies
|
|
2. **Tool version change**: Check scanner/tool versions
|
|
3. **Ordering instability**: Check component/dependency ordering
|
|
4. **Unicode normalization**: Check for composed vs decomposed characters
|
|
|
|
**Debug**:
|
|
```bash
|
|
# Compare two SBOMs
|
|
stella sbom diff bom1.json bom2.json
|
|
|
|
# Check canonical form
|
|
stella sbom verify bom1.json --canonical --verbose
|
|
stella sbom verify bom2.json --canonical --verbose
|
|
```
|
|
|
|
### 7.2 serialNumber Warning
|
|
|
|
**Symptom**: Warning `CDX_SERIAL_NON_DETERMINISTIC` during validation.
|
|
|
|
**Cause**: SBOM uses `urn:uuid:` format instead of `urn:sha256:`.
|
|
|
|
**Fix**: Ensure `ArtifactDigest` is provided when generating SBOM:
|
|
|
|
```csharp
|
|
var document = new SbomDocument
|
|
{
|
|
Name = "my-app",
|
|
ArtifactDigest = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
|
|
// ...
|
|
};
|
|
```
|
|
|
|
### 7.3 Canonical vs Pretty-Printed
|
|
|
|
**Symptom**: SBOM appears valid but fails canonical verification.
|
|
|
|
**Cause**: SBOM was saved with indentation/formatting.
|
|
|
|
**Fix**:
|
|
```bash
|
|
# Convert to canonical form
|
|
stella sbom verify input.json --canonical --output output.json
|
|
|
|
# Use output.json for signing and storage
|
|
```
|
|
|
|
### 7.4 Platform-Specific Differences
|
|
|
|
**Symptom**: Same code produces different SBOMs on Windows vs Linux.
|
|
|
|
**Causes**:
|
|
1. **Line endings**: CR+LF vs LF in embedded content
|
|
2. **Path separators**: `\` vs `/` in file paths
|
|
3. **Locale differences**: Number formatting, date formatting
|
|
|
|
**Prevention**:
|
|
- Normalize line endings in CI
|
|
- Use forward slashes for paths
|
|
- Use invariant culture for formatting
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [RFC 8785: JSON Canonicalization Scheme](https://tools.ietf.org/html/rfc8785)
|
|
- [CycloneDX 1.7 Specification](https://cyclonedx.org/docs/1.7/json/)
|
|
- [SPDX 2.3 Specification](https://spdx.github.io/spdx-spec/v2.3/)
|
|
- `docs/modules/scanner/signed-sbom-archive-spec.md` - Archive format
|
|
- `docs/modules/scanner/deterministic-sbom-compose.md` - Composition rules
|
|
- `src/Attestor/__Libraries/StellaOps.Attestor.StandardPredicates/Writers/CycloneDxWriter.cs` - Implementation
|
|
- `src/__Libraries/StellaOps.Canonical.Json/CanonJson.cs` - Canonicalization library
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Date | Change |
|
|
|------|--------|
|
|
| 2026-01-19 | Initial creation (TASK-025-005) |
|