23 KiB
Here’s a crisp, practical idea to harden Stella Ops: make the SBOM → VEX pipeline deterministic and verifiable by treating it as a series of signed, hash‑anchored state transitions—so every rebuild yields the same provenance envelope you can mathematically check across air‑gapped nodes.
What this means (plain English)
- SBOM (what’s inside): list of packages, files, and their hashes.
- VEX (what’s affected): statements like “CVE‑2024‑1234 is not exploitable here because X.”
- Deterministic: same inputs → byte‑identical outputs, every time.
- Verifiable transitions: each step (ingest → normalize → resolve → reachability → VEX) emits a signed attestation that pins its inputs/outputs by content hash.
Minimal design you can drop into Stella Ops
-
Canonicalize everything
- Sort JSON keys, normalize whitespace/line endings.
- Freeze timestamps by recording them only in an outer envelope (not inside payloads used for hashing).
-
Edge‑level attestations
-
For each dependency edge in the reachability graph
(nodeA → nodeB via symbol S), emit a tiny DSSE payload:{edge_id, from_purl, to_purl, rule_id, witness_hashes[]}
-
Hash is over the canonical payload; sign via DSSE (Sigstore or your Authority PKI).
-
-
Step attestations (pipeline states)
-
For each stage (
Sbomer,Scanner,Vexer/Excititor,Concelier):- Emit
predicateType:stellaops.dev/attestations/<stage> - Include
input_digests[],output_digests[],parameters_digest,tool_version - Sign with stage key; record the public key (or cert chain) in Authority.
- Emit
-
-
Provenance envelope
-
Build a top‑level DSSE that includes:
- Merkle root of all edge attestations.
- Merkle roots of each stage’s outputs.
- Mapping table of
PURL ↔ build‑ID (ELF/PE/Mach‑O)for stable identity.
-
-
Replay manifest
-
A single, declarative file that pins:
- Feeds (CPE/CVE/VEX sources + exact digests)
- Rule/lattice versions and parameters
- Container images + layers’ SHA256
- Platform toggles (e.g., PQC on/off)
-
Running replay on this manifest must reproduce the same Merkle roots.
-
-
Air‑gap sync
- Export only the envelopes + Merkle roots + public certs.
- On the target, verify chains and recompute roots from the replay manifest—no internet required.
Slim C# shapes (DTOs) for DSSE predicates
public record EdgeAttestation(
string EdgeId,
string FromPurl,
string ToPurl,
string RuleId,
string[] WitnessHashes, // e.g., CFG slice, symbol tables, lineage JSON
string CanonicalAlgo = "SHA256");
public record StepAttestation(
string Stage, // "Sbomer" | "Scanner" | "Excititor" | "Concelier"
string ToolVersion,
string[] InputDigests,
string[] OutputDigests,
string ParametersDigest, // hash of canonicalized params
DateTimeOffset StartedAt,
DateTimeOffset FinishedAt);
public record ProvenanceEnvelope(
string ReplayManifestDigest,
string EdgeMerkleRoot,
Dictionary<string,string> StageMerkleRoots, // stage -> root
Dictionary<string,string> PurlToBuildId); // stable identity map
Determinism checklist (quick win)
- Canonical JSON (stable key order) everywhere.
- No wall‑clock timestamps inside hashed payloads.
- Only reference inputs by digest, never by URL.
- Lock rule sets / lattice policies by digest.
- Normalize file paths (POSIX style) and line endings.
- Container images by digest, not tags.
Why it’s worth it
- Auditability: every VEX claim is backed by a verifiable graph path with signed edges.
- Reproducibility: regulators (and customers) can replay your exact scan and get identical roots.
- Integrity at scale: air‑gapped sites can validate without trusting your network—just the math.
If you want, I’ll turn this into ready‑to‑paste .proto contracts + a small .NET library (StellaOps.Attestations) with DSSE signing/verification helpers and Merkle builders.
Got it — let’s turn that sketch into a concrete implementation plan your devs can actually execute.
I’ll structure this as:
- Objectives & scope
- High-level architecture
- Workstreams & milestones
- Detailed tasks per workstream
- Rollout, testing, and ops
You can copy/paste this straight into a tracking system and break it into tickets.
1. Objectives & scope
Primary objectives
-
Make the SBOM → VEX pipeline deterministic:
- Same inputs (SBOM, feeds, rules, images) → bit‑identical provenance & VEX outputs.
-
Make the pipeline verifiable:
- Each step emits signed attestations with content hashes.
- Attestations are chainable from raw SBOM to VEX & reports.
-
Make outputs replayable and air‑gap friendly:
- A single Replay Manifest can reconstruct pipeline outputs on another node and verify Merkle roots match.
Out of scope (for this phase)
- New vulnerability scanning engines.
- New UI views (beyond minimal “show provenance / verify”).
- Key management redesign (we’ll integrate with existing Authority / PKI).
2. High-level architecture
New shared library
Library name (example): StellaOps.Attestations (or similar)
Provides:
-
Canonical serialization:
- Deterministic JSON encoder (stable key ordering, normalized formatting).
-
Hashing utilities:
- SHA‑256 (and extension point for future algorithms).
-
DSSE wrapper:
Sign(payload, keyRef)→ DSSE envelope.Verify(dsse, keyResolver)→ payload + key metadata.
-
Merkle utilities:
- Build Merkle trees from lists of digests.
-
DTOs:
EdgeAttestation,StepAttestation,ProvenanceEnvelope,ReplayManifest.
Components that will integrate the library
- Sbomer – outputs SBOM + StepAttestation.
- Scanner – consumes SBOM, produces findings + StepAttestation.
- Excititor / Vexer – takes findings + reachability graph → VEX + EdgeAttestations + StepAttestation.
- Concelier – takes SBOM + VEX → reports + StepAttestation + ProvenanceEnvelope.
- Authority – manages keys and verification (possibly separate microservice or shared module).
3. Workstreams & milestones
Break this into parallel workstreams:
- WS1 – Canonicalization & hashing
- WS2 – DSSE & key integration
- WS3 – Attestation schemas & Merkle envelopes
- WS4 – Pipeline integration (Sbomer, Scanner, Excititor, Concelier)
- WS5 – Replay engine & CLI
- WS6 – Verification / air‑gap support
- WS7 – Testing, observability, and rollout
Each workstream below has concrete tasks + “Definition of Done” (DoD).
4. Detailed tasks per workstream
WS1 – Canonicalization & hashing
Goal: A small, well-tested core that makes everything deterministic.
Tasks
-
Define canonical JSON format
-
Decision doc:
- Use UTF‑8.
- No insignificant whitespace.
- Keys always sorted lexicographically.
- No embedded timestamps or non-deterministic fields inside hashed payloads.
-
Implement:
CanonicalJsonSerializer.Serialize<T>(T value) : string/byte[].
-
-
Define deterministic string normalization rules
-
Normalize line endings in any text:
\nonly. -
Normalize paths:
- Use POSIX style
/. - Remove trailing slashes (except root).
- Use POSIX style
-
Normalize numeric formatting:
- No scientific notation.
- Fixed decimal rules, if relevant.
-
-
Implement hashing helper
-
Digesttype:public record Digest(string Algorithm, string Value); // Algorithm = "SHA256" -
Hashing.ComputeDigest(byte[] data) : Digest. -
Hashing.ComputeDigestCanonical<T>(T value) : Digest(serialize canonically then hash).
-
-
Add unit tests & golden files
-
Golden tests:
- Same input object → same canonical JSON & digest, regardless of property order, culture, runtime.
- Hash of JSON must match pre‑computed values (store
.goldenfiles in repo).
-
Edge cases:
- Unicode strings.
- Nested objects.
- Arrays with different order (order preserved, but ensure same input → same output).
-
DoD
-
Canonical serializer & hashing utilities available in
StellaOps.Attestations. -
Test suite with >95% coverage for serializer + hashing.
-
Simple CLI or test harness:
stella-attest dump-canonical <json>→ prints canonical JSON & digest.
WS2 – DSSE & key integration
Goal: Standardize how we sign and verify attestations.
Tasks
-
Select DSSE representation
-
Use JSON DSSE envelope:
{ "payloadType": "stellaops.dev/attestation/edge@v1", "payload": "<base64 of canonical JSON>", "signatures": [{ "keyid": "...", "sig": "..." }] }
-
-
Implement DSSE API in library
-
Interfaces:
public interface ISigner { Task<Signature> SignAsync(byte[] payload, string keyRef); } public interface IVerifier { Task<VerificationResult> VerifyAsync(Envelope envelope); } -
Helpers:
Dsse.CreateEnvelope(payloadType, canonicalPayloadBytes, signer, keyRef).Dsse.VerifyEnvelope(envelope, verifier).
-
-
Integrate with Authority / PKI
-
Add
AuthoritySigner/AuthorityVerifierimplementations:-
keyRefis an ID understood by Authority (service name, stage name, or explicit key ID). -
Ensure we can:
- Request signing of arbitrary bytes.
- Resolve the public key used to sign.
-
-
-
Key usage conventions
-
Define mapping:
sbomerkey.scannerkey.excititorkey.concelierkey.
-
Optional: use distinct keys per environment (dev/stage/prod) but include environment in attestation metadata.
-
-
Tests
-
Round-trip: sign then verify sample payloads.
-
Negative tests:
- Tampered payload → verification fails.
- Tampered signatures → verification fails.
-
DoD
-
DSSE envelope creation/verification implemented and tested.
-
Authority integration with mock/fake for unit tests.
-
Documentation for developers:
- “How to emit an attestation: 5‑line example.”
WS3 – Attestation schemas & Merkle envelopes
Goal: Standardize the data models for all attestations and envelopes.
Tasks
-
Define EdgeAttestation schema
Fields (concrete draft):
public record EdgeAttestation( string EdgeId, // deterministic ID string FromPurl, // e.g. pkg:maven/... string ToPurl, string? FromSymbol, // optional (symbol, API, entry point) string? ToSymbol, string RuleId, // which reachability rule fired Digest[] WitnessDigests, // digests of evidence payloads string CanonicalAlgo = "SHA256" );-
EdgeIdconvention (document in ADR):- E.g.
sha256(fromPurl + "→" + toPurl + "|" + ruleId + "|" + fromSymbol + "|" + toSymbol)(before hashing, canonicalize strings).
- E.g.
-
-
Define StepAttestation schema
public record StepAttestation( string Stage, // "Sbomer" | "Scanner" | ... string ToolVersion, Digest[] InputDigests, // SBOM digest, feed digests, image digests Digest[] OutputDigests, // outputs of this stage Digest ParametersDigest, // hash of canonicalized params (flags, rule sets, etc.) DateTimeOffset StartedAt, DateTimeOffset FinishedAt, string Environment, // dev/stage/prod/airgap string NodeId // machine or logical node name );- Note:
StartedAt/FinishedAtare not included in any hashed payload used for determinism; they’re OK as metadata but not part of Merkle roots.
- Note:
-
Define ProvenanceEnvelope schema
public record ProvenanceEnvelope( Digest ReplayManifestDigest, Digest EdgeMerkleRoot, Dictionary<string, Digest> StageMerkleRoots, // stage -> root digest Dictionary<string, string> PurlToBuildId // PURL -> build-id string ); -
Define ReplayManifest schema
public record ReplayManifest( string PipelineVersion, Digest SbomDigest, Digest[] FeedDigests, // CVE, CPE, VEX sources Digest[] RuleSetDigests, // reachability + policy rules Digest[] ContainerImageDigests, string[] PlatformToggles // e.g. ["pqc=on", "mode=strict"] ); -
Implement Merkle utilities
-
Provide:
-
Digest Merkle.BuildRoot(IEnumerable<Digest> leaves). -
Deterministic rules:
- Sort leaves by
Value(digest hex string) before building. - If odd number of leaves, duplicate last leaf or define explicit strategy and document it.
- Sort leaves by
-
-
Tie into:
- Edges →
EdgeMerkleRoot. - Per stage attestation list → stage‑specific root.
- Edges →
-
-
Schema documentation
-
Markdown/ADR file:
- Field definitions.
- Which fields are hashed vs. metadata only.
- How
EdgeId, Merkle roots, and PURL→BuildId mapping are generated.
-
DoD
- DTOs implemented in shared library.
- Merkle root builder implemented and tested.
- Schema documented and shared across teams.
WS4 – Pipeline integration
Goal: Each stage emits StepAttestations and (for reachability) EdgeAttestations, and Concelier emits ProvenanceEnvelope.
We’ll do this stage by stage.
WS4.A – Sbomer integration
Tasks
-
Identify SBOM hash:
- After generating SBOM, serialize canonically and compute
Digest.
- After generating SBOM, serialize canonically and compute
-
Collect inputs:
- Input sources digests (e.g., image digests, source artifact digests).
-
Collect parameters:
-
All relevant configuration into a
SbomerParamsobject:- E.g.
scanDepth,excludedPaths,sbomFormat.
- E.g.
-
Canonicalize and compute
ParametersDigest.
-
-
Emit StepAttestation:
-
Create DTO.
-
Canonicalize & hash for Merkle tree use.
-
Wrap in DSSE envelope with
payloadType = "stellaops.dev/attestation/step@v1". -
Store envelope:
- Append to standard location (e.g.
<artifact-root>/attestations/sbomer-step.dsse.json).
- Append to standard location (e.g.
-
-
Add config flag:
--emit-attestations(default: off initially, later: on by default).
WS4.B – Scanner integration
Tasks
-
Take SBOM digest as an InputDigest.
-
Collect feed digests:
- Each CVE/CPE/VEX feed file → canonical hash.
-
Compute
ScannerParamsdigest:- E.g.
severityThreshold,downloaderOptions,scanMode.
- E.g.
-
Emit StepAttestation (same pattern as Sbomer).
-
Tag scanner outputs:
- The vulnerability findings file(s) should be content‑addressable (filename includes digest or store meta manifest mapping).
WS4.C – Excititor/Vexer integration
Tasks
-
Integrate reachability graph emission:
-
From final graph, generate EdgeAttestations:
-
One per edge
(from, to, rule). -
For each edge, compute witness digests:
-
E.g. serialized CFG slice, symbol table snippet, call chain.
-
Those witness artifacts should be stored under canonical paths:
<artifact-root>/witnesses/<edge-id>/<witness-type>.json.
-
-
-
-
Canonicalize & hash each EdgeAttestation.
-
Build Merkle root over all edge attestation digests.
-
Emit Excititor StepAttestation:
- Inputs: SBOM, scanner findings, feeds, rule sets.
- Outputs: VEX document(s), EdgeMerkleRoot digest.
- Params: reachability flags, rule definitions digest.
-
Store:
-
Edge attestations:
-
Either:
- One DSSE per edge (possibly a lot of files).
- Or a batch file containing a list of attestations wrapped into a single DSSE.
-
Prefer: batch for performance; define
EdgeAttestationBatchDTO.
-
-
VEX output(s) with deterministic file naming.
-
WS4.D – Concelier integration
Tasks
-
Gather all StepAttestations & EdgeMerkleRoot:
- Input: references (paths) to stage outputs + their DSSE envelopes.
-
Build
PurlToBuildIdmap:-
For each component:
- Extract PURL from SBOM.
- Extract build-id from binary metadata.
-
-
Build StageMerkleRoots:
- For each stage, compute Merkle root of its StepAttestations.
- In simplest version: 1 step attestation per stage → root is just its digest.
-
Construct ReplayManifest:
- From final pipeline context (SBOM, feeds, rules, images, toggles).
- Compute
ReplayManifestDigestand store manifest file (e.g.replay-manifest.json).
-
Construct ProvenanceEnvelope:
- Fill fields with digests.
- Canonicalize and sign with Concelier key (DSSE).
-
Store outputs:
provenance-envelope.dsse.json.replay-manifest.json(unsigned) + optional signed manifest.
WS4 DoD
-
All four stages can:
- Emit StepAttestations (and EdgeAttestations where applicable).
- Produce a final ProvenanceEnvelope.
-
Feature can be toggled via config.
-
Pipelines run end‑to‑end in CI with attestation emission enabled.
WS5 – Replay engine & CLI
Goal: Given a ReplayManifest, re‑run the pipeline and verify that all Merkle roots and digests match.
Tasks
-
Implement a Replay Orchestrator library:
-
Input:
- Path/URL to
replay-manifest.json.
- Path/URL to
-
Responsibilities:
-
Verify manifest’s own digest (if signed).
-
Fetch or confirm presence of:
- SBOM.
- Feeds.
- Rule sets.
- Container images.
-
Spin up each stage with parameters reconstructed from the manifest:
- Ensure versions and flags match.
-
-
Implementation: shared orchestration code reusing existing pipeline entrypoints.
-
-
Implement CLI tool:
stella-attest replay-
Commands:
-
stella-attest replay run --manifest <path> --out <dir>.- Runs pipeline and emits fresh attestations.
-
stella-attest replay verify --manifest <path> --envelope <path> --attest-dir <dir>:-
Compares:
- Replay Merkle roots vs.
ProvenanceEnvelope. - Stage roots.
- Edge root.
- Replay Merkle roots vs.
-
Emits a verification report (JSON + human-readable).
-
-
-
-
Verification logic:
-
Steps:
-
Parse ProvenanceEnvelope (verify DSSE signature).
-
Compute Merkle roots from the new replay’s attestations.
-
Compare:
ReplayManifestDigestin envelope vs digest of manifest used.EdgeMerkleRootvs recalculated root.StageMerkleRoots[stage]vs recalculated stage roots.
-
Output:
verified = true/false.- If false, list mismatches with digests.
-
-
-
Tests:
- Replay the same pipeline on same machine → must match.
- Replay on different machine (CI job simulating different environment) → must match.
- Injected change in feed or rule set → deliberate mismatch detected.
DoD
stella-attest replayworks locally and in CI.- Documentation: “How to replay a run and verify determinism.”
WS6 – Verification / air‑gap support
Goal: Allow verification in environments without outward network access.
Tasks
-
Define export bundle format
-
Bundle includes:
provenance-envelope.dsse.json.replay-manifest.json.- All DSSE attestation files.
- All witness artifacts (or digests only if storage is local).
- Public key material or certificate chains needed to verify signatures.
-
Represent as:
- Tarball or zip: e.g.
stella-bundle-<pipeline-id>.tar.gz. - Manifest file listing contents and digests.
- Tarball or zip: e.g.
-
-
Implement exporter
-
CLI:
stella-attest export --run-id <id> --out bundle.tar.gz. -
Internally:
- Collect paths to all relevant artifacts for the run.
- Canonicalize folder structure (e.g.
/sbom,/scanner,/vex,/attestations,/witnesses).
-
-
Implement offline verifier
-
CLI:
stella-attest verify-bundle --bundle <path>. -
Steps:
-
Unpack bundle to temp dir.
-
Verify:
- Attestation signatures via included public keys.
- Merkle roots and digests as in WS5.
-
Do not attempt network calls.
-
-
-
Documentation / runbook
-
“How to verify a Stella Ops run in an air‑gapped environment.”
-
Include:
- How to move bundles (e.g. via USB, secure file transfer).
- What to do if verification fails.
-
DoD
- Bundles can be exported from a connected environment and verified in a disconnected environment using only the bundle contents.
WS7 – Testing, observability, and rollout
Goal: Make this robust, observable, and gradually enable in prod.
Tasks
-
Integration tests
-
Full pipeline scenario:
-
Start from known SBOM + feeds + rules.
-
Run pipeline twice and:
- Compare final outputs:
ProvenanceEnvelope, VEX doc, final reports. - Compare digests & Merkle roots.
- Compare final outputs:
-
-
Edge cases:
- Different machines (simulate via CI jobs with different runners).
- Missing or corrupted attestation file → verify that verification fails with clear error.
-
-
Property-based tests (optional but great)
-
Generate random but structured SBOMs and graphs.
-
Ensure:
- Canonicalization is idempotent.
- Hashing is consistent.
- Merkle roots are stable for repeated runs.
-
-
Observability
-
Add logging around:
- Attestation creation & signing.
- Verification failures.
- Replay runs.
-
Add metrics:
- Number of attestations per run.
- Time spent in canonicalization / hashing / signing.
- Verification success/fail counts.
-
-
Rollout plan
-
Phase 0 (dev only):
- Attestation emission enabled by default in dev.
- Verification run in CI only.
-
Phase 1 (staging):
-
Enable dual‑path:
- Old behaviour + new attestations.
-
Run replay+verify in staging pipeline.
-
-
Phase 2 (production, non‑enforced):
- Enable attestation emission in prod.
- Verification runs “side‑car” but does not block.
-
Phase 3 (production, enforced):
-
CI/CD gates:
-
Fails if:
- Signatures invalid.
- Merkle roots mismatch.
- Envelope/manifest missing.
-
-
-
-
Documentation
-
Developer docs:
- “How to emit a StepAttestation from your service.”
- “How to add new fields without breaking determinism.”
-
Operator docs:
- “How to run replay & verification.”
- “How to interpret failures and debug.”
-
DoD
- All new functionality covered by automated tests.
- Observability dashboards / alerts configured.
- Rollout phases defined with clear criteria for moving to the next phase.
5. How to turn this into tickets
You can break this down roughly like:
- Epic 1: Attestation core library (WS1 + WS2 + WS3).
- Epic 2: Stage integrations (WS4A–D).
- Epic 3: Replay & verification tooling (WS5 + WS6).
- Epic 4: Testing, observability, rollout (WS7).
If you want, next step I can:
- Turn each epic into Jira-style stories with acceptance criteria.
- Or produce sample code stubs (interfaces + minimal implementations) matching this plan.