Files
git.stella-ops.org/docs/product-advisories/25-Nov-2025 - Revisiting Determinism in SBOM→VEX Pipeline.md
master e950474a77
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
up
2025-11-27 15:16:31 +02:00

23 KiB
Raw Blame History

Heres a crisp, practical idea to harden StellaOps: make the SBOM → VEX pipeline deterministic and verifiable by treating it as a series of signed, hashanchored state transitions—so every rebuild yields the same provenance envelope you can mathematically check across airgapped nodes.


What this means (plain English)

  • SBOM (whats inside): list of packages, files, and their hashes.
  • VEX (whats affected): statements like “CVE20241234 is not exploitable here because X.”
  • Deterministic: same inputs → byteidentical outputs, every time.
  • Verifiable transitions: each step (ingest → normalize → resolve → reachability → VEX) emits a signed attestation that pins its inputs/outputs by content hash.

Minimal design you can drop into StellaOps

  1. Canonicalize everything

    • Sort JSON keys, normalize whitespace/line endings.
    • Freeze timestamps by recording them only in an outer envelope (not inside payloads used for hashing).
  2. Edgelevel attestations

    • For each dependency edge in the reachability graph (nodeA → nodeB via symbol S), emit a tiny DSSE payload:

      • {edge_id, from_purl, to_purl, rule_id, witness_hashes[]}
    • Hash is over the canonical payload; sign via DSSE (Sigstore or your Authority PKI).

  3. Step attestations (pipeline states)

    • For each stage (Sbomer, Scanner, Vexer/Excititor, Concelier):

      • Emit predicateType: stellaops.dev/attestations/<stage>
      • Include input_digests[], output_digests[], parameters_digest, tool_version
      • Sign with stage key; record the public key (or cert chain) in Authority.
  4. Provenance envelope

    • Build a toplevel DSSE that includes:

      • Merkle root of all edge attestations.
      • Merkle roots of each stages outputs.
      • Mapping table of PURL ↔ buildID (ELF/PE/MachO) for stable identity.
  5. Replay manifest

    • A single, declarative file that pins:

      • Feeds (CPE/CVE/VEX sources + exact digests)
      • Rule/lattice versions and parameters
      • Container images + layers SHA256
      • Platform toggles (e.g., PQC on/off)
    • Running replay on this manifest must reproduce the same Merkle roots.

  6. Airgap sync

    • Export only the envelopes + Merkle roots + public certs.
    • On the target, verify chains and recompute roots from the replay manifest—no internet required.

Slim C# shapes (DTOs) for DSSE predicates

public record EdgeAttestation(
    string EdgeId,
    string FromPurl,
    string ToPurl,
    string RuleId,
    string[] WitnessHashes, // e.g., CFG slice, symbol tables, lineage JSON
    string CanonicalAlgo = "SHA256");

public record StepAttestation(
    string Stage,               // "Sbomer" | "Scanner" | "Excititor" | "Concelier"
    string ToolVersion,
    string[] InputDigests,
    string[] OutputDigests,
    string ParametersDigest,    // hash of canonicalized params
    DateTimeOffset StartedAt,
    DateTimeOffset FinishedAt);

public record ProvenanceEnvelope(
    string ReplayManifestDigest,
    string EdgeMerkleRoot,
    Dictionary<string,string> StageMerkleRoots, // stage -> root
    Dictionary<string,string> PurlToBuildId);   // stable identity map

Determinism checklist (quick win)

  • Canonical JSON (stable key order) everywhere.
  • No wallclock timestamps inside hashed payloads.
  • Only reference inputs by digest, never by URL.
  • Lock rule sets / lattice policies by digest.
  • Normalize file paths (POSIX style) and line endings.
  • Container images by digest, not tags.

Why its worth it

  • Auditability: every VEX claim is backed by a verifiable graph path with signed edges.
  • Reproducibility: regulators (and customers) can replay your exact scan and get identical roots.
  • Integrity at scale: airgapped sites can validate without trusting your network—just the math.

If you want, Ill turn this into readytopaste .proto contracts + a small .NET library (StellaOps.Attestations) with DSSE signing/verification helpers and Merkle builders. Got it — lets turn that sketch into a concrete implementation plan your devs can actually execute.

Ill structure this as:

  1. Objectives & scope
  2. High-level architecture
  3. Workstreams & milestones
  4. Detailed tasks per workstream
  5. Rollout, testing, and ops

You can copy/paste this straight into a tracking system and break it into tickets.


1. Objectives & scope

Primary objectives

  • Make the SBOM → VEX pipeline deterministic:

    • Same inputs (SBOM, feeds, rules, images) → bitidentical provenance & VEX outputs.
  • Make the pipeline verifiable:

    • Each step emits signed attestations with content hashes.
    • Attestations are chainable from raw SBOM to VEX & reports.
  • Make outputs replayable and airgap friendly:

    • A single Replay Manifest can reconstruct pipeline outputs on another node and verify Merkle roots match.

Out of scope (for this phase)

  • New vulnerability scanning engines.
  • New UI views (beyond minimal “show provenance / verify”).
  • Key management redesign (well integrate with existing Authority / PKI).

2. High-level architecture

New shared library

Library name (example): StellaOps.Attestations (or similar)

Provides:

  • Canonical serialization:

    • Deterministic JSON encoder (stable key ordering, normalized formatting).
  • Hashing utilities:

    • SHA256 (and extension point for future algorithms).
  • DSSE wrapper:

    • Sign(payload, keyRef) → DSSE envelope.
    • Verify(dsse, keyResolver) → payload + key metadata.
  • Merkle utilities:

    • Build Merkle trees from lists of digests.
  • DTOs:

    • EdgeAttestation, StepAttestation, ProvenanceEnvelope, ReplayManifest.

Components that will integrate the library

  • Sbomer outputs SBOM + StepAttestation.
  • Scanner consumes SBOM, produces findings + StepAttestation.
  • Excititor / Vexer takes findings + reachability graph → VEX + EdgeAttestations + StepAttestation.
  • Concelier takes SBOM + VEX → reports + StepAttestation + ProvenanceEnvelope.
  • Authority manages keys and verification (possibly separate microservice or shared module).

3. Workstreams & milestones

Break this into parallel workstreams:

  1. WS1 Canonicalization & hashing
  2. WS2 DSSE & key integration
  3. WS3 Attestation schemas & Merkle envelopes
  4. WS4 Pipeline integration (Sbomer, Scanner, Excititor, Concelier)
  5. WS5 Replay engine & CLI
  6. WS6 Verification / airgap support
  7. WS7 Testing, observability, and rollout

Each workstream below has concrete tasks + “Definition of Done” (DoD).


4. Detailed tasks per workstream

WS1 Canonicalization & hashing

Goal: A small, well-tested core that makes everything deterministic.

Tasks

  1. Define canonical JSON format

    • Decision doc:

      • Use UTF8.
      • No insignificant whitespace.
      • Keys always sorted lexicographically.
      • No embedded timestamps or non-deterministic fields inside hashed payloads.
    • Implement:

      • CanonicalJsonSerializer.Serialize<T>(T value) : string/byte[].
  2. Define deterministic string normalization rules

    • Normalize line endings in any text: \n only.

    • Normalize paths:

      • Use POSIX style /.
      • Remove trailing slashes (except root).
    • Normalize numeric formatting:

      • No scientific notation.
      • Fixed decimal rules, if relevant.
  3. Implement hashing helper

    • Digest type:

      public record Digest(string Algorithm, string Value); // Algorithm = "SHA256"
      
    • Hashing.ComputeDigest(byte[] data) : Digest.

    • Hashing.ComputeDigestCanonical<T>(T value) : Digest (serialize canonically then hash).

  4. Add unit tests & golden files

    • Golden tests:

      • Same input object → same canonical JSON & digest, regardless of property order, culture, runtime.
      • Hash of JSON must match precomputed values (store .golden files in repo).
    • Edge cases:

      • Unicode strings.
      • Nested objects.
      • Arrays with different order (order preserved, but ensure same input → same output).

DoD

  • Canonical serializer & hashing utilities available in StellaOps.Attestations.

  • Test suite with >95% coverage for serializer + hashing.

  • Simple CLI or test harness:

    • stella-attest dump-canonical <json> → prints canonical JSON & digest.

WS2 DSSE & key integration

Goal: Standardize how we sign and verify attestations.

Tasks

  1. Select DSSE representation

    • Use JSON DSSE envelope:

      {
        "payloadType": "stellaops.dev/attestation/edge@v1",
        "payload": "<base64 of canonical JSON>",
        "signatures": [{ "keyid": "...", "sig": "..." }]
      }
      
  2. Implement DSSE API in library

    • Interfaces:

      public interface ISigner {
          Task<Signature> SignAsync(byte[] payload, string keyRef);
      }
      
      public interface IVerifier {
          Task<VerificationResult> VerifyAsync(Envelope envelope);
      }
      
    • Helpers:

      • Dsse.CreateEnvelope(payloadType, canonicalPayloadBytes, signer, keyRef).
      • Dsse.VerifyEnvelope(envelope, verifier).
  3. Integrate with Authority / PKI

    • Add AuthoritySigner / AuthorityVerifier implementations:

      • keyRef is an ID understood by Authority (service name, stage name, or explicit key ID).

      • Ensure we can:

        • Request signing of arbitrary bytes.
        • Resolve the public key used to sign.
  4. Key usage conventions

    • Define mapping:

      • sbomer key.
      • scanner key.
      • excititor key.
      • concelier key.
    • Optional: use distinct keys per environment (dev/stage/prod) but include environment in attestation metadata.

  5. Tests

    • Round-trip: sign then verify sample payloads.

    • Negative tests:

      • Tampered payload → verification fails.
      • Tampered signatures → verification fails.

DoD

  • DSSE envelope creation/verification implemented and tested.

  • Authority integration with mock/fake for unit tests.

  • Documentation for developers:

    • “How to emit an attestation: 5line example.”

WS3 Attestation schemas & Merkle envelopes

Goal: Standardize the data models for all attestations and envelopes.

Tasks

  1. Define EdgeAttestation schema

    Fields (concrete draft):

    public record EdgeAttestation(
        string EdgeId,              // deterministic ID
        string FromPurl,            // e.g. pkg:maven/...
        string ToPurl,
        string? FromSymbol,         // optional (symbol, API, entry point)
        string? ToSymbol,
        string RuleId,              // which reachability rule fired
        Digest[] WitnessDigests,    // digests of evidence payloads
        string CanonicalAlgo = "SHA256"
    );
    
    • EdgeId convention (document in ADR):

      • E.g. sha256(fromPurl + "→" + toPurl + "|" + ruleId + "|" + fromSymbol + "|" + toSymbol) (before hashing, canonicalize strings).
  2. Define StepAttestation schema

    public record StepAttestation(
        string Stage,               // "Sbomer" | "Scanner" | ...
        string ToolVersion,
        Digest[] InputDigests,      // SBOM digest, feed digests, image digests
        Digest[] OutputDigests,     // outputs of this stage
        Digest ParametersDigest,    // hash of canonicalized params (flags, rule sets, etc.)
        DateTimeOffset StartedAt,
        DateTimeOffset FinishedAt,
        string Environment,         // dev/stage/prod/airgap
        string NodeId               // machine or logical node name
    );
    
    • Note: StartedAt / FinishedAt are not included in any hashed payload used for determinism; theyre OK as metadata but not part of Merkle roots.
  3. Define ProvenanceEnvelope schema

    public record ProvenanceEnvelope(
        Digest ReplayManifestDigest,
        Digest EdgeMerkleRoot,
        Dictionary<string, Digest> StageMerkleRoots, // stage -> root digest
        Dictionary<string, string> PurlToBuildId     // PURL -> build-id string
    );
    
  4. Define ReplayManifest schema

    public record ReplayManifest(
        string PipelineVersion,
        Digest SbomDigest,
        Digest[] FeedDigests,       // CVE, CPE, VEX sources
        Digest[] RuleSetDigests,    // reachability + policy rules
        Digest[] ContainerImageDigests,
        string[] PlatformToggles    // e.g. ["pqc=on", "mode=strict"]
    );
    
  5. Implement Merkle utilities

    • Provide:

      • Digest Merkle.BuildRoot(IEnumerable<Digest> leaves).

      • Deterministic rules:

        • Sort leaves by Value (digest hex string) before building.
        • If odd number of leaves, duplicate last leaf or define explicit strategy and document it.
    • Tie into:

      • Edges → EdgeMerkleRoot.
      • Per stage attestation list → stagespecific root.
  6. Schema documentation

    • Markdown/ADR file:

      • Field definitions.
      • Which fields are hashed vs. metadata only.
      • How EdgeId, Merkle roots, and PURL→BuildId mapping are generated.

DoD

  • DTOs implemented in shared library.
  • Merkle root builder implemented and tested.
  • Schema documented and shared across teams.

WS4 Pipeline integration

Goal: Each stage emits StepAttestations and (for reachability) EdgeAttestations, and Concelier emits ProvenanceEnvelope.

Well do this stage by stage.

WS4.A Sbomer integration

Tasks

  1. Identify SBOM hash:

    • After generating SBOM, serialize canonically and compute Digest.
  2. Collect inputs:

    • Input sources digests (e.g., image digests, source artifact digests).
  3. Collect parameters:

    • All relevant configuration into a SbomerParams object:

      • E.g. scanDepth, excludedPaths, sbomFormat.
    • Canonicalize and compute ParametersDigest.

  4. Emit StepAttestation:

    • Create DTO.

    • Canonicalize & hash for Merkle tree use.

    • Wrap in DSSE envelope with payloadType = "stellaops.dev/attestation/step@v1".

    • Store envelope:

      • Append to standard location (e.g. <artifact-root>/attestations/sbomer-step.dsse.json).
  5. Add config flag:

    • --emit-attestations (default: off initially, later: on by default).

WS4.B Scanner integration

Tasks

  1. Take SBOM digest as an InputDigest.

  2. Collect feed digests:

    • Each CVE/CPE/VEX feed file → canonical hash.
  3. Compute ScannerParams digest:

    • E.g. severityThreshold, downloaderOptions, scanMode.
  4. Emit StepAttestation (same pattern as Sbomer).

  5. Tag scanner outputs:

    • The vulnerability findings file(s) should be contentaddressable (filename includes digest or store meta manifest mapping).

WS4.C Excititor/Vexer integration

Tasks

  1. Integrate reachability graph emission:

    • From final graph, generate EdgeAttestations:

      • One per edge (from, to, rule).

      • For each edge, compute witness digests:

        • E.g. serialized CFG slice, symbol table snippet, call chain.

        • Those witness artifacts should be stored under canonical paths:

          • <artifact-root>/witnesses/<edge-id>/<witness-type>.json.
  2. Canonicalize & hash each EdgeAttestation.

  3. Build Merkle root over all edge attestation digests.

  4. Emit Excititor StepAttestation:

    • Inputs: SBOM, scanner findings, feeds, rule sets.
    • Outputs: VEX document(s), EdgeMerkleRoot digest.
    • Params: reachability flags, rule definitions digest.
  5. Store:

    • Edge attestations:

      • Either:

        • One DSSE per edge (possibly a lot of files).
        • Or a batch file containing a list of attestations wrapped into a single DSSE.
      • Prefer: batch for performance; define EdgeAttestationBatch DTO.

    • VEX output(s) with deterministic file naming.

WS4.D Concelier integration

Tasks

  1. Gather all StepAttestations & EdgeMerkleRoot:

    • Input: references (paths) to stage outputs + their DSSE envelopes.
  2. Build PurlToBuildId map:

    • For each component:

      • Extract PURL from SBOM.
      • Extract build-id from binary metadata.
  3. Build StageMerkleRoots:

    • For each stage, compute Merkle root of its StepAttestations.
    • In simplest version: 1 step attestation per stage → root is just its digest.
  4. Construct ReplayManifest:

    • From final pipeline context (SBOM, feeds, rules, images, toggles).
    • Compute ReplayManifestDigest and store manifest file (e.g. replay-manifest.json).
  5. Construct ProvenanceEnvelope:

    • Fill fields with digests.
    • Canonicalize and sign with Concelier key (DSSE).
  6. Store outputs:

    • provenance-envelope.dsse.json.
    • replay-manifest.json (unsigned) + optional signed manifest.

WS4 DoD

  • All four stages can:

    • Emit StepAttestations (and EdgeAttestations where applicable).
    • Produce a final ProvenanceEnvelope.
  • Feature can be toggled via config.

  • Pipelines run endtoend in CI with attestation emission enabled.


WS5 Replay engine & CLI

Goal: Given a ReplayManifest, rerun the pipeline and verify that all Merkle roots and digests match.

Tasks

  1. Implement a Replay Orchestrator library:

    • Input:

      • Path/URL to replay-manifest.json.
    • Responsibilities:

      • Verify manifests own digest (if signed).

      • Fetch or confirm presence of:

        • SBOM.
        • Feeds.
        • Rule sets.
        • Container images.
      • Spin up each stage with parameters reconstructed from the manifest:

        • Ensure versions and flags match.
    • Implementation: shared orchestration code reusing existing pipeline entrypoints.

  2. Implement CLI tool: stella-attest replay

    • Commands:

      • stella-attest replay run --manifest <path> --out <dir>.

        • Runs pipeline and emits fresh attestations.
      • stella-attest replay verify --manifest <path> --envelope <path> --attest-dir <dir>:

        • Compares:

          • Replay Merkle roots vs. ProvenanceEnvelope.
          • Stage roots.
          • Edge root.
        • Emits a verification report (JSON + human-readable).

  3. Verification logic:

    • Steps:

      1. Parse ProvenanceEnvelope (verify DSSE signature).

      2. Compute Merkle roots from the new replays attestations.

      3. Compare:

        • ReplayManifestDigest in envelope vs digest of manifest used.
        • EdgeMerkleRoot vs recalculated root.
        • StageMerkleRoots[stage] vs recalculated stage roots.
      4. Output:

        • verified = true/false.
        • If false, list mismatches with digests.
  4. Tests:

    • Replay the same pipeline on same machine → must match.
    • Replay on different machine (CI job simulating different environment) → must match.
    • Injected change in feed or rule set → deliberate mismatch detected.

DoD

  • stella-attest replay works locally and in CI.
  • Documentation: “How to replay a run and verify determinism.”

WS6 Verification / airgap support

Goal: Allow verification in environments without outward network access.

Tasks

  1. Define export bundle format

    • Bundle includes:

      • provenance-envelope.dsse.json.
      • replay-manifest.json.
      • All DSSE attestation files.
      • All witness artifacts (or digests only if storage is local).
      • Public key material or certificate chains needed to verify signatures.
    • Represent as:

      • Tarball or zip: e.g. stella-bundle-<pipeline-id>.tar.gz.
      • Manifest file listing contents and digests.
  2. Implement exporter

    • CLI: stella-attest export --run-id <id> --out bundle.tar.gz.

    • Internally:

      • Collect paths to all relevant artifacts for the run.
      • Canonicalize folder structure (e.g. /sbom, /scanner, /vex, /attestations, /witnesses).
  3. Implement offline verifier

    • CLI: stella-attest verify-bundle --bundle <path>.

    • Steps:

      • Unpack bundle to temp dir.

      • Verify:

        • Attestation signatures via included public keys.
        • Merkle roots and digests as in WS5.
      • Do not attempt network calls.

  4. Documentation / runbook

    • “How to verify a Stella Ops run in an airgapped environment.”

    • Include:

      • How to move bundles (e.g. via USB, secure file transfer).
      • What to do if verification fails.

DoD

  • Bundles can be exported from a connected environment and verified in a disconnected environment using only the bundle contents.

WS7 Testing, observability, and rollout

Goal: Make this robust, observable, and gradually enable in prod.

Tasks

  1. Integration tests

    • Full pipeline scenario:

      • Start from known SBOM + feeds + rules.

      • Run pipeline twice and:

        • Compare final outputs: ProvenanceEnvelope, VEX doc, final reports.
        • Compare digests & Merkle roots.
    • Edge cases:

      • Different machines (simulate via CI jobs with different runners).
      • Missing or corrupted attestation file → verify that verification fails with clear error.
  2. Property-based tests (optional but great)

    • Generate random but structured SBOMs and graphs.

    • Ensure:

      • Canonicalization is idempotent.
      • Hashing is consistent.
      • Merkle roots are stable for repeated runs.
  3. Observability

    • Add logging around:

      • Attestation creation & signing.
      • Verification failures.
      • Replay runs.
    • Add metrics:

      • Number of attestations per run.
      • Time spent in canonicalization / hashing / signing.
      • Verification success/fail counts.
  4. Rollout plan

    1. Phase 0 (dev only):

      • Attestation emission enabled by default in dev.
      • Verification run in CI only.
    2. Phase 1 (staging):

      • Enable dualpath:

        • Old behaviour + new attestations.
      • Run replay+verify in staging pipeline.

    3. Phase 2 (production, nonenforced):

      • Enable attestation emission in prod.
      • Verification runs “sidecar” but does not block.
    4. Phase 3 (production, enforced):

      • CI/CD gates:

        • Fails if:

          • Signatures invalid.
          • Merkle roots mismatch.
          • Envelope/manifest missing.
  5. Documentation

    • Developer docs:

      • “How to emit a StepAttestation from your service.”
      • “How to add new fields without breaking determinism.”
    • Operator docs:

      • “How to run replay & verification.”
      • “How to interpret failures and debug.”

DoD

  • All new functionality covered by automated tests.
  • Observability dashboards / alerts configured.
  • Rollout phases defined with clear criteria for moving to the next phase.

5. How to turn this into tickets

You can break this down roughly like:

  • Epic 1: Attestation core library (WS1 + WS2 + WS3).
  • Epic 2: Stage integrations (WS4AD).
  • Epic 3: Replay & verification tooling (WS5 + WS6).
  • Epic 4: Testing, observability, rollout (WS7).

If you want, next step I can:

  • Turn each epic into Jira-style stories with acceptance criteria.
  • Or produce sample code stubs (interfaces + minimal implementations) matching this plan.