Here’s a crisp, practical idea to harden Stella Ops: make the SBOM → VEX pipeline **deterministic and verifiable** by treating it as a series of signed, hash‑anchored state transitions—so every rebuild yields the *same* provenance envelope you can mathematically check across air‑gapped nodes. --- ### What this means (plain English) * **SBOM** (what’s inside): list of packages, files, and their hashes. * **VEX** (what’s affected): statements like “CVE‑2024‑1234 is **not** exploitable here because X.” * **Deterministic**: same inputs → byte‑identical outputs, every time. * **Verifiable transitions**: each step (ingest → normalize → resolve → reachability → VEX) emits a signed attestation that pins its inputs/outputs by content hash. --- ### Minimal design you can drop into Stella Ops 1. **Canonicalize everything** * Sort JSON keys, normalize whitespace/line endings. * Freeze timestamps by recording them only in an outer envelope (not inside payloads used for hashing). 2. **Edge‑level attestations** * For each dependency edge in the reachability graph `(nodeA → nodeB via symbol S)`, emit a tiny DSSE payload: * `{edge_id, from_purl, to_purl, rule_id, witness_hashes[]}` * Hash is over the canonical payload; sign via DSSE (Sigstore or your Authority PKI). 3. **Step attestations (pipeline states)** * For each stage (`Sbomer`, `Scanner`, `Vexer/Excititor`, `Concelier`): * Emit `predicateType`: `stellaops.dev/attestations/` * Include `input_digests[]`, `output_digests[]`, `parameters_digest`, `tool_version` * Sign with stage key; record the public key (or cert chain) in Authority. 4. **Provenance envelope** * Build a top‑level DSSE that includes: * Merkle root of **all** edge attestations. * Merkle roots of each stage’s outputs. * Mapping table of `PURL ↔ build‑ID (ELF/PE/Mach‑O)` for stable identity. 5. **Replay manifest** * A single, declarative file that pins: * Feeds (CPE/CVE/VEX sources + exact digests) * Rule/lattice versions and parameters * Container images + layers’ SHA256 * Platform toggles (e.g., PQC on/off) * Running **replay** on this manifest must reproduce the same Merkle roots. 6. **Air‑gap sync** * Export only the envelopes + Merkle roots + public certs. * On the target, verify chains and recompute roots from the replay manifest—no internet required. --- ### Slim C# shapes (DTOs) for DSSE predicates ```csharp public record EdgeAttestation( string EdgeId, string FromPurl, string ToPurl, string RuleId, string[] WitnessHashes, // e.g., CFG slice, symbol tables, lineage JSON string CanonicalAlgo = "SHA256"); public record StepAttestation( string Stage, // "Sbomer" | "Scanner" | "Excititor" | "Concelier" string ToolVersion, string[] InputDigests, string[] OutputDigests, string ParametersDigest, // hash of canonicalized params DateTimeOffset StartedAt, DateTimeOffset FinishedAt); public record ProvenanceEnvelope( string ReplayManifestDigest, string EdgeMerkleRoot, Dictionary StageMerkleRoots, // stage -> root Dictionary PurlToBuildId); // stable identity map ``` --- ### Determinism checklist (quick win) * Canonical JSON (stable key order) everywhere. * No wall‑clock timestamps inside hashed payloads. * Only reference inputs by digest, never by URL. * Lock rule sets / lattice policies by digest. * Normalize file paths (POSIX style) and line endings. * Container images by **digest**, not tags. --- ### Why it’s worth it * **Auditability:** every VEX claim is backed by a verifiable graph path with signed edges. * **Reproducibility:** regulators (and customers) can replay your exact scan and get identical roots. * **Integrity at scale:** air‑gapped sites can validate without trusting your network—just the math. If you want, I’ll turn this into ready‑to‑paste `.proto` contracts + a small .NET library (`StellaOps.Attestations`) with DSSE signing/verification helpers and Merkle builders. Got it — let’s turn that sketch into a concrete implementation plan your devs can actually execute. I’ll structure this as: 1. **Objectives & scope** 2. **High-level architecture** 3. **Workstreams & milestones** 4. **Detailed tasks per workstream** 5. **Rollout, testing, and ops** You can copy/paste this straight into a tracking system and break it into tickets. --- ## 1. Objectives & scope ### Primary objectives * Make the **SBOM → VEX pipeline deterministic**: * Same inputs (SBOM, feeds, rules, images) → **bit‑identical** provenance & VEX outputs. * Make the pipeline **verifiable**: * Each step emits **signed attestations** with content hashes. * Attestations are **chainable** from raw SBOM to VEX & reports. * Make outputs **replayable** and **air‑gap friendly**: * A single **Replay Manifest** can reconstruct pipeline outputs on another node and verify Merkle roots match. ### Out of scope (for this phase) * New vulnerability scanning engines. * New UI views (beyond minimal “show provenance / verify”). * Key management redesign (we’ll integrate with existing Authority / PKI). --- ## 2. High-level architecture ### New shared library **Library name (example):** `StellaOps.Attestations` (or similar) Provides: * Canonical serialization: * Deterministic JSON encoder (stable key ordering, normalized formatting). * Hashing utilities: * SHA‑256 (and extension point for future algorithms). * DSSE wrapper: * `Sign(payload, keyRef)` → DSSE envelope. * `Verify(dsse, keyResolver)` → payload + key metadata. * Merkle utilities: * Build Merkle trees from lists of digests. * DTOs: * `EdgeAttestation`, `StepAttestation`, `ProvenanceEnvelope`, `ReplayManifest`. ### Components that will integrate the library * **Sbomer** – outputs SBOM + StepAttestation. * **Scanner** – consumes SBOM, produces findings + StepAttestation. * **Excititor / Vexer** – takes findings + reachability graph → VEX + EdgeAttestations + StepAttestation. * **Concelier** – takes SBOM + VEX → reports + StepAttestation + ProvenanceEnvelope. * **Authority** – manages keys and verification (possibly separate microservice or shared module). --- ## 3. Workstreams & milestones Break this into parallel workstreams: 1. **WS1 – Canonicalization & hashing** 2. **WS2 – DSSE & key integration** 3. **WS3 – Attestation schemas & Merkle envelopes** 4. **WS4 – Pipeline integration (Sbomer, Scanner, Excititor, Concelier)** 5. **WS5 – Replay engine & CLI** 6. **WS6 – Verification / air‑gap support** 7. **WS7 – Testing, observability, and rollout** Each workstream below has concrete tasks + “Definition of Done” (DoD). --- ## 4. Detailed tasks per workstream ### WS1 – Canonicalization & hashing **Goal:** A small, well-tested core that makes everything deterministic. #### Tasks 1. **Define canonical JSON format** * Decision doc: * Use UTF‑8. * No insignificant whitespace. * Keys always sorted lexicographically. * No embedded timestamps or non-deterministic fields inside hashed payloads. * Implement: * `CanonicalJsonSerializer.Serialize(T value) : string/byte[]`. 2. **Define deterministic string normalization rules** * Normalize line endings in any text: `\n` only. * Normalize paths: * Use POSIX style `/`. * Remove trailing slashes (except root). * Normalize numeric formatting: * No scientific notation. * Fixed decimal rules, if relevant. 3. **Implement hashing helper** * `Digest` type: ```csharp public record Digest(string Algorithm, string Value); // Algorithm = "SHA256" ``` * `Hashing.ComputeDigest(byte[] data) : Digest`. * `Hashing.ComputeDigestCanonical(T value) : Digest` (serialize canonically then hash). 4. **Add unit tests & golden files** * Golden tests: * Same input object → same canonical JSON & digest, regardless of property order, culture, runtime. * Hash of JSON must match pre‑computed values (store `.golden` files in repo). * Edge cases: * Unicode strings. * Nested objects. * Arrays with different order (order preserved, but ensure same input → same output). #### DoD * Canonical serializer & hashing utilities available in `StellaOps.Attestations`. * Test suite with >95% coverage for serializer + hashing. * Simple CLI or test harness: * `stella-attest dump-canonical ` → prints canonical JSON & digest. --- ### WS2 – DSSE & key integration **Goal:** Standardize how we sign and verify attestations. #### Tasks 1. **Select DSSE representation** * Use JSON DSSE envelope: ```json { "payloadType": "stellaops.dev/attestation/edge@v1", "payload": "", "signatures": [{ "keyid": "...", "sig": "..." }] } ``` 2. **Implement DSSE API in library** * Interfaces: ```csharp public interface ISigner { Task SignAsync(byte[] payload, string keyRef); } public interface IVerifier { Task VerifyAsync(Envelope envelope); } ``` * Helpers: * `Dsse.CreateEnvelope(payloadType, canonicalPayloadBytes, signer, keyRef)`. * `Dsse.VerifyEnvelope(envelope, verifier)`. 3. **Integrate with Authority / PKI** * Add `AuthoritySigner` / `AuthorityVerifier` implementations: * `keyRef` is an ID understood by Authority (service name, stage name, or explicit key ID). * Ensure we can: * Request signing of arbitrary bytes. * Resolve the public key used to sign. 4. **Key usage conventions** * Define mapping: * `sbomer` key. * `scanner` key. * `excititor` key. * `concelier` key. * Optional: use distinct keys per environment (dev/stage/prod) but **include environment** in attestation metadata. 5. **Tests** * Round-trip: sign then verify sample payloads. * Negative tests: * Tampered payload → verification fails. * Tampered signatures → verification fails. #### DoD * DSSE envelope creation/verification implemented and tested. * Authority integration with mock/fake for unit tests. * Documentation for developers: * “How to emit an attestation: 5‑line example.” --- ### WS3 – Attestation schemas & Merkle envelopes **Goal:** Standardize the data models for all attestations and envelopes. #### Tasks 1. **Define EdgeAttestation schema** Fields (concrete draft): ```csharp public record EdgeAttestation( string EdgeId, // deterministic ID string FromPurl, // e.g. pkg:maven/... string ToPurl, string? FromSymbol, // optional (symbol, API, entry point) string? ToSymbol, string RuleId, // which reachability rule fired Digest[] WitnessDigests, // digests of evidence payloads string CanonicalAlgo = "SHA256" ); ``` * `EdgeId` convention (document in ADR): * E.g. `sha256(fromPurl + "→" + toPurl + "|" + ruleId + "|" + fromSymbol + "|" + toSymbol)` (before hashing, canonicalize strings). 2. **Define StepAttestation schema** ```csharp public record StepAttestation( string Stage, // "Sbomer" | "Scanner" | ... string ToolVersion, Digest[] InputDigests, // SBOM digest, feed digests, image digests Digest[] OutputDigests, // outputs of this stage Digest ParametersDigest, // hash of canonicalized params (flags, rule sets, etc.) DateTimeOffset StartedAt, DateTimeOffset FinishedAt, string Environment, // dev/stage/prod/airgap string NodeId // machine or logical node name ); ``` * Note: `StartedAt` / `FinishedAt` are **not** included in any hashed payload used for determinism; they’re OK as metadata but not part of Merkle roots. 3. **Define ProvenanceEnvelope schema** ```csharp public record ProvenanceEnvelope( Digest ReplayManifestDigest, Digest EdgeMerkleRoot, Dictionary StageMerkleRoots, // stage -> root digest Dictionary PurlToBuildId // PURL -> build-id string ); ``` 4. **Define ReplayManifest schema** ```csharp public record ReplayManifest( string PipelineVersion, Digest SbomDigest, Digest[] FeedDigests, // CVE, CPE, VEX sources Digest[] RuleSetDigests, // reachability + policy rules Digest[] ContainerImageDigests, string[] PlatformToggles // e.g. ["pqc=on", "mode=strict"] ); ``` 5. **Implement Merkle utilities** * Provide: * `Digest Merkle.BuildRoot(IEnumerable leaves)`. * Deterministic rules: * Sort leaves by `Value` (digest hex string) before building. * If odd number of leaves, duplicate last leaf or define explicit strategy and document it. * Tie into: * Edges → `EdgeMerkleRoot`. * Per stage attestation list → stage‑specific root. 6. **Schema documentation** * Markdown/ADR file: * Field definitions. * Which fields are hashed vs. metadata only. * How `EdgeId`, Merkle roots, and PURL→BuildId mapping are generated. #### DoD * DTOs implemented in shared library. * Merkle root builder implemented and tested. * Schema documented and shared across teams. --- ### WS4 – Pipeline integration **Goal:** Each stage emits StepAttestations and (for reachability) EdgeAttestations, and Concelier emits ProvenanceEnvelope. We’ll do this stage by stage. #### WS4.A – Sbomer integration **Tasks** 1. Identify **SBOM hash**: * After generating SBOM, serialize canonically and compute `Digest`. 2. Collect **inputs**: * Input sources digests (e.g., image digests, source artifact digests). 3. Collect **parameters**: * All relevant configuration into a `SbomerParams` object: * E.g. `scanDepth`, `excludedPaths`, `sbomFormat`. * Canonicalize and compute `ParametersDigest`. 4. Emit **StepAttestation**: * Create DTO. * Canonicalize & hash for Merkle tree use. * Wrap in DSSE envelope with `payloadType = "stellaops.dev/attestation/step@v1"`. * Store envelope: * Append to standard location (e.g. `/attestations/sbomer-step.dsse.json`). 5. Add config flag: * `--emit-attestations` (default: off initially, later: on by default). #### WS4.B – Scanner integration **Tasks** 1. Take SBOM digest as an **InputDigest**. 2. Collect feed digests: * Each CVE/CPE/VEX feed file → canonical hash. 3. Compute `ScannerParams` digest: * E.g. `severityThreshold`, `downloaderOptions`, `scanMode`. 4. Emit **StepAttestation** (same pattern as Sbomer). 5. Tag scanner outputs: * The vulnerability findings file(s) should be content‑addressable (filename includes digest or store meta manifest mapping). #### WS4.C – Excititor/Vexer integration **Tasks** 1. Integrate reachability graph emission: * From final graph, **generate EdgeAttestations**: * One per edge `(from, to, rule)`. * For each edge, compute witness digests: * E.g. serialized CFG slice, symbol table snippet, call chain. * Those witness artifacts should be stored under canonical paths: * `/witnesses//.json`. 2. Canonicalize & hash each EdgeAttestation. 3. Build **Merkle root** over all edge attestation digests. 4. Emit **Excititor StepAttestation**: * Inputs: SBOM, scanner findings, feeds, rule sets. * Outputs: VEX document(s), EdgeMerkleRoot digest. * Params: reachability flags, rule definitions digest. 5. Store: * Edge attestations: * Either: * One DSSE per edge (possibly a lot of files). * Or a **batch file** containing a list of attestations wrapped into a single DSSE. * Prefer: **batch** for performance; define `EdgeAttestationBatch` DTO. * VEX output(s) with deterministic file naming. #### WS4.D – Concelier integration **Tasks** 1. Gather all **StepAttestations** & **EdgeMerkleRoot**: * Input: references (paths) to stage outputs + their DSSE envelopes. 2. Build `PurlToBuildId` map: * For each component: * Extract PURL from SBOM. * Extract build-id from binary metadata. 3. Build **StageMerkleRoots**: * For each stage, compute Merkle root of its StepAttestations. * In simplest version: 1 step attestation per stage → root is just its digest. 4. Construct **ReplayManifest**: * From final pipeline context (SBOM, feeds, rules, images, toggles). * Compute `ReplayManifestDigest` and store manifest file (e.g. `replay-manifest.json`). 5. Construct **ProvenanceEnvelope**: * Fill fields with digests. * Canonicalize and sign with Concelier key (DSSE). 6. Store outputs: * `provenance-envelope.dsse.json`. * `replay-manifest.json` (unsigned) + optional signed manifest. #### WS4 DoD * All four stages can: * Emit StepAttestations (and EdgeAttestations where applicable). * Produce a final ProvenanceEnvelope. * Feature can be toggled via config. * Pipelines run end‑to‑end in CI with attestation emission enabled. --- ### WS5 – Replay engine & CLI **Goal:** Given a ReplayManifest, re‑run the pipeline and verify that all Merkle roots and digests match. #### Tasks 1. Implement a **Replay Orchestrator** library: * Input: * Path/URL to `replay-manifest.json`. * Responsibilities: * Verify manifest’s own digest (if signed). * Fetch or confirm presence of: * SBOM. * Feeds. * Rule sets. * Container images. * Spin up each stage with parameters reconstructed from the manifest: * Ensure versions and flags match. * Implementation: shared orchestration code reusing existing pipeline entrypoints. 2. Implement **CLI tool**: `stella-attest replay` * Commands: * `stella-attest replay run --manifest --out `. * Runs pipeline and emits fresh attestations. * `stella-attest replay verify --manifest --envelope --attest-dir `: * Compares: * Replay Merkle roots vs. `ProvenanceEnvelope`. * Stage roots. * Edge root. * Emits a verification report (JSON + human-readable). 3. Verification logic: * Steps: 1. Parse ProvenanceEnvelope (verify DSSE signature). 2. Compute Merkle roots from the new replay’s attestations. 3. Compare: * `ReplayManifestDigest` in envelope vs digest of manifest used. * `EdgeMerkleRoot` vs recalculated root. * `StageMerkleRoots[stage]` vs recalculated stage roots. 4. Output: * `verified = true/false`. * If false, list mismatches with digests. 4. Tests: * Replay the same pipeline on same machine → must match. * Replay on different machine (CI job simulating different environment) → must match. * Injected change in feed or rule set → deliberate mismatch detected. #### DoD * `stella-attest replay` works locally and in CI. * Documentation: “How to replay a run and verify determinism.” --- ### WS6 – Verification / air‑gap support **Goal:** Allow verification in environments without outward network access. #### Tasks 1. **Define export bundle format** * Bundle includes: * `provenance-envelope.dsse.json`. * `replay-manifest.json`. * All DSSE attestation files. * All witness artifacts (or digests only if storage is local). * Public key material or certificate chains needed to verify signatures. * Represent as: * Tarball or zip: e.g. `stella-bundle-.tar.gz`. * Manifest file listing contents and digests. 2. **Implement exporter** * CLI: `stella-attest export --run-id --out bundle.tar.gz`. * Internally: * Collect paths to all relevant artifacts for the run. * Canonicalize folder structure (e.g. `/sbom`, `/scanner`, `/vex`, `/attestations`, `/witnesses`). 3. **Implement offline verifier** * CLI: `stella-attest verify-bundle --bundle `. * Steps: * Unpack bundle to temp dir. * Verify: * Attestation signatures via included public keys. * Merkle roots and digests as in WS5. * Do **not** attempt network calls. 4. **Documentation / runbook** * “How to verify a Stella Ops run in an air‑gapped environment.” * Include: * How to move bundles (e.g. via USB, secure file transfer). * What to do if verification fails. #### DoD * Bundles can be exported from a connected environment and verified in a disconnected environment using only the bundle contents. --- ### WS7 – Testing, observability, and rollout **Goal:** Make this robust, observable, and gradually enable in prod. #### Tasks 1. **Integration tests** * Full pipeline scenario: * Start from known SBOM + feeds + rules. * Run pipeline twice and: * Compare final outputs: `ProvenanceEnvelope`, VEX doc, final reports. * Compare digests & Merkle roots. * Edge cases: * Different machines (simulate via CI jobs with different runners). * Missing or corrupted attestation file → verify that verification fails with clear error. 2. **Property-based tests** (optional but great) * Generate random but structured SBOMs and graphs. * Ensure: * Canonicalization is idempotent. * Hashing is consistent. * Merkle roots are stable for repeated runs. 3. **Observability** * Add logging around: * Attestation creation & signing. * Verification failures. * Replay runs. * Add metrics: * Number of attestations per run. * Time spent in canonicalization / hashing / signing. * Verification success/fail counts. 4. **Rollout plan** 1. **Phase 0 (dev only)**: * Attestation emission enabled by default in dev. * Verification run in CI only. 2. **Phase 1 (staging)**: * Enable dual‑path: * Old behaviour + new attestations. * Run replay+verify in staging pipeline. 3. **Phase 2 (production, non‑enforced)**: * Enable attestation emission in prod. * Verification runs “side‑car” but does not block. 4. **Phase 3 (production, enforced)**: * CI/CD gates: * Fails if: * Signatures invalid. * Merkle roots mismatch. * Envelope/manifest missing. 5. **Documentation** * Developer docs: * “How to emit a StepAttestation from your service.” * “How to add new fields without breaking determinism.” * Operator docs: * “How to run replay & verification.” * “How to interpret failures and debug.” #### DoD * All new functionality covered by automated tests. * Observability dashboards / alerts configured. * Rollout phases defined with clear criteria for moving to the next phase. --- ## 5. How to turn this into tickets You can break this down roughly like: * **Epic 1:** Attestation core library (WS1 + WS2 + WS3). * **Epic 2:** Stage integrations (WS4A–D). * **Epic 3:** Replay & verification tooling (WS5 + WS6). * **Epic 4:** Testing, observability, rollout (WS7). If you want, next step I can: * Turn each epic into **Jira-style stories** with acceptance criteria. * Or produce **sample code stubs** (interfaces + minimal implementations) matching this plan.