Files
git.stella-ops.org/docs-archived/product/advisories/2026-03-04 - Smart‑diff and binary provenance chain.md

6.7 KiB
Raw Permalink Blame History

Heres a compact, practical design for a smartdifference scanner that produces tiny, verifiable binary deltas and plugs cleanly into a release/provenance workflow—explained from the ground up.


What this thing does (in plain words)

It compares two software artifacts (containers, packages, binaries), computes the smallest safe update between them, and emits both:

  • a delta (what to apply),
  • and proof (why its safe and who built it).

You get faster rollouts, smaller downloads, and auditable provenance—plus a builtin rollback thats just as verifiable.


Core idea

  1. Contentdefined chunking (CDC) Split files into variablesize chunks using Rabin/CDC, so similar regions line up even if bytes shift. Build a Merkle DAG over the chunks.
  2. Deterministic delta ops Delta = ordered ops: COPY <chunk-id> or ADD <chunk-bytes>. No “magic heuristics”; same inputs → same delta.
  3. Functionlevel diffs (executables only) For ELF/PE, disassemble and compare by symbol/function to highlight semantic changes (added/removed/modified functions), but still ship chunklevel ops for patching.
  4. Verification & attestation Every delta links to attestations (SLSA/DSSE/cosign/Rekor) so a verifier can check builder identity, materials, and inclusion proofs offline.

Supported inputs

  • Blobs: OCI layers, .deb/.rpm payloads, zip/jar/war
  • Binaries: ELF/PE segments (persection CDC first, then optional symbol compare)

Artifacts the scanner emits

delta-manifest.json (deterministic):

  • base_digest, target_digest, artifact_type
  • changed_chunks[] (ids, byte ranges)
  • ops[] (COPY/ADD sequence)
  • functions_changed (added/removed/modified counts; top symbols)
  • materials_delta (new/removed deps & digests)
  • attestations[] (DSSE/cosign refs, Rekor log pointers or embedded CT tile)
  • score_inputs (precomputed metrics to keep scoring reproducible)

The actual delta payload is a compact binary: header + op stream + ADD byte blobs.


How verification works (offlinefirst)

  • Content addressability: chunk ids are hashes; COPY ops verify by recomputing.
  • Attestations: DSSE/cosign bundle includes builder identity and materials[] digests. Rekor inclusion proof (or embedded tile fragment) lets verifiers reassemble the transparency chain without the Internet.
  • Policy: if SLSA predicate present and policy threshold met → “green”; else fall back to vendor signature + content checks and mark provenance gaps.

Risk scoring (explainable)

Compute a single delta_risk from:

  • provenance_completeness (SLSA level, DSSE validity, Rekor inclusion)
  • delta_entropy (how many new bytes vs copies; unexpected high entropy is riskier)
  • new_deps_count (materials delta)
  • signed_attestation_validity (key/trust chain freshness)
  • function_change_impact (count/criticality of changed symbols)

Expose the breakdown directly in UI so reviewers see why the score is what it is.


Rollback thats actually safe

  • Rollback is just “apply delta going to previous artifact” plus a signed rollback attestation anchored in the transparency log.
  • Verifier refuses rollbacks without matching provenance or if the computed rollback delta doesnt reproduce the earlier artifacts digest.

Minimal internal data structures (sketch)

Chunk {
  id: sha256(bytes),
  size: u32,
  merkle: sha256(left||right)
}

DeltaOp = COPY {chunk_id} | ADD {len, bytes}

DeltaManifest {
  base_digest, target_digest, artifact_type,
  ops[], changed_chunks[],
  functions_changed: {added[], removed[], modified[]},
  materials_delta: {added[], removed[]},
  attestations: {dsse_bundle_ref, rekor_inclusion[]},
  score_inputs: {provenance, entropy, deps, attestation_validity, fn_impact}
}

Pipeline (endtoend)

  1. Ingest base & target → normalize (strip nondeterministic metadata; preserve signatures).
  2. CDC pass → chunk map → Merkle DAGs.
  3. Delta construction (greedy minimal ADDs, prefer COPY of identical chunk ids).
  4. (Executables) symbol table → lightweight disassembly → function map diff.
  5. Attestation linkage → attach DSSE bundle refs + Rekor proofs.
  6. Scoring → deterministic delta_risk + breakdown.
  7. Emit delta.manifest + delta.bin.

UI: what reviewers see

  • Top changed functions (name, section, size delta, callfanout hint)
  • Provenance panel (SLSA level, DSSE signer, Rekor entry—click to open)
  • Delta anatomy (COPY/ADD ratio, entropy, bytes added)
  • Dependencies delta (new/removed materials with digests)
  • “Apply” / “Rollback” buttons gated by policy & attestation validity

How this fits your StellaOps stack (dropin plan)

  • Module: add DeltaScanner service under Evidence/Attestor boundary.
  • Airgap: store DSSE bundles and Rekor tile fragments alongside artifacts in EvidenceLocker.
  • SBOM/VEX: on delta, also diff SBOM nodes and attach a deltaSBOM for impacted components; feed VEX evaluation to AdvisoryAI for surfaced risk notes.
  • Release gates: block promotion if delta_risk > threshold or provenance_completeness < policy.
  • CLI: stella delta create|verify|apply|rollback --base A --target B --policy policy.yaml.

Implementation notes (concise)

  • CDC: Rabin fingerprinting window 4864B; average chunk 416KiB; rolling mask yields boundaries.
  • Hashing: BLAKE3 for speed; SHA256 for interop (store both if needed).
  • Disassembly: Capstone/llvmobjdump (ELF/PE), symbol map fallback if stripped.
  • Determinism: fix chunk params, hash orderings, and traversal; sort tables prior to emit.
  • Security: validate all COPY targets exist in base; cap ADD size; verify DSSE before score.

Deliverables you can ship quickly

  • delta-scanner lib (CDC + DAG + ops)
  • delta-verify (attestations, Rekor proof check offline)
  • delta-score (pure function over delta-manifest)
  • UI panels: Delta, Provenance, Risk (reuse Stellas style system)
  • CI job: create delta + attach DSSE + upload to EvidenceLocker

Test matrix (essentials)

  • Small edit in large file (ADD minimal)
  • Repacked zip with same payload (COPY dominates)
  • Stripped vs nonstripped ELF (function compare graceful)
  • Added dependency layer in OCI (materials_delta flagged)
  • Missing SLSA but valid vendor sig (gap recorded, lower score)
  • Rollback with/without signed rollback attestation (accept/deny)

If you want, I can generate:

  • a readytocommit Go/.NET reference implementation skeleton,
  • a policy.yaml template with thresholds,
  • and UI wireframes (ASCII + Mermaid) for the three panels.