6.7 KiB
6.7 KiB
Here’s a compact, practical design for a smart‑difference scanner that produces tiny, verifiable binary deltas and plugs cleanly into a release/provenance workflow—explained from the ground up.
What this thing does (in plain words)
It compares two software artifacts (containers, packages, binaries), computes the smallest safe update between them, and emits both:
- a delta (what to apply),
- and proof (why it’s safe and who built it).
You get faster rollouts, smaller downloads, and auditable provenance—plus a built‑in rollback that’s just as verifiable.
Core idea
- Content‑defined chunking (CDC) Split files into variable‑size chunks using Rabin/CDC, so similar regions line up even if bytes shift. Build a Merkle DAG over the chunks.
- Deterministic delta ops
Delta = ordered ops:
COPY <chunk-id>orADD <chunk-bytes>. No “magic heuristics”; same inputs → same delta. - Function‑level diffs (executables only) For ELF/PE, disassemble and compare by symbol/function to highlight semantic changes (added/removed/modified functions), but still ship chunk‑level ops for patching.
- Verification & attestation Every delta links to attestations (SLSA/DSSE/cosign/Rekor) so a verifier can check builder identity, materials, and inclusion proofs offline.
Supported inputs
- Blobs: OCI layers, .deb/.rpm payloads, zip/jar/war
- Binaries: ELF/PE segments (per‑section CDC first, then optional symbol compare)
Artifacts the scanner emits
delta-manifest.json (deterministic):
base_digest,target_digest,artifact_typechanged_chunks[](ids, byte ranges)ops[](COPY/ADD sequence)functions_changed(added/removed/modified counts; top symbols)materials_delta(new/removed deps & digests)attestations[](DSSE/cosign refs, Rekor log pointers or embedded CT tile)score_inputs(pre‑computed metrics to keep scoring reproducible)
The actual delta payload is a compact binary: header + op stream + ADD byte blobs.
How verification works (offline‑first)
- Content addressability: chunk ids are hashes; COPY ops verify by recomputing.
- Attestations: DSSE/cosign bundle includes builder identity and
materials[]digests. Rekor inclusion proof (or embedded tile fragment) lets verifiers reassemble the transparency chain without the Internet. - Policy: if SLSA predicate present and policy threshold met → “green”; else fall back to vendor signature + content checks and mark provenance gaps.
Risk scoring (explainable)
Compute a single delta_risk from:
provenance_completeness(SLSA level, DSSE validity, Rekor inclusion)delta_entropy(how many new bytes vs copies; unexpected high entropy is riskier)new_deps_count(materials delta)signed_attestation_validity(key/trust chain freshness)function_change_impact(count/criticality of changed symbols)
Expose the breakdown directly in UI so reviewers see why the score is what it is.
Rollback that’s actually safe
- Rollback is just “apply delta going to previous artifact” plus a signed rollback attestation anchored in the transparency log.
- Verifier refuses rollbacks without matching provenance or if the computed rollback delta doesn’t reproduce the earlier artifact’s digest.
Minimal internal data structures (sketch)
Chunk {
id: sha256(bytes),
size: u32,
merkle: sha256(left||right)
}
DeltaOp = COPY {chunk_id} | ADD {len, bytes}
DeltaManifest {
base_digest, target_digest, artifact_type,
ops[], changed_chunks[],
functions_changed: {added[], removed[], modified[]},
materials_delta: {added[], removed[]},
attestations: {dsse_bundle_ref, rekor_inclusion[]},
score_inputs: {provenance, entropy, deps, attestation_validity, fn_impact}
}
Pipeline (end‑to‑end)
- Ingest base & target → normalize (strip nondeterministic metadata; preserve signatures).
- CDC pass → chunk map → Merkle DAGs.
- Delta construction (greedy minimal ADDs, prefer COPY of identical chunk ids).
- (Executables) symbol table → lightweight disassembly → function map diff.
- Attestation linkage → attach DSSE bundle refs + Rekor proofs.
- Scoring → deterministic
delta_risk+ breakdown. - Emit
delta.manifest+delta.bin.
UI: what reviewers see
- Top changed functions (name, section, size delta, call‑fanout hint)
- Provenance panel (SLSA level, DSSE signer, Rekor entry—click to open)
- Delta anatomy (COPY/ADD ratio, entropy, bytes added)
- Dependencies delta (new/removed materials with digests)
- “Apply” / “Rollback” buttons gated by policy & attestation validity
How this fits your Stella Ops stack (drop‑in plan)
- Module: add
DeltaScannerservice under Evidence/Attestor boundary. - Air‑gap: store DSSE bundles and Rekor tile fragments alongside artifacts in EvidenceLocker.
- SBOM/VEX: on delta, also diff SBOM nodes and attach a delta‑SBOM for impacted components; feed VEX evaluation to AdvisoryAI for surfaced risk notes.
- Release gates: block promotion if
delta_risk > thresholdorprovenance_completeness < policy. - CLI:
stella delta create|verify|apply|rollback --base A --target B --policy policy.yaml.
Implementation notes (concise)
- CDC: Rabin fingerprinting window 48–64B; average chunk 4–16 KiB; rolling mask yields boundaries.
- Hashing: BLAKE3 for speed; SHA‑256 for interop (store both if needed).
- Disassembly: Capstone/llvm‑objdump (ELF/PE), symbol map fallback if stripped.
- Determinism: fix chunk params, hash orderings, and traversal; sort tables prior to emit.
- Security: validate all COPY targets exist in base; cap ADD size; verify DSSE before score.
Deliverables you can ship quickly
delta-scannerlib (CDC + DAG + ops)delta-verify(attestations, Rekor proof check offline)delta-score(pure function overdelta-manifest)- UI panels: Delta, Provenance, Risk (reuse Stella’s style system)
- CI job: create delta + attach DSSE + upload to EvidenceLocker
Test matrix (essentials)
- Small edit in large file (ADD minimal)
- Repacked zip with same payload (COPY dominates)
- Stripped vs non‑stripped ELF (function compare graceful)
- Added dependency layer in OCI (materials_delta flagged)
- Missing SLSA but valid vendor sig (gap recorded, lower score)
- Rollback with/without signed rollback attestation (accept/deny)
If you want, I can generate:
- a ready‑to‑commit Go/.NET reference implementation skeleton,
- a policy.yaml template with thresholds,
- and UI wireframes (ASCII + Mermaid) for the three panels.