Files
git.stella-ops.org/docs/features/unimplemented/binaryindex/byte-level-binary-diffing-with-rolling-hash-windows.md
2026-02-12 10:27:23 +02:00

2.7 KiB

Byte-Level Binary Diffing with Rolling Hash Windows

Module

BinaryIndex

Status

PARTIALLY_IMPLEMENTED

Description

Byte-level binary comparison using rolling hash windows that identifies exactly which byte ranges changed between binary versions. Produces binary proof snippets with section analysis and privacy controls to strip raw bytes. Supports stream and file-based comparison.

Implementation Details

  • Modules: src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/
  • Key Classes:
    • PatchDiffEngine (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/PatchDiffEngine.cs) - core diffing engine computing byte-level differences between binary versions using function fingerprints
    • FunctionDiffer (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionDiffer.cs) - function-level comparison with semantic analysis option and call-graph edge diffing
    • FunctionRenameDetector (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionRenameDetector.cs) - detects function renames between versions using fingerprint similarity
    • VerdictCalculator (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/VerdictCalculator.cs) - computes patch verification verdicts from diff results
    • InMemoryDiffResultStore (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Storage/InMemoryDiffResultStore.cs) - stores diff results with content-addressed IDs
  • Models: PatchDiffModels, DiffEvidenceModels, BinaryReference (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Models/)
  • Interfaces: IPatchDiffEngine, IDiffResultStore (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/)
  • Source: SPRINT_20260112_200_004_CHGTRC_byte_diffing.md

E2E Test Plan

  • Submit two binary versions and verify byte-range differences are identified with correct offsets
  • Verify section analysis identifies which ELF sections changed (.text, .data, .rodata)
  • Verify privacy controls strip raw bytes from proof snippets when configured
  • Verify FunctionRenameDetector correctly identifies renamed functions between versions
  • Verify VerdictCalculator produces correct patch verification verdict (patched vs unpatched)
  • Verify diff results are stored with deterministic content-addressed IDs

Implementation Gaps (QA 2026-02-11)

  • Current diff engine is function/CFG-level (PatchDiffEngine + FunctionDiffer) and does not implement byte-range rolling-window outputs with exact offsets.
  • Section-aware diff outputs (.text/.data/.rodata) and privacy controls to strip raw proof bytes are not present in exposed models/engine behavior.
  • InMemoryDiffResultStore stores results using Guid.NewGuid() rather than deterministic content-addressed IDs.