Files
git.stella-ops.org/docs/modules/binary-index/hybrid-diff-stack.md
2026-02-17 00:51:35 +02:00

4.8 KiB

Hybrid Diff Stack Architecture (Source -> Symbols -> Normalized Bytes)

Status: Planned (advisory translation, 2026-02-16) Module: BinaryIndex with cross-module contracts (Symbols, EvidenceLocker, Policy, Attestor, ReleaseOrchestrator)

1. Objective

Produce compact, auditable patch artifacts that preserve developer intent and binary truth at the same time:

  • Source-level intent: semantic edit scripts anchored to classes/functions.
  • Build-level mapping: symbol map linked to immutable build identity.
  • Binary-level patching: normalization-first per-symbol deltas.
  • Release evidence: DSSE-signed contract consumed by policy and replay.

2. Current implementation baseline

Implemented today:

  • ELF normalization passes and deterministic delta hash generation.
  • DeltaSig predicate contracts (v1 and v2) with CLI author/sign/verify flows.
  • Symbol manifest model with debug id, code id, source paths, and line data.

Gaps for full advisory scope:

  • No AST semantic edit script artifact pipeline in current release workflow.
  • No canonical builder output for source-range to symbol-address map as a first-class build artifact contract.
  • No end-to-end "source edits -> symbol patch plan -> normalized deltas" bundle schema consumed by release policy.
  • Existing function delta composition still contains placeholder address/size behavior in parts of DeltaSig generation.

3. Target contracts

3.1 Source semantic edit script (semantic_edit_script.json)

Required fields:

  • schemaVersion
  • sourceTreeDigest
  • edits[] where each edit includes:
    • editType: add|remove|move|update|rename
    • nodeKind: class|method|field|import|statement
    • nodePath: stable language-specific path
    • anchor: symbol-like identifier (for example Namespace.Type.Method)
    • pre and post source spans and digests

Determinism rules:

  • Stable sort by file path, then node path.
  • Stable source digests and normalized paths.

3.2 Symbol map (symbol_map.json)

Produced during build from DWARF/PDB + build metadata.

Required fields:

  • schemaVersion
  • buildId
  • binaryDigest
  • symbols[]:
    • name
    • kind (function|object|section)
    • addressStart and addressEnd
    • section
    • sourceRanges[] (file, lineStart, lineEnd)

Determinism rules:

  • Symbol ordering by address then name.
  • Build id must match attestation subject.

3.3 Symbol patch plan (symbol_patch_plan.json)

Joins source edits with concrete symbols.

Required fields:

  • schemaVersion
  • buildIdBefore and buildIdAfter
  • editsDigest
  • symbolMapDigestBefore and symbolMapDigestAfter
  • changes[]:
    • symbol
    • changeType (added|removed|modified|moved)
    • astAnchors[]
    • preHash and postHash
    • deltaRef

3.4 Patch manifest (patch_manifest.json)

Binds per-symbol normalized deltas to evidence and policy.

Required fields:

  • schemaVersion
  • buildId
  • normalizationRecipeId
  • patches[]:
    • symbol
    • addressRange
    • deltaDigest
    • pre (size, hash)
    • post (size, hash)
  • attestation (predicateType, dsseDigest)

4. Evidence and policy integration

EvidenceLocker stores four linked artifacts per release comparison:

  1. semantic edit script
  2. symbol maps (before/after)
  3. symbol patch plan
  4. normalized patch manifest + delta blobs

Policy hooks:

  • Allowlist/denylist by namespace or symbol path.
  • Max function-count and max byte budget controls.
  • API surface change checks.
  • Hot-path and cryptography namespace protection rules.

5. Verifier contract (Attestor/Doctor)

Verifier must prove all of the following before promotion:

  • Build-id and subject digest alignment.
  • Re-normalization of target binary with matching recipe id.
  • Dry-run delta application succeeds within declared symbol boundaries.
  • Resulting hashes equal manifest post values.
  • AST anchors reconcile to changed symbols in symbol patch plan.
  • DSSE signatures and transparency references validate per policy.

6. Integration boundaries

Builder step (CI): emit symbol map and normalized segments.

ReleaseOrchestrator step: combine source edits, symbol maps, and normalized bytes into patch plan and manifest.

BinaryIndex/DeltaSig: own normalization and per-symbol diff generation.

Attestor/Doctor: own verification and attestation checks.

EvidenceLocker: own storage schema and query surfaces.

Policy: consume summarized patch-plan metrics and rule evaluations.

7. Implementation tracker

Execution is tracked in:

  • docs/implplan/SPRINT_20260216_001_BinaryIndex_hybrid_diff_patch_pipeline.md
  • docs/hybrid-diff-patching.md
  • docs/modules/binary-index/semantic-diffing.md
  • docs/modules/binary-index/deltasig-v2-schema.md
  • docs/modules/scanner/binary-diff-attestation.md
  • docs/modules/evidence-locker/guides/evidence-pack-schema.md