git.stella-ops.org/18-Nov-2026 - Patch-Oracles.md at 65b15992292c4b4b3100430a80d289a461e7d16c - git.stella-ops.org

Files

Docs CI / lint-and-preview (push) Has been cancelled

Details

feat: Add comprehensive documentation for binary reachability with PURL-resolved edges

- Introduced a detailed specification for encoding binary reachability that integrates call graphs with SBOMs.
- Defined a minimal data model including nodes, edges, and SBOM components.
- Outlined a step-by-step guide for building the reachability graph in a C#-centric manner.
- Established core domain models, including enumerations for binary formats and symbol kinds.
- Created a public API for the binary reachability service, including methods for graph building and serialization.
- Specified SBOM component resolution and binary parsing abstractions for PE, ELF, and Mach-O formats.
- Enhanced symbol normalization and digesting processes to ensure deterministic signatures.
- Included error handling, logging, and a high-level test plan to ensure robustness and correctness.
- Added non-functional requirements to guide performance, memory usage, and thread safety.

2025-11-20 23:16:02 +02:00

17 KiB

Raw Blame History

Here’s a simple, cheap way to sanity‑check your vuln function recovery without fancy ground truth: build “patch oracles.”

What it is (in plain words)

Take a known CVE and compile two tiny binaries from the same source:

Vulnerable commit/revision
Fixed commit/revision Then diff the discovered functions + call edges between the two. If your analyzer can’t see the symbol (or guard) the patch adds/removes/tightens, your recall is suspect.

Why it works

Patches for real CVEs usually:

add/remove a function (e.g., validate_len)
change a call site (new guard before memcpy)
tweak control flow (early return on bounds check)

Those are precisely the things your function recovery / call‑graph pass should surface—even on stripped ELFs. If they don’t move in your graph, you’ve got blind spots.

Minimal workflow (5 steps)

Pick a CVE with a clean, public fix (e.g., OpenSSL/zlib/busybox).
Isolate the patch (git range or cherry‑pick) and craft a tiny harness that calls the affected code path.
Build both with the same toolchain/flags; produce stripped ELFs (-s) to mimic production.
Run your discovery on both:
- function list, demangled where possible
- call edges (A→B), basic blocks (optional)
Diff the graphs: look for the new guard function, removed unsafe call, or altered edge count.

A tiny “oracle spec” (drop-in YAML for your test runner)

cve: CVE-YYYY-XXXX
target: libfoo 1.2.3
build:
  cc: clang
  cflags: [-O2, -fno-omit-frame-pointer]
  ldflags: []
  strip: true
evidence:
  expect_functions_added:   [validate_len]
  expect_functions_removed: [unsafe_copy]     # optional
  expect_call_added:
    - caller: foo_parse
      callee: validate_len
  expect_call_removed:
    - caller: foo_parse
      callee: memcpy
tolerances:
  allow_unresolved_symbols: 0
  allow_extra_funcs: 2

Quick harness pattern (C)

// before: foo_parse -> memcpy(buf, src, len);
// after : foo_parse -> validate_len(len) -> memcpy(...)
extern int foo_parse(const char*);

int main(int argc, char** argv) {
  const char* in = argc > 1 ? argv[1] : "AAAA";
  return foo_parse(in);
}

What to flag as a failure

Expected function not discovered (e.g., validate_len missing).
Expected edge not present (foo_parse → validate_len absent).
No CFG change where patch clearly adds a guard/early return.

Where this plugs into Stella Ops

Put these oracles under Scanner/tests/patch-oracles/* per language.
Run them in CI for .NET/JVM/C/C++/Go/Rust analyzers.
Use them to gate any changes to symbolization, demangling, or call‑graph building.
Record per‑analyzer recall deltas when you tweak heuristics or switch disassemblers.

If you want, I can scaffold the first three oracles (e.g., zlib overflow fix, OpenSSL length check, BusyBox ash patch) with ready‑to‑run Makefiles and expected graph diffs. Understood — let us turn the “patch oracle” idea into something you can actually drop into the Stella Ops repo and CI.

I will walk through:

How to structure this inside the monorepo
How to build one oracle end-to-end (C/C++ example)
How to do the same for .NET/JVM
How to automate running and asserting them
Practical rules and pitfalls so these stay stable and useful

1. Where this lives in Stella Ops

A simple, language-agnostic layout that will scale:

src/
  StellaOps.Scanner/
    ...                               # your scanner code
  StellaOps.Scanner.Tests/            # existing tests (if any)
    PatchOracles/
      c/
        CVE-YYYY-XXXX-<short-name>/
          src/
          build.sh
          oracle.yml
          README.md
      cpp/
        ...
      dotnet/
        CVE-YYYY-XXXX-<short-name>/
          src/
          build.ps1
          oracle.yml
          README.md
      jvm/
        ...
      go/
        ...
      rust/
        ...
  tools/
    scanner-oracle-runner/            # tiny runner (C# console or bash)

Key principles:

Each CVE/test case is self-contained (its own folder with sources, build script, oracle.yml).
Build scripts produce two binaries/artifacts: vuln and fixed.
oracle.yml describes: how to build, what to scan, and what differences to expect in Scanner’s call graph/function list.

2. How to build a single patch oracle (C/C++)

Think of a patch oracle as: “Given these two binaries, Scanner must see specific changes in functions and call edges.”

2.1. Step-by-step workflow

For one C/C++ CVE:

Pick & freeze the patch
- Choose a small, clean CVE in a library with easily buildable code (zlib, OpenSSL, BusyBox, etc.).
- Identify commit A (vulnerable) and commit B (fixed).
- Extract only the minimal sources needed to build the affected function + a harness into src/.
Create a minimal harness

Example: patch adds validate_len and guards a memcpy in foo_parse.

// src/main.c
#include <stdio.h>

int foo_parse(const char* in);  // from the library code under test

int main(int argc, char** argv) {
    const char* in = (argc > 1) ? argv[1] : "AAAA";
    return foo_parse(in);
}

Under src/, you keep two sets of sources:

src/
  vuln/
    foo.c        # vulnerable version
    api.h
    main.c
  fixed/
    foo.c        # fixed version (adds validate_len, changes calls)
    api.h
    main.c

Provide a deterministic build script

Example build.sh:

#!/usr/bin/env bash
set -euo pipefail

CC="${CC:-clang}"
CFLAGS="${CFLAGS:- -O2 -fno-omit-frame-pointer -g0}"
LDFLAGS="${LDFLAGS:- }"

build_one() {
  local name="$1"   # vuln or fixed
  mkdir -p build
  ${CC} ${CFLAGS} src/${name}/*.c ${LDFLAGS} -o build/${name}
  # Strip symbols to simulate production
  strip build/${name}
}

build_one "vuln"
build_one "fixed"

Guidelines:

Fix the toolchain: either run this inside a Docker image (e.g., debian:bookworm with specific clang version) or at least document required versions in README.md.
Always build both artifacts with identical flags; the only difference should be the code change.
Use strip to ensure Scanner doesn’t accidentally rely on debug symbols.

Define the oracle (what must change)

You define expectations based on the patch:

Functions added/removed/renamed.
New call edges (e.g., foo_parse -> validate_len).
Removed call edges (e.g., foo_parse -> memcpy).
Optionally: new basic blocks, conditional branches, or early returns.

A practical oracle.yml for this case:

cve: CVE-YYYY-XXXX
name: zlib_len_guard_example
language: c
toolchain:
  cc: clang
  cflags: "-O2 -fno-omit-frame-pointer -g0"
  ldflags: ""
build:
  script: "./build.sh"
  artifacts:
    vulnerable: "build/vuln"
    fixed: "build/fixed"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  # If you have a Dockerized scanner, you could do:
  # scanner_cli: "docker run --rm -v $PWD:/work stellaops/scanner:dev"
  args:
    - "--format=json"
    - "--analyzers=native"
  timeout_seconds: 120

expectations:
  functions:
    must_exist_in_fixed:
      - name: "validate_len"
    must_not_exist_in_vuln:
      - name: "validate_len"
  calls:
    must_add:
      - caller: "foo_parse"
        callee: "validate_len"
    must_remove:
      - caller: "foo_parse"
        callee: "memcpy"
  tolerances:
    allow_unresolved_symbols: 0
    allow_extra_functions: 5
    allow_missing_calls: 0

Connect Scanner output to the oracle

Assume your Scanner CLI produces something like:

{
  "binary": "build/fixed",
  "functions": [
    { "name": "foo_parse", "address": "0x401000" },
    { "name": "validate_len", "address": "0x401080" },
    ...
  ],
  "calls": [
    { "caller": "foo_parse", "callee": "validate_len" },
    { "caller": "validate_len", "callee": "memcpy" }
  ]
}

Your oracle-runner will:

Run scanner on vuln → vuln.json
Run scanner on fixed → fixed.json
Compare each expectation in oracle.yml against vuln.json and fixed.json

Pseudo-logic for a function expectation:

bool HasFunction(JsonElement doc, string name) =>
    doc.GetProperty("functions")
       .EnumerateArray()
       .Any(f => f.GetProperty("name").GetString() == name);

bool HasCall(JsonElement doc, string caller, string callee) =>
    doc.GetProperty("calls")
       .EnumerateArray()
       .Any(c =>
            c.GetProperty("caller").GetString() == caller &&
            c.GetProperty("callee").GetString() == callee);

The runner will produce a small report, per oracle:

[PASS] CVE-YYYY-XXXX zlib_len_guard_example
  + validate_len appears only in fixed → OK
  + foo_parse → validate_len call added → OK
  + foo_parse → memcpy call removed → OK

If anything fails, it prints the mismatches and exits with non-zero code so CI fails.

3. Implementing the oracle runner (practical variant)

You can implement this either as:

A standalone C# console (StellaOps.Scanner.PatchOracleRunner), or
A set of xUnit tests that read oracle.yml and run dynamically.

3.1. Console runner skeleton (C#)

High-level structure:

src/tools/scanner-oracle-runner/
  Program.cs
  Oracles/
    (symlink or reference to src/StellaOps.Scanner.Tests/PatchOracles)

Core responsibilities:

Discover all oracle.yml files under PatchOracles/.
For each:
- Run the build script.
- Run the scanner on both artifacts.
- Evaluate expectations.
Aggregate results and exit with appropriate status.

Pseudo-code outline:

static int Main(string[] args)
{
    var root = args.Length > 0 ? args[0] : "src/StellaOps.Scanner.Tests/PatchOracles";
    var oracleFiles = Directory.GetFiles(root, "oracle.yml", SearchOption.AllDirectories);
    var failures = new List<string>();

    foreach (var oracleFile in oracleFiles)
    {
        var result = RunOracle(oracleFile);
        if (!result.Success)
        {
            failures.Add($"{result.Name}: {result.FailureReason}");
        }
    }

    if (failures.Any())
    {
        Console.Error.WriteLine("Patch oracle failures:");
        foreach (var f in failures) Console.Error.WriteLine("  - " + f);
        return 1;
    }

    Console.WriteLine("All patch oracles passed.");
    return 0;
}

RunOracle does:

Deserialize YAML (e.g., via YamlDotNet).
Process.Start for build.script.
Process.Start for scanner_cli twice (vuln/fixed).
Read/parse JSON outputs.
Run checks functions.must_* and calls.must_*.

This is straightforward plumbing code; once built, adding a new patch oracle is just adding a folder + oracle.yml.

4. Managed (.NET / JVM) patch oracles

Exact same concept, slightly different mechanics.

4.1. .NET example

Directory:

PatchOracles/
  dotnet/
    CVE-2021-XXXXX-systemtextjson/
      src/
        vuln/
          Example.sln
          Api/...
        fixed/
          Example.sln
          Api/...
      build.ps1
      oracle.yml

build.ps1 (PowerShell, simplified):

param(
  [string]$Configuration = "Release"
)

$ErrorActionPreference = "Stop"

function Build-One([string]$name) {
  Push-Location "src/$name"
  dotnet clean
  dotnet publish -c $Configuration -p:DebugType=None -p:DebugSymbols=false -o ../../build/$name
  Pop-Location
}

New-Item -ItemType Directory -Force -Path "build" | Out-Null

Build-One "vuln"
Build-One "fixed"

oracle.yml:

cve: CVE-2021-XXXXX
name: systemtextjson_escape_fix
language: dotnet
build:
  script: "pwsh ./build.ps1"
  artifacts:
    vulnerable: "build/vuln/Api.dll"
    fixed:      "build/fixed/Api.dll"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  args:
    - "--format=json"
    - "--analyzers=dotnet"
  timeout_seconds: 120

expectations:
  methods:
    must_exist_in_fixed:
      - "Api.JsonHelper::EscapeString"
    must_not_exist_in_vuln:
      - "Api.JsonHelper::EscapeString"
  calls:
    must_add:
      - caller: "Api.Controller::Handle"
        callee: "Api.JsonHelper::EscapeString"
  tolerances:
    allow_missing_calls: 0
    allow_extra_methods: 10

Scanner’s .NET analyzer should produce method identifiers in a stable format (e.g., Namespace.Type::Method(Signature)), which you then use in the oracle.

4.2. JVM example

Similar structure, but artifacts are JARs:

build:
  script: "./gradlew :app:assemble"
  artifacts:
    vulnerable: "app-vuln.jar"
    fixed: "app-fixed.jar"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  args:
    - "--format=json"
    - "--analyzers=jvm"

Expectations then refer to methods like com.example.JsonHelper.escapeString:(Ljava/lang/String;)Ljava/lang/String;.

5. Wiring into CI

You can integrate this in your existing pipeline (GitLab Runner / Gitea / etc.) as a separate job.

Example CI job skeleton (GitLab-like YAML for illustration):

patch-oracle-tests:
  stage: test
  image: mcr.microsoft.com/dotnet/sdk:10.0
  script:
    - dotnet build src/StellaOps.Scanner/StellaOps.Scanner.csproj -c Release
    - dotnet build src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -c Release
    - dotnet run --project src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -- \
        src/StellaOps.Scanner.Tests/PatchOracles
  artifacts:
    when: on_failure
    paths:
      - src/StellaOps.Scanner.Tests/PatchOracles/**/build
      - oracle-results.log

You can also:

Tag the job (e.g., oracle or reachability) so you can run it nightly or on changes to Scanner analyzers.
Pin Docker images with the exact C/C++/Java toolchains used by patch oracles so results are deterministic.

6. Practical guidelines and pitfalls

Here are concrete rules of thumb for making this robust:

6.1. Choosing good CVE oracles

Prefer cases where:

The patch clearly adds/removes a function or method, or introduces a separate helper such as validate_len, check_bounds, etc.
The patch adds/removes a call that is easy to see even under optimization (e.g., non-inline, non-template).
The project is easy to build and not heavily reliant on obscure toolchains.

For each supported language in Scanner, target:

3–5 small C or C++ oracles.
3–5 .NET or JVM oracles.
1–3 for Go and Rust once those analyzers exist.

You do not need many; you want sharp, surgical tests, not coverage.

6.2. Handle inlining and optimization

Compilers may inline small functions; this can break naive “must have call edge” expectations.

Mitigations:

Choose functions that are “large enough” or mark them __attribute__((noinline)) (GCC/Clang) in your test harness code if necessary.
Alternatively, relax expectations using should_add vs must_add for some edges:

calls:
  must_add: []
  should_add:
    - caller: "foo_parse"
      callee: "validate_len"

In the runner, should_add failures can mark the oracle as “degraded” but not fatal, while must_* failures break the build.

6.3. Keep oracles stable over time

To avoid flakiness:

Vendor sources into the repo (or at least snapshot the patch) so upstream changes do not affect builds.
Pin toolchain versions in Docker images for CI.
Capture and pin scanner configuration: analyzers enabled, rules, version. If you support “deterministic scan manifests” later, these oracles are perfect consumers of that.

6.4. What to assert beyond functions/calls

When your Scanner gets more advanced, you can extend oracle.yml:

cfg:
  must_increase_blocks:
    - function: "foo_parse"
  must_add_branch_on:
    - function: "foo_parse"
      operand_pattern: "len <= MAX_LEN"

Initially, I would keep it to:

Function presence/absence
Call edges presence/absence

and add CFG assertions only when your analyzers and JSON model for CFG stabilize.

6.5. How to use failures

When a patch oracle fails, it is a signal that either:

A change in Scanner or a new optimization pattern created a blind spot, or
The oracle is too strict (e.g., relying on a call that got inlined).

You then:

Inspect the disassembly / Scanner JSON for vuln and fixed.
Decide if Scanner is wrong (fix analyzer) or oracle is too rigid (relax to should_*).
Commit both the code change and updated oracle (if needed) in the same merge request.

7. Minimal checklist for adding a new patch oracle

For your future self and your agents, here is a compressed checklist:

Select CVE + patch; copy minimal affected sources into src/…/<lang>/<CVE>/src/{vuln,fixed}.
Add a tiny harness that calls the patched code path.
Write build.sh / build.ps1 to produce build/vuln and build/fixed artifacts, stripped/Release.
Run manual scanner on both artifacts once; inspect JSON to find real symbol names and call edges.
Create oracle.yml with:
- build.script and artifacts.* paths
- scan.scanner_cli + args
- expectations.functions.* and expectations.calls.*
Run scanner-oracle-runner locally; fix any mismatches or over-strict expectations.
Commit and ensure CI job patch-oracle-tests runs and must pass on MR.

If you wish, next step we can design the actual JSON schema that Scanner should emit for function/call graphs and write a first C# implementation of scanner-oracle-runner aligned with that schema.

17 KiB Raw Blame History Unescape Escape