Files
git.stella-ops.org/docs/product-advisories/18-Nov-2026 - Patch-Oracles.md
master 522fff73cd
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Add comprehensive documentation for binary reachability with PURL-resolved edges
- Introduced a detailed specification for encoding binary reachability that integrates call graphs with SBOMs.
- Defined a minimal data model including nodes, edges, and SBOM components.
- Outlined a step-by-step guide for building the reachability graph in a C#-centric manner.
- Established core domain models, including enumerations for binary formats and symbol kinds.
- Created a public API for the binary reachability service, including methods for graph building and serialization.
- Specified SBOM component resolution and binary parsing abstractions for PE, ELF, and Mach-O formats.
- Enhanced symbol normalization and digesting processes to ensure deterministic signatures.
- Included error handling, logging, and a high-level test plan to ensure robustness and correctness.
- Added non-functional requirements to guide performance, memory usage, and thread safety.
2025-11-20 23:16:02 +02:00

17 KiB
Raw Blame History

Heres a simple, cheap way to sanitycheck your vuln function recovery without fancy ground truth: build “patch oracles.”


What it is (in plain words)

Take a known CVE and compile two tiny binaries from the same source:

  • Vulnerable commit/revision
  • Fixed commit/revision Then diff the discovered functions + call edges between the two. If your analyzer cant see the symbol (or guard) the patch adds/removes/tightens, your recall is suspect.

Why it works

Patches for real CVEs usually:

  • add/remove a function (e.g., validate_len)
  • change a call site (new guard before memcpy)
  • tweak control flow (early return on bounds check)

Those are precisely the things your function recovery / callgraph pass should surface—even on stripped ELFs. If they dont move in your graph, youve got blind spots.


Minimal workflow (5 steps)

  1. Pick a CVE with a clean, public fix (e.g., OpenSSL/zlib/busybox).

  2. Isolate the patch (git range or cherrypick) and craft a tiny harness that calls the affected code path.

  3. Build both with the same toolchain/flags; produce stripped ELFs (-s) to mimic production.

  4. Run your discovery on both:

    • function list, demangled where possible
    • call edges (A→B), basic blocks (optional)
  5. Diff the graphs: look for the new guard function, removed unsafe call, or altered edge count.


A tiny “oracle spec” (drop-in YAML for your test runner)

cve: CVE-YYYY-XXXX
target: libfoo 1.2.3
build:
  cc: clang
  cflags: [-O2, -fno-omit-frame-pointer]
  ldflags: []
  strip: true
evidence:
  expect_functions_added:   [validate_len]
  expect_functions_removed: [unsafe_copy]     # optional
  expect_call_added:
    - caller: foo_parse
      callee: validate_len
  expect_call_removed:
    - caller: foo_parse
      callee: memcpy
tolerances:
  allow_unresolved_symbols: 0
  allow_extra_funcs: 2

Quick harness pattern (C)

// before: foo_parse -> memcpy(buf, src, len);
// after : foo_parse -> validate_len(len) -> memcpy(...)
extern int foo_parse(const char*);

int main(int argc, char** argv) {
  const char* in = argc > 1 ? argv[1] : "AAAA";
  return foo_parse(in);
}

What to flag as a failure

  • Expected function not discovered (e.g., validate_len missing).
  • Expected edge not present (foo_parse → validate_len absent).
  • No CFG change where patch clearly adds a guard/early return.

Where this plugs into StellaOps

  • Put these oracles under Scanner/tests/patch-oracles/* per language.
  • Run them in CI for .NET/JVM/C/C++/Go/Rust analyzers.
  • Use them to gate any changes to symbolization, demangling, or callgraph building.
  • Record peranalyzer recall deltas when you tweak heuristics or switch disassemblers.

If you want, I can scaffold the first three oracles (e.g., zlib overflow fix, OpenSSL length check, BusyBox ash patch) with readytorun Makefiles and expected graph diffs. Understood — let us turn the “patch oracle” idea into something you can actually drop into the Stella Ops repo and CI.

I will walk through:

  1. How to structure this inside the monorepo
  2. How to build one oracle end-to-end (C/C++ example)
  3. How to do the same for .NET/JVM
  4. How to automate running and asserting them
  5. Practical rules and pitfalls so these stay stable and useful

1. Where this lives in Stella Ops

A simple, language-agnostic layout that will scale:

src/
  StellaOps.Scanner/
    ...                               # your scanner code
  StellaOps.Scanner.Tests/            # existing tests (if any)
    PatchOracles/
      c/
        CVE-YYYY-XXXX-<short-name>/
          src/
          build.sh
          oracle.yml
          README.md
      cpp/
        ...
      dotnet/
        CVE-YYYY-XXXX-<short-name>/
          src/
          build.ps1
          oracle.yml
          README.md
      jvm/
        ...
      go/
        ...
      rust/
        ...
  tools/
    scanner-oracle-runner/            # tiny runner (C# console or bash)

Key principles:

  • Each CVE/test case is self-contained (its own folder with sources, build script, oracle.yml).
  • Build scripts produce two binaries/artifacts: vuln and fixed.
  • oracle.yml describes: how to build, what to scan, and what differences to expect in Scanners call graph/function list.

2. How to build a single patch oracle (C/C++)

Think of a patch oracle as: “Given these two binaries, Scanner must see specific changes in functions and call edges.”

2.1. Step-by-step workflow

For one C/C++ CVE:

  1. Pick & freeze the patch

    • Choose a small, clean CVE in a library with easily buildable code (zlib, OpenSSL, BusyBox, etc.).
    • Identify commit A (vulnerable) and commit B (fixed).
    • Extract only the minimal sources needed to build the affected function + a harness into src/.
  2. Create a minimal harness

Example: patch adds validate_len and guards a memcpy in foo_parse.

// src/main.c
#include <stdio.h>

int foo_parse(const char* in);  // from the library code under test

int main(int argc, char** argv) {
    const char* in = (argc > 1) ? argv[1] : "AAAA";
    return foo_parse(in);
}

Under src/, you keep two sets of sources:

src/
  vuln/
    foo.c        # vulnerable version
    api.h
    main.c
  fixed/
    foo.c        # fixed version (adds validate_len, changes calls)
    api.h
    main.c
  1. Provide a deterministic build script

Example build.sh:

#!/usr/bin/env bash
set -euo pipefail

CC="${CC:-clang}"
CFLAGS="${CFLAGS:- -O2 -fno-omit-frame-pointer -g0}"
LDFLAGS="${LDFLAGS:- }"

build_one() {
  local name="$1"   # vuln or fixed
  mkdir -p build
  ${CC} ${CFLAGS} src/${name}/*.c ${LDFLAGS} -o build/${name}
  # Strip symbols to simulate production
  strip build/${name}
}

build_one "vuln"
build_one "fixed"

Guidelines:

  • Fix the toolchain: either run this inside a Docker image (e.g., debian:bookworm with specific clang version) or at least document required versions in README.md.
  • Always build both artifacts with identical flags; the only difference should be the code change.
  • Use strip to ensure Scanner doesnt accidentally rely on debug symbols.
  1. Define the oracle (what must change)

You define expectations based on the patch:

  • Functions added/removed/renamed.
  • New call edges (e.g., foo_parse -> validate_len).
  • Removed call edges (e.g., foo_parse -> memcpy).
  • Optionally: new basic blocks, conditional branches, or early returns.

A practical oracle.yml for this case:

cve: CVE-YYYY-XXXX
name: zlib_len_guard_example
language: c
toolchain:
  cc: clang
  cflags: "-O2 -fno-omit-frame-pointer -g0"
  ldflags: ""
build:
  script: "./build.sh"
  artifacts:
    vulnerable: "build/vuln"
    fixed: "build/fixed"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  # If you have a Dockerized scanner, you could do:
  # scanner_cli: "docker run --rm -v $PWD:/work stellaops/scanner:dev"
  args:
    - "--format=json"
    - "--analyzers=native"
  timeout_seconds: 120

expectations:
  functions:
    must_exist_in_fixed:
      - name: "validate_len"
    must_not_exist_in_vuln:
      - name: "validate_len"
  calls:
    must_add:
      - caller: "foo_parse"
        callee: "validate_len"
    must_remove:
      - caller: "foo_parse"
        callee: "memcpy"
  tolerances:
    allow_unresolved_symbols: 0
    allow_extra_functions: 5
    allow_missing_calls: 0
  1. Connect Scanner output to the oracle

Assume your Scanner CLI produces something like:

{
  "binary": "build/fixed",
  "functions": [
    { "name": "foo_parse", "address": "0x401000" },
    { "name": "validate_len", "address": "0x401080" },
    ...
  ],
  "calls": [
    { "caller": "foo_parse", "callee": "validate_len" },
    { "caller": "validate_len", "callee": "memcpy" }
  ]
}

Your oracle-runner will:

  • Run scanner on vulnvuln.json
  • Run scanner on fixedfixed.json
  • Compare each expectation in oracle.yml against vuln.json and fixed.json

Pseudo-logic for a function expectation:

bool HasFunction(JsonElement doc, string name) =>
    doc.GetProperty("functions")
       .EnumerateArray()
       .Any(f => f.GetProperty("name").GetString() == name);

bool HasCall(JsonElement doc, string caller, string callee) =>
    doc.GetProperty("calls")
       .EnumerateArray()
       .Any(c =>
            c.GetProperty("caller").GetString() == caller &&
            c.GetProperty("callee").GetString() == callee);

The runner will produce a small report, per oracle:

[PASS] CVE-YYYY-XXXX zlib_len_guard_example
  + validate_len appears only in fixed → OK
  + foo_parse → validate_len call added → OK
  + foo_parse → memcpy call removed → OK

If anything fails, it prints the mismatches and exits with non-zero code so CI fails.


3. Implementing the oracle runner (practical variant)

You can implement this either as:

  • A standalone C# console (StellaOps.Scanner.PatchOracleRunner), or
  • A set of xUnit tests that read oracle.yml and run dynamically.

3.1. Console runner skeleton (C#)

High-level structure:

src/tools/scanner-oracle-runner/
  Program.cs
  Oracles/
    (symlink or reference to src/StellaOps.Scanner.Tests/PatchOracles)

Core responsibilities:

  1. Discover all oracle.yml files under PatchOracles/.

  2. For each:

    • Run the build script.
    • Run the scanner on both artifacts.
    • Evaluate expectations.
  3. Aggregate results and exit with appropriate status.

Pseudo-code outline:

static int Main(string[] args)
{
    var root = args.Length > 0 ? args[0] : "src/StellaOps.Scanner.Tests/PatchOracles";
    var oracleFiles = Directory.GetFiles(root, "oracle.yml", SearchOption.AllDirectories);
    var failures = new List<string>();

    foreach (var oracleFile in oracleFiles)
    {
        var result = RunOracle(oracleFile);
        if (!result.Success)
        {
            failures.Add($"{result.Name}: {result.FailureReason}");
        }
    }

    if (failures.Any())
    {
        Console.Error.WriteLine("Patch oracle failures:");
        foreach (var f in failures) Console.Error.WriteLine("  - " + f);
        return 1;
    }

    Console.WriteLine("All patch oracles passed.");
    return 0;
}

RunOracle does:

  • Deserialize YAML (e.g., via YamlDotNet).
  • Process.Start for build.script.
  • Process.Start for scanner_cli twice (vuln/fixed).
  • Read/parse JSON outputs.
  • Run checks functions.must_* and calls.must_*.

This is straightforward plumbing code; once built, adding a new patch oracle is just adding a folder + oracle.yml.


4. Managed (.NET / JVM) patch oracles

Exact same concept, slightly different mechanics.

4.1. .NET example

Directory:

PatchOracles/
  dotnet/
    CVE-2021-XXXXX-systemtextjson/
      src/
        vuln/
          Example.sln
          Api/...
        fixed/
          Example.sln
          Api/...
      build.ps1
      oracle.yml

build.ps1 (PowerShell, simplified):

param(
  [string]$Configuration = "Release"
)

$ErrorActionPreference = "Stop"

function Build-One([string]$name) {
  Push-Location "src/$name"
  dotnet clean
  dotnet publish -c $Configuration -p:DebugType=None -p:DebugSymbols=false -o ../../build/$name
  Pop-Location
}

New-Item -ItemType Directory -Force -Path "build" | Out-Null

Build-One "vuln"
Build-One "fixed"

oracle.yml:

cve: CVE-2021-XXXXX
name: systemtextjson_escape_fix
language: dotnet
build:
  script: "pwsh ./build.ps1"
  artifacts:
    vulnerable: "build/vuln/Api.dll"
    fixed:      "build/fixed/Api.dll"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  args:
    - "--format=json"
    - "--analyzers=dotnet"
  timeout_seconds: 120

expectations:
  methods:
    must_exist_in_fixed:
      - "Api.JsonHelper::EscapeString"
    must_not_exist_in_vuln:
      - "Api.JsonHelper::EscapeString"
  calls:
    must_add:
      - caller: "Api.Controller::Handle"
        callee: "Api.JsonHelper::EscapeString"
  tolerances:
    allow_missing_calls: 0
    allow_extra_methods: 10

Scanners .NET analyzer should produce method identifiers in a stable format (e.g., Namespace.Type::Method(Signature)), which you then use in the oracle.

4.2. JVM example

Similar structure, but artifacts are JARs:

build:
  script: "./gradlew :app:assemble"
  artifacts:
    vulnerable: "app-vuln.jar"
    fixed: "app-fixed.jar"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  args:
    - "--format=json"
    - "--analyzers=jvm"

Expectations then refer to methods like com.example.JsonHelper.escapeString:(Ljava/lang/String;)Ljava/lang/String;.


5. Wiring into CI

You can integrate this in your existing pipeline (GitLab Runner / Gitea / etc.) as a separate job.

Example CI job skeleton (GitLab-like YAML for illustration):

patch-oracle-tests:
  stage: test
  image: mcr.microsoft.com/dotnet/sdk:10.0
  script:
    - dotnet build src/StellaOps.Scanner/StellaOps.Scanner.csproj -c Release
    - dotnet build src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -c Release
    - dotnet run --project src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -- \
        src/StellaOps.Scanner.Tests/PatchOracles
  artifacts:
    when: on_failure
    paths:
      - src/StellaOps.Scanner.Tests/PatchOracles/**/build
      - oracle-results.log

You can also:

  • Tag the job (e.g., oracle or reachability) so you can run it nightly or on changes to Scanner analyzers.
  • Pin Docker images with the exact C/C++/Java toolchains used by patch oracles so results are deterministic.

6. Practical guidelines and pitfalls

Here are concrete rules of thumb for making this robust:

6.1. Choosing good CVE oracles

Prefer cases where:

  • The patch clearly adds/removes a function or method, or introduces a separate helper such as validate_len, check_bounds, etc.
  • The patch adds/removes a call that is easy to see even under optimization (e.g., non-inline, non-template).
  • The project is easy to build and not heavily reliant on obscure toolchains.

For each supported language in Scanner, target:

  • 35 small C or C++ oracles.
  • 35 .NET or JVM oracles.
  • 13 for Go and Rust once those analyzers exist.

You do not need many; you want sharp, surgical tests, not coverage.

6.2. Handle inlining and optimization

Compilers may inline small functions; this can break naive “must have call edge” expectations.

Mitigations:

  • Choose functions that are “large enough” or mark them __attribute__((noinline)) (GCC/Clang) in your test harness code if necessary.
  • Alternatively, relax expectations using should_add vs must_add for some edges:
calls:
  must_add: []
  should_add:
    - caller: "foo_parse"
      callee: "validate_len"

In the runner, should_add failures can mark the oracle as “degraded” but not fatal, while must_* failures break the build.

6.3. Keep oracles stable over time

To avoid flakiness:

  • Vendor sources into the repo (or at least snapshot the patch) so upstream changes do not affect builds.
  • Pin toolchain versions in Docker images for CI.
  • Capture and pin scanner configuration: analyzers enabled, rules, version. If you support “deterministic scan manifests” later, these oracles are perfect consumers of that.

6.4. What to assert beyond functions/calls

When your Scanner gets more advanced, you can extend oracle.yml:

cfg:
  must_increase_blocks:
    - function: "foo_parse"
  must_add_branch_on:
    - function: "foo_parse"
      operand_pattern: "len <= MAX_LEN"

Initially, I would keep it to:

  • Function presence/absence
  • Call edges presence/absence

and add CFG assertions only when your analyzers and JSON model for CFG stabilize.

6.5. How to use failures

When a patch oracle fails, it is a signal that either:

  • A change in Scanner or a new optimization pattern created a blind spot, or
  • The oracle is too strict (e.g., relying on a call that got inlined).

You then:

  1. Inspect the disassembly / Scanner JSON for vuln and fixed.
  2. Decide if Scanner is wrong (fix analyzer) or oracle is too rigid (relax to should_*).
  3. Commit both the code change and updated oracle (if needed) in the same merge request.

7. Minimal checklist for adding a new patch oracle

For your future self and your agents, here is a compressed checklist:

  1. Select CVE + patch; copy minimal affected sources into src/…/<lang>/<CVE>/src/{vuln,fixed}.

  2. Add a tiny harness that calls the patched code path.

  3. Write build.sh / build.ps1 to produce build/vuln and build/fixed artifacts, stripped/Release.

  4. Run manual scanner on both artifacts once; inspect JSON to find real symbol names and call edges.

  5. Create oracle.yml with:

    • build.script and artifacts.* paths
    • scan.scanner_cli + args
    • expectations.functions.* and expectations.calls.*
  6. Run scanner-oracle-runner locally; fix any mismatches or over-strict expectations.

  7. Commit and ensure CI job patch-oracle-tests runs and must pass on MR.

If you wish, next step we can design the actual JSON schema that Scanner should emit for function/call graphs and write a first C# implementation of scanner-oracle-runner aligned with that schema.