Here’s a simple, cheap way to sanity‑check your vuln function recovery without fancy ground truth: **build “patch oracles.”** --- ### What it is (in plain words) Take a known CVE and compile two **tiny** binaries from the same source: * **Vulnerable** commit/revision * **Fixed** commit/revision Then diff the discovered functions + call edges between the two. If your analyzer can’t see the symbol (or guard) the patch adds/removes/tightens, your recall is suspect. --- ### Why it works Patches for real CVEs usually: * add/remove a **function** (e.g., `validate_len`) * change a **call site** (new guard before `memcpy`) * tweak **control flow** (early return on bounds check) Those are precisely the things your function recovery / call‑graph pass should surface—even on stripped ELFs. If they don’t move in your graph, you’ve got blind spots. --- ### Minimal workflow (5 steps) 1. **Pick a CVE** with a clean, public fix (e.g., OpenSSL/zlib/busybox). 2. **Isolate the patch** (git range or cherry‑pick) and craft a *tiny harness* that calls the affected code path. 3. **Build both** with the same toolchain/flags; produce **stripped** ELFs (`-s`) to mimic production. 4. **Run your discovery** on both: * function list, demangled where possible * call edges (A→B), basic blocks (optional) 5. **Diff the graphs**: look for the new guard function, removed unsafe call, or altered edge count. --- ### A tiny “oracle spec” (drop-in YAML for your test runner) ```yaml cve: CVE-YYYY-XXXX target: libfoo 1.2.3 build: cc: clang cflags: [-O2, -fno-omit-frame-pointer] ldflags: [] strip: true evidence: expect_functions_added: [validate_len] expect_functions_removed: [unsafe_copy] # optional expect_call_added: - caller: foo_parse callee: validate_len expect_call_removed: - caller: foo_parse callee: memcpy tolerances: allow_unresolved_symbols: 0 allow_extra_funcs: 2 ``` --- ### Quick harness pattern (C) ```c // before: foo_parse -> memcpy(buf, src, len); // after : foo_parse -> validate_len(len) -> memcpy(...) extern int foo_parse(const char*); int main(int argc, char** argv) { const char* in = argc > 1 ? argv[1] : "AAAA"; return foo_parse(in); } ``` --- ### What to flag as a failure * Expected **function not discovered** (e.g., `validate_len` missing). * Expected **edge not present** (`foo_parse → validate_len` absent). * **No CFG change** where patch clearly adds a guard/early return. --- ### Where this plugs into Stella Ops * Put these oracles under `Scanner/tests/patch-oracles/*` per language. * Run them in CI for **.NET/JVM/C/C++/Go/Rust** analyzers. * Use them to gate any changes to symbolization, demangling, or call‑graph building. * Record per‑analyzer **recall deltas** when you tweak heuristics or switch disassemblers. --- If you want, I can scaffold the first three oracles (e.g., zlib overflow fix, OpenSSL length check, BusyBox `ash` patch) with ready‑to‑run Makefiles and expected graph diffs. Understood — let us turn the “patch oracle” idea into something you can actually drop into the Stella Ops repo and CI. I will walk through: 1. How to structure this inside the monorepo 2. How to build one oracle end-to-end (C/C++ example) 3. How to do the same for .NET/JVM 4. How to automate running and asserting them 5. Practical rules and pitfalls so these stay stable and useful --- ## 1. Where this lives in Stella Ops A simple, language-agnostic layout that will scale: ```text src/ StellaOps.Scanner/ ... # your scanner code StellaOps.Scanner.Tests/ # existing tests (if any) PatchOracles/ c/ CVE-YYYY-XXXX-/ src/ build.sh oracle.yml README.md cpp/ ... dotnet/ CVE-YYYY-XXXX-/ src/ build.ps1 oracle.yml README.md jvm/ ... go/ ... rust/ ... tools/ scanner-oracle-runner/ # tiny runner (C# console or bash) ``` Key principles: * Each CVE/test case is **self-contained** (its own folder with sources, build script, oracle.yml). * Build scripts produce **two binaries/artifacts**: `vuln` and `fixed`. * `oracle.yml` describes: how to build, what to scan, and what differences to expect in Scanner’s call graph/function list. --- ## 2. How to build a single patch oracle (C/C++) Think of a patch oracle as: “Given these two binaries, Scanner must see specific changes in functions and call edges.” ### 2.1. Step-by-step workflow For one C/C++ CVE: 1. **Pick & freeze the patch** * Choose a small, clean CVE in a library with easily buildable code (zlib, OpenSSL, BusyBox, etc.). * Identify commit `A` (vulnerable) and commit `B` (fixed). * Extract only the minimal sources needed to build the affected function + a harness into `src/`. 2. **Create a minimal harness** Example: patch adds `validate_len` and guards a `memcpy` in `foo_parse`. ```c // src/main.c #include int foo_parse(const char* in); // from the library code under test int main(int argc, char** argv) { const char* in = (argc > 1) ? argv[1] : "AAAA"; return foo_parse(in); } ``` Under `src/`, you keep two sets of sources: ```text src/ vuln/ foo.c # vulnerable version api.h main.c fixed/ foo.c # fixed version (adds validate_len, changes calls) api.h main.c ``` 3. **Provide a deterministic build script** Example `build.sh`: ```bash #!/usr/bin/env bash set -euo pipefail CC="${CC:-clang}" CFLAGS="${CFLAGS:- -O2 -fno-omit-frame-pointer -g0}" LDFLAGS="${LDFLAGS:- }" build_one() { local name="$1" # vuln or fixed mkdir -p build ${CC} ${CFLAGS} src/${name}/*.c ${LDFLAGS} -o build/${name} # Strip symbols to simulate production strip build/${name} } build_one "vuln" build_one "fixed" ``` Guidelines: * Fix the toolchain: either run this inside a Docker image (e.g., `debian:bookworm` with specific `clang` version) or at least document required versions in `README.md`. * Always build both artifacts with **identical flags**; the only difference should be the code change. * Use `strip` to ensure Scanner doesn’t accidentally rely on debug symbols. 4. **Define the oracle (what must change)** You define expectations based on the patch: * Functions added/removed/renamed. * New call edges (e.g., `foo_parse -> validate_len`). * Removed call edges (e.g., `foo_parse -> memcpy`). * Optionally: new basic blocks, conditional branches, or early returns. A practical `oracle.yml` for this case: ```yaml cve: CVE-YYYY-XXXX name: zlib_len_guard_example language: c toolchain: cc: clang cflags: "-O2 -fno-omit-frame-pointer -g0" ldflags: "" build: script: "./build.sh" artifacts: vulnerable: "build/vuln" fixed: "build/fixed" scan: scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli" # If you have a Dockerized scanner, you could do: # scanner_cli: "docker run --rm -v $PWD:/work stellaops/scanner:dev" args: - "--format=json" - "--analyzers=native" timeout_seconds: 120 expectations: functions: must_exist_in_fixed: - name: "validate_len" must_not_exist_in_vuln: - name: "validate_len" calls: must_add: - caller: "foo_parse" callee: "validate_len" must_remove: - caller: "foo_parse" callee: "memcpy" tolerances: allow_unresolved_symbols: 0 allow_extra_functions: 5 allow_missing_calls: 0 ``` 5. **Connect Scanner output to the oracle** Assume your Scanner CLI produces something like: ```json { "binary": "build/fixed", "functions": [ { "name": "foo_parse", "address": "0x401000" }, { "name": "validate_len", "address": "0x401080" }, ... ], "calls": [ { "caller": "foo_parse", "callee": "validate_len" }, { "caller": "validate_len", "callee": "memcpy" } ] } ``` Your oracle-runner will: * Run scanner on `vuln` → `vuln.json` * Run scanner on `fixed` → `fixed.json` * Compare each expectation in `oracle.yml` against `vuln.json` and `fixed.json` Pseudo-logic for a function expectation: ```csharp bool HasFunction(JsonElement doc, string name) => doc.GetProperty("functions") .EnumerateArray() .Any(f => f.GetProperty("name").GetString() == name); bool HasCall(JsonElement doc, string caller, string callee) => doc.GetProperty("calls") .EnumerateArray() .Any(c => c.GetProperty("caller").GetString() == caller && c.GetProperty("callee").GetString() == callee); ``` The runner will produce a small report, per oracle: ```text [PASS] CVE-YYYY-XXXX zlib_len_guard_example + validate_len appears only in fixed → OK + foo_parse → validate_len call added → OK + foo_parse → memcpy call removed → OK ``` If anything fails, it prints the mismatches and exits with non-zero code so CI fails. --- ## 3. Implementing the oracle runner (practical variant) You can implement this either as: * A standalone C# console (`StellaOps.Scanner.PatchOracleRunner`), or * A set of xUnit tests that read `oracle.yml` and run dynamically. ### 3.1. Console runner skeleton (C#) High-level structure: ```text src/tools/scanner-oracle-runner/ Program.cs Oracles/ (symlink or reference to src/StellaOps.Scanner.Tests/PatchOracles) ``` Core responsibilities: 1. Discover all `oracle.yml` files under `PatchOracles/`. 2. For each: * Run the `build` script. * Run the scanner on both artifacts. * Evaluate expectations. 3. Aggregate results and exit with appropriate status. Pseudo-code outline: ```csharp static int Main(string[] args) { var root = args.Length > 0 ? args[0] : "src/StellaOps.Scanner.Tests/PatchOracles"; var oracleFiles = Directory.GetFiles(root, "oracle.yml", SearchOption.AllDirectories); var failures = new List(); foreach (var oracleFile in oracleFiles) { var result = RunOracle(oracleFile); if (!result.Success) { failures.Add($"{result.Name}: {result.FailureReason}"); } } if (failures.Any()) { Console.Error.WriteLine("Patch oracle failures:"); foreach (var f in failures) Console.Error.WriteLine(" - " + f); return 1; } Console.WriteLine("All patch oracles passed."); return 0; } ``` `RunOracle` does: * Deserialize YAML (e.g., via `YamlDotNet`). * `Process.Start` for `build.script`. * `Process.Start` for `scanner_cli` twice (vuln/fixed). * Read/parse JSON outputs. * Run checks `functions.must_*` and `calls.must_*`. This is straightforward plumbing code; once built, adding a new patch oracle is just adding a folder + `oracle.yml`. --- ## 4. Managed (.NET / JVM) patch oracles Exact same concept, slightly different mechanics. ### 4.1. .NET example Directory: ```text PatchOracles/ dotnet/ CVE-2021-XXXXX-systemtextjson/ src/ vuln/ Example.sln Api/... fixed/ Example.sln Api/... build.ps1 oracle.yml ``` `build.ps1` (PowerShell, simplified): ```powershell param( [string]$Configuration = "Release" ) $ErrorActionPreference = "Stop" function Build-One([string]$name) { Push-Location "src/$name" dotnet clean dotnet publish -c $Configuration -p:DebugType=None -p:DebugSymbols=false -o ../../build/$name Pop-Location } New-Item -ItemType Directory -Force -Path "build" | Out-Null Build-One "vuln" Build-One "fixed" ``` `oracle.yml`: ```yaml cve: CVE-2021-XXXXX name: systemtextjson_escape_fix language: dotnet build: script: "pwsh ./build.ps1" artifacts: vulnerable: "build/vuln/Api.dll" fixed: "build/fixed/Api.dll" scan: scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli" args: - "--format=json" - "--analyzers=dotnet" timeout_seconds: 120 expectations: methods: must_exist_in_fixed: - "Api.JsonHelper::EscapeString" must_not_exist_in_vuln: - "Api.JsonHelper::EscapeString" calls: must_add: - caller: "Api.Controller::Handle" callee: "Api.JsonHelper::EscapeString" tolerances: allow_missing_calls: 0 allow_extra_methods: 10 ``` Scanner’s .NET analyzer should produce method identifiers in a stable format (e.g., `Namespace.Type::Method(Signature)`), which you then use in the oracle. ### 4.2. JVM example Similar structure, but artifacts are JARs: ```yaml build: script: "./gradlew :app:assemble" artifacts: vulnerable: "app-vuln.jar" fixed: "app-fixed.jar" scan: scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli" args: - "--format=json" - "--analyzers=jvm" ``` Expectations then refer to methods like `com.example.JsonHelper.escapeString:(Ljava/lang/String;)Ljava/lang/String;`. --- ## 5. Wiring into CI You can integrate this in your existing pipeline (GitLab Runner / Gitea / etc.) as a separate job. Example CI job skeleton (GitLab-like YAML for illustration): ```yaml patch-oracle-tests: stage: test image: mcr.microsoft.com/dotnet/sdk:10.0 script: - dotnet build src/StellaOps.Scanner/StellaOps.Scanner.csproj -c Release - dotnet build src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -c Release - dotnet run --project src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -- \ src/StellaOps.Scanner.Tests/PatchOracles artifacts: when: on_failure paths: - src/StellaOps.Scanner.Tests/PatchOracles/**/build - oracle-results.log ``` You can also: * Tag the job (e.g., `oracle` or `reachability`) so you can run it nightly or on changes to Scanner analyzers. * Pin Docker images with the exact C/C++/Java toolchains used by patch oracles so results are deterministic. --- ## 6. Practical guidelines and pitfalls Here are concrete rules of thumb for making this robust: ### 6.1. Choosing good CVE oracles Prefer cases where: * The patch clearly adds/removes a **function** or **method**, or introduces a separate helper such as `validate_len`, `check_bounds`, etc. * The patch adds/removes a **call** that is easy to see even under optimization (e.g., non-inline, non-template). * The project is easy to build and not heavily reliant on obscure toolchains. For each supported language in Scanner, target: * 3–5 small C or C++ oracles. * 3–5 .NET or JVM oracles. * 1–3 for Go and Rust once those analyzers exist. You do not need many; you want **sharp, surgical tests**, not coverage. ### 6.2. Handle inlining and optimization Compilers may inline small functions; this can break naive “must have call edge” expectations. Mitigations: * Choose functions that are “large enough” or mark them `__attribute__((noinline))` (GCC/Clang) in your test harness code if necessary. * Alternatively, relax expectations using `should_add` vs `must_add` for some edges: ```yaml calls: must_add: [] should_add: - caller: "foo_parse" callee: "validate_len" ``` In the runner, `should_add` failures can mark the oracle as “degraded” but not fatal, while `must_*` failures break the build. ### 6.3. Keep oracles stable over time To avoid flakiness: * **Vendor sources** into the repo (or at least snapshot the patch) so upstream changes do not affect builds. * Pin toolchain versions in Docker images for CI. * Capture and pin scanner configuration: analyzers enabled, rules, version. If you support “deterministic scan manifests” later, these oracles are perfect consumers of that. ### 6.4. What to assert beyond functions/calls When your Scanner gets more advanced, you can extend `oracle.yml`: ```yaml cfg: must_increase_blocks: - function: "foo_parse" must_add_branch_on: - function: "foo_parse" operand_pattern: "len <= MAX_LEN" ``` Initially, I would keep it to: * Function presence/absence * Call edges presence/absence and add CFG assertions only when your analyzers and JSON model for CFG stabilize. ### 6.5. How to use failures When a patch oracle fails, it is a **signal** that either: * A change in Scanner or a new optimization pattern created a blind spot, or * The oracle is too strict (e.g., relying on a call that got inlined). You then: 1. Inspect the disassembly / Scanner JSON for `vuln` and `fixed`. 2. Decide if Scanner is wrong (fix analyzer) or oracle is too rigid (relax to `should_*`). 3. Commit both the code change and updated oracle (if needed) in the same merge request. --- ## 7. Minimal checklist for adding a new patch oracle For your future self and your agents, here is a compressed checklist: 1. Select CVE + patch; copy minimal affected sources into `src/…///src/{vuln,fixed}`. 2. Add a tiny harness that calls the patched code path. 3. Write `build.sh` / `build.ps1` to produce `build/vuln` and `build/fixed` artifacts, stripped/Release. 4. Run manual `scanner` on both artifacts once; inspect JSON to find real symbol names and call edges. 5. Create `oracle.yml` with: * `build.script` and `artifacts.*` paths * `scan.scanner_cli` + args * `expectations.functions.*` and `expectations.calls.*` 6. Run `scanner-oracle-runner` locally; fix any mismatches or over-strict expectations. 7. Commit and ensure CI job `patch-oracle-tests` runs and must pass on MR. If you wish, next step we can design the actual JSON schema that Scanner should emit for function/call graphs and write a first C# implementation of `scanner-oracle-runner` aligned with that schema.