Here’s a simple, cheap way to sanity‑check your vuln function recovery without fancy ground truth: **build “patch oracles.”**

---

### What it is (in plain words)

Take a known CVE and compile two **tiny** binaries from the same source:

* **Vulnerable** commit/revision
* **Fixed** commit/revision
  Then diff the discovered functions + call edges between the two. If your analyzer can’t see the symbol (or guard) the patch adds/removes/tightens, your recall is suspect.

---

### Why it works

Patches for real CVEs usually:

* add/remove a **function** (e.g., `validate_len`)
* change a **call site** (new guard before `memcpy`)
* tweak **control flow** (early return on bounds check)

Those are precisely the things your function recovery / call‑graph pass should surface—even on stripped ELFs. If they don’t move in your graph, you’ve got blind spots.

---

### Minimal workflow (5 steps)

1. **Pick a CVE** with a clean, public fix (e.g., OpenSSL/zlib/busybox).
2. **Isolate the patch** (git range or cherry‑pick) and craft a *tiny harness* that calls the affected code path.
3. **Build both** with the same toolchain/flags; produce **stripped** ELFs (`-s`) to mimic production.
4. **Run your discovery** on both:

   * function list, demangled where possible
   * call edges (A→B), basic blocks (optional)
5. **Diff the graphs**: look for the new guard function, removed unsafe call, or altered edge count.

---

### A tiny “oracle spec” (drop-in YAML for your test runner)

```yaml
cve: CVE-YYYY-XXXX
target: libfoo 1.2.3
build:
  cc: clang
  cflags: [-O2, -fno-omit-frame-pointer]
  ldflags: []
  strip: true
evidence:
  expect_functions_added:   [validate_len]
  expect_functions_removed: [unsafe_copy]     # optional
  expect_call_added:
    - caller: foo_parse
      callee: validate_len
  expect_call_removed:
    - caller: foo_parse
      callee: memcpy
tolerances:
  allow_unresolved_symbols: 0
  allow_extra_funcs: 2
```

---

### Quick harness pattern (C)

```c
// before: foo_parse -> memcpy(buf, src, len);
// after : foo_parse -> validate_len(len) -> memcpy(...)
extern int foo_parse(const char*);

int main(int argc, char** argv) {
  const char* in = argc > 1 ? argv[1] : "AAAA";
  return foo_parse(in);
}
```

---

### What to flag as a failure

* Expected **function not discovered** (e.g., `validate_len` missing).
* Expected **edge not present** (`foo_parse → validate_len` absent).
* **No CFG change** where patch clearly adds a guard/early return.

---

### Where this plugs into Stella Ops

* Put these oracles under `Scanner/tests/patch-oracles/*` per language.
* Run them in CI for **.NET/JVM/C/C++/Go/Rust** analyzers.
* Use them to gate any changes to symbolization, demangling, or call‑graph building.
* Record per‑analyzer **recall deltas** when you tweak heuristics or switch disassemblers.

---

If you want, I can scaffold the first three oracles (e.g., zlib overflow fix, OpenSSL length check, BusyBox `ash` patch) with ready‑to‑run Makefiles and expected graph diffs.
Understood — let us turn the “patch oracle” idea into something you can actually drop into the Stella Ops repo and CI.

I will walk through:

1. How to structure this inside the monorepo
2. How to build one oracle end-to-end (C/C++ example)
3. How to do the same for .NET/JVM
4. How to automate running and asserting them
5. Practical rules and pitfalls so these stay stable and useful

---

## 1. Where this lives in Stella Ops

A simple, language-agnostic layout that will scale:

```text
src/
  StellaOps.Scanner/
    ...                               # your scanner code
  StellaOps.Scanner.Tests/            # existing tests (if any)
    PatchOracles/
      c/
        CVE-YYYY-XXXX-<short-name>/
          src/
          build.sh
          oracle.yml
          README.md
      cpp/
        ...
      dotnet/
        CVE-YYYY-XXXX-<short-name>/
          src/
          build.ps1
          oracle.yml
          README.md
      jvm/
        ...
      go/
        ...
      rust/
        ...
  tools/
    scanner-oracle-runner/            # tiny runner (C# console or bash)
```

Key principles:

* Each CVE/test case is **self-contained** (its own folder with sources, build script, oracle.yml).
* Build scripts produce **two binaries/artifacts**: `vuln` and `fixed`.
* `oracle.yml` describes: how to build, what to scan, and what differences to expect in Scanner’s call graph/function list.

---

## 2. How to build a single patch oracle (C/C++)

Think of a patch oracle as: “Given these two binaries, Scanner must see specific changes in functions and call edges.”

### 2.1. Step-by-step workflow

For one C/C++ CVE:

1. **Pick & freeze the patch**

   * Choose a small, clean CVE in a library with easily buildable code (zlib, OpenSSL, BusyBox, etc.).
   * Identify commit `A` (vulnerable) and commit `B` (fixed).
   * Extract only the minimal sources needed to build the affected function + a harness into `src/`.

2. **Create a minimal harness**

Example: patch adds `validate_len` and guards a `memcpy` in `foo_parse`.

```c
// src/main.c
#include <stdio.h>

int foo_parse(const char* in);  // from the library code under test

int main(int argc, char** argv) {
    const char* in = (argc > 1) ? argv[1] : "AAAA";
    return foo_parse(in);
}
```

Under `src/`, you keep two sets of sources:

```text
src/
  vuln/
    foo.c        # vulnerable version
    api.h
    main.c
  fixed/
    foo.c        # fixed version (adds validate_len, changes calls)
    api.h
    main.c
```

3. **Provide a deterministic build script**

Example `build.sh`:

```bash
#!/usr/bin/env bash
set -euo pipefail

CC="${CC:-clang}"
CFLAGS="${CFLAGS:- -O2 -fno-omit-frame-pointer -g0}"
LDFLAGS="${LDFLAGS:- }"

build_one() {
  local name="$1"   # vuln or fixed
  mkdir -p build
  ${CC} ${CFLAGS} src/${name}/*.c ${LDFLAGS} -o build/${name}
  # Strip symbols to simulate production
  strip build/${name}
}

build_one "vuln"
build_one "fixed"
```

Guidelines:

* Fix the toolchain: either run this inside a Docker image (e.g., `debian:bookworm` with specific `clang` version) or at least document required versions in `README.md`.
* Always build both artifacts with **identical flags**; the only difference should be the code change.
* Use `strip` to ensure Scanner doesn’t accidentally rely on debug symbols.

4. **Define the oracle (what must change)**

You define expectations based on the patch:

* Functions added/removed/renamed.
* New call edges (e.g., `foo_parse -> validate_len`).
* Removed call edges (e.g., `foo_parse -> memcpy`).
* Optionally: new basic blocks, conditional branches, or early returns.

A practical `oracle.yml` for this case:

```yaml
cve: CVE-YYYY-XXXX
name: zlib_len_guard_example
language: c
toolchain:
  cc: clang
  cflags: "-O2 -fno-omit-frame-pointer -g0"
  ldflags: ""
build:
  script: "./build.sh"
  artifacts:
    vulnerable: "build/vuln"
    fixed: "build/fixed"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  # If you have a Dockerized scanner, you could do:
  # scanner_cli: "docker run --rm -v $PWD:/work stellaops/scanner:dev"
  args:
    - "--format=json"
    - "--analyzers=native"
  timeout_seconds: 120

expectations:
  functions:
    must_exist_in_fixed:
      - name: "validate_len"
    must_not_exist_in_vuln:
      - name: "validate_len"
  calls:
    must_add:
      - caller: "foo_parse"
        callee: "validate_len"
    must_remove:
      - caller: "foo_parse"
        callee: "memcpy"
  tolerances:
    allow_unresolved_symbols: 0
    allow_extra_functions: 5
    allow_missing_calls: 0
```

5. **Connect Scanner output to the oracle**

Assume your Scanner CLI produces something like:

```json
{
  "binary": "build/fixed",
  "functions": [
    { "name": "foo_parse", "address": "0x401000" },
    { "name": "validate_len", "address": "0x401080" },
    ...
  ],
  "calls": [
    { "caller": "foo_parse", "callee": "validate_len" },
    { "caller": "validate_len", "callee": "memcpy" }
  ]
}
```

Your oracle-runner will:

* Run scanner on `vuln` → `vuln.json`
* Run scanner on `fixed` → `fixed.json`
* Compare each expectation in `oracle.yml` against `vuln.json` and `fixed.json`

Pseudo-logic for a function expectation:

```csharp
bool HasFunction(JsonElement doc, string name) =>
    doc.GetProperty("functions")
       .EnumerateArray()
       .Any(f => f.GetProperty("name").GetString() == name);

bool HasCall(JsonElement doc, string caller, string callee) =>
    doc.GetProperty("calls")
       .EnumerateArray()
       .Any(c =>
            c.GetProperty("caller").GetString() == caller &&
            c.GetProperty("callee").GetString() == callee);
```

The runner will produce a small report, per oracle:

```text
[PASS] CVE-YYYY-XXXX zlib_len_guard_example
  + validate_len appears only in fixed → OK
  + foo_parse → validate_len call added → OK
  + foo_parse → memcpy call removed → OK
```

If anything fails, it prints the mismatches and exits with non-zero code so CI fails.

---

## 3. Implementing the oracle runner (practical variant)

You can implement this either as:

* A standalone C# console (`StellaOps.Scanner.PatchOracleRunner`), or
* A set of xUnit tests that read `oracle.yml` and run dynamically.

### 3.1. Console runner skeleton (C#)

High-level structure:

```text
src/tools/scanner-oracle-runner/
  Program.cs
  Oracles/
    (symlink or reference to src/StellaOps.Scanner.Tests/PatchOracles)
```

Core responsibilities:

1. Discover all `oracle.yml` files under `PatchOracles/`.
2. For each:

   * Run the `build` script.
   * Run the scanner on both artifacts.
   * Evaluate expectations.
3. Aggregate results and exit with appropriate status.

Pseudo-code outline:

```csharp
static int Main(string[] args)
{
    var root = args.Length > 0 ? args[0] : "src/StellaOps.Scanner.Tests/PatchOracles";
    var oracleFiles = Directory.GetFiles(root, "oracle.yml", SearchOption.AllDirectories);
    var failures = new List<string>();

    foreach (var oracleFile in oracleFiles)
    {
        var result = RunOracle(oracleFile);
        if (!result.Success)
        {
            failures.Add($"{result.Name}: {result.FailureReason}");
        }
    }

    if (failures.Any())
    {
        Console.Error.WriteLine("Patch oracle failures:");
        foreach (var f in failures) Console.Error.WriteLine("  - " + f);
        return 1;
    }

    Console.WriteLine("All patch oracles passed.");
    return 0;
}
```

`RunOracle` does:

* Deserialize YAML (e.g., via `YamlDotNet`).
* `Process.Start` for `build.script`.
* `Process.Start` for `scanner_cli` twice (vuln/fixed).
* Read/parse JSON outputs.
* Run checks `functions.must_*` and `calls.must_*`.

This is straightforward plumbing code; once built, adding a new patch oracle is just adding a folder + `oracle.yml`.

---

## 4. Managed (.NET / JVM) patch oracles

Exact same concept, slightly different mechanics.

### 4.1. .NET example

Directory:

```text
PatchOracles/
  dotnet/
    CVE-2021-XXXXX-systemtextjson/
      src/
        vuln/
          Example.sln
          Api/...
        fixed/
          Example.sln
          Api/...
      build.ps1
      oracle.yml
```

`build.ps1` (PowerShell, simplified):

```powershell
param(
  [string]$Configuration = "Release"
)

$ErrorActionPreference = "Stop"

function Build-One([string]$name) {
  Push-Location "src/$name"
  dotnet clean
  dotnet publish -c $Configuration -p:DebugType=None -p:DebugSymbols=false -o ../../build/$name
  Pop-Location
}

New-Item -ItemType Directory -Force -Path "build" | Out-Null

Build-One "vuln"
Build-One "fixed"
```

`oracle.yml`:

```yaml
cve: CVE-2021-XXXXX
name: systemtextjson_escape_fix
language: dotnet
build:
  script: "pwsh ./build.ps1"
  artifacts:
    vulnerable: "build/vuln/Api.dll"
    fixed:      "build/fixed/Api.dll"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  args:
    - "--format=json"
    - "--analyzers=dotnet"
  timeout_seconds: 120

expectations:
  methods:
    must_exist_in_fixed:
      - "Api.JsonHelper::EscapeString"
    must_not_exist_in_vuln:
      - "Api.JsonHelper::EscapeString"
  calls:
    must_add:
      - caller: "Api.Controller::Handle"
        callee: "Api.JsonHelper::EscapeString"
  tolerances:
    allow_missing_calls: 0
    allow_extra_methods: 10
```

Scanner’s .NET analyzer should produce method identifiers in a stable format (e.g., `Namespace.Type::Method(Signature)`), which you then use in the oracle.

### 4.2. JVM example

Similar structure, but artifacts are JARs:

```yaml
build:
  script: "./gradlew :app:assemble"
  artifacts:
    vulnerable: "app-vuln.jar"
    fixed: "app-fixed.jar"

scan:
  scanner_cli: "dotnet run --project ../../StellaOps.Scanner.Cli"
  args:
    - "--format=json"
    - "--analyzers=jvm"
```

Expectations then refer to methods like `com.example.JsonHelper.escapeString:(Ljava/lang/String;)Ljava/lang/String;`.

---

## 5. Wiring into CI

You can integrate this in your existing pipeline (GitLab Runner / Gitea / etc.) as a separate job.

Example CI job skeleton (GitLab-like YAML for illustration):

```yaml
patch-oracle-tests:
  stage: test
  image: mcr.microsoft.com/dotnet/sdk:10.0
  script:
    - dotnet build src/StellaOps.Scanner/StellaOps.Scanner.csproj -c Release
    - dotnet build src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -c Release
    - dotnet run --project src/tools/scanner-oracle-runner/scanner-oracle-runner.csproj -- \
        src/StellaOps.Scanner.Tests/PatchOracles
  artifacts:
    when: on_failure
    paths:
      - src/StellaOps.Scanner.Tests/PatchOracles/**/build
      - oracle-results.log
```

You can also:

* Tag the job (e.g., `oracle` or `reachability`) so you can run it nightly or on changes to Scanner analyzers.
* Pin Docker images with the exact C/C++/Java toolchains used by patch oracles so results are deterministic.

---

## 6. Practical guidelines and pitfalls

Here are concrete rules of thumb for making this robust:

### 6.1. Choosing good CVE oracles

Prefer cases where:

* The patch clearly adds/removes a **function** or **method**, or introduces a separate helper such as `validate_len`, `check_bounds`, etc.
* The patch adds/removes a **call** that is easy to see even under optimization (e.g., non-inline, non-template).
* The project is easy to build and not heavily reliant on obscure toolchains.

For each supported language in Scanner, target:

* 3–5 small C or C++ oracles.
* 3–5 .NET or JVM oracles.
* 1–3 for Go and Rust once those analyzers exist.

You do not need many; you want **sharp, surgical tests**, not coverage.

### 6.2. Handle inlining and optimization

Compilers may inline small functions; this can break naive “must have call edge” expectations.

Mitigations:

* Choose functions that are “large enough” or mark them `__attribute__((noinline))` (GCC/Clang) in your test harness code if necessary.
* Alternatively, relax expectations using `should_add` vs `must_add` for some edges:

```yaml
calls:
  must_add: []
  should_add:
    - caller: "foo_parse"
      callee: "validate_len"
```

In the runner, `should_add` failures can mark the oracle as “degraded” but not fatal, while `must_*` failures break the build.

### 6.3. Keep oracles stable over time

To avoid flakiness:

* **Vendor sources** into the repo (or at least snapshot the patch) so upstream changes do not affect builds.
* Pin toolchain versions in Docker images for CI.
* Capture and pin scanner configuration: analyzers enabled, rules, version. If you support “deterministic scan manifests” later, these oracles are perfect consumers of that.

### 6.4. What to assert beyond functions/calls

When your Scanner gets more advanced, you can extend `oracle.yml`:

```yaml
cfg:
  must_increase_blocks:
    - function: "foo_parse"
  must_add_branch_on:
    - function: "foo_parse"
      operand_pattern: "len <= MAX_LEN"
```

Initially, I would keep it to:

* Function presence/absence
* Call edges presence/absence

and add CFG assertions only when your analyzers and JSON model for CFG stabilize.

### 6.5. How to use failures

When a patch oracle fails, it is a **signal** that either:

* A change in Scanner or a new optimization pattern created a blind spot, or
* The oracle is too strict (e.g., relying on a call that got inlined).

You then:

1. Inspect the disassembly / Scanner JSON for `vuln` and `fixed`.
2. Decide if Scanner is wrong (fix analyzer) or oracle is too rigid (relax to `should_*`).
3. Commit both the code change and updated oracle (if needed) in the same merge request.

---

## 7. Minimal checklist for adding a new patch oracle

For your future self and your agents, here is a compressed checklist:

1. Select CVE + patch; copy minimal affected sources into `src/…/<lang>/<CVE>/src/{vuln,fixed}`.
2. Add a tiny harness that calls the patched code path.
3. Write `build.sh` / `build.ps1` to produce `build/vuln` and `build/fixed` artifacts, stripped/Release.
4. Run manual `scanner` on both artifacts once; inspect JSON to find real symbol names and call edges.
5. Create `oracle.yml` with:

   * `build.script` and `artifacts.*` paths
   * `scan.scanner_cli` + args
   * `expectations.functions.*` and `expectations.calls.*`
6. Run `scanner-oracle-runner` locally; fix any mismatches or over-strict expectations.
7. Commit and ensure CI job `patch-oracle-tests` runs and must pass on MR.

If you wish, next step we can design the actual JSON schema that Scanner should emit for function/call graphs and write a first C# implementation of `scanner-oracle-runner` aligned with that schema.