Files
git.stella-ops.org/docs/product-advisories/30-Nov-2025 - Reachability Benchmark Fixtures Snapshot.md
StellaOps Bot 25254e3831 news advisories
2025-11-30 21:00:38 +02:00

20 KiB
Raw Blame History

I thought you might want a sharper picture of whats out there — and whats credible — for building a compact “golden set” of redistributable fixtures to drop into StellaOps deterministic reachability benchmark. Heres a quick breakdown of the main external benchmark sources that map closely to your list, and what youd get from integrating them.


🔎 Key benchmark suites & what they bring

SV-COMP (C reachability / verification tasks)

  • SV-COMP is the canonical benchmark for software verification tools, providing curated C programs + reachability specs (safety, mem-safety, overflow, termination). (SV-COMP)
  • Tasks are published via tagged git releases (e.g., svcomp25), so you can reproducibly fetch exact sources and property definitions.
  • These tasks give deterministic ground truth about reachability/unreachability.

Why it matters for StellaOps: plug in real-world-like C code + formal specs.

OSS-Fuzz (reproducer corpus for fuzzed bugs)

  • Each OSS-Fuzz issue includes a deterministic reproducer file. (Google GitHub)
  • Replaying the reproducer yields the same crash if the binary & sanitizers are consistent.
  • Public corpora can become golden fixtures.

Why it matters: covers real-world bugs beyond static properties.


Not perfectly covered (Tier-2 options)

  • Juliet / OWASP / Java/Python suites lack a single authoritative, stable distribution.
  • Package snapshots (Debian/Alpine) need manual CVE + harness mapping.
  • Curated container images (Vulhub) require licensing vetting and orchestration.
  • Call-graph corpora (NYXCorpus, SWARM) have no guaranteed stable labels.

Implication: Tier-2 fixtures need frozen versions, harness engineering, and license checks.


Compact “golden set” candidate

# Fixture source Ground truth
1 SV-COMP ReachSafety / MemSafety / NoOverflows C programs + formal reachability specs (no reach_error() calls, no overflows).
2 OSS-Fuzz reproducer corpus (C/C++) Deterministic crash inputs triggering CVEs.
3 OSS-Fuzz seed corpus Known-safe vs bug-triggering inputs for stability comparisons.
4 Additional SV-COMP categories (Heap, Bitvector, Float) Broader token coverage.
5 Placeholder: Debian/Alpine package snapshots Manual metadata + CVEs; build harnesses.
6 Placeholder: Java/Python OWASP/Juliet-inspired fixtures Dynamic languages coverage; custom instrumentation required.
7 Placeholder: Curated vulnerable container images (Vulhub) Real-world deployment exposures.
8 Placeholder: Call-graph corpora (NYXCorpus/SWARM) Dataflow reachability under complex graphs; requires tooling.

Start with SV-COMP + OSS-Fuzz as Tier-1; add Tier-2 once harnesses & snapshots are ready.


⚠️ Watch-outs for Tier-2

  • Corpus maintenance (OSS-Fuzz seeds can change) requires freezing sources.
  • Label ambiguity: container/Java fixtures must have precise success/failure definitions.
  • Licensing/distribution for some corpa/containers must be vetted.
  • Call-graph/dynamic cases demand instrumentation to produce deterministic labels.

🎯 Recommendation

Adopt a “Tier-1 Cadre” (SV-COMP + OSS-Fuzz) for deterministic ground truth. Expand Tier-2 (packages, Java/Python, containers, call graphs) as separate projects with frozen fixtures, metadata (CWE/CVE, PURLs, CycloneDX), and verified harnesses.

Want me to draft an 812 fixture “starter pack” (paths, metadata, PURLs) you can copy into StellaOps? Let me know.

Good, lets design this as a real “v0.1 Golden Set” spec your team can implement, not just a loose starter pack.

Ill give you:

  1. Design goals and constraints
  2. Repository layout for the benchmark
  3. Fixture metadata schema (YAML)
  4. A concrete Core-10 Golden Fixture set (each with purpose & ground truth)
  5. How to wire this into StellaOps (Scanner/Sbomer/Vexer/UnknownsRegistry/Authority)
  6. A short implementation plan for your team

1. Design goals & constraints

Non-negotiables:

  • Deterministic: same input → same graph, same verdicts, same logs; no network, no time, no randomness.

  • Compact: ~10 tiny fixtures, each buildable and runnable in seconds, small SBOMs.

  • Redistributable: avoid licensing traps by:

    • Preferring synthetic code owned by you (MIT/BSD-style).
    • For “realistic” CVEs, use fake local IDs (e.g. FAKE-CVE-2025-0001) and local feeds, not NVD data.
  • Complete chain: every fixture ships with:

    • Source + build recipe
    • Binary/container
    • CycloneDX SBOM
    • Local vulnerability feed entries (OSV-like or your own schema)
    • Reference VEX document (OpenVEX or CycloneDX VEX)
    • Expected graph revision ID + reachability verdicts
  • Coverage of patterns:

    • Safe vs unsafe variants (Not Affected vs Affected)
    • Intra-procedural, inter-procedural, transitive dep
    • OS package vs app-level deps
    • Multi-language (C, .NET, Java, Python)
    • Containerized vs bare-metal

2. Scope of v0: Core-10 Golden Fixtures

Two tiers, but all small:

  • Core-10 (what you ship and depend on for regression):

    • 5 native/C fixtures (classic reachability & library shape)
    • 1 .NET fixture
    • 1 Java fixture
    • 1 Python fixture
    • 2 container fixtures (OS package style)
  • Extended-X (optional later): fuzz-style repros, dynamic imports, concurrency, etc.

Below Ill detail the Core-10 so your team can actually implement them.


3. Repository layout

Recommended layout inside stella-ops mono-repo:

benchmarks/
  reachability/
    golden-v0/
      fixtures/
        C-REACH-UNSAFE-001/
        C-REACH-SAFE-002/
        C-MEM-BOUNDS-003/
        C-INT-OVERFLOW-004/
        C-LIB-TRANSITIVE-005/
        CONTAINER-OSPKG-SAFE-006/
        CONTAINER-OSPKG-UNSAFE-007/
        JAVA-HTTP-UNSAFE-008/
        DOTNET-LIB-PAIR-009/
        PYTHON-IMPORT-UNSAFE-010/
      feeds/
        osv-golden.json          # “FAKE-CVE-*” style vulnerabilities
      authority/
        vex-reference-index.json # mapping fixture → reference VEX + expected verdict
      README.md

Each fixture folder:

fixtures/<ID>/
  src/                 # source code (C, C#, Java, Python, Dockerfile...)
  build/
    Dockerfile         # if containerised
    build.sh           # deterministic build commands
  artifacts/
    binary/            # final exe/jar/dll/image.tar
    sbom.cdx.json      # CycloneDX SBOM (canonical, normalized)
    vex.openvex.json   # reference VEX verdicts
    manifest.fixture.yaml
    graph.reference.json        # canonical normalized graph
    graph.reference.sha256.txt  # hash = "Graph Revision ID"
  docs/
    explanation.md     # human-readable root-cause + reachability explanation

4. Fixture metadata schema (manifest.fixture.yaml)

id: "C-REACH-UNSAFE-001"
name: "Simple reachable error in C"
version: "0.1.0"
language: "c"
domain: "native"
category:
  - "reachability"
  - "control-flow"
ground_truth:
  vulnerable: true
  reachable: true
  cwes: ["CWE-754"]            # Improper Check for Unusual or Exceptional Conditions
  fake_cves: ["FAKE-CVE-2025-0001"]
  verdict:
    property_type: "reach_error_unreachable"
    property_holds: false
    explanation_ref: "docs/explanation.md"

build:
  type: "local"
  environment: "debian:12"
  commands:
    - "gcc -O0 -g -o app main.c"
  outputs:
    binary: "artifacts/binary/app"

run:
  command: "./artifacts/binary/app"
  args: []
  env: {}
  stdin: ""
  expected:
    exit_code: 1
    stdout_contains: ["REACH_ERROR"]
    stderr_contains: []
    # Optional coverage or traces you might add later
    coverage_file: null

sbom:
  path: "artifacts/sbom.cdx.json"
  format: "cyclonedx-1.5"

vex:
  path: "artifacts/vex.openvex.json"
  format: "openvex-0.2"
  statement_ids:
    - "vex-statement-1"

graph:
  reference_path: "artifacts/graph.reference.json"
  revision_id_sha256_path: "artifacts/graph.reference.sha256.txt"

stellaops_tags:
  difficulty: "easy"
  focus:
    - "control-flow"
    - "single-binary"
  used_by:
    - "scanner.webservice"
    - "sbomer"
    - "vexer"
    - "excititor"

Your team can add more, but this is enough to wire the benchmark into the pipeline.


5. Core-10 Golden Fixtures (concrete proposal)

Ill describe each with: purpose, pattern, what the code does, and ground truth.

5.1 C-REACH-UNSAFE-001 basic reachable error

  • Purpose: baseline reachability detection on a tiny C program.

  • Pattern: single main, simple branch, error sink function reach_error().

  • Code shape:

    • main(int argc, char** argv) parses integer x.
    • If x == 42, it calls reach_error(), which prints REACH_ERROR and exits 1.
  • Ground truth:

    • Vulnerable: true.
    • Reachable: true if run with x=42 (you fix input in run.args).
    • Property: “reach_error is unreachable” → false (counterexample exists).
  • Why its valuable:

    • Exercises simple control flow; used as “hello world” of deterministic reachability.

5.2 C-REACH-SAFE-002 safe twin of 001

  • Purpose: same SBOM shape, but no reachable error, to test “Not Affected”.

  • Pattern: identical to 001 but with an added guard.

  • Code shape:

    • For example, check x != 42 or remove path to reach_error().
  • Ground truth:

    • Vulnerable: false (no call to reach_error at all) or treat it as “patched”.
    • Reachable: false.
    • Property “reach_error is unreachable” → true.
  • Why:

    • Used to verify that graph revision is different and VEX becomes “Not Affected” for the same fake CVE (if you model it that way).

5.3 C-MEM-BOUNDS-003 out-of-bounds write

  • Purpose: exercise memory-safety property and CWE mapping.

  • Pattern: fixed-size buffer + unchecked copy.

  • Code shape:

    • char buf[16];
    • Copies argv[1] into buf with strcpy or manual loop without bounds check.
  • Ground truth:

    • Vulnerable: true.
    • Reachable: true on any input with length > 15 (you fix a triggering arg).
    • CWEs: ["CWE-119", "CWE-120"].
  • Expected run:

    • With ASAN or similar, exit non-zero, mention heap/buffer overflow; for determinism, you can standardize to exit code 139 and not rely on sanitizer text in tests.

5.4 C-INT-OVERFLOW-004 integer overflow

  • Purpose: test handling of arithmetic / overflow-related vulnerabilities.

  • Pattern: multiplication or addition with insufficient bounds checking.

  • Code shape:

    • Function size_t alloc_size(size_t n) that does n * 16 without overflow checks, then allocates and writes.
  • Ground truth:

    • Vulnerable: true.
    • Reachable: true with crafted large n.
    • CWEs: ["CWE-190", "CWE-680"].
  • Why:

    • Lets you validate that your vulnerability feed (fake CVE) asserts “affected” on this component, and your reachability engine confirms the path.

5.5 C-LIB-TRANSITIVE-005 vulnerable library, unreachable in app

  • Purpose: test the core SBOM→VEX story: component is vulnerable, but not used.

  • Pattern:

    • libvuln.a with function void do_unsafe(char* input) containing the same OOB bug as 003.
    • app.c links to libvuln but never calls do_unsafe().
  • Code shape:

    • Build static library from libvuln.c.
    • Build app that uses only do_safe() from libvuln.c or that just links but doesnt call anything from the “unsafe” TU.
  • SBOM:

    • SBOM lists component: "pkg:generic/libvuln@1.0.0" with fake_cves: ["FAKE-CVE-2025-0003"].
  • Ground truth:

    • Vulnerable component present in SBOM: yes.
    • Reachable vulnerable function: no.
    • Correct VEX: “Not Affected: vulnerable code not in execution path for this product”.
  • Why:

    • Canonical demonstration of correct VEX semantics on real-world pattern: vulnerable lib, harmless usage.

5.6 CONTAINER-OSPKG-SAFE-006 OS package, unused binary

  • Purpose: simulate vulnerable OS package installed but unused, to test image scanning vs reachability.

  • Pattern:

    • Minimal container (e.g. debian:12-slim or alpine:3.x) with installed package vuln-tool that is never invoked by the entrypoint.
    • Your app is a trivial “hello” binary.
  • SBOM:

    • OS-level components include pkg:generic/vuln-tool@1.0.0.
  • Ground truth:

    • Vulnerable: the package is flagged by local feed.
    • Reachable: false under the specified CMD and test scenario.
    • VEX: “Not Affected vulnerable code present but not invoked in products operational context.”
  • Why:

    • Tests that StellaOps does not over-report image-level CVEs when nothing in the products execution profile uses them.

5.7 CONTAINER-OSPKG-UNSAFE-007 OS package actually used

  • Purpose: same as 006 but positive case: vulnerability is reachable.

  • Pattern:

    • Same base image and package.
    • Entrypoint script calls vuln-tool with crafted input that triggers the bug.
  • Ground truth:

    • Vulnerable: true.
    • Reachable: true.
    • This should flip the VEX verdict vs 006.
  • Why:

    • Verifies that your reachability engine + runtime behaviour correctly distinguish “installed but unused” from “installed and actively exploited.”

5.8 JAVA-HTTP-UNSAFE-008 vulnerable route in minimal Java service

  • Purpose: test JVM + HTTP + transitive dep reachability.

  • Pattern:

    • Small Spring Boot or JAX-RS service with:

      • /safe endpoint using only safe methods.
      • /unsafe endpoint calling a method in vuln-lib that has a simple bug (e.g. path traversal or unsafe deserialization).
  • SBOM:

    • Component pkg:maven/org.stellaops/vuln-lib@1.0.0 linked to FAKE-CVE-2025-0004.
  • Ground truth:

    • For an HTTP call to /unsafe, vulnerability reachable.
    • For /safe, not reachable.
  • Benchmark convention:

    • Fixture defines run.unsafe and run.safe commands in manifest (two separate “scenarios” under one fixture ID, or two sub-cases in manifest.fixture.yaml).
  • Why:

    • Exercises language-level dependency resolution, transitive calls, and HTTP entrypoints.

5.9 DOTNET-LIB-PAIR-009 .NET assembly with safe & unsafe variants

  • Purpose: cover your home turf: .NET 10 / C# pipeline + SBOM & VEX.

  • Pattern:

    • Golden.Banking.Core library with method:

      • public void Process(string iban) → suspicious string parsing / regex or overflow.
    • Two apps:

      • Golden.Banking.App.Unsafe that calls Process() with unsafe behaviour.
      • Golden.Banking.App.Safe that never calls Process() or uses a safe wrapper.
  • SBOM:

    • Component pkg:nuget/Golden.Banking.Core@1.0.0 tied to FAKE-CVE-2025-0005.
  • Ground truth:

    • For App.Unsafe, vulnerability reachable.
    • For App.Safe, not reachable.
  • Why:

    • Validates your .NET tooling (Sbomer, scanner.webservice) and that your graphs respect assembly boundaries and call sites.

5.10 PYTHON-IMPORT-UNSAFE-010 Python optional import pattern

  • Purpose: basic coverage for dynamic / interpreted language with optional module.

  • Pattern:

    • app.py:

      • Imports helper which conditionally imports vuln_mod when ENABLE_VULN=1.
      • When enabled, calling /unsafe function triggers, e.g., eval(user_input).
  • SBOM:

    • Component pkg:pypi/vuln-mod@1.0.0FAKE-CVE-2025-0006.
  • Ground truth:

    • With ENABLE_VULN=0, vulnerable module not imported → unreachable.
    • With ENABLE_VULN=1, reachable.
  • Why:

    • Simple but realistic test for environment-dependent reachability and Python support.

6. Local vulnerability feed for the Golden Set

To keep everything sovereign and deterministic, define a small internal OSV-like JSON feed, e.g. benchmarks/reachability/golden-v0/feeds/osv-golden.json:

{
  "vulnerabilities": [
    {
      "id": "FAKE-CVE-2025-0001",
      "summary": "Reachable error in sample C program",
      "aliases": [],
      "affected": [
        {
          "package": {
            "ecosystem": "generic",
            "name": "C-REACH-UNSAFE-001"
          },
          "ranges": [{ "type": "SEMVER", "events": [{ "introduced": "0" }] }]
        }
      ],
      "database_specific": {
        "stellaops_fixture_id": "C-REACH-UNSAFE-001"
      }
    }
    // ... more FAKE-CVE defs ...
  ]
}

Scanner/Feedser in “golden mode” should:

  • Use only this feed.
  • Produce deterministic, closed-world graphs and VEX decisions.

7. Integration hooks with StellaOps

Make sure each module has a clear use of the golden set:

  • Scanner.Webservice

    • Input: SBOM + local feed for a fixture.
    • Output: canonical graph JSON and SHA-256 revision.
    • For each fixture, compare produced revision_id against graph.reference.sha256.txt.
  • Sbomer

    • Rebuilds SBOM from source/binaries and compares it to artifacts/sbom.cdx.json.
    • Fails test if SBOMs differ in canonicalized form.
  • Vexer / Excititor

    • Ingests graph + local feed and produces VEX.
    • Compare resulting VEX to artifacts/vex.openvex.json.
  • UnknownsRegistry

    • For v0 you can keep unknowns minimal, but:

      • At least one fixture (e.g. Python or container) can contain a “deliberately un-PURL-able” file to confirm it enters Unknowns with expected half-life.
  • Authority

    • Signs the reference artifacts:

      • SBOM
      • Graph
      • VEX
    • Ensures deterministic attestation for the golden set (you can later publish these as public reference proofs).


8. Implementation plan for your team

You can drop this straight into a ticket or doc.

  1. Scaffold repo structure

    • Create benchmarks/reachability/golden-v0/... layout as above.
    • Add a top-level README.md describing goals and usage.
  2. Implement the 10 fixtures

    • Each fixture: write minimal code, build scripts, and manifest.fixture.yaml.
    • Keep code tiny (13 files) and deterministic (no network, no randomness, no wall time).
  3. Generate SBOMs

    • Use your Sbomer for each artifact.
    • Normalize / canonicalize SBOMs and commit them as artifacts/sbom.cdx.json.
  4. Define FAKE-CVE feed

    • Create feeds/osv-golden.json with 12 entries per fixture.
    • Map each entry to PURLs used in SBOMs.
  5. Produce reference graphs

    • Run Scanner in “golden mode” on each fixtures SBOM + feed.
    • Normalize graphs (sorted JSON, deterministic formatting).
    • Compute SHA-256 → store in graph.reference.sha256.txt.
  6. Produce reference VEX documents

    • Run Vexer / Excititor with graphs + feed.
    • Manually review results, edit as needed.
    • Save final accepted VEX as artifacts/vex.openvex.json.
  7. Write explanations

    • For each fixture, add docs/explanation.md:

      • 510 lines explaining root cause, path, and why affected / not affected.
  8. Wire into CI

    • Add a GoldenReachabilityTests job that:

      • Builds all fixtures.
      • Regenerates SBOM, graph, and VEX.
      • Compares against reference artifacts.
    • Fail CI if any fixture drifts.

  9. Expose as a developer command

    • Add a CLI command, e.g.:

      • stellaops bench reachability --fixture C-REACH-UNSAFE-001
    • So developers can locally re-run single fixtures during development.


If you want, next step I can:

  • Take 23 of these fixtures (for example C-REACH-UNSAFE-001, C-LIB-TRANSITIVE-005, and DOTNET-LIB-PAIR-009) and draft actual code sketches + full manifest.fixture.yaml so your devs can literally copy-paste and start implementing.