Files
git.stella-ops.org/docs/product-advisories/archived/23-Nov-2025 - Publishing a Reachability Benchmark Dataset.md
2025-11-23 23:44:35 +02:00

26 KiB
Raw Blame History

Heres a crisp plan to publish a small, public “vulnerable binaries” dataset (PHP, JS, C#) and a way to compare reachability results across tools—so you can ship something useful fast, gather feedback, and iterate.


Scope (MVP)

  • Languages: PHP (composer), JavaScript (npm), C# (.NET).

  • Artifacts per sample:

    1. minimal app, 2) lockfile, 3) SBOM (CycloneDX JSON), 4) VEX (OSV/CycloneDX VEX), 5) groundtruth reachability notes, 6) scriptable repro (Docker).
  • Size: 35 samples per language (915 total). Keep each sample ≤200 LOC.


Repo layout

vuln-reach-dataset/
  LICENSE
  README.md
  schema/
    ground-truth.schema.json
    run-matrix.schema.json
  runners/
    run_all.sh
    run_all.ps1
  results/
    <tool>/<lang>/<sample>/run.json
  samples/
    php/...
    js/...
    csharp/...

“Ground truth” format (minimal)

{
  "sample_id": "php-001-phar-deserialize",
  "lang": "php",
  "package_manager": "composer",
  "vuln_ids": ["CVE-2019-XXXX","OSV:GHSA-..."],
  "entrypoints": ["public/index.php"],
  "reachable_symbols": [
    {"purl":"pkg:composer/vendor/package@1.2.3","symbol":"Vendor\\Unsafe::unserialize"},
    {"purl":"pkg:composer/monolog/monolog@2.9.0","symbol":"Monolog\\Logger::pushHandler","note":"benign"}
  ],
  "evidence": [
    {"type":"path","file":"public/index.php","line":18,"desc":"tainted input -> unserialize"},
    {"type":"exec","cmd":"curl 'http://localhost/?p=O:...'", "result":"triggered sink"}
  ]
}

Samples to include (suggested)

PHP (composer)

  1. php-001-phar-deserialize

    • Risk: unsafe unserialize() on user input; optional PHAR gadget.
    • Ground truth: reachable sink unserialize.
  2. php-002-xxe-simplexml

    • Risk: XML external entity in simplexml_load_string with libxml options off.
    • Ground truth: reachable XXE sink.
  3. php-003-ssrf-guzzle

    • Risk: usercontrolled URL into Guzzle client.
    • Ground truth: SSRF call chain to Client::request.

JavaScript (npm)

  1. js-001-prototype-pollution

    • Risk: lodash.merge (known vulns historically) with user object.
    • Ground truth: polluted {__proto__} path reaches object creation site.
  2. js-002-yaml-unsafe-load

    • Risk: js-yaml load on untrusted text.
    • Ground truth: call to load reachable from HTTP route.
  3. js-003-ssrf-node-fetch

    • Risk: user URL to node-fetch.
    • Ground truth: request issued to attacker-controlled host.

C# (.NET)

  1. cs-001-binaryformatter-deserialize

    • Risk: BinaryFormatter.Deserialize on user input (legacy).
    • Ground truth: reachable call to Deserialize.
  2. cs-002-processstartinfo-injection

    • Risk: Process.Start with unsanitized arg (Windows/Linux).
    • Ground truth: taint to Process.Start.
  3. cs-003-xmlreader-xxe

    • Risk: insecure XmlReader settings (DtdProcessing = Parse).
    • Ground truth: external entity resolved.

Each sample should:

  • Pin a known vulnerable version in lockfile.
  • Provide a positive (reachable) and negative (not reachable) path.
  • Include a tiny HTTP entrypoint to exercise the path.

SBOM & VEX per sample

  • CycloneDX 1.6 JSON SBOM produced via native tool (composer, npm, dotnet) + converter.
  • VEX: one document stating the vulnerability is affected and exploitable for the positive path; not_affected for the negative path with justification (e.g., “vulnerable code not invoked”).

Runner & result format (tool-agnostic)

  • Runners call each selected tool (e.g., “ToolA”, “ToolB”), then normalize outputs to:
{
  "tool": "ToolA",
  "version": "x.y.z",
  "sample_id": "js-002-yaml-unsafe-load",
  "detected_vulns": ["OSV:GHSA-..."],
  "reachable_symbols_reported": [
    {"purl":"pkg:npm/js-yaml@4.1.0","symbol":"load"}
  ],
  "verdict": {
    "reachable": true,
    "confidence": 0.92
  },
  "raw": "path/to/original/tool/output.json"
}

Comparison metrics

For each sample:

  • TP (tool says reachable & ground truth reachable)
  • FP (tool says reachable but ground truth not reachable)
  • FN (tool says not reachable but ground truth reachable)
  • TN (tool says not reachable & ground truth not reachable)

Aggregate per language & tool:

  • Precision, recall, F1, and Reachability Accuracy = (TP+TN)/All.
  • Optional: Path depth agreement (did tool cite the expected symbol/edge?).
  • Optional: Time-to-result (seconds) and scan mode (static, dynamic, hybrid).

Minimal example (JS) — samples/js/js-002-yaml-unsafe-load

package.json
package-lock.json
server.js            # express route POST /parse -> js-yaml load(body.text)
README.md
sbom.cdx.json
vex.cdx.json
repro.sh             # npm ci; node server.js; curl -XPOST ...
GROUND_TRUTH.json
  • Positive path: POST {"text":"a: &a 1\nb: *a"} to exercise parser.
  • Negative path: guarded route that rejects user input unless whitelisted.

Publishing checklist

  • License: CC BY 4.0 (dataset) + MIT (runners).

  • Data hygiene: no real secrets; deterministic scripts; pinned versions.

  • Repro: onecommand docker compose up per language.

  • Docs:

    • What is “reachability”? (vulnerable code is actually callable from app inputs).
    • How we built ground truth (static review + runnable PoC).
    • How to add a new sample (template folder + PR checklist).

Fast path to first release (12 days of focused work)

  1. Ship one sample per language with full ground truth + SBOM/VEX.
  2. Include one tool runner (even a noop placeholder) and the result schema.
  3. Add a results/README with the confusionmatrix table filled for these 3 samples.
  4. Open issues inviting contributions: more samples, more tools, more sinks.

Why this helps

  • Creates a neutral, reproducible yardstick for reachability.
  • Lets vendors & researchers compare apples to apples.
  • Encourages PRs (small, selfcontained samples) and early citations for StellaOps.

If you want, I can generate the repo skeleton (folders, sample stubs, JSON schemas, and runner scripts) so you can push it directly to GitHub. Heres a “drop in” developer guide + concrete samples you can paste into your repo (or split into README.md / docs/DEVELOPER_GUIDE.md). Ill show:

  1. How the project is structured
  2. Very detailed example samples for PHP, JS, C#
  3. How tool authors integrate their reachability tool
  4. How contributors add new samples

You can tweak names/IDs, but everything below is selfconsistent.


1. Repository structure (recap)

vuln-reach-dataset/
  README.md
  docs/
    DEVELOPER_GUIDE.md        # this file (or paste sections into README)
  schema/
    ground-truth.schema.json
    run-matrix.schema.json
  samples/
    php/
      php-001-phar-deserialize/
      php-002-xxe-simplexml/
      ...
    js/
      js-002-yaml-unsafe-load/
      ...
    csharp/
      cs-001-binaryformatter-deserialize/
      ...
  runners/
    run_all.sh
    run_all.ps1
    run_with_tool_mytool.py   # example tool integration
  results/
    mytool/
      php/php-001-phar-deserialize/run.json
      js/js-002-yaml-unsafe-load/run.json
      ...

Core idea: Each samples/<lang>/<sample_id>/ folder is:

  • A minimal runnable app containing a known vulnerability
  • A positive path (vulnerable code reachable) and (ideally) a negative path (package present but not reachable)
  • GROUND_TRUTH.json describing what is actually reachable
  • SBOM + VEX files describing vulnerabilities at the component level
  • A repro.sh script to run the app and trigger the bug

Tool authors plug in by reading each sample folder, running their scanner, and writing normalized results to results/<tool>/<lang>/<sample_id>/run.json.


2. Ground truth schema (what tools are judged against)

Minimal JSON format (you can store a full JSON Schema in schema/ground-truth.schema.json):

{
  "sample_id": "php-001-phar-deserialize",
  "lang": "php",
  "package_manager": "composer",
  "vuln_ids": [
    "OSV:PLACEHOLDER-2019-XXXX"
  ],
  "entrypoints": [
    "public/index.php"
  ],
  "reachable_symbols": [
    {
      "purl": "pkg:composer/example/vendor@1.2.3",
      "symbol": "Example\\Unsafe::unserialize",
      "kind": "sink",
      "note": "User-controlled input can reach this sink in /?mode=unsafe&data=..."
    },
    {
      "purl": "pkg:composer/example/vendor@1.2.3",
      "symbol": "Example\\Unsafe::unserialize",
      "kind": "sink",
      "note": "NOT reached in /?mode=safe (negative path)."
    }
  ],
  "evidence": [
    {
      "type": "path",
      "file": "public/index.php",
      "line": 25,
      "desc": "Tainted $_GET['data'] flows into Example\\Unsafe::unserialize"
    },
    {
      "type": "exec",
      "cmd": "curl 'http://localhost:8000/?mode=unsafe&data=...payload...'",
      "result": "Trigger behavior / exploit / exception"
    }
  ]
}

Fields are intentionally simple:

  • reachable_symbols describes what is reachable and from which package/version.
  • evidence explains why we marked it reachable (code path + repro command).

3. PHP sample (php-001-phar-deserialize)

3.1 Folder layout

samples/php/php-001-phar-deserialize/:

composer.json
composer.lock              # pinned, checked-in
public/
  index.php
src/
  UnsafeDeser.php
sbom.cdx.json
vex.cdx.json
GROUND_TRUTH.json
repro.sh
Dockerfile                 # optional, but recommended
README.md                  # local sample README

3.2 composer.json

Pin a vulnerable (or pretend-vulnerable) version:

{
  "name": "dataset/php-001-phar-deserialize",
  "description": "Minimal PHP app demonstrating unsafe unserialize reachability.",
  "require": {
    "php": "^8.1",
    "example/vendor": "1.2.3"   // pretend vulnerable package
  },
  "autoload": {
    "psr-4": {
      "Dataset\\Php001\\": "src/"
    }
  }
}

3.3 src/UnsafeDeser.php

<?php
namespace Dataset\Php001;

class UnsafeDeser
{
    /**
     * Vulnerable sink: directly calls unserialize() on user-controlled data.
     */
    public static function unsafeUnserialize(string $data): mixed
    {
        // This is the "vulnerable symbol" tools should flag as reachable.
        return unserialize($data);
    }

    /**
     * Example of a "benign" path: input is compared, not deserialized.
     */
    public static function safeCompare(string $data): bool
    {
        return $data === 'ok';
    }
}

3.4 public/index.php

<?php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Dataset\Php001\UnsafeDeser;

$mode = $_GET['mode'] ?? 'safe';
$data = $_GET['data'] ?? 's:2:"ok";';

if ($mode === 'unsafe') {
    // POSITIVE PATH: user input -> vulnerable sink
    $result = UnsafeDeser::unsafeUnserialize($data);
    echo "UNSAFE RESULT:\n";
    var_dump($result);
} else {
    // NEGATIVE PATH: package is present but sink not invoked
    $isOk = UnsafeDeser::safeCompare($data);
    echo "SAFE RESULT:\n";
    var_dump($isOk);
}

3.5 GROUND_TRUTH.json

{
  "sample_id": "php-001-phar-deserialize",
  "lang": "php",
  "package_manager": "composer",
  "vuln_ids": [
    "OSV:PLACEHOLDER-2019-UNSERIALIZE"
  ],
  "entrypoints": [
    "public/index.php"
  ],
  "reachable_symbols": [
    {
      "purl": "pkg:composer/example/vendor@1.2.3",
      "symbol": "Dataset\\Php001\\UnsafeDeser::unsafeUnserialize",
      "kind": "sink",
      "note": "Reachable when mode=unsafe (positive path)."
    }
  ],
  "evidence": [
    {
      "type": "path",
      "file": "public/index.php",
      "line": 15,
      "desc": "$_GET['data'] flows into UnsafeDeser::unsafeUnserialize without validation."
    },
    {
      "type": "exec",
      "cmd": "php -S 0.0.0.0:8000 -t public",
      "result": "Dev server started at http://0.0.0.0:8000"
    },
    {
      "type": "exec",
      "cmd": "curl 'http://localhost:8000/?mode=unsafe&data=O:4:\"Test\":0:{}'",
      "result": "Object of class Test created via unserialize()"
    }
  ]
}

3.6 Minimal SBOM (sbom.cdx.json)

Very small CycloneDX 1.6 example (trim or enrich as needed):

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "version": 1,
  "metadata": {
    "component": {
      "type": "application",
      "name": "php-001-phar-deserialize"
    }
  },
  "components": [
    {
      "type": "library",
      "name": "example/vendor",
      "version": "1.2.3",
      "purl": "pkg:composer/example/vendor@1.2.3"
    }
  ]
}

3.7 Minimal VEX (vex.cdx.json)

CycloneDX VEX example:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "version": 1,
  "metadata": {
    "component": {
      "type": "application",
      "name": "php-001-phar-deserialize"
    }
  },
  "vulnerabilities": [
    {
      "id": "OSV:PLACEHOLDER-2019-UNSERIALIZE",
      "source": {
        "name": "OSV",
        "url": "https://osv.dev/"
      },
      "affects": [
        {
          "ref": "pkg:composer/example/vendor@1.2.3"
        }
      ],
      "analysis": {
        "state": "affected",
        "justification": "exploitable",
        "detail": "UnsafeDeser::unsafeUnserialize is reachable from HTTP query parameter 'data' when mode=unsafe."
      }
    }
  ]
}

3.8 repro.sh

#!/usr/bin/env bash
set -euxo pipefail

# Install dependencies
composer install --no-interaction --no-progress

# Start built-in PHP server in background
php -S 0.0.0.0:8000 -t public &
SERVER_PID=$!

# Give server a moment to start
sleep 2

echo "[+] Safe path (should NOT reach vulnerable sink)"
curl -s 'http://localhost:8000/?mode=safe&data=s:2:"ok";' || true

echo "[+] Unsafe path (should reach vulnerable sink)"
curl -s 'http://localhost:8000/?mode=unsafe&data=O:4:"Test":0:{}' || true

kill "$SERVER_PID"
wait || true

4. JavaScript sample (js-002-yaml-unsafe-load)

This example is intentionally simple: an Express server that calls js-yamls unsafe load() on user input.

4.1 Layout

samples/js/js-002-yaml-unsafe-load/:

package.json
package-lock.json
server.js
sbom.cdx.json
vex.cdx.json
GROUND_TRUTH.json
repro.sh
Dockerfile (optional)
README.md

4.2 package.json

{
  "name": "js-002-yaml-unsafe-load",
  "version": "1.0.0",
  "description": "Minimal Node.js sample demonstrating unsafe js-yaml load reachability.",
  "main": "server.js",
  "scripts": {
    "start": "node server.js"
  },
  "dependencies": {
    "express": "^4.19.0",
    "js-yaml": "4.1.0"
  }
}

4.3 server.js

const express = require('express');
const yaml = require('js-yaml');

const app = express();
app.use(express.text({ type: '*/*' }));

// POSITIVE PATH: unsafe load of attacker-controlled YAML
app.post('/parse-unsafe', (req, res) => {
  try {
    const doc = yaml.load(req.body);  // vulnerable symbol
    res.json({ parsed: doc });
  } catch (err) {
    res.status(400).json({ error: String(err) });
  }
});

// NEGATIVE PATH: same dependency, but not reachable as a sink
app.post('/parse-safe', (req, res) => {
  // Pretend we validated and reject anything non-whitelisted
  if (req.body.length > 100) {
    return res.status(400).json({ error: 'Too big' });
  }
  // No call to yaml.load() here; dependency is present but sink not invoked
  res.json({ length: req.body.length });
});

const port = process.env.PORT || 3000;
app.listen(port, () => {
  console.log(`js-002-yaml-unsafe-load listening on http://localhost:${port}`);
});

4.4 GROUND_TRUTH.json

{
  "sample_id": "js-002-yaml-unsafe-load",
  "lang": "javascript",
  "package_manager": "npm",
  "vuln_ids": [
    "OSV:PLACEHOLDER-js-yaml-unsafe-load"
  ],
  "entrypoints": [
    "server.js"
  ],
  "reachable_symbols": [
    {
      "purl": "pkg:npm/js-yaml@4.1.0",
      "symbol": "load",
      "kind": "sink",
      "note": "Reachable from POST /parse-unsafe body."
    }
  ],
  "evidence": [
    {
      "type": "path",
      "file": "server.js",
      "line": 9,
      "desc": "req.body passes directly into yaml.load() without validation."
    },
    {
      "type": "exec",
      "cmd": "node server.js",
      "result": "Server listening on http://localhost:3000"
    },
    {
      "type": "exec",
      "cmd": "curl -XPOST localhost:3000/parse-unsafe -d 'foo: bar'",
      "result": "JSON response with {\"foo\":\"bar\"}"
    }
  ]
}

4.5 SBOM & VEX

Very similar to the PHP example, but with npm purl:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "version": 1,
  "components": [
    {
      "type": "library",
      "name": "js-yaml",
      "version": "4.1.0",
      "purl": "pkg:npm/js-yaml@4.1.0"
    }
  ]
}

and:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "version": 1,
  "vulnerabilities": [
    {
      "id": "OSV:PLACEHOLDER-js-yaml-unsafe-load",
      "affects": [
        { "ref": "pkg:npm/js-yaml@4.1.0" }
      ],
      "analysis": {
        "state": "affected",
        "justification": "exploitable",
        "detail": "yaml.load() reachable from POST /parse-unsafe."
      }
    }
  ]
}

4.6 repro.sh

#!/usr/bin/env bash
set -euxo pipefail

npm ci
node server.js &
SERVER_PID=$!

sleep 2

echo "[+] Positive (reachable) path"
curl -s -XPOST localhost:3000/parse-unsafe -d 'foo: bar' || true

echo "[+] Negative (not reaching sink) path"
curl -s -XPOST localhost:3000/parse-safe -d 'foo: bar' || true

kill "$SERVER_PID"
wait || true

5. C# sample (cs-001-binaryformatter-deserialize)

Minimal ASP.NET Core-style sample that uses BinaryFormatter.Deserialize on request data.

5.1 Layout

samples/csharp/cs-001-binaryformatter-deserialize/:

Cs001BinaryFormatter.csproj
Program.cs
sbom.cdx.json
vex.cdx.json
GROUND_TRUTH.json
repro.sh
Dockerfile (optional)
README.md

5.2 Cs001BinaryFormatter.csproj

<Project Sdk="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>
  <ItemGroup>
    <!-- Pretend vulnerable library -->
    <PackageReference Include="Example.VulnerableLib" Version="1.0.0" />
  </ItemGroup>
</Project>

5.3 Program.cs

using System.Runtime.Serialization.Formatters.Binary;
using System.Text;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.MapPost("/deserialize-unsafe", async (HttpContext ctx) =>
{
    // POSITIVE PATH: body -> BinaryFormatter.Deserialize
    using var ms = new MemoryStream(await ToBytes(ctx.Request.Body));
#pragma warning disable SYSLIB0011
    var formatter = new BinaryFormatter();
    var obj = formatter.Deserialize(ms); // vulnerable symbol
#pragma warning restore SYSLIB0011

    await ctx.Response.WriteAsJsonAsync(new { success = true, type = obj?.GetType().FullName });
});

app.MapPost("/deserialize-safe", async (HttpContext ctx) =>
{
    // NEGATIVE PATH: we read input, but never deserialize
    using var reader = new StreamReader(ctx.Request.Body, Encoding.UTF8);
    var text = await reader.ReadToEndAsync();
    await ctx.Response.WriteAsJsonAsync(new { length = text.Length });
});

app.Run();

static async Task<byte[]> ToBytes(Stream stream)
{
    using var ms = new MemoryStream();
    await stream.CopyToAsync(ms);
    return ms.ToArray();
}

5.4 GROUND_TRUTH.json

{
  "sample_id": "cs-001-binaryformatter-deserialize",
  "lang": "csharp",
  "package_manager": "nuget",
  "vuln_ids": [
    "OSV:PLACEHOLDER-BinaryFormatter"
  ],
  "entrypoints": [
    "Program.cs"
  ],
  "reachable_symbols": [
    {
      "purl": "pkg:nuget/Example.VulnerableLib@1.0.0",
      "symbol": "System.Runtime.Serialization.Formatters.Binary.BinaryFormatter::Deserialize",
      "kind": "sink",
      "note": "Reachable from POST /deserialize-unsafe body."
    }
  ],
  "evidence": [
    {
      "type": "path",
      "file": "Program.cs",
      "line": 15,
      "desc": "Request body copied verbatim into BinaryFormatter.Deserialize."
    },
    {
      "type": "exec",
      "cmd": "dotnet run",
      "result": "App listening on http://localhost:5000"
    },
    {
      "type": "exec",
      "cmd": "curl -XPOST http://localhost:5000/deserialize-unsafe --data-binary @payload.bin",
      "result": "Response includes type name of deserialized object."
    }
  ]
}

SBOM/VEX same pattern as previous examples, with purl: "pkg:nuget/Example.VulnerableLib@1.0.0".


6. Tool output schema and integration

This is the normalized output your runners should produce for each (tool, sample) pair.

6.1 run.json schema (results///<sample_id>/run.json)

{
  "tool": "mytool",
  "version": "1.2.3",
  "sample_id": "js-002-yaml-unsafe-load",
  "lang": "javascript",
  "detected_vulns": [
    "OSV:PLACEHOLDER-js-yaml-unsafe-load"
  ],
  "reachable_symbols_reported": [
    {
      "purl": "pkg:npm/js-yaml@4.1.0",
      "symbol": "load",
      "kind": "sink",
      "evidence": "Taint flow from POST /parse-unsafe body to js-yaml load()."
    }
  ],
  "verdict": {
    "reachable": true,
    "confidence": 0.92
  },
  "timing": {
    "scan_ms": 2300
  },
  "raw": "tool-output.json"   // optional path to original tool output
}

Fields:

  • reachable is your top-level yes/no reachability verdict for the specific vulnerability(ies) listed in the sample.
  • reachable_symbols_reported should map onto GROUND_TRUTH.reachable_symbols where possible.

7. Example integration: running a tool against all samples

7.1 Simple Bash runner (runners/run_all.sh)

#!/usr/bin/env bash
set -euo pipefail

TOOL_NAME="${1:-mytool}"

ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"

for lang_dir in "$ROOT_DIR/samples"/*; do
  lang="$(basename "$lang_dir")"
  for sample_dir in "$lang_dir"/*; do
    sample_id="$(basename "$sample_dir")"
    echo "[*] Running $TOOL_NAME on $lang/$sample_id"

    sbom="$sample_dir/sbom.cdx.json"
    vex="$sample_dir/vex.cdx.json"

    mkdir -p "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id"

    # Example: assume your tool supports a CLI like:
    # mytool scan --sbom sbom.cdx.json --vex vex.cdx.json --project-root .
    mytool scan \
      --sbom "$sbom" \
      --vex "$vex" \
      --project-root "$sample_dir" \
      > "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id/tool-output.json"

    # Normalize output to run.json via helper script
    python "$ROOT_DIR/runners/normalize_${TOOL_NAME}.py" \
      "$sample_dir" \
      "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id/tool-output.json" \
      > "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id/run.json"
  done
done

7.2 Python normalizer example (runners/normalize_mytool.py)

This script turns your proprietary tool output into our run.json schema.

#!/usr/bin/env python
import json
import sys
from pathlib import Path

sample_dir = Path(sys.argv[1])
tool_output_path = Path(sys.argv[2])

ground_truth = json.loads((sample_dir / "GROUND_TRUTH.json").read_text())
tool_output = json.loads(tool_output_path.read_text())

# Example: adapt based on your tool's own schema
run = {
    "tool": "mytool",
    "version": tool_output.get("tool_version", "unknown"),
    "sample_id": ground_truth["sample_id"],
    "lang": ground_truth["lang"],
    "detected_vulns": tool_output.get("vuln_ids", []),
    "reachable_symbols_reported": [],
    "verdict": {
        "reachable": bool(tool_output.get("reachable", False)),
        "confidence": float(tool_output.get("confidence", 0.0))
    },
    "timing": {
        "scan_ms": tool_output.get("scan_ms", None)
    },
    "raw": str(tool_output_path.name)
}

for r in tool_output.get("reachable_sinks", []):
    run["reachable_symbols_reported"].append({
        "purl": r.get("purl"),
        "symbol": r.get("symbol"),
        "kind": r.get("kind", "sink"),
        "evidence": r.get("evidence", "")
    })

print(json.dumps(run, indent=2))

So tool authors only need to:

  1. Implement a CLI to scan a project (given SBOM/VEX).
  2. Implement a small normalizer to produce run.json.

8. Adding a new sample (for contributors)

This is what youd document so others can extend the dataset.

  1. Pick a language & ID

    • Folder: samples/<lang>/<lang-short-id>-NNN-<name>/
    • Example: samples/php/php-004-guzzle-ssrf/
  2. Create a minimal app

    • It must install with one command (composer install, npm ci, dotnet restore, etc.).

    • Include:

      • Positive path: user-controllable data reaches the vulnerable sink.
      • Negative path (if possible): same dependency present but sink not reachable.
  3. Pin dependencies

    • Commit lockfiles (composer.lock, package-lock.json, etc.).
    • Make sure the vulnerable version is used.
  4. Write GROUND_TRUTH.json

    • Fill all required fields from the schema above.
    • Be explicit about which symbol(s) are reachable and how to reproduce.
  5. Generate SBOM

    • Use your preferred SBOM generator and convert to CycloneDX 1.6 JSON (sbom.cdx.json).
    • Ensure PURLs match those you reference in GROUND_TRUTH.json.
  6. Write VEX (vex.cdx.json)

    • At minimum: one vulnerability with analysis.state = affected or not_affected.
    • Link to the SBOM component via affects.ref.
  7. Add repro.sh

    • Script that:

      • Installs deps.
      • Starts the app.
      • Executes at least one positive and one negative HTTP/CLI call.
    • Must exit nonzero on obvious failure.

  8. Document briefly in local README

    • What vulnerability pattern this sample represents (e.g., SSRF, XXE, unsafe deserialization).
    • Expected tool behavior (what should be marked reachable).

If you want, you can literally copy-paste the code and JSON above as your initial three samples (php-001, js-002, cs-001) and then we can layer in more patterns (XXE, SSRF, prototype pollution, etc.) the same way.