Here’s a crisp plan to **publish a small, public “vulnerable binaries” dataset** (PHP, JS, C#) and a way to **compare reachability results** across tools—so you can ship something useful fast, gather feedback, and iterate. --- # Scope (MVP) * **Languages:** PHP (composer), JavaScript (npm), C# (.NET). * **Artifacts per sample:** 1. minimal app, 2) lockfile, 3) SBOM (CycloneDX JSON), 4) VEX (OSV/CycloneDX VEX), 5) ground‑truth reachability notes, 6) scriptable repro (Docker). * **Size:** 3–5 samples per language (9–15 total). Keep each sample ≤200 LOC. --- # Repo layout ``` vuln-reach-dataset/ LICENSE README.md schema/ ground-truth.schema.json run-matrix.schema.json runners/ run_all.sh run_all.ps1 results/ ///run.json samples/ php/... js/... csharp/... ``` --- # “Ground truth” format (minimal) ```json { "sample_id": "php-001-phar-deserialize", "lang": "php", "package_manager": "composer", "vuln_ids": ["CVE-2019-XXXX","OSV:GHSA-..."], "entrypoints": ["public/index.php"], "reachable_symbols": [ {"purl":"pkg:composer/vendor/package@1.2.3","symbol":"Vendor\\Unsafe::unserialize"}, {"purl":"pkg:composer/monolog/monolog@2.9.0","symbol":"Monolog\\Logger::pushHandler","note":"benign"} ], "evidence": [ {"type":"path","file":"public/index.php","line":18,"desc":"tainted input -> unserialize"}, {"type":"exec","cmd":"curl 'http://localhost/?p=O:...'", "result":"triggered sink"} ] } ``` --- # Samples to include (suggested) ## PHP (composer) 1. **php-001-phar-deserialize** * Risk: unsafe `unserialize()` on user input; optional PHAR gadget. * Ground truth: reachable sink `unserialize`. 2. **php-002-xxe-simplexml** * Risk: XML external entity in `simplexml_load_string` with libxml options off. * Ground truth: reachable XXE sink. 3. **php-003-ssrf-guzzle** * Risk: user‑controlled URL into Guzzle client. * Ground truth: SSRF call chain to `Client::request`. ## JavaScript (npm) 1. **js-001-prototype-pollution** * Risk: `lodash.merge` (known vulns historically) with user object. * Ground truth: polluted `{__proto__}` path reaches object creation site. 2. **js-002-yaml-unsafe-load** * Risk: `js-yaml` `load` on untrusted text. * Ground truth: call to `load` reachable from HTTP route. 3. **js-003-ssrf-node-fetch** * Risk: user URL to `node-fetch`. * Ground truth: request issued to attacker-controlled host. ## C# (.NET) 1. **cs-001-binaryformatter-deserialize** * Risk: `BinaryFormatter.Deserialize` on user input (legacy). * Ground truth: reachable call to `Deserialize`. 2. **cs-002-processstartinfo-injection** * Risk: `Process.Start` with unsanitized arg (Windows/Linux). * Ground truth: taint to `Process.Start`. 3. **cs-003-xmlreader-xxe** * Risk: insecure `XmlReader` settings (DtdProcessing = Parse). * Ground truth: external entity resolved. Each sample should: * Pin a known vulnerable version in lockfile. * Provide a **positive** (reachable) and **negative** (not reachable) path. * Include a tiny HTTP entrypoint to exercise the path. --- # SBOM & VEX per sample * **CycloneDX 1.6 JSON** SBOM produced via native tool (composer, npm, dotnet) + converter. * **VEX**: one document stating the vulnerability is **affected** and **exploitable** for the positive path; **not_affected** for the negative path with justification (e.g., “vulnerable code not invoked”). --- # Runner & result format (tool-agnostic) * Runners call each selected tool (e.g., “ToolA”, “ToolB”), then normalize outputs to: ```json { "tool": "ToolA", "version": "x.y.z", "sample_id": "js-002-yaml-unsafe-load", "detected_vulns": ["OSV:GHSA-..."], "reachable_symbols_reported": [ {"purl":"pkg:npm/js-yaml@4.1.0","symbol":"load"} ], "verdict": { "reachable": true, "confidence": 0.92 }, "raw": "path/to/original/tool/output.json" } ``` --- # Comparison metrics For each sample: * **TP** (tool says reachable & ground truth reachable) * **FP** (tool says reachable but ground truth not reachable) * **FN** (tool says not reachable but ground truth reachable) * **TN** (tool says not reachable & ground truth not reachable) Aggregate per language & tool: * Precision, recall, F1, and **Reachability Accuracy** = (TP+TN)/All. * Optional: **Path depth** agreement (did tool cite the expected symbol/edge?). * Optional: **Time-to-result** (seconds) and **scan mode** (static, dynamic, hybrid). --- # Minimal example (JS) — `samples/js/js-002-yaml-unsafe-load` ``` package.json package-lock.json server.js # express route POST /parse -> js-yaml load(body.text) README.md sbom.cdx.json vex.cdx.json repro.sh # npm ci; node server.js; curl -XPOST ... GROUND_TRUTH.json ``` * Positive path: POST `{"text":"a: &a 1\nb: *a"}` to exercise parser. * Negative path: guarded route that rejects user input unless whitelisted. --- # Publishing checklist * **License:** CC BY 4.0 (dataset) + MIT (runners). * **Data hygiene:** no real secrets; deterministic scripts; pinned versions. * **Repro:** one‑command `docker compose up` per language. * **Docs:** * What is “reachability”? (vulnerable code is actually callable from app inputs). * How we built ground truth (static review + runnable PoC). * How to add a new sample (template folder + PR checklist). --- # Fast path to first release (1–2 days of focused work) 1. Ship **one sample per language** with full ground truth + SBOM/VEX. 2. Include **one tool runner** (even a no‑op placeholder) and the **result schema**. 3. Add a **results/README** with the confusion‑matrix table filled for these 3 samples. 4. Open **issues** inviting contributions: more samples, more tools, more sinks. --- # Why this helps * Creates a **neutral, reproducible** yardstick for reachability. * Lets vendors & researchers **compare apples to apples**. * Encourages **PRs** (small, self‑contained samples) and **early citations** for Stella Ops. If you want, I can generate the repo skeleton (folders, sample stubs, JSON schemas, and runner scripts) so you can push it directly to GitHub. Here’s a “drop in” **developer guide + concrete samples** you can paste into your repo (or split into `README.md` / `docs/DEVELOPER_GUIDE.md`). I’ll show: 1. How the project is structured 2. **Very detailed** example samples for PHP, JS, C# 3. How tool authors integrate their reachability tool 4. How contributors add new samples You can tweak names/IDs, but everything below is self‑consistent. --- ## 1. Repository structure (recap) ```text vuln-reach-dataset/ README.md docs/ DEVELOPER_GUIDE.md # this file (or paste sections into README) schema/ ground-truth.schema.json run-matrix.schema.json samples/ php/ php-001-phar-deserialize/ php-002-xxe-simplexml/ ... js/ js-002-yaml-unsafe-load/ ... csharp/ cs-001-binaryformatter-deserialize/ ... runners/ run_all.sh run_all.ps1 run_with_tool_mytool.py # example tool integration results/ mytool/ php/php-001-phar-deserialize/run.json js/js-002-yaml-unsafe-load/run.json ... ``` **Core idea:** Each `samples///` folder is: * A minimal runnable app containing a known vulnerability * A **positive path** (vulnerable code reachable) and (ideally) a **negative path** (package present but not reachable) * `GROUND_TRUTH.json` describing what is actually reachable * SBOM + VEX files describing vulnerabilities at the component level * A `repro.sh` script to run the app and trigger the bug Tool authors plug in by reading each sample folder, running their scanner, and writing normalized results to `results////run.json`. --- ## 2. Ground truth schema (what tools are judged against) **Minimal JSON format** (you can store a full JSON Schema in `schema/ground-truth.schema.json`): ```jsonc { "sample_id": "php-001-phar-deserialize", "lang": "php", "package_manager": "composer", "vuln_ids": [ "OSV:PLACEHOLDER-2019-XXXX" ], "entrypoints": [ "public/index.php" ], "reachable_symbols": [ { "purl": "pkg:composer/example/vendor@1.2.3", "symbol": "Example\\Unsafe::unserialize", "kind": "sink", "note": "User-controlled input can reach this sink in /?mode=unsafe&data=..." }, { "purl": "pkg:composer/example/vendor@1.2.3", "symbol": "Example\\Unsafe::unserialize", "kind": "sink", "note": "NOT reached in /?mode=safe (negative path)." } ], "evidence": [ { "type": "path", "file": "public/index.php", "line": 25, "desc": "Tainted $_GET['data'] flows into Example\\Unsafe::unserialize" }, { "type": "exec", "cmd": "curl 'http://localhost:8000/?mode=unsafe&data=...payload...'", "result": "Trigger behavior / exploit / exception" } ] } ``` Fields are intentionally simple: * `reachable_symbols` describes **what is reachable** and from which package/version. * `evidence` explains *why* we marked it reachable (code path + repro command). --- ## 3. PHP sample (php-001-phar-deserialize) ### 3.1 Folder layout `samples/php/php-001-phar-deserialize/`: ```text composer.json composer.lock # pinned, checked-in public/ index.php src/ UnsafeDeser.php sbom.cdx.json vex.cdx.json GROUND_TRUTH.json repro.sh Dockerfile # optional, but recommended README.md # local sample README ``` ### 3.2 `composer.json` Pin a vulnerable (or pretend-vulnerable) version: ```json { "name": "dataset/php-001-phar-deserialize", "description": "Minimal PHP app demonstrating unsafe unserialize reachability.", "require": { "php": "^8.1", "example/vendor": "1.2.3" // pretend vulnerable package }, "autoload": { "psr-4": { "Dataset\\Php001\\": "src/" } } } ``` ### 3.3 `src/UnsafeDeser.php` ```php vulnerable sink $result = UnsafeDeser::unsafeUnserialize($data); echo "UNSAFE RESULT:\n"; var_dump($result); } else { // NEGATIVE PATH: package is present but sink not invoked $isOk = UnsafeDeser::safeCompare($data); echo "SAFE RESULT:\n"; var_dump($isOk); } ``` ### 3.5 `GROUND_TRUTH.json` ```json { "sample_id": "php-001-phar-deserialize", "lang": "php", "package_manager": "composer", "vuln_ids": [ "OSV:PLACEHOLDER-2019-UNSERIALIZE" ], "entrypoints": [ "public/index.php" ], "reachable_symbols": [ { "purl": "pkg:composer/example/vendor@1.2.3", "symbol": "Dataset\\Php001\\UnsafeDeser::unsafeUnserialize", "kind": "sink", "note": "Reachable when mode=unsafe (positive path)." } ], "evidence": [ { "type": "path", "file": "public/index.php", "line": 15, "desc": "$_GET['data'] flows into UnsafeDeser::unsafeUnserialize without validation." }, { "type": "exec", "cmd": "php -S 0.0.0.0:8000 -t public", "result": "Dev server started at http://0.0.0.0:8000" }, { "type": "exec", "cmd": "curl 'http://localhost:8000/?mode=unsafe&data=O:4:\"Test\":0:{}'", "result": "Object of class Test created via unserialize()" } ] } ``` ### 3.6 Minimal SBOM (`sbom.cdx.json`) Very small CycloneDX 1.6 example (trim or enrich as needed): ```json { "bomFormat": "CycloneDX", "specVersion": "1.6", "version": 1, "metadata": { "component": { "type": "application", "name": "php-001-phar-deserialize" } }, "components": [ { "type": "library", "name": "example/vendor", "version": "1.2.3", "purl": "pkg:composer/example/vendor@1.2.3" } ] } ``` ### 3.7 Minimal VEX (`vex.cdx.json`) CycloneDX VEX example: ```json { "bomFormat": "CycloneDX", "specVersion": "1.6", "version": 1, "metadata": { "component": { "type": "application", "name": "php-001-phar-deserialize" } }, "vulnerabilities": [ { "id": "OSV:PLACEHOLDER-2019-UNSERIALIZE", "source": { "name": "OSV", "url": "https://osv.dev/" }, "affects": [ { "ref": "pkg:composer/example/vendor@1.2.3" } ], "analysis": { "state": "affected", "justification": "exploitable", "detail": "UnsafeDeser::unsafeUnserialize is reachable from HTTP query parameter 'data' when mode=unsafe." } } ] } ``` ### 3.8 `repro.sh` ```bash #!/usr/bin/env bash set -euxo pipefail # Install dependencies composer install --no-interaction --no-progress # Start built-in PHP server in background php -S 0.0.0.0:8000 -t public & SERVER_PID=$! # Give server a moment to start sleep 2 echo "[+] Safe path (should NOT reach vulnerable sink)" curl -s 'http://localhost:8000/?mode=safe&data=s:2:"ok";' || true echo "[+] Unsafe path (should reach vulnerable sink)" curl -s 'http://localhost:8000/?mode=unsafe&data=O:4:"Test":0:{}' || true kill "$SERVER_PID" wait || true ``` --- ## 4. JavaScript sample (js-002-yaml-unsafe-load) This example is intentionally simple: an Express server that calls `js-yaml`’s unsafe `load()` on user input. ### 4.1 Layout `samples/js/js-002-yaml-unsafe-load/`: ```text package.json package-lock.json server.js sbom.cdx.json vex.cdx.json GROUND_TRUTH.json repro.sh Dockerfile (optional) README.md ``` ### 4.2 `package.json` ```json { "name": "js-002-yaml-unsafe-load", "version": "1.0.0", "description": "Minimal Node.js sample demonstrating unsafe js-yaml load reachability.", "main": "server.js", "scripts": { "start": "node server.js" }, "dependencies": { "express": "^4.19.0", "js-yaml": "4.1.0" } } ``` ### 4.3 `server.js` ```js const express = require('express'); const yaml = require('js-yaml'); const app = express(); app.use(express.text({ type: '*/*' })); // POSITIVE PATH: unsafe load of attacker-controlled YAML app.post('/parse-unsafe', (req, res) => { try { const doc = yaml.load(req.body); // vulnerable symbol res.json({ parsed: doc }); } catch (err) { res.status(400).json({ error: String(err) }); } }); // NEGATIVE PATH: same dependency, but not reachable as a sink app.post('/parse-safe', (req, res) => { // Pretend we validated and reject anything non-whitelisted if (req.body.length > 100) { return res.status(400).json({ error: 'Too big' }); } // No call to yaml.load() here; dependency is present but sink not invoked res.json({ length: req.body.length }); }); const port = process.env.PORT || 3000; app.listen(port, () => { console.log(`js-002-yaml-unsafe-load listening on http://localhost:${port}`); }); ``` ### 4.4 `GROUND_TRUTH.json` ```json { "sample_id": "js-002-yaml-unsafe-load", "lang": "javascript", "package_manager": "npm", "vuln_ids": [ "OSV:PLACEHOLDER-js-yaml-unsafe-load" ], "entrypoints": [ "server.js" ], "reachable_symbols": [ { "purl": "pkg:npm/js-yaml@4.1.0", "symbol": "load", "kind": "sink", "note": "Reachable from POST /parse-unsafe body." } ], "evidence": [ { "type": "path", "file": "server.js", "line": 9, "desc": "req.body passes directly into yaml.load() without validation." }, { "type": "exec", "cmd": "node server.js", "result": "Server listening on http://localhost:3000" }, { "type": "exec", "cmd": "curl -XPOST localhost:3000/parse-unsafe -d 'foo: bar'", "result": "JSON response with {\"foo\":\"bar\"}" } ] } ``` ### 4.5 SBOM & VEX Very similar to the PHP example, but with npm purl: ```json { "bomFormat": "CycloneDX", "specVersion": "1.6", "version": 1, "components": [ { "type": "library", "name": "js-yaml", "version": "4.1.0", "purl": "pkg:npm/js-yaml@4.1.0" } ] } ``` and: ```json { "bomFormat": "CycloneDX", "specVersion": "1.6", "version": 1, "vulnerabilities": [ { "id": "OSV:PLACEHOLDER-js-yaml-unsafe-load", "affects": [ { "ref": "pkg:npm/js-yaml@4.1.0" } ], "analysis": { "state": "affected", "justification": "exploitable", "detail": "yaml.load() reachable from POST /parse-unsafe." } } ] } ``` ### 4.6 `repro.sh` ```bash #!/usr/bin/env bash set -euxo pipefail npm ci node server.js & SERVER_PID=$! sleep 2 echo "[+] Positive (reachable) path" curl -s -XPOST localhost:3000/parse-unsafe -d 'foo: bar' || true echo "[+] Negative (not reaching sink) path" curl -s -XPOST localhost:3000/parse-safe -d 'foo: bar' || true kill "$SERVER_PID" wait || true ``` --- ## 5. C# sample (cs-001-binaryformatter-deserialize) Minimal ASP.NET Core-style sample that uses `BinaryFormatter.Deserialize` on request data. ### 5.1 Layout `samples/csharp/cs-001-binaryformatter-deserialize/`: ```text Cs001BinaryFormatter.csproj Program.cs sbom.cdx.json vex.cdx.json GROUND_TRUTH.json repro.sh Dockerfile (optional) README.md ``` ### 5.2 `Cs001BinaryFormatter.csproj` ```xml net8.0 enable enable ``` ### 5.3 `Program.cs` ```csharp using System.Runtime.Serialization.Formatters.Binary; using System.Text; var builder = WebApplication.CreateBuilder(args); var app = builder.Build(); app.MapPost("/deserialize-unsafe", async (HttpContext ctx) => { // POSITIVE PATH: body -> BinaryFormatter.Deserialize using var ms = new MemoryStream(await ToBytes(ctx.Request.Body)); #pragma warning disable SYSLIB0011 var formatter = new BinaryFormatter(); var obj = formatter.Deserialize(ms); // vulnerable symbol #pragma warning restore SYSLIB0011 await ctx.Response.WriteAsJsonAsync(new { success = true, type = obj?.GetType().FullName }); }); app.MapPost("/deserialize-safe", async (HttpContext ctx) => { // NEGATIVE PATH: we read input, but never deserialize using var reader = new StreamReader(ctx.Request.Body, Encoding.UTF8); var text = await reader.ReadToEndAsync(); await ctx.Response.WriteAsJsonAsync(new { length = text.Length }); }); app.Run(); static async Task ToBytes(Stream stream) { using var ms = new MemoryStream(); await stream.CopyToAsync(ms); return ms.ToArray(); } ``` ### 5.4 `GROUND_TRUTH.json` ```json { "sample_id": "cs-001-binaryformatter-deserialize", "lang": "csharp", "package_manager": "nuget", "vuln_ids": [ "OSV:PLACEHOLDER-BinaryFormatter" ], "entrypoints": [ "Program.cs" ], "reachable_symbols": [ { "purl": "pkg:nuget/Example.VulnerableLib@1.0.0", "symbol": "System.Runtime.Serialization.Formatters.Binary.BinaryFormatter::Deserialize", "kind": "sink", "note": "Reachable from POST /deserialize-unsafe body." } ], "evidence": [ { "type": "path", "file": "Program.cs", "line": 15, "desc": "Request body copied verbatim into BinaryFormatter.Deserialize." }, { "type": "exec", "cmd": "dotnet run", "result": "App listening on http://localhost:5000" }, { "type": "exec", "cmd": "curl -XPOST http://localhost:5000/deserialize-unsafe --data-binary @payload.bin", "result": "Response includes type name of deserialized object." } ] } ``` SBOM/VEX same pattern as previous examples, with `purl: "pkg:nuget/Example.VulnerableLib@1.0.0"`. --- ## 6. Tool output schema and integration This is the **normalized output** your runners should produce for each `(tool, sample)` pair. ### 6.1 `run.json` schema (results////run.json) ```jsonc { "tool": "mytool", "version": "1.2.3", "sample_id": "js-002-yaml-unsafe-load", "lang": "javascript", "detected_vulns": [ "OSV:PLACEHOLDER-js-yaml-unsafe-load" ], "reachable_symbols_reported": [ { "purl": "pkg:npm/js-yaml@4.1.0", "symbol": "load", "kind": "sink", "evidence": "Taint flow from POST /parse-unsafe body to js-yaml load()." } ], "verdict": { "reachable": true, "confidence": 0.92 }, "timing": { "scan_ms": 2300 }, "raw": "tool-output.json" // optional path to original tool output } ``` Fields: * `reachable` is your top-level yes/no reachability verdict **for the specific vulnerability(ies) listed in the sample**. * `reachable_symbols_reported` should map onto `GROUND_TRUTH.reachable_symbols` where possible. --- ## 7. Example integration: running a tool against all samples ### 7.1 Simple Bash runner (`runners/run_all.sh`) ```bash #!/usr/bin/env bash set -euo pipefail TOOL_NAME="${1:-mytool}" ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" for lang_dir in "$ROOT_DIR/samples"/*; do lang="$(basename "$lang_dir")" for sample_dir in "$lang_dir"/*; do sample_id="$(basename "$sample_dir")" echo "[*] Running $TOOL_NAME on $lang/$sample_id" sbom="$sample_dir/sbom.cdx.json" vex="$sample_dir/vex.cdx.json" mkdir -p "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id" # Example: assume your tool supports a CLI like: # mytool scan --sbom sbom.cdx.json --vex vex.cdx.json --project-root . mytool scan \ --sbom "$sbom" \ --vex "$vex" \ --project-root "$sample_dir" \ > "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id/tool-output.json" # Normalize output to run.json via helper script python "$ROOT_DIR/runners/normalize_${TOOL_NAME}.py" \ "$sample_dir" \ "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id/tool-output.json" \ > "$ROOT_DIR/results/$TOOL_NAME/$lang/$sample_id/run.json" done done ``` ### 7.2 Python normalizer example (`runners/normalize_mytool.py`) This script turns your proprietary tool output into our `run.json` schema. ```python #!/usr/bin/env python import json import sys from pathlib import Path sample_dir = Path(sys.argv[1]) tool_output_path = Path(sys.argv[2]) ground_truth = json.loads((sample_dir / "GROUND_TRUTH.json").read_text()) tool_output = json.loads(tool_output_path.read_text()) # Example: adapt based on your tool's own schema run = { "tool": "mytool", "version": tool_output.get("tool_version", "unknown"), "sample_id": ground_truth["sample_id"], "lang": ground_truth["lang"], "detected_vulns": tool_output.get("vuln_ids", []), "reachable_symbols_reported": [], "verdict": { "reachable": bool(tool_output.get("reachable", False)), "confidence": float(tool_output.get("confidence", 0.0)) }, "timing": { "scan_ms": tool_output.get("scan_ms", None) }, "raw": str(tool_output_path.name) } for r in tool_output.get("reachable_sinks", []): run["reachable_symbols_reported"].append({ "purl": r.get("purl"), "symbol": r.get("symbol"), "kind": r.get("kind", "sink"), "evidence": r.get("evidence", "") }) print(json.dumps(run, indent=2)) ``` So **tool authors** only need to: 1. Implement a CLI to scan a project (given SBOM/VEX). 2. Implement a small normalizer to produce `run.json`. --- ## 8. Adding a new sample (for contributors) This is what you’d document so others can extend the dataset. 1. **Pick a language & ID** * Folder: `samples//-NNN-/` * Example: `samples/php/php-004-guzzle-ssrf/` 2. **Create a minimal app** * It must install with one command (`composer install`, `npm ci`, `dotnet restore`, etc.). * Include: * **Positive path**: user-controllable data reaches the vulnerable sink. * **Negative path** (if possible): same dependency present but sink not reachable. 3. **Pin dependencies** * Commit lockfiles (`composer.lock`, `package-lock.json`, etc.). * Make sure the vulnerable version is used. 4. **Write `GROUND_TRUTH.json`** * Fill all required fields from the schema above. * Be explicit about which symbol(s) are reachable and how to reproduce. 5. **Generate SBOM** * Use your preferred SBOM generator and convert to CycloneDX 1.6 JSON (`sbom.cdx.json`). * Ensure PURLs match those you reference in `GROUND_TRUTH.json`. 6. **Write VEX (`vex.cdx.json`)** * At minimum: one vulnerability with `analysis.state = affected` or `not_affected`. * Link to the SBOM component via `affects.ref`. 7. **Add `repro.sh`** * Script that: * Installs deps. * Starts the app. * Executes at least one positive and one negative HTTP/CLI call. * Must exit non‑zero on obvious failure. 8. **Document briefly in local README** * What vulnerability pattern this sample represents (e.g., SSRF, XXE, unsafe deserialization). * Expected tool behavior (what should be marked reachable). --- If you want, you can literally copy-paste the code and JSON above as your initial three samples (`php-001`, `js-002`, `cs-001`) and then we can layer in more patterns (XXE, SSRF, prototype pollution, etc.) the same way.