# Ground-Truth Corpus Specification > **Version**: 1.0.0 > **Last Updated**: 2025-12-17 > **Source Advisory**: 16-Dec-2025 - Building a Deeper Moat Beyond Reachability This document specifies the ground-truth corpus for benchmarking StellaOps' binary-only reachability analysis and deterministic scoring. --- ## Overview A ground-truth corpus is a curated set of binaries with **known** reachable and unreachable vulnerable sinks. It enables: - Precision/recall measurement for reachability claims - Regression detection in CI - Deterministic replay validation --- ## Corpus Structure ### Sample Requirements Each sample binary must include: - **Manifest file**: `sample.manifest.json` with ground-truth annotations - **Binary file**: The target executable (ELF/PE/Mach-O) - **Source (optional)**: Original source for reproducibility verification ### Manifest Schema ```json { "$schema": "https://stellaops.io/schemas/corpus-sample.v1.json", "sampleId": "gt-0001", "name": "vulnerable-sink-reachable-from-main", "format": "elf64", "arch": "x86_64", "compiler": "gcc-13.2", "compilerFlags": ["-O2", "-fPIE"], "stripped": false, "obfuscation": "none", "pie": true, "cfi": false, "sinks": [ { "sinkId": "sink-001", "signature": "vulnerable_function(char*)", "address": "0x401234", "cveId": "CVE-2024-XXXXX", "expected": "reachable", "expectedPaths": [ ["main", "process_input", "parse_data", "vulnerable_function"] ], "expectedUnreachableReasons": null }, { "sinkId": "sink-002", "signature": "dead_code_vulnerable()", "address": "0x402000", "cveId": "CVE-2024-YYYYY", "expected": "unreachable", "expectedPaths": null, "expectedUnreachableReasons": ["no-caller", "dead-code-elimination"] } ], "entrypoints": [ {"name": "main", "address": "0x401000"}, {"name": "_start", "address": "0x400ff0"} ], "metadata": { "createdAt": "2025-12-17T00:00:00Z", "author": "StellaOps QA Guild", "notes": "Basic reachability test with one true positive and one true negative" } } ``` --- ## Starter Corpus (20 Samples) ### Category A: Reachable Sinks (10 samples) | ID | Description | Format | Stripped | Obfuscation | Expected | |----|-------------|--------|----------|-------------|----------| | gt-0001 | Direct call from main | ELF64 | No | None | Reachable | | gt-0002 | Indirect call via function pointer | ELF64 | No | None | Reachable | | gt-0003 | Reachable through PLT/GOT | ELF64 | No | None | Reachable | | gt-0004 | Reachable via vtable dispatch | ELF64 | No | None | Reachable | | gt-0005 | Reachable with stripped symbols | ELF64 | Yes | None | Reachable | | gt-0006 | Reachable with partial obfuscation | ELF64 | No | Control-flow | Reachable | | gt-0007 | Reachable in PIE binary | ELF64 | No | None | Reachable | | gt-0008 | Reachable in ASLR context | ELF64 | No | None | Reachable | | gt-0009 | Reachable through shared library | ELF64 | No | None | Reachable | | gt-0010 | Reachable via callback registration | ELF64 | No | None | Reachable | ### Category B: Unreachable Sinks (10 samples) | ID | Description | Format | Stripped | Obfuscation | Expected Reason | |----|-------------|--------|----------|-------------|-----------------| | gt-0011 | Dead code (never called) | ELF64 | No | None | no-caller | | gt-0012 | Guarded by impossible condition | ELF64 | No | None | dead-branch | | gt-0013 | Linked but not used | ELF64 | No | None | unused-import | | gt-0014 | Behind disabled feature flag | ELF64 | No | None | config-disabled | | gt-0015 | Requires privilege escalation | ELF64 | No | None | privilege-gate | | gt-0016 | Behind authentication check | ELF64 | No | None | auth-gate | | gt-0017 | Unreachable with CFI enabled | ELF64 | No | None | cfi-prevented | | gt-0018 | Optimized away by compiler | ELF64 | No | None | dce-eliminated | | gt-0019 | In unreachable exception handler | ELF64 | No | None | exception-only | | gt-0020 | Test-only code not in production | ELF64 | No | None | test-code-only | --- ## Metrics ### Primary Metrics | Metric | Definition | Target | |--------|------------|--------| | **Precision** | TP / (TP + FP) | ≥ 95% | | **Recall** | TP / (TP + FN) | ≥ 90% | | **F1 Score** | 2 × (Precision × Recall) / (Precision + Recall) | ≥ 92% | | **TTFRP** | Time-to-First-Reachable-Path (ms) | p95 < 500ms | | **Deterministic Replay** | Identical proofs across runs | 100% | ### Regression Gates CI gates that **fail the build**: - Precision drops > 1.0 percentage point vs baseline - Recall drops > 1.0 percentage point vs baseline - Deterministic replay drops below 100% - TTFRP p95 increases > 20% vs baseline --- ## CI Integration ### Benchmark Job ```yaml # .gitea/workflows/reachability-bench.yaml name: Reachability Benchmark on: push: branches: [main] pull_request: branches: [main] schedule: - cron: '0 2 * * *' # Nightly jobs: benchmark: runs-on: self-hosted steps: - uses: actions/checkout@v4 - name: Run corpus benchmark run: | stellaops bench run \ --corpus datasets/reachability/ground-truth/ \ --output bench/results/$(date +%Y%m%d).json \ --baseline bench/baselines/current.json - name: Check regression gates run: | stellaops bench check \ --results bench/results/$(date +%Y%m%d).json \ --baseline bench/baselines/current.json \ --precision-threshold 0.95 \ --recall-threshold 0.90 \ --determinism-threshold 1.0 - name: Post results to PR if: github.event_name == 'pull_request' run: | stellaops bench report \ --results bench/results/$(date +%Y%m%d).json \ --baseline bench/baselines/current.json \ --format markdown > bench-report.md # Post to PR via API ``` ### Result Schema ```json { "runId": "bench-20251217-001", "timestamp": "2025-12-17T02:00:00Z", "corpusVersion": "1.0.0", "scannerVersion": "1.3.0", "metrics": { "precision": 0.96, "recall": 0.91, "f1": 0.935, "ttfrp_p50_ms": 120, "ttfrp_p95_ms": 380, "deterministicReplay": 1.0 }, "samples": [ { "sampleId": "gt-0001", "sinkId": "sink-001", "expected": "reachable", "actual": "reachable", "pathFound": ["main", "process_input", "parse_data", "vulnerable_function"], "proofHash": "sha256:abc123...", "ttfrpMs": 95 } ], "regressions": [], "improvements": [] } ``` --- ## Corpus Maintenance ### Adding New Samples 1. Create sample binary with known sink reachability 2. Write `sample.manifest.json` with ground-truth annotations 3. Place in `datasets/reachability/ground-truth/{category}/` 4. Update corpus version in `datasets/reachability/corpus.json` 5. Run baseline update: `stellaops bench baseline update` ### Updating Baselines When scanner improvements are validated: ```bash stellaops bench baseline update \ --results bench/results/latest.json \ --output bench/baselines/current.json ``` ### Sample Categories - `basic/` — Simple direct call chains - `indirect/` — Function pointers, vtables, callbacks - `stripped/` — Symbol-stripped binaries - `obfuscated/` — Control-flow obfuscation, packing - `guarded/` — Config/auth/privilege guards - `multiarch/` — ARM64, x86, RISC-V variants --- ## Related Documentation - [Reachability Analysis Technical Reference](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md) - [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md) - [Scanner Benchmark Submission Guide](submission-guide.md)