# Ground-Truth Corpus Specification

> **Version**: 1.0.0  
> **Last Updated**: 2025-12-17  
> **Source Advisory**: 16-Dec-2025 - Building a Deeper Moat Beyond Reachability

This document specifies the ground-truth corpus for benchmarking StellaOps' binary-only reachability analysis and deterministic scoring.

---

## Overview

A ground-truth corpus is a curated set of binaries with **known** reachable and unreachable vulnerable sinks. It enables:
- Precision/recall measurement for reachability claims
- Regression detection in CI
- Deterministic replay validation

---

## Corpus Structure

### Sample Requirements

Each sample binary must include:
- **Manifest file**: `sample.manifest.json` with ground-truth annotations
- **Binary file**: The target executable (ELF/PE/Mach-O)
- **Source (optional)**: Original source for reproducibility verification

### Manifest Schema

```json
{
  "$schema": "https://stellaops.io/schemas/corpus-sample.v1.json",
  "sampleId": "gt-0001",
  "name": "vulnerable-sink-reachable-from-main",
  "format": "elf64",
  "arch": "x86_64",
  "compiler": "gcc-13.2",
  "compilerFlags": ["-O2", "-fPIE"],
  "stripped": false,
  "obfuscation": "none",
  "pie": true,
  "cfi": false,
  "sinks": [
    {
      "sinkId": "sink-001",
      "signature": "vulnerable_function(char*)",
      "address": "0x401234",
      "cveId": "CVE-2024-XXXXX",
      "expected": "reachable",
      "expectedPaths": [
        ["main", "process_input", "parse_data", "vulnerable_function"]
      ],
      "expectedUnreachableReasons": null
    },
    {
      "sinkId": "sink-002", 
      "signature": "dead_code_vulnerable()",
      "address": "0x402000",
      "cveId": "CVE-2024-YYYYY",
      "expected": "unreachable",
      "expectedPaths": null,
      "expectedUnreachableReasons": ["no-caller", "dead-code-elimination"]
    }
  ],
  "entrypoints": [
    {"name": "main", "address": "0x401000"},
    {"name": "_start", "address": "0x400ff0"}
  ],
  "metadata": {
    "createdAt": "2025-12-17T00:00:00Z",
    "author": "StellaOps QA Guild",
    "notes": "Basic reachability test with one true positive and one true negative"
  }
}
```

---

## Starter Corpus (20 Samples)

### Category A: Reachable Sinks (10 samples)

| ID | Description | Format | Stripped | Obfuscation | Expected |
|----|-------------|--------|----------|-------------|----------|
| gt-0001 | Direct call from main | ELF64 | No | None | Reachable |
| gt-0002 | Indirect call via function pointer | ELF64 | No | None | Reachable |
| gt-0003 | Reachable through PLT/GOT | ELF64 | No | None | Reachable |
| gt-0004 | Reachable via vtable dispatch | ELF64 | No | None | Reachable |
| gt-0005 | Reachable with stripped symbols | ELF64 | Yes | None | Reachable |
| gt-0006 | Reachable with partial obfuscation | ELF64 | No | Control-flow | Reachable |
| gt-0007 | Reachable in PIE binary | ELF64 | No | None | Reachable |
| gt-0008 | Reachable in ASLR context | ELF64 | No | None | Reachable |
| gt-0009 | Reachable through shared library | ELF64 | No | None | Reachable |
| gt-0010 | Reachable via callback registration | ELF64 | No | None | Reachable |

### Category B: Unreachable Sinks (10 samples)

| ID | Description | Format | Stripped | Obfuscation | Expected Reason |
|----|-------------|--------|----------|-------------|-----------------|
| gt-0011 | Dead code (never called) | ELF64 | No | None | no-caller |
| gt-0012 | Guarded by impossible condition | ELF64 | No | None | dead-branch |
| gt-0013 | Linked but not used | ELF64 | No | None | unused-import |
| gt-0014 | Behind disabled feature flag | ELF64 | No | None | config-disabled |
| gt-0015 | Requires privilege escalation | ELF64 | No | None | privilege-gate |
| gt-0016 | Behind authentication check | ELF64 | No | None | auth-gate |
| gt-0017 | Unreachable with CFI enabled | ELF64 | No | None | cfi-prevented |
| gt-0018 | Optimized away by compiler | ELF64 | No | None | dce-eliminated |
| gt-0019 | In unreachable exception handler | ELF64 | No | None | exception-only |
| gt-0020 | Test-only code not in production | ELF64 | No | None | test-code-only |

---

## Metrics

### Primary Metrics

| Metric | Definition | Target |
|--------|------------|--------|
| **Precision** | TP / (TP + FP) | ≥ 95% |
| **Recall** | TP / (TP + FN) | ≥ 90% |
| **F1 Score** | 2 × (Precision × Recall) / (Precision + Recall) | ≥ 92% |
| **TTFRP** | Time-to-First-Reachable-Path (ms) | p95 < 500ms |
| **Deterministic Replay** | Identical proofs across runs | 100% |

### Regression Gates

CI gates that **fail the build**:
- Precision drops > 1.0 percentage point vs baseline
- Recall drops > 1.0 percentage point vs baseline
- Deterministic replay drops below 100%
- TTFRP p95 increases > 20% vs baseline

---

## CI Integration

### Benchmark Job

```yaml
# .gitea/workflows/reachability-bench.yaml
name: Reachability Benchmark
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'  # Nightly

jobs:
  benchmark:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4
      
      - name: Run corpus benchmark
        run: |
          stellaops bench run \
            --corpus datasets/reachability/ground-truth/ \
            --output bench/results/$(date +%Y%m%d).json \
            --baseline bench/baselines/current.json
      
      - name: Check regression gates
        run: |
          stellaops bench check \
            --results bench/results/$(date +%Y%m%d).json \
            --baseline bench/baselines/current.json \
            --precision-threshold 0.95 \
            --recall-threshold 0.90 \
            --determinism-threshold 1.0
      
      - name: Post results to PR
        if: github.event_name == 'pull_request'
        run: |
          stellaops bench report \
            --results bench/results/$(date +%Y%m%d).json \
            --baseline bench/baselines/current.json \
            --format markdown > bench-report.md
          # Post to PR via API
```

### Result Schema

```json
{
  "runId": "bench-20251217-001",
  "timestamp": "2025-12-17T02:00:00Z",
  "corpusVersion": "1.0.0",
  "scannerVersion": "1.3.0",
  "metrics": {
    "precision": 0.96,
    "recall": 0.91,
    "f1": 0.935,
    "ttfrp_p50_ms": 120,
    "ttfrp_p95_ms": 380,
    "deterministicReplay": 1.0
  },
  "samples": [
    {
      "sampleId": "gt-0001",
      "sinkId": "sink-001",
      "expected": "reachable",
      "actual": "reachable",
      "pathFound": ["main", "process_input", "parse_data", "vulnerable_function"],
      "proofHash": "sha256:abc123...",
      "ttfrpMs": 95
    }
  ],
  "regressions": [],
  "improvements": []
}
```

---

## Corpus Maintenance

### Adding New Samples

1. Create sample binary with known sink reachability
2. Write `sample.manifest.json` with ground-truth annotations
3. Place in `datasets/reachability/ground-truth/{category}/`
4. Update corpus version in `datasets/reachability/corpus.json`
5. Run baseline update: `stellaops bench baseline update`

### Updating Baselines

When scanner improvements are validated:
```bash
stellaops bench baseline update \
  --results bench/results/latest.json \
  --output bench/baselines/current.json
```

### Sample Categories

- `basic/` — Simple direct call chains
- `indirect/` — Function pointers, vtables, callbacks
- `stripped/` — Symbol-stripped binaries
- `obfuscated/` — Control-flow obfuscation, packing
- `guarded/` — Config/auth/privilege guards
- `multiarch/` — ARM64, x86, RISC-V variants

---

## Related Documentation

- [Reachability Analysis Technical Reference](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
- [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
- [Scanner Benchmark Submission Guide](submission-guide.md)