git.stella-ops.org/docs/reachability/ground-truth-schema.md

# Ground Truth Schema for Reachability Datasets

> **Status:** Design v1 (Sprint 0401)
> **Owners:** Scanner Guild, Signals Guild, Quality Guild

This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.

---

## 1. Purpose

Ground truth datasets enable:

1. **Regression testing:** Detect regressions in reachability analysis accuracy
2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery
3. **Lattice validation:** Verify join/meet operations produce expected states
4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions

---

## 2. Dataset Structure

### 2.1 Directory Layout

```
datasets/reachability/
├── samples/
│   ├── java/
│   │   ├── vulnerable-log4j/
│   │   │   ├── manifest.json          # Sample metadata
│   │   │   ├── richgraph-v1.json      # Input callgraph
│   │   │   ├── ground-truth.json      # Expected outcomes
│   │   │   └── artifacts/             # Source binaries/SBOMs
│   │   └── safe-spring-boot/
│   │       └── ...
│   ├── native/
│   │   ├── stripped-elf/
│   │   └── openssl-vuln/
│   └── polyglot/
│       └── node-native-addon/
├── corpus/
│   ├── positive/                      # Known reachable samples
│   ├── negative/                      # Known unreachable samples
│   └── contested/                     # Known conflict samples
└── schema/
    ├── manifest.schema.json
    └── ground-truth.schema.json
```

### 2.2 Sample Manifest (`manifest.json`)

```json
{
  "sampleId": "sample:java:vulnerable-log4j:001",
  "version": "1.0.0",
  "createdAt": "2025-12-13T10:00:00Z",
  "language": "java",
  "category": "positive",
  "description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
  "source": {
    "repository": "https://github.com/example/vuln-app",
    "commit": "abc123...",
    "buildToolchain": "maven:3.9.0,jdk:17"
  },
  "vulnerabilities": [
    {
      "vulnId": "CVE-2021-44228",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
    }
  ],
  "artifacts": [
    {
      "path": "artifacts/app.jar",
      "hash": "sha256:...",
      "type": "application/java-archive"
    },
    {
      "path": "artifacts/sbom.cdx.json",
      "hash": "sha256:...",
      "type": "application/vnd.cyclonedx+json"
    }
  ]
}
```

### 2.3 Ground Truth Document (`ground-truth.json`)

```json
{
  "schema": "ground-truth-v1",
  "sampleId": "sample:java:vulnerable-log4j:001",
  "generatedAt": "2025-12-13T10:00:00Z",
  "generator": {
    "name": "manual-annotation",
    "version": "1.0.0",
    "annotator": "security-team"
  },
  "targets": [
    {
      "symbolId": "sym:java:...",
      "display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "expected": {
        "latticeState": "CR",
        "bucket": "direct",
        "reachable": true,
        "confidence": 0.95,
        "pathLength": 3,
        "path": [
          "sym:java:...main",
          "sym:java:...logInfo",
          "sym:java:...JndiLookup.lookup"
        ]
      },
      "reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
    },
    {
      "symbolId": "sym:java:...",
      "display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "expected": {
        "latticeState": "CU",
        "bucket": "unreachable",
        "reachable": false,
        "confidence": 0.90,
        "pathLength": null,
        "path": null
      },
      "reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
    }
  ],
  "entryPoints": [
    {
      "symbolId": "sym:java:...",
      "display": "com.example.app.Main.main",
      "phase": "runtime",
      "source": "manifest"
    }
  ],
  "expectedUncertainty": {
    "states": [],
    "aggregateTier": "T4",
    "riskScore": 0.0
  },
  "expectedGateDecisions": [
    {
      "vulnId": "CVE-2021-44228",
      "targetSymbol": "sym:java:...JndiLookup.lookup",
      "requestedStatus": "not_affected",
      "expectedDecision": "block",
      "expectedBlockedBy": "LatticeState",
      "expectedReason": "CR state incompatible with not_affected"
    },
    {
      "vulnId": "CVE-2021-44228",
      "targetSymbol": "sym:java:...JndiLookup.lookup",
      "requestedStatus": "affected",
      "expectedDecision": "allow"
    }
  ]
}
```

---

## 3. Schema Definitions

### 3.1 Ground Truth Target

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) |
| `display` | string | No | Human-readable symbol name |
| `purl` | string | No | Package URL of containing package |
| `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` |
| `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) |
| `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point |
| `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] |
| `expected.pathLength` | number | No | Expected path length (null if unreachable) |
| `expected.path` | string[] | No | Expected path (sorted, deterministic) |
| `reasoning` | string | Yes | Human explanation of expected outcome |

### 3.2 Expected Gate Decision

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `vulnId` | string | Yes | Vulnerability identifier |
| `targetSymbol` | string | Yes | Target SymbolID |
| `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` |
| `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` |
| `expectedBlockedBy` | string | No | Gate name if blocked |
| `expectedReason` | string | No | Expected reason message |

---

## 4. Sample Categories

### 4.1 Positive Samples (Reachable)

Known-reachable cases where vulnerable code is called:

- **direct-call:** Vulnerable function called directly from entry point
- **transitive:** Multi-hop path from entry point to vulnerable function
- **runtime-observed:** Confirmed reachable via runtime probe
- **init-array:** Reachable via load-time constructor

### 4.2 Negative Samples (Unreachable)

Known-unreachable cases where vulnerable code exists but isn't called:

- **dead-code:** Function present but never invoked
- **conditional-unreachable:** Function behind impossible condition
- **test-only:** Function only reachable from test entry points
- **deprecated-api:** Old API present but replaced by new implementation

### 4.3 Contested Samples

Cases where static and runtime evidence conflict:

- **static-reach-runtime-miss:** Static analysis finds path, runtime never observes
- **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution
- **version-mismatch:** Analysis version differs from runtime version

---

## 5. Benchmark Metrics

### 5.1 Path Discovery Metrics

```
Precision = TruePositive / (TruePositive + FalsePositive)
Recall    = TruePositive / (TruePositive + FalseNegative)
F1        = 2 * (Precision * Recall) / (Precision + Recall)
```

### 5.2 Lattice State Accuracy

```
StateAccuracy = CorrectStates / TotalTargets
BucketAccuracy = CorrectBuckets / TotalTargets  (v0 compatibility)
```

### 5.3 Gate Decision Accuracy

```
GateAccuracy = CorrectDecisions / TotalGateTests
FalseAllow   = AllowedWhenShouldBlock / TotalBlocks  (critical metric)
FalseBlock   = BlockedWhenShouldAllow / TotalAllows
```

---

## 6. Test Harness Integration

### 6.1 xUnit Test Pattern

```csharp
[Theory]
[MemberData(nameof(GetGroundTruthSamples))]
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
{
    // Arrange
    var graph = await LoadRichGraphAsync(sample.GraphPath);
    var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();

    // Act
    var result = await scorer.ComputeAsync(graph, sample.EntryPoints);

    // Assert
    foreach (var target in sample.Targets)
    {
        var actual = result.States.First(s => s.SymbolId == target.SymbolId);
        Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
        Assert.Equal(target.Expected.Reachable, actual.Reachable);
        Assert.InRange(actual.Confidence,
            target.Expected.Confidence - 0.05,
            target.Expected.Confidence + 0.05);
    }
}
```

### 6.2 Benchmark Runner

```bash
# Run reachability benchmarks
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
  --dataset datasets/reachability/samples \
  --output benchmark-results.json \
  --threshold-f1 0.95 \
  --threshold-gate-accuracy 0.99
```

---

## 7. Sample Contribution Guidelines

### 7.1 Adding New Samples

1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/`
2. Add `manifest.json` with sample metadata
3. Add `richgraph-v1.json` (run scanner on artifacts)
4. Create `ground-truth.json` with manual annotations
5. Include reasoning for each expected outcome
6. Run validation: `dotnet test --filter "GroundTruth"`

### 7.2 Ground Truth Validation

Ground truth files must pass schema validation:

```bash
npx ajv validate -s docs/reachability/ground-truth.schema.json \
  -d datasets/reachability/samples/**/ground-truth.json
```

### 7.3 Review Requirements

- All samples require two independent annotators
- Contested samples require security team review
- Changes to existing samples require regression test pass

---

## 8. Related Documents

- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
- [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification

---

## Changelog

| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |