# Ground Truth Schema for Reachability Datasets > **Status:** Design v1 (Sprint 0401) > **Owners:** Scanner Guild, Signals Guild, Quality Guild This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions. --- ## 1. Purpose Ground truth datasets enable: 1. **Regression testing:** Detect regressions in reachability analysis accuracy 2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery 3. **Lattice validation:** Verify join/meet operations produce expected states 4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions --- ## 2. Dataset Structure ### 2.1 Directory Layout ``` datasets/reachability/ ├── samples/ │ ├── java/ │ │ ├── vulnerable-log4j/ │ │ │ ├── manifest.json # Sample metadata │ │ │ ├── richgraph-v1.json # Input callgraph │ │ │ ├── ground-truth.json # Expected outcomes │ │ │ └── artifacts/ # Source binaries/SBOMs │ │ └── safe-spring-boot/ │ │ └── ... │ ├── native/ │ │ ├── stripped-elf/ │ │ └── openssl-vuln/ │ └── polyglot/ │ └── node-native-addon/ ├── corpus/ │ ├── positive/ # Known reachable samples │ ├── negative/ # Known unreachable samples │ └── contested/ # Known conflict samples └── schema/ ├── manifest.schema.json └── ground-truth.schema.json ``` ### 2.2 Sample Manifest (`manifest.json`) ```json { "sampleId": "sample:java:vulnerable-log4j:001", "version": "1.0.0", "createdAt": "2025-12-13T10:00:00Z", "language": "java", "category": "positive", "description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path", "source": { "repository": "https://github.com/example/vuln-app", "commit": "abc123...", "buildToolchain": "maven:3.9.0,jdk:17" }, "vulnerabilities": [ { "vulnId": "CVE-2021-44228", "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1", "affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup" } ], "artifacts": [ { "path": "artifacts/app.jar", "hash": "sha256:...", "type": "application/java-archive" }, { "path": "artifacts/sbom.cdx.json", "hash": "sha256:...", "type": "application/vnd.cyclonedx+json" } ] } ``` ### 2.3 Ground Truth Document (`ground-truth.json`) ```json { "schema": "ground-truth-v1", "sampleId": "sample:java:vulnerable-log4j:001", "generatedAt": "2025-12-13T10:00:00Z", "generator": { "name": "manual-annotation", "version": "1.0.0", "annotator": "security-team" }, "targets": [ { "symbolId": "sym:java:...", "display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup", "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1", "expected": { "latticeState": "CR", "bucket": "direct", "reachable": true, "confidence": 0.95, "pathLength": 3, "path": [ "sym:java:...main", "sym:java:...logInfo", "sym:java:...JndiLookup.lookup" ] }, "reasoning": "Direct call path from main() through logging framework to vulnerable lookup method" }, { "symbolId": "sym:java:...", "display": "org.apache.logging.log4j.core.net.JndiManager.lookup", "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1", "expected": { "latticeState": "CU", "bucket": "unreachable", "reachable": false, "confidence": 0.90, "pathLength": null, "path": null }, "reasoning": "JndiManager.lookup is present but not called from any reachable entry point" } ], "entryPoints": [ { "symbolId": "sym:java:...", "display": "com.example.app.Main.main", "phase": "runtime", "source": "manifest" } ], "expectedUncertainty": { "states": [], "aggregateTier": "T4", "riskScore": 0.0 }, "expectedGateDecisions": [ { "vulnId": "CVE-2021-44228", "targetSymbol": "sym:java:...JndiLookup.lookup", "requestedStatus": "not_affected", "expectedDecision": "block", "expectedBlockedBy": "LatticeState", "expectedReason": "CR state incompatible with not_affected" }, { "vulnId": "CVE-2021-44228", "targetSymbol": "sym:java:...JndiLookup.lookup", "requestedStatus": "affected", "expectedDecision": "allow" } ] } ``` --- ## 3. Schema Definitions ### 3.1 Ground Truth Target | Field | Type | Required | Description | |-------|------|----------|-------------| | `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) | | `display` | string | No | Human-readable symbol name | | `purl` | string | No | Package URL of containing package | | `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` | | `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) | | `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point | | `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] | | `expected.pathLength` | number | No | Expected path length (null if unreachable) | | `expected.path` | string[] | No | Expected path (sorted, deterministic) | | `reasoning` | string | Yes | Human explanation of expected outcome | ### 3.2 Expected Gate Decision | Field | Type | Required | Description | |-------|------|----------|-------------| | `vulnId` | string | Yes | Vulnerability identifier | | `targetSymbol` | string | Yes | Target SymbolID | | `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` | | `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` | | `expectedBlockedBy` | string | No | Gate name if blocked | | `expectedReason` | string | No | Expected reason message | --- ## 4. Sample Categories ### 4.1 Positive Samples (Reachable) Known-reachable cases where vulnerable code is called: - **direct-call:** Vulnerable function called directly from entry point - **transitive:** Multi-hop path from entry point to vulnerable function - **runtime-observed:** Confirmed reachable via runtime probe - **init-array:** Reachable via load-time constructor ### 4.2 Negative Samples (Unreachable) Known-unreachable cases where vulnerable code exists but isn't called: - **dead-code:** Function present but never invoked - **conditional-unreachable:** Function behind impossible condition - **test-only:** Function only reachable from test entry points - **deprecated-api:** Old API present but replaced by new implementation ### 4.3 Contested Samples Cases where static and runtime evidence conflict: - **static-reach-runtime-miss:** Static analysis finds path, runtime never observes - **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution - **version-mismatch:** Analysis version differs from runtime version --- ## 5. Benchmark Metrics ### 5.1 Path Discovery Metrics ``` Precision = TruePositive / (TruePositive + FalsePositive) Recall = TruePositive / (TruePositive + FalseNegative) F1 = 2 * (Precision * Recall) / (Precision + Recall) ``` ### 5.2 Lattice State Accuracy ``` StateAccuracy = CorrectStates / TotalTargets BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility) ``` ### 5.3 Gate Decision Accuracy ``` GateAccuracy = CorrectDecisions / TotalGateTests FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric) FalseBlock = BlockedWhenShouldAllow / TotalAllows ``` --- ## 6. Test Harness Integration ### 6.1 xUnit Test Pattern ```csharp [Theory] [MemberData(nameof(GetGroundTruthSamples))] public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample) { // Arrange var graph = await LoadRichGraphAsync(sample.GraphPath); var scorer = _serviceProvider.GetRequiredService(); // Act var result = await scorer.ComputeAsync(graph, sample.EntryPoints); // Assert foreach (var target in sample.Targets) { var actual = result.States.First(s => s.SymbolId == target.SymbolId); Assert.Equal(target.Expected.LatticeState, actual.LatticeState); Assert.Equal(target.Expected.Reachable, actual.Reachable); Assert.InRange(actual.Confidence, target.Expected.Confidence - 0.05, target.Expected.Confidence + 0.05); } } ``` ### 6.2 Benchmark Runner ```bash # Run reachability benchmarks dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \ --dataset datasets/reachability/samples \ --output benchmark-results.json \ --threshold-f1 0.95 \ --threshold-gate-accuracy 0.99 ``` --- ## 7. Sample Contribution Guidelines ### 7.1 Adding New Samples 1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/` 2. Add `manifest.json` with sample metadata 3. Add `richgraph-v1.json` (run scanner on artifacts) 4. Create `ground-truth.json` with manual annotations 5. Include reasoning for each expected outcome 6. Run validation: `dotnet test --filter "GroundTruth"` ### 7.2 Ground Truth Validation Ground truth files must pass schema validation: ```bash npx ajv validate -s docs/reachability/ground-truth.schema.json \ -d datasets/reachability/samples/**/ground-truth.json ``` ### 7.3 Review Requirements - All samples require two independent annotators - Contested samples require security team review - Changes to existing samples require regression test pass --- ## 8. Related Documents - [Lattice Model](./lattice.md) — v1 formal 7-state lattice - [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions - [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema - [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification --- ## Changelog | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |