10 KiB
10 KiB
Ground Truth Schema for Reachability Datasets
Status: Design v1 (Sprint 0401) Owners: Scanner Guild, Signals Guild, Quality Guild
This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.
1. Purpose
Ground truth datasets enable:
- Regression testing: Detect regressions in reachability analysis accuracy
- Benchmark scoring: Measure precision, recall, F1 for path discovery
- Lattice validation: Verify join/meet operations produce expected states
- Policy gate testing: Ensure gates block/allow correct VEX transitions
2. Dataset Structure
2.1 Directory Layout
datasets/reachability/
├── samples/
│ ├── java/
│ │ ├── vulnerable-log4j/
│ │ │ ├── manifest.json # Sample metadata
│ │ │ ├── richgraph-v1.json # Input callgraph
│ │ │ ├── ground-truth.json # Expected outcomes
│ │ │ └── artifacts/ # Source binaries/SBOMs
│ │ └── safe-spring-boot/
│ │ └── ...
│ ├── native/
│ │ ├── stripped-elf/
│ │ └── openssl-vuln/
│ └── polyglot/
│ └── node-native-addon/
├── corpus/
│ ├── positive/ # Known reachable samples
│ ├── negative/ # Known unreachable samples
│ └── contested/ # Known conflict samples
└── schema/
├── manifest.schema.json
└── ground-truth.schema.json
2.2 Sample Manifest (manifest.json)
{
"sampleId": "sample:java:vulnerable-log4j:001",
"version": "1.0.0",
"createdAt": "2025-12-13T10:00:00Z",
"language": "java",
"category": "positive",
"description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
"source": {
"repository": "https://github.com/example/vuln-app",
"commit": "abc123...",
"buildToolchain": "maven:3.9.0,jdk:17"
},
"vulnerabilities": [
{
"vulnId": "CVE-2021-44228",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
}
],
"artifacts": [
{
"path": "artifacts/app.jar",
"hash": "sha256:...",
"type": "application/java-archive"
},
{
"path": "artifacts/sbom.cdx.json",
"hash": "sha256:...",
"type": "application/vnd.cyclonedx+json"
}
]
}
2.3 Ground Truth Document (ground-truth.json)
{
"schema": "ground-truth-v1",
"sampleId": "sample:java:vulnerable-log4j:001",
"generatedAt": "2025-12-13T10:00:00Z",
"generator": {
"name": "manual-annotation",
"version": "1.0.0",
"annotator": "security-team"
},
"targets": [
{
"symbolId": "sym:java:...",
"display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"expected": {
"latticeState": "CR",
"bucket": "direct",
"reachable": true,
"confidence": 0.95,
"pathLength": 3,
"path": [
"sym:java:...main",
"sym:java:...logInfo",
"sym:java:...JndiLookup.lookup"
]
},
"reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
},
{
"symbolId": "sym:java:...",
"display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"expected": {
"latticeState": "CU",
"bucket": "unreachable",
"reachable": false,
"confidence": 0.90,
"pathLength": null,
"path": null
},
"reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
}
],
"entryPoints": [
{
"symbolId": "sym:java:...",
"display": "com.example.app.Main.main",
"phase": "runtime",
"source": "manifest"
}
],
"expectedUncertainty": {
"states": [],
"aggregateTier": "T4",
"riskScore": 0.0
},
"expectedGateDecisions": [
{
"vulnId": "CVE-2021-44228",
"targetSymbol": "sym:java:...JndiLookup.lookup",
"requestedStatus": "not_affected",
"expectedDecision": "block",
"expectedBlockedBy": "LatticeState",
"expectedReason": "CR state incompatible with not_affected"
},
{
"vulnId": "CVE-2021-44228",
"targetSymbol": "sym:java:...JndiLookup.lookup",
"requestedStatus": "affected",
"expectedDecision": "allow"
}
]
}
3. Schema Definitions
3.1 Ground Truth Target
| Field | Type | Required | Description |
|---|---|---|---|
symbolId |
string | Yes | Canonical SymbolID (sym:{lang}:{hash}) |
display |
string | No | Human-readable symbol name |
purl |
string | No | Package URL of containing package |
expected.latticeState |
enum | Yes | Expected v1 lattice state: U, SR, SU, RO, RU, CR, CU, X |
expected.bucket |
enum | Yes | Expected v0 bucket (backward compat) |
expected.reachable |
boolean | Yes | True if symbol is reachable from any entry point |
expected.confidence |
number | Yes | Expected confidence score [0.0-1.0] |
expected.pathLength |
number | No | Expected path length (null if unreachable) |
expected.path |
string[] | No | Expected path (sorted, deterministic) |
reasoning |
string | Yes | Human explanation of expected outcome |
3.2 Expected Gate Decision
| Field | Type | Required | Description |
|---|---|---|---|
vulnId |
string | Yes | Vulnerability identifier |
targetSymbol |
string | Yes | Target SymbolID |
requestedStatus |
enum | Yes | VEX status: affected, not_affected, under_investigation, fixed |
expectedDecision |
enum | Yes | Gate outcome: allow, block, warn |
expectedBlockedBy |
string | No | Gate name if blocked |
expectedReason |
string | No | Expected reason message |
4. Sample Categories
4.1 Positive Samples (Reachable)
Known-reachable cases where vulnerable code is called:
- direct-call: Vulnerable function called directly from entry point
- transitive: Multi-hop path from entry point to vulnerable function
- runtime-observed: Confirmed reachable via runtime probe
- init-array: Reachable via load-time constructor
4.2 Negative Samples (Unreachable)
Known-unreachable cases where vulnerable code exists but isn't called:
- dead-code: Function present but never invoked
- conditional-unreachable: Function behind impossible condition
- test-only: Function only reachable from test entry points
- deprecated-api: Old API present but replaced by new implementation
4.3 Contested Samples
Cases where static and runtime evidence conflict:
- static-reach-runtime-miss: Static analysis finds path, runtime never observes
- static-miss-runtime-hit: Static analysis misses path, runtime observes execution
- version-mismatch: Analysis version differs from runtime version
5. Benchmark Metrics
5.1 Path Discovery Metrics
Precision = TruePositive / (TruePositive + FalsePositive)
Recall = TruePositive / (TruePositive + FalseNegative)
F1 = 2 * (Precision * Recall) / (Precision + Recall)
5.2 Lattice State Accuracy
StateAccuracy = CorrectStates / TotalTargets
BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility)
5.3 Gate Decision Accuracy
GateAccuracy = CorrectDecisions / TotalGateTests
FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric)
FalseBlock = BlockedWhenShouldAllow / TotalAllows
6. Test Harness Integration
6.1 xUnit Test Pattern
[Theory]
[MemberData(nameof(GetGroundTruthSamples))]
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
{
// Arrange
var graph = await LoadRichGraphAsync(sample.GraphPath);
var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();
// Act
var result = await scorer.ComputeAsync(graph, sample.EntryPoints);
// Assert
foreach (var target in sample.Targets)
{
var actual = result.States.First(s => s.SymbolId == target.SymbolId);
Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
Assert.Equal(target.Expected.Reachable, actual.Reachable);
Assert.InRange(actual.Confidence,
target.Expected.Confidence - 0.05,
target.Expected.Confidence + 0.05);
}
}
6.2 Benchmark Runner
# Run reachability benchmarks
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
--dataset datasets/reachability/samples \
--output benchmark-results.json \
--threshold-f1 0.95 \
--threshold-gate-accuracy 0.99
7. Sample Contribution Guidelines
7.1 Adding New Samples
- Create directory under
datasets/reachability/samples/{language}/{sample-name}/ - Add
manifest.jsonwith sample metadata - Add
richgraph-v1.json(run scanner on artifacts) - Create
ground-truth.jsonwith manual annotations - Include reasoning for each expected outcome
- Run validation:
dotnet test --filter "GroundTruth"
7.2 Ground Truth Validation
Ground truth files must pass schema validation:
npx ajv validate -s docs/reachability/ground-truth.schema.json \
-d datasets/reachability/samples/**/ground-truth.json
7.3 Review Requirements
- All samples require two independent annotators
- Contested samples require security team review
- Changes to existing samples require regression test pass
8. Related Documents
- Lattice Model — v1 formal 7-state lattice
- Policy Gates — Gate rules for VEX decisions
- Evidence Schema — richgraph-v1 schema
- richgraph-v1 Contract — Full schema specification
Changelog
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |