338 lines
10 KiB
Markdown
338 lines
10 KiB
Markdown
# Ground Truth Schema for Reachability Datasets
|
|
|
|
> **Status:** Design v1 (Sprint 0401)
|
|
> **Owners:** Scanner Guild, Signals Guild, Quality Guild
|
|
|
|
This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
Ground truth datasets enable:
|
|
|
|
1. **Regression testing:** Detect regressions in reachability analysis accuracy
|
|
2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery
|
|
3. **Lattice validation:** Verify join/meet operations produce expected states
|
|
4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions
|
|
|
|
---
|
|
|
|
## 2. Dataset Structure
|
|
|
|
### 2.1 Directory Layout
|
|
|
|
```
|
|
datasets/reachability/
|
|
├── samples/
|
|
│ ├── java/
|
|
│ │ ├── vulnerable-log4j/
|
|
│ │ │ ├── manifest.json # Sample metadata
|
|
│ │ │ ├── richgraph-v1.json # Input callgraph
|
|
│ │ │ ├── ground-truth.json # Expected outcomes
|
|
│ │ │ └── artifacts/ # Source binaries/SBOMs
|
|
│ │ └── safe-spring-boot/
|
|
│ │ └── ...
|
|
│ ├── native/
|
|
│ │ ├── stripped-elf/
|
|
│ │ └── openssl-vuln/
|
|
│ └── polyglot/
|
|
│ └── node-native-addon/
|
|
├── corpus/
|
|
│ ├── positive/ # Known reachable samples
|
|
│ ├── negative/ # Known unreachable samples
|
|
│ └── contested/ # Known conflict samples
|
|
└── schema/
|
|
├── manifest.schema.json
|
|
└── ground-truth.schema.json
|
|
```
|
|
|
|
### 2.2 Sample Manifest (`manifest.json`)
|
|
|
|
```json
|
|
{
|
|
"sampleId": "sample:java:vulnerable-log4j:001",
|
|
"version": "1.0.0",
|
|
"createdAt": "2025-12-13T10:00:00Z",
|
|
"language": "java",
|
|
"category": "positive",
|
|
"description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
|
|
"source": {
|
|
"repository": "https://github.com/example/vuln-app",
|
|
"commit": "abc123...",
|
|
"buildToolchain": "maven:3.9.0,jdk:17"
|
|
},
|
|
"vulnerabilities": [
|
|
{
|
|
"vulnId": "CVE-2021-44228",
|
|
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
|
"affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
|
|
}
|
|
],
|
|
"artifacts": [
|
|
{
|
|
"path": "artifacts/app.jar",
|
|
"hash": "sha256:...",
|
|
"type": "application/java-archive"
|
|
},
|
|
{
|
|
"path": "artifacts/sbom.cdx.json",
|
|
"hash": "sha256:...",
|
|
"type": "application/vnd.cyclonedx+json"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 2.3 Ground Truth Document (`ground-truth.json`)
|
|
|
|
```json
|
|
{
|
|
"schema": "ground-truth-v1",
|
|
"sampleId": "sample:java:vulnerable-log4j:001",
|
|
"generatedAt": "2025-12-13T10:00:00Z",
|
|
"generator": {
|
|
"name": "manual-annotation",
|
|
"version": "1.0.0",
|
|
"annotator": "security-team"
|
|
},
|
|
"targets": [
|
|
{
|
|
"symbolId": "sym:java:...",
|
|
"display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
|
|
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
|
"expected": {
|
|
"latticeState": "CR",
|
|
"bucket": "direct",
|
|
"reachable": true,
|
|
"confidence": 0.95,
|
|
"pathLength": 3,
|
|
"path": [
|
|
"sym:java:...main",
|
|
"sym:java:...logInfo",
|
|
"sym:java:...JndiLookup.lookup"
|
|
]
|
|
},
|
|
"reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
|
|
},
|
|
{
|
|
"symbolId": "sym:java:...",
|
|
"display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
|
|
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
|
"expected": {
|
|
"latticeState": "CU",
|
|
"bucket": "unreachable",
|
|
"reachable": false,
|
|
"confidence": 0.90,
|
|
"pathLength": null,
|
|
"path": null
|
|
},
|
|
"reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
|
|
}
|
|
],
|
|
"entryPoints": [
|
|
{
|
|
"symbolId": "sym:java:...",
|
|
"display": "com.example.app.Main.main",
|
|
"phase": "runtime",
|
|
"source": "manifest"
|
|
}
|
|
],
|
|
"expectedUncertainty": {
|
|
"states": [],
|
|
"aggregateTier": "T4",
|
|
"riskScore": 0.0
|
|
},
|
|
"expectedGateDecisions": [
|
|
{
|
|
"vulnId": "CVE-2021-44228",
|
|
"targetSymbol": "sym:java:...JndiLookup.lookup",
|
|
"requestedStatus": "not_affected",
|
|
"expectedDecision": "block",
|
|
"expectedBlockedBy": "LatticeState",
|
|
"expectedReason": "CR state incompatible with not_affected"
|
|
},
|
|
{
|
|
"vulnId": "CVE-2021-44228",
|
|
"targetSymbol": "sym:java:...JndiLookup.lookup",
|
|
"requestedStatus": "affected",
|
|
"expectedDecision": "allow"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Schema Definitions
|
|
|
|
### 3.1 Ground Truth Target
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) |
|
|
| `display` | string | No | Human-readable symbol name |
|
|
| `purl` | string | No | Package URL of containing package |
|
|
| `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` |
|
|
| `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) |
|
|
| `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point |
|
|
| `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] |
|
|
| `expected.pathLength` | number | No | Expected path length (null if unreachable) |
|
|
| `expected.path` | string[] | No | Expected path (sorted, deterministic) |
|
|
| `reasoning` | string | Yes | Human explanation of expected outcome |
|
|
|
|
### 3.2 Expected Gate Decision
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `vulnId` | string | Yes | Vulnerability identifier |
|
|
| `targetSymbol` | string | Yes | Target SymbolID |
|
|
| `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` |
|
|
| `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` |
|
|
| `expectedBlockedBy` | string | No | Gate name if blocked |
|
|
| `expectedReason` | string | No | Expected reason message |
|
|
|
|
---
|
|
|
|
## 4. Sample Categories
|
|
|
|
### 4.1 Positive Samples (Reachable)
|
|
|
|
Known-reachable cases where vulnerable code is called:
|
|
|
|
- **direct-call:** Vulnerable function called directly from entry point
|
|
- **transitive:** Multi-hop path from entry point to vulnerable function
|
|
- **runtime-observed:** Confirmed reachable via runtime probe
|
|
- **init-array:** Reachable via load-time constructor
|
|
|
|
### 4.2 Negative Samples (Unreachable)
|
|
|
|
Known-unreachable cases where vulnerable code exists but isn't called:
|
|
|
|
- **dead-code:** Function present but never invoked
|
|
- **conditional-unreachable:** Function behind impossible condition
|
|
- **test-only:** Function only reachable from test entry points
|
|
- **deprecated-api:** Old API present but replaced by new implementation
|
|
|
|
### 4.3 Contested Samples
|
|
|
|
Cases where static and runtime evidence conflict:
|
|
|
|
- **static-reach-runtime-miss:** Static analysis finds path, runtime never observes
|
|
- **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution
|
|
- **version-mismatch:** Analysis version differs from runtime version
|
|
|
|
---
|
|
|
|
## 5. Benchmark Metrics
|
|
|
|
### 5.1 Path Discovery Metrics
|
|
|
|
```
|
|
Precision = TruePositive / (TruePositive + FalsePositive)
|
|
Recall = TruePositive / (TruePositive + FalseNegative)
|
|
F1 = 2 * (Precision * Recall) / (Precision + Recall)
|
|
```
|
|
|
|
### 5.2 Lattice State Accuracy
|
|
|
|
```
|
|
StateAccuracy = CorrectStates / TotalTargets
|
|
BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility)
|
|
```
|
|
|
|
### 5.3 Gate Decision Accuracy
|
|
|
|
```
|
|
GateAccuracy = CorrectDecisions / TotalGateTests
|
|
FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric)
|
|
FalseBlock = BlockedWhenShouldAllow / TotalAllows
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Test Harness Integration
|
|
|
|
### 6.1 xUnit Test Pattern
|
|
|
|
```csharp
|
|
[Theory]
|
|
[MemberData(nameof(GetGroundTruthSamples))]
|
|
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
|
|
{
|
|
// Arrange
|
|
var graph = await LoadRichGraphAsync(sample.GraphPath);
|
|
var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();
|
|
|
|
// Act
|
|
var result = await scorer.ComputeAsync(graph, sample.EntryPoints);
|
|
|
|
// Assert
|
|
foreach (var target in sample.Targets)
|
|
{
|
|
var actual = result.States.First(s => s.SymbolId == target.SymbolId);
|
|
Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
|
|
Assert.Equal(target.Expected.Reachable, actual.Reachable);
|
|
Assert.InRange(actual.Confidence,
|
|
target.Expected.Confidence - 0.05,
|
|
target.Expected.Confidence + 0.05);
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6.2 Benchmark Runner
|
|
|
|
```bash
|
|
# Run reachability benchmarks
|
|
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
|
|
--dataset datasets/reachability/samples \
|
|
--output benchmark-results.json \
|
|
--threshold-f1 0.95 \
|
|
--threshold-gate-accuracy 0.99
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Sample Contribution Guidelines
|
|
|
|
### 7.1 Adding New Samples
|
|
|
|
1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/`
|
|
2. Add `manifest.json` with sample metadata
|
|
3. Add `richgraph-v1.json` (run scanner on artifacts)
|
|
4. Create `ground-truth.json` with manual annotations
|
|
5. Include reasoning for each expected outcome
|
|
6. Run validation: `dotnet test --filter "GroundTruth"`
|
|
|
|
### 7.2 Ground Truth Validation
|
|
|
|
Ground truth files must pass schema validation:
|
|
|
|
```bash
|
|
npx ajv validate -s docs/reachability/ground-truth.schema.json \
|
|
-d datasets/reachability/samples/**/ground-truth.json
|
|
```
|
|
|
|
### 7.3 Review Requirements
|
|
|
|
- All samples require two independent annotators
|
|
- Contested samples require security team review
|
|
- Changes to existing samples require regression test pass
|
|
|
|
---
|
|
|
|
## 8. Related Documents
|
|
|
|
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
|
|
- [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions
|
|
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
|
|
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Version | Date | Author | Changes |
|
|
|---------|------|--------|---------|
|
|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |
|