Files
git.stella-ops.org/docs/reachability/ground-truth-schema.md
StellaOps Bot 999e26a48e up
2025-12-13 02:22:15 +02:00

338 lines
10 KiB
Markdown

# Ground Truth Schema for Reachability Datasets
> **Status:** Design v1 (Sprint 0401)
> **Owners:** Scanner Guild, Signals Guild, Quality Guild
This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.
---
## 1. Purpose
Ground truth datasets enable:
1. **Regression testing:** Detect regressions in reachability analysis accuracy
2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery
3. **Lattice validation:** Verify join/meet operations produce expected states
4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions
---
## 2. Dataset Structure
### 2.1 Directory Layout
```
datasets/reachability/
├── samples/
│ ├── java/
│ │ ├── vulnerable-log4j/
│ │ │ ├── manifest.json # Sample metadata
│ │ │ ├── richgraph-v1.json # Input callgraph
│ │ │ ├── ground-truth.json # Expected outcomes
│ │ │ └── artifacts/ # Source binaries/SBOMs
│ │ └── safe-spring-boot/
│ │ └── ...
│ ├── native/
│ │ ├── stripped-elf/
│ │ └── openssl-vuln/
│ └── polyglot/
│ └── node-native-addon/
├── corpus/
│ ├── positive/ # Known reachable samples
│ ├── negative/ # Known unreachable samples
│ └── contested/ # Known conflict samples
└── schema/
├── manifest.schema.json
└── ground-truth.schema.json
```
### 2.2 Sample Manifest (`manifest.json`)
```json
{
"sampleId": "sample:java:vulnerable-log4j:001",
"version": "1.0.0",
"createdAt": "2025-12-13T10:00:00Z",
"language": "java",
"category": "positive",
"description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
"source": {
"repository": "https://github.com/example/vuln-app",
"commit": "abc123...",
"buildToolchain": "maven:3.9.0,jdk:17"
},
"vulnerabilities": [
{
"vulnId": "CVE-2021-44228",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
}
],
"artifacts": [
{
"path": "artifacts/app.jar",
"hash": "sha256:...",
"type": "application/java-archive"
},
{
"path": "artifacts/sbom.cdx.json",
"hash": "sha256:...",
"type": "application/vnd.cyclonedx+json"
}
]
}
```
### 2.3 Ground Truth Document (`ground-truth.json`)
```json
{
"schema": "ground-truth-v1",
"sampleId": "sample:java:vulnerable-log4j:001",
"generatedAt": "2025-12-13T10:00:00Z",
"generator": {
"name": "manual-annotation",
"version": "1.0.0",
"annotator": "security-team"
},
"targets": [
{
"symbolId": "sym:java:...",
"display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"expected": {
"latticeState": "CR",
"bucket": "direct",
"reachable": true,
"confidence": 0.95,
"pathLength": 3,
"path": [
"sym:java:...main",
"sym:java:...logInfo",
"sym:java:...JndiLookup.lookup"
]
},
"reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
},
{
"symbolId": "sym:java:...",
"display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"expected": {
"latticeState": "CU",
"bucket": "unreachable",
"reachable": false,
"confidence": 0.90,
"pathLength": null,
"path": null
},
"reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
}
],
"entryPoints": [
{
"symbolId": "sym:java:...",
"display": "com.example.app.Main.main",
"phase": "runtime",
"source": "manifest"
}
],
"expectedUncertainty": {
"states": [],
"aggregateTier": "T4",
"riskScore": 0.0
},
"expectedGateDecisions": [
{
"vulnId": "CVE-2021-44228",
"targetSymbol": "sym:java:...JndiLookup.lookup",
"requestedStatus": "not_affected",
"expectedDecision": "block",
"expectedBlockedBy": "LatticeState",
"expectedReason": "CR state incompatible with not_affected"
},
{
"vulnId": "CVE-2021-44228",
"targetSymbol": "sym:java:...JndiLookup.lookup",
"requestedStatus": "affected",
"expectedDecision": "allow"
}
]
}
```
---
## 3. Schema Definitions
### 3.1 Ground Truth Target
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) |
| `display` | string | No | Human-readable symbol name |
| `purl` | string | No | Package URL of containing package |
| `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` |
| `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) |
| `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point |
| `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] |
| `expected.pathLength` | number | No | Expected path length (null if unreachable) |
| `expected.path` | string[] | No | Expected path (sorted, deterministic) |
| `reasoning` | string | Yes | Human explanation of expected outcome |
### 3.2 Expected Gate Decision
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `vulnId` | string | Yes | Vulnerability identifier |
| `targetSymbol` | string | Yes | Target SymbolID |
| `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` |
| `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` |
| `expectedBlockedBy` | string | No | Gate name if blocked |
| `expectedReason` | string | No | Expected reason message |
---
## 4. Sample Categories
### 4.1 Positive Samples (Reachable)
Known-reachable cases where vulnerable code is called:
- **direct-call:** Vulnerable function called directly from entry point
- **transitive:** Multi-hop path from entry point to vulnerable function
- **runtime-observed:** Confirmed reachable via runtime probe
- **init-array:** Reachable via load-time constructor
### 4.2 Negative Samples (Unreachable)
Known-unreachable cases where vulnerable code exists but isn't called:
- **dead-code:** Function present but never invoked
- **conditional-unreachable:** Function behind impossible condition
- **test-only:** Function only reachable from test entry points
- **deprecated-api:** Old API present but replaced by new implementation
### 4.3 Contested Samples
Cases where static and runtime evidence conflict:
- **static-reach-runtime-miss:** Static analysis finds path, runtime never observes
- **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution
- **version-mismatch:** Analysis version differs from runtime version
---
## 5. Benchmark Metrics
### 5.1 Path Discovery Metrics
```
Precision = TruePositive / (TruePositive + FalsePositive)
Recall = TruePositive / (TruePositive + FalseNegative)
F1 = 2 * (Precision * Recall) / (Precision + Recall)
```
### 5.2 Lattice State Accuracy
```
StateAccuracy = CorrectStates / TotalTargets
BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility)
```
### 5.3 Gate Decision Accuracy
```
GateAccuracy = CorrectDecisions / TotalGateTests
FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric)
FalseBlock = BlockedWhenShouldAllow / TotalAllows
```
---
## 6. Test Harness Integration
### 6.1 xUnit Test Pattern
```csharp
[Theory]
[MemberData(nameof(GetGroundTruthSamples))]
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
{
// Arrange
var graph = await LoadRichGraphAsync(sample.GraphPath);
var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();
// Act
var result = await scorer.ComputeAsync(graph, sample.EntryPoints);
// Assert
foreach (var target in sample.Targets)
{
var actual = result.States.First(s => s.SymbolId == target.SymbolId);
Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
Assert.Equal(target.Expected.Reachable, actual.Reachable);
Assert.InRange(actual.Confidence,
target.Expected.Confidence - 0.05,
target.Expected.Confidence + 0.05);
}
}
```
### 6.2 Benchmark Runner
```bash
# Run reachability benchmarks
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
--dataset datasets/reachability/samples \
--output benchmark-results.json \
--threshold-f1 0.95 \
--threshold-gate-accuracy 0.99
```
---
## 7. Sample Contribution Guidelines
### 7.1 Adding New Samples
1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/`
2. Add `manifest.json` with sample metadata
3. Add `richgraph-v1.json` (run scanner on artifacts)
4. Create `ground-truth.json` with manual annotations
5. Include reasoning for each expected outcome
6. Run validation: `dotnet test --filter "GroundTruth"`
### 7.2 Ground Truth Validation
Ground truth files must pass schema validation:
```bash
npx ajv validate -s docs/reachability/ground-truth.schema.json \
-d datasets/reachability/samples/**/ground-truth.json
```
### 7.3 Review Requirements
- All samples require two independent annotators
- Contested samples require security team review
- Changes to existing samples require regression test pass
---
## 8. Related Documents
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
- [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |