Files
git.stella-ops.org/docs/reachability/ground-truth-schema.md
StellaOps Bot 999e26a48e up
2025-12-13 02:22:15 +02:00

10 KiB

Ground Truth Schema for Reachability Datasets

Status: Design v1 (Sprint 0401) Owners: Scanner Guild, Signals Guild, Quality Guild

This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.


1. Purpose

Ground truth datasets enable:

  1. Regression testing: Detect regressions in reachability analysis accuracy
  2. Benchmark scoring: Measure precision, recall, F1 for path discovery
  3. Lattice validation: Verify join/meet operations produce expected states
  4. Policy gate testing: Ensure gates block/allow correct VEX transitions

2. Dataset Structure

2.1 Directory Layout

datasets/reachability/
├── samples/
│   ├── java/
│   │   ├── vulnerable-log4j/
│   │   │   ├── manifest.json          # Sample metadata
│   │   │   ├── richgraph-v1.json      # Input callgraph
│   │   │   ├── ground-truth.json      # Expected outcomes
│   │   │   └── artifacts/             # Source binaries/SBOMs
│   │   └── safe-spring-boot/
│   │       └── ...
│   ├── native/
│   │   ├── stripped-elf/
│   │   └── openssl-vuln/
│   └── polyglot/
│       └── node-native-addon/
├── corpus/
│   ├── positive/                      # Known reachable samples
│   ├── negative/                      # Known unreachable samples
│   └── contested/                     # Known conflict samples
└── schema/
    ├── manifest.schema.json
    └── ground-truth.schema.json

2.2 Sample Manifest (manifest.json)

{
  "sampleId": "sample:java:vulnerable-log4j:001",
  "version": "1.0.0",
  "createdAt": "2025-12-13T10:00:00Z",
  "language": "java",
  "category": "positive",
  "description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
  "source": {
    "repository": "https://github.com/example/vuln-app",
    "commit": "abc123...",
    "buildToolchain": "maven:3.9.0,jdk:17"
  },
  "vulnerabilities": [
    {
      "vulnId": "CVE-2021-44228",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
    }
  ],
  "artifacts": [
    {
      "path": "artifacts/app.jar",
      "hash": "sha256:...",
      "type": "application/java-archive"
    },
    {
      "path": "artifacts/sbom.cdx.json",
      "hash": "sha256:...",
      "type": "application/vnd.cyclonedx+json"
    }
  ]
}

2.3 Ground Truth Document (ground-truth.json)

{
  "schema": "ground-truth-v1",
  "sampleId": "sample:java:vulnerable-log4j:001",
  "generatedAt": "2025-12-13T10:00:00Z",
  "generator": {
    "name": "manual-annotation",
    "version": "1.0.0",
    "annotator": "security-team"
  },
  "targets": [
    {
      "symbolId": "sym:java:...",
      "display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "expected": {
        "latticeState": "CR",
        "bucket": "direct",
        "reachable": true,
        "confidence": 0.95,
        "pathLength": 3,
        "path": [
          "sym:java:...main",
          "sym:java:...logInfo",
          "sym:java:...JndiLookup.lookup"
        ]
      },
      "reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
    },
    {
      "symbolId": "sym:java:...",
      "display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "expected": {
        "latticeState": "CU",
        "bucket": "unreachable",
        "reachable": false,
        "confidence": 0.90,
        "pathLength": null,
        "path": null
      },
      "reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
    }
  ],
  "entryPoints": [
    {
      "symbolId": "sym:java:...",
      "display": "com.example.app.Main.main",
      "phase": "runtime",
      "source": "manifest"
    }
  ],
  "expectedUncertainty": {
    "states": [],
    "aggregateTier": "T4",
    "riskScore": 0.0
  },
  "expectedGateDecisions": [
    {
      "vulnId": "CVE-2021-44228",
      "targetSymbol": "sym:java:...JndiLookup.lookup",
      "requestedStatus": "not_affected",
      "expectedDecision": "block",
      "expectedBlockedBy": "LatticeState",
      "expectedReason": "CR state incompatible with not_affected"
    },
    {
      "vulnId": "CVE-2021-44228",
      "targetSymbol": "sym:java:...JndiLookup.lookup",
      "requestedStatus": "affected",
      "expectedDecision": "allow"
    }
  ]
}

3. Schema Definitions

3.1 Ground Truth Target

Field Type Required Description
symbolId string Yes Canonical SymbolID (sym:{lang}:{hash})
display string No Human-readable symbol name
purl string No Package URL of containing package
expected.latticeState enum Yes Expected v1 lattice state: U, SR, SU, RO, RU, CR, CU, X
expected.bucket enum Yes Expected v0 bucket (backward compat)
expected.reachable boolean Yes True if symbol is reachable from any entry point
expected.confidence number Yes Expected confidence score [0.0-1.0]
expected.pathLength number No Expected path length (null if unreachable)
expected.path string[] No Expected path (sorted, deterministic)
reasoning string Yes Human explanation of expected outcome

3.2 Expected Gate Decision

Field Type Required Description
vulnId string Yes Vulnerability identifier
targetSymbol string Yes Target SymbolID
requestedStatus enum Yes VEX status: affected, not_affected, under_investigation, fixed
expectedDecision enum Yes Gate outcome: allow, block, warn
expectedBlockedBy string No Gate name if blocked
expectedReason string No Expected reason message

4. Sample Categories

4.1 Positive Samples (Reachable)

Known-reachable cases where vulnerable code is called:

  • direct-call: Vulnerable function called directly from entry point
  • transitive: Multi-hop path from entry point to vulnerable function
  • runtime-observed: Confirmed reachable via runtime probe
  • init-array: Reachable via load-time constructor

4.2 Negative Samples (Unreachable)

Known-unreachable cases where vulnerable code exists but isn't called:

  • dead-code: Function present but never invoked
  • conditional-unreachable: Function behind impossible condition
  • test-only: Function only reachable from test entry points
  • deprecated-api: Old API present but replaced by new implementation

4.3 Contested Samples

Cases where static and runtime evidence conflict:

  • static-reach-runtime-miss: Static analysis finds path, runtime never observes
  • static-miss-runtime-hit: Static analysis misses path, runtime observes execution
  • version-mismatch: Analysis version differs from runtime version

5. Benchmark Metrics

5.1 Path Discovery Metrics

Precision = TruePositive / (TruePositive + FalsePositive)
Recall    = TruePositive / (TruePositive + FalseNegative)
F1        = 2 * (Precision * Recall) / (Precision + Recall)

5.2 Lattice State Accuracy

StateAccuracy = CorrectStates / TotalTargets
BucketAccuracy = CorrectBuckets / TotalTargets  (v0 compatibility)

5.3 Gate Decision Accuracy

GateAccuracy = CorrectDecisions / TotalGateTests
FalseAllow   = AllowedWhenShouldBlock / TotalBlocks  (critical metric)
FalseBlock   = BlockedWhenShouldAllow / TotalAllows

6. Test Harness Integration

6.1 xUnit Test Pattern

[Theory]
[MemberData(nameof(GetGroundTruthSamples))]
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
{
    // Arrange
    var graph = await LoadRichGraphAsync(sample.GraphPath);
    var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();

    // Act
    var result = await scorer.ComputeAsync(graph, sample.EntryPoints);

    // Assert
    foreach (var target in sample.Targets)
    {
        var actual = result.States.First(s => s.SymbolId == target.SymbolId);
        Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
        Assert.Equal(target.Expected.Reachable, actual.Reachable);
        Assert.InRange(actual.Confidence,
            target.Expected.Confidence - 0.05,
            target.Expected.Confidence + 0.05);
    }
}

6.2 Benchmark Runner

# Run reachability benchmarks
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
  --dataset datasets/reachability/samples \
  --output benchmark-results.json \
  --threshold-f1 0.95 \
  --threshold-gate-accuracy 0.99

7. Sample Contribution Guidelines

7.1 Adding New Samples

  1. Create directory under datasets/reachability/samples/{language}/{sample-name}/
  2. Add manifest.json with sample metadata
  3. Add richgraph-v1.json (run scanner on artifacts)
  4. Create ground-truth.json with manual annotations
  5. Include reasoning for each expected outcome
  6. Run validation: dotnet test --filter "GroundTruth"

7.2 Ground Truth Validation

Ground truth files must pass schema validation:

npx ajv validate -s docs/reachability/ground-truth.schema.json \
  -d datasets/reachability/samples/**/ground-truth.json

7.3 Review Requirements

  • All samples require two independent annotators
  • Contested samples require security team review
  • Changes to existing samples require regression test pass


Changelog

Version Date Author Changes
1.0.0 2025-12-13 Scanner Guild Initial design from Sprint 0401