Files

StellaOps Bot 999e26a48e up

2025-12-13 02:22:15 +02:00

10 KiB

Raw Blame History

Ground Truth Schema for Reachability Datasets

Status: Design v1 (Sprint 0401) Owners: Scanner Guild, Signals Guild, Quality Guild

This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.

1. Purpose

Ground truth datasets enable:

Regression testing: Detect regressions in reachability analysis accuracy
Benchmark scoring: Measure precision, recall, F1 for path discovery
Lattice validation: Verify join/meet operations produce expected states
Policy gate testing: Ensure gates block/allow correct VEX transitions

2. Dataset Structure

2.1 Directory Layout

datasets/reachability/
├── samples/
│   ├── java/
│   │   ├── vulnerable-log4j/
│   │   │   ├── manifest.json          # Sample metadata
│   │   │   ├── richgraph-v1.json      # Input callgraph
│   │   │   ├── ground-truth.json      # Expected outcomes
│   │   │   └── artifacts/             # Source binaries/SBOMs
│   │   └── safe-spring-boot/
│   │       └── ...
│   ├── native/
│   │   ├── stripped-elf/
│   │   └── openssl-vuln/
│   └── polyglot/
│       └── node-native-addon/
├── corpus/
│   ├── positive/                      # Known reachable samples
│   ├── negative/                      # Known unreachable samples
│   └── contested/                     # Known conflict samples
└── schema/
    ├── manifest.schema.json
    └── ground-truth.schema.json

2.2 Sample Manifest (`manifest.json`)

{
  "sampleId": "sample:java:vulnerable-log4j:001",
  "version": "1.0.0",
  "createdAt": "2025-12-13T10:00:00Z",
  "language": "java",
  "category": "positive",
  "description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
  "source": {
    "repository": "https://github.com/example/vuln-app",
    "commit": "abc123...",
    "buildToolchain": "maven:3.9.0,jdk:17"
  },
  "vulnerabilities": [
    {
      "vulnId": "CVE-2021-44228",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
    }
  ],
  "artifacts": [
    {
      "path": "artifacts/app.jar",
      "hash": "sha256:...",
      "type": "application/java-archive"
    },
    {
      "path": "artifacts/sbom.cdx.json",
      "hash": "sha256:...",
      "type": "application/vnd.cyclonedx+json"
    }
  ]
}

2.3 Ground Truth Document (`ground-truth.json`)

{
  "schema": "ground-truth-v1",
  "sampleId": "sample:java:vulnerable-log4j:001",
  "generatedAt": "2025-12-13T10:00:00Z",
  "generator": {
    "name": "manual-annotation",
    "version": "1.0.0",
    "annotator": "security-team"
  },
  "targets": [
    {
      "symbolId": "sym:java:...",
      "display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "expected": {
        "latticeState": "CR",
        "bucket": "direct",
        "reachable": true,
        "confidence": 0.95,
        "pathLength": 3,
        "path": [
          "sym:java:...main",
          "sym:java:...logInfo",
          "sym:java:...JndiLookup.lookup"
        ]
      },
      "reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
    },
    {
      "symbolId": "sym:java:...",
      "display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
      "purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
      "expected": {
        "latticeState": "CU",
        "bucket": "unreachable",
        "reachable": false,
        "confidence": 0.90,
        "pathLength": null,
        "path": null
      },
      "reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
    }
  ],
  "entryPoints": [
    {
      "symbolId": "sym:java:...",
      "display": "com.example.app.Main.main",
      "phase": "runtime",
      "source": "manifest"
    }
  ],
  "expectedUncertainty": {
    "states": [],
    "aggregateTier": "T4",
    "riskScore": 0.0
  },
  "expectedGateDecisions": [
    {
      "vulnId": "CVE-2021-44228",
      "targetSymbol": "sym:java:...JndiLookup.lookup",
      "requestedStatus": "not_affected",
      "expectedDecision": "block",
      "expectedBlockedBy": "LatticeState",
      "expectedReason": "CR state incompatible with not_affected"
    },
    {
      "vulnId": "CVE-2021-44228",
      "targetSymbol": "sym:java:...JndiLookup.lookup",
      "requestedStatus": "affected",
      "expectedDecision": "allow"
    }
  ]
}

3. Schema Definitions

3.1 Ground Truth Target

Field	Type	Required	Description
`symbolId`	string	Yes	Canonical SymbolID (`sym:{lang}:{hash}`)
`display`	string	No	Human-readable symbol name
`purl`	string	No	Package URL of containing package
`expected.latticeState`	enum	Yes	Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X`
`expected.bucket`	enum	Yes	Expected v0 bucket (backward compat)
`expected.reachable`	boolean	Yes	True if symbol is reachable from any entry point
`expected.confidence`	number	Yes	Expected confidence score [0.0-1.0]
`expected.pathLength`	number	No	Expected path length (null if unreachable)
`expected.path`	string[]	No	Expected path (sorted, deterministic)
`reasoning`	string	Yes	Human explanation of expected outcome

3.2 Expected Gate Decision

Field	Type	Required	Description
`vulnId`	string	Yes	Vulnerability identifier
`targetSymbol`	string	Yes	Target SymbolID
`requestedStatus`	enum	Yes	VEX status: `affected`, `not_affected`, `under_investigation`, `fixed`
`expectedDecision`	enum	Yes	Gate outcome: `allow`, `block`, `warn`
`expectedBlockedBy`	string	No	Gate name if blocked
`expectedReason`	string	No	Expected reason message

4. Sample Categories

4.1 Positive Samples (Reachable)

Known-reachable cases where vulnerable code is called:

direct-call: Vulnerable function called directly from entry point
transitive: Multi-hop path from entry point to vulnerable function
runtime-observed: Confirmed reachable via runtime probe
init-array: Reachable via load-time constructor

4.2 Negative Samples (Unreachable)

Known-unreachable cases where vulnerable code exists but isn't called:

dead-code: Function present but never invoked
conditional-unreachable: Function behind impossible condition
test-only: Function only reachable from test entry points
deprecated-api: Old API present but replaced by new implementation

4.3 Contested Samples

Cases where static and runtime evidence conflict:

static-reach-runtime-miss: Static analysis finds path, runtime never observes
static-miss-runtime-hit: Static analysis misses path, runtime observes execution
version-mismatch: Analysis version differs from runtime version

5. Benchmark Metrics

5.1 Path Discovery Metrics

Precision = TruePositive / (TruePositive + FalsePositive)
Recall    = TruePositive / (TruePositive + FalseNegative)
F1        = 2 * (Precision * Recall) / (Precision + Recall)

5.2 Lattice State Accuracy

StateAccuracy = CorrectStates / TotalTargets
BucketAccuracy = CorrectBuckets / TotalTargets  (v0 compatibility)

5.3 Gate Decision Accuracy

GateAccuracy = CorrectDecisions / TotalGateTests
FalseAllow   = AllowedWhenShouldBlock / TotalBlocks  (critical metric)
FalseBlock   = BlockedWhenShouldAllow / TotalAllows

6. Test Harness Integration

6.1 xUnit Test Pattern

[Theory]
[MemberData(nameof(GetGroundTruthSamples))]
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
{
    // Arrange
    var graph = await LoadRichGraphAsync(sample.GraphPath);
    var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();

    // Act
    var result = await scorer.ComputeAsync(graph, sample.EntryPoints);

    // Assert
    foreach (var target in sample.Targets)
    {
        var actual = result.States.First(s => s.SymbolId == target.SymbolId);
        Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
        Assert.Equal(target.Expected.Reachable, actual.Reachable);
        Assert.InRange(actual.Confidence,
            target.Expected.Confidence - 0.05,
            target.Expected.Confidence + 0.05);
    }
}

6.2 Benchmark Runner

# Run reachability benchmarks
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
  --dataset datasets/reachability/samples \
  --output benchmark-results.json \
  --threshold-f1 0.95 \
  --threshold-gate-accuracy 0.99

7. Sample Contribution Guidelines

7.1 Adding New Samples

Create directory under datasets/reachability/samples/{language}/{sample-name}/
Add manifest.json with sample metadata
Add richgraph-v1.json (run scanner on artifacts)
Create ground-truth.json with manual annotations
Include reasoning for each expected outcome
Run validation: dotnet test --filter "GroundTruth"

7.2 Ground Truth Validation

Ground truth files must pass schema validation:

npx ajv validate -s docs/reachability/ground-truth.schema.json \
  -d datasets/reachability/samples/**/ground-truth.json

7.3 Review Requirements

All samples require two independent annotators
Contested samples require security team review
Changes to existing samples require regression test pass

Lattice Model — v1 formal 7-state lattice
Policy Gates — Gate rules for VEX decisions
Evidence Schema — richgraph-v1 schema
richgraph-v1 Contract — Full schema specification

Changelog

Version	Date	Author	Changes
1.0.0	2025-12-13	Scanner Guild	Initial design from Sprint 0401

10 KiB Raw Blame History