up

2025-12-13 02:22:15 +02:00
parent 564df71bfb
commit 999e26a48e
395 changed files with 25045 additions and 2224 deletions
--- a/datasets/reachability/README.md
+++ b/datasets/reachability/README.md
@@ -0,0 +1,87 @@
+# Reachability Test Datasets
+
+This directory contains ground truth samples for validating reachability analysis accuracy.
+
+## Directory Structure
+
+```
+datasets/reachability/
+├── README.md                     # This file
+├── samples/                      # Test samples by language
+│   ├── csharp/
+│   │   ├── simple-reachable/     # Positive: direct call path
+│   │   └── dead-code/            # Negative: unreachable code
+│   ├── java/
+│   │   └── vulnerable-log4j/     # Positive: Log4Shell CVE
+│   └── native/
+│       └── stripped-elf/         # Positive: stripped binary
+└── schema/
+    ├── manifest.schema.json      # Sample manifest schema
+    └── ground-truth.schema.json  # Ground truth schema
+```
+
+## Sample Categories
+
+### Positive (Reachable)
+Samples where vulnerable code has a confirmed path from entry points:
+- `csharp/simple-reachable` - Direct call to vulnerable API
+- `java/vulnerable-log4j` - Log4Shell with runtime confirmation
+- `native/stripped-elf` - Stripped ELF with heuristic analysis
+
+### Negative (Unreachable)
+Samples where vulnerable code exists but is never called:
+- `csharp/dead-code` - Deprecated API replaced by safe implementation
+
+## Schema Reference
+
+### manifest.json
+Sample metadata including:
+- `sampleId` - Unique identifier
+- `language` - Primary language (java, csharp, native, etc.)
+- `category` - positive, negative, or contested
+- `vulnerabilities` - CVEs and affected symbols
+- `artifacts` - Binary/SBOM file references
+
+### ground-truth.json
+Expected outcomes including:
+- `targets` - Symbols with expected lattice states
+- `entryPoints` - Program entry points
+- `expectedUncertainty` - Expected uncertainty tier
+- `expectedGateDecisions` - Expected policy gate outcomes
+
+## Lattice States
+
+| Code | Name | Description |
+|------|------|-------------|
+| U | Unknown | No analysis performed |
+| SR | StaticallyReachable | Static analysis finds path |
+| SU | StaticallyUnreachable | Static analysis finds no path |
+| RO | RuntimeObserved | Runtime probe observed execution |
+| RU | RuntimeUnobserved | Runtime probe did not observe |
+| CR | ConfirmedReachable | Both static and runtime confirm |
+| CU | ConfirmedUnreachable | Both static and runtime confirm unreachable |
+| X | Contested | Static and runtime evidence conflict |
+
+## Running Tests
+
+```bash
+# Validate schemas
+npx ajv validate -s schema/ground-truth.schema.json -d samples/**/ground-truth.json
+
+# Run benchmark tests
+dotnet test --filter "GroundTruth" src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks/
+```
+
+## Adding New Samples
+
+1. Create directory: `samples/{language}/{sample-name}/`
+2. Add `manifest.json` with sample metadata
+3. Add `ground-truth.json` with expected outcomes
+4. Include `reasoning` for each target explaining the expected state
+5. Validate against schema before committing
+
+## Related Documentation
+
+- [Ground Truth Schema](../../docs/reachability/ground-truth-schema.md)
+- [Lattice Model](../../docs/reachability/lattice.md)
+- [Policy Gates](../../docs/reachability/policy-gate.md)