# Scanner Reachability Ground-Truth Corpus This document defines the deterministic toy-service corpus used to validate reachability tier classification quality in Scanner tests. ## Location - `src/Scanner/__Tests/__Datasets/toys/` ## Service Set - `svc-01-log4shell-java` - `svc-02-prototype-pollution-node` - `svc-03-pickle-deserialization-python` - `svc-04-text-template-go` - `svc-05-xmlserializer-dotnet` - `svc-06-erb-injection-ruby` Each service contains: - Minimal source code with a known vulnerability pattern. - `labels.yaml` with tier ground truth for one or more CVEs. ## labels.yaml Contract (v1) - Required top-level fields: `schema_version`, `service`, `language`, `entrypoint`, `cves`. - Each CVE entry requires: `id`, `package`, `tier`, `rationale`. - Allowed tier values: - `R0`: unreachable - `R1`: present in dependency only - `R2`: imported but not called - `R3`: called but not reachable from entrypoint - `R4`: reachable from entrypoint ## Deterministic Validation Harness - Test suite: `src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/Benchmarks/ReachabilityTierCorpusTests.cs` - Harness capabilities: - Validates corpus structure and required schema fields. - Verifies `R0..R4` coverage across the toy corpus. - Maps `R0..R4` into Scanner confidence tiers for compatibility checks. - Computes precision, recall, and F1 per tier using deterministic ordering. ## Offline Posture - No external network access is required for corpus loading or metric computation. - Dataset files are copied into test output for stable local/CI execution.