# Scanner Reachability Ground-Truth Corpus

This document defines the deterministic toy-service corpus used to validate
reachability tier classification quality in Scanner tests.

## Location
- `src/Scanner/__Tests/__Datasets/toys/`

## Service Set
- `svc-01-log4shell-java`
- `svc-02-prototype-pollution-node`
- `svc-03-pickle-deserialization-python`
- `svc-04-text-template-go`
- `svc-05-xmlserializer-dotnet`
- `svc-06-erb-injection-ruby`

Each service contains:
- Minimal source code with a known vulnerability pattern.
- `labels.yaml` with tier ground truth for one or more CVEs.

## labels.yaml Contract (v1)
- Required top-level fields: `schema_version`, `service`, `language`, `entrypoint`, `cves`.
- Each CVE entry requires: `id`, `package`, `tier`, `rationale`.
- Allowed tier values:
  - `R0`: unreachable
  - `R1`: present in dependency only
  - `R2`: imported but not called
  - `R3`: called but not reachable from entrypoint
  - `R4`: reachable from entrypoint

## Deterministic Validation Harness
- Test suite: `src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/Benchmarks/ReachabilityTierCorpusTests.cs`
- Harness capabilities:
  - Validates corpus structure and required schema fields.
  - Verifies `R0..R4` coverage across the toy corpus.
  - Maps `R0..R4` into Scanner confidence tiers for compatibility checks.
  - Computes precision, recall, and F1 per tier using deterministic ordering.

## Offline Posture
- No external network access is required for corpus loading or metric computation.
- Dataset files are copied into test output for stable local/CI execution.