1.6 KiB
1.6 KiB
Scanner Reachability Ground-Truth Corpus
This document defines the deterministic toy-service corpus used to validate reachability tier classification quality in Scanner tests.
Location
src/Scanner/__Tests/__Datasets/toys/
Service Set
svc-01-log4shell-javasvc-02-prototype-pollution-nodesvc-03-pickle-deserialization-pythonsvc-04-text-template-gosvc-05-xmlserializer-dotnetsvc-06-erb-injection-ruby
Each service contains:
- Minimal source code with a known vulnerability pattern.
labels.yamlwith tier ground truth for one or more CVEs.
labels.yaml Contract (v1)
- Required top-level fields:
schema_version,service,language,entrypoint,cves. - Each CVE entry requires:
id,package,tier,rationale. - Allowed tier values:
R0: unreachableR1: present in dependency onlyR2: imported but not calledR3: called but not reachable from entrypointR4: reachable from entrypoint
Deterministic Validation Harness
- Test suite:
src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/Benchmarks/ReachabilityTierCorpusTests.cs - Harness capabilities:
- Validates corpus structure and required schema fields.
- Verifies
R0..R4coverage across the toy corpus. - Maps
R0..R4into Scanner confidence tiers for compatibility checks. - Computes precision, recall, and F1 per tier using deterministic ordering.
Offline Posture
- No external network access is required for corpus loading or metric computation.
- Dataset files are copied into test output for stable local/CI execution.