stella-ops.org/git.stella-ops.org

Files

master 4bdc298ec1 partly or unimplemented features - now implemented

2026-02-09 08:53:51 +02:00

1.6 KiB

Raw Blame History

Scanner Reachability Ground-Truth Corpus

This document defines the deterministic toy-service corpus used to validate reachability tier classification quality in Scanner tests.

Location

src/Scanner/__Tests/__Datasets/toys/

Service Set

svc-01-log4shell-java
svc-02-prototype-pollution-node
svc-03-pickle-deserialization-python
svc-04-text-template-go
svc-05-xmlserializer-dotnet
svc-06-erb-injection-ruby

Each service contains:

Minimal source code with a known vulnerability pattern.
labels.yaml with tier ground truth for one or more CVEs.

labels.yaml Contract (v1)

Required top-level fields: schema_version, service, language, entrypoint, cves.
Each CVE entry requires: id, package, tier, rationale.
Allowed tier values:
- R0: unreachable
- R1: present in dependency only
- R2: imported but not called
- R3: called but not reachable from entrypoint
- R4: reachable from entrypoint

Deterministic Validation Harness

Test suite: src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/Benchmarks/ReachabilityTierCorpusTests.cs
Harness capabilities:
- Validates corpus structure and required schema fields.
- Verifies R0..R4 coverage across the toy corpus.
- Maps R0..R4 into Scanner confidence tiers for compatibility checks.
- Computes precision, recall, and F1 per tier using deterministic ordering.

Offline Posture

No external network access is required for corpus loading or metric computation.
Dataset files are copied into test output for stable local/CI execution.