Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Reachability Corpus Validation / validate-corpus (push) Has been cancelled
Reachability Corpus Validation / validate-ground-truths (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Reachability Corpus Validation / determinism-check (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
StellaOps Test Infrastructure
This document describes the test infrastructure for StellaOps, including reachability corpus fixtures, benchmark automation, and CI integration.
Reachability Test Fixtures
Corpus Structure
The reachability corpus is located at tests/reachability/ and contains:
tests/reachability/
├── corpus/
│ ├── manifest.json # SHA-256 hashes for all corpus files
│ ├── java/ # Java test cases
│ │ └── <case-id>/
│ │ ├── project/ # Source code
│ │ ├── callgraph.json # Expected call graph
│ │ └── ground-truth.json
│ ├── dotnet/ # .NET test cases
│ └── native/ # Native (C/C++/Rust) test cases
├── fixtures/
│ └── reachbench-2025-expanded/
│ ├── INDEX.json # Fixture index
│ └── cases/
│ └── <case-id>/
│ └── images/
│ ├── reachable/
│ │ └── reachgraph.truth.json
│ └── unreachable/
│ └── reachgraph.truth.json
└── StellaOps.Reachability.FixtureTests/
├── CorpusFixtureTests.cs
└── ReachbenchFixtureTests.cs
Ground-Truth Schema
All ground-truth files follow the reachbench.reachgraph.truth/v1 schema:
{
"schema_version": "reachbench.reachgraph.truth/v1",
"case_id": "CVE-2023-38545",
"variant": "reachable",
"paths": [
{
"entry_point": "main",
"vulnerable_function": "curl_easy_perform",
"frames": ["main", "do_http_request", "curl_easy_perform"]
}
],
"metadata": {
"cve_id": "CVE-2023-38545",
"purl": "pkg:generic/curl@8.4.0"
}
}
Running Fixture Tests
# Run all reachability fixture tests
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests
# Run only corpus tests
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
--filter "FullyQualifiedName~CorpusFixtureTests"
# Run only reachbench tests
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
--filter "FullyQualifiedName~ReachbenchFixtureTests"
# Cross-platform runner scripts
./scripts/reachability/run_all.sh # Unix
./scripts/reachability/run_all.ps1 # Windows
CI Integration
The reachability corpus is validated in CI via .gitea/workflows/reachability-corpus-ci.yml:
- validate-corpus: Runs fixture tests, verifies SHA-256 hashes
- validate-ground-truths: Validates schema version and structure
- determinism-check: Ensures JSON files have sorted keys
Triggers:
- Push/PR to paths:
tests/reachability/**,scripts/reachability/** - Manual workflow dispatch
CAS Layout Reference
Content-Addressable Storage Paths
StellaOps uses BLAKE3 hashes for content-addressable storage:
| Artifact Type | CAS Path Pattern | Example |
|---|---|---|
| Call Graph | cas://reachability/graphs/{blake3} |
cas://reachability/graphs/3a7f2b... |
| Runtime Facts | cas://reachability/runtime-facts/{blake3} |
cas://reachability/runtime-facts/8c4d1e... |
| Replay Manifest | cas://reachability/replay/{blake3} |
cas://reachability/replay/f2e9c8... |
| Evidence Bundle | cas://reachability/evidence/{blake3} |
cas://reachability/evidence/a1b2c3... |
| DSSE Envelope | cas://attestation/dsse/{blake3} |
cas://attestation/dsse/d4e5f6... |
| Symbol Manifest | cas://symbols/manifests/{blake3} |
cas://symbols/manifests/7g8h9i... |
Hash Algorithm
All CAS URIs use BLAKE3 with base16 (hex) encoding:
cas://{namespace}/{artifact-type}/{blake3-hex}
Example hash computation:
import hashlib
# Use BLAKE3 for CAS hashing
from blake3 import blake3
content_hash = blake3(file_content).hexdigest()
Replay Workflow
Replay Manifest v2 Schema
{
"version": 2,
"hashAlg": "blake3",
"hash": "blake3:3a7f2b...",
"created_at": "2025-12-14T00:00:00Z",
"entries": [
{
"type": "callgraph",
"cas_uri": "cas://reachability/graphs/3a7f2b...",
"hash": "blake3:3a7f2b..."
},
{
"type": "runtime-facts",
"cas_uri": "cas://reachability/runtime-facts/8c4d1e...",
"hash": "blake3:8c4d1e..."
}
],
"code_id_coverage": 0.95
}
Replay Steps
-
Export replay manifest:
stella replay export --scan-id <scan-id> --output replay-manifest.json -
Validate manifest integrity:
stella replay validate --manifest replay-manifest.json -
Fetch CAS artifacts (online):
stella replay fetch --manifest replay-manifest.json --output ./artifacts/ -
Import for replay (air-gapped):
stella replay import --bundle replay-bundle.tar.gz --verify -
Execute replay:
stella replay run --manifest replay-manifest.json --compare-to <baseline-hash>
Validation Error Codes
| Code | Description |
|---|---|
REPLAY_MANIFEST_MISSING_VERSION |
Manifest missing version field |
VERSION_MISMATCH |
Unexpected manifest version |
MISSING_HASH_ALG |
Hash algorithm not specified |
UNSORTED_ENTRIES |
CAS entries not sorted (non-deterministic) |
CAS_NOT_FOUND |
Referenced CAS artifact missing |
HASH_MISMATCH |
Computed hash differs from declared |
Benchmark Automation
Running Benchmarks
# Full benchmark pipeline
./scripts/bench/run-baseline.sh --all
# Individual steps
./scripts/bench/run-baseline.sh --populate # Generate findings from fixtures
./scripts/bench/run-baseline.sh --compute # Compute metrics
# Compare with baseline scanner
./scripts/bench/run-baseline.sh --compare baseline-results.json
Benchmark Outputs
Results are written to bench/results/:
summary.csv: Per-run metrics (TP, FP, TN, FN, precision, recall, F1)metrics.json: Detailed findings with evidence hashesreplay/: Replay outputs for verification
Verification Tools
# Online verification (DSSE + Rekor)
./bench/tools/verify.sh <finding-bundle>
# Offline verification
python3 bench/tools/verify.py --bundle <finding-dir> --offline
# Compare scanners
python3 bench/tools/compare.py --baseline <scanner-results> --json