# StellaOps Test Infrastructure This document describes the test infrastructure for StellaOps, including reachability corpus fixtures, benchmark automation, and CI integration. ## Reachability Test Fixtures ### Corpus Structure The reachability corpus is located at `tests/reachability/` and contains: ``` tests/reachability/ ├── corpus/ │ ├── manifest.json # SHA-256 hashes for all corpus files │ ├── java/ # Java test cases │ │ └── / │ │ ├── project/ # Source code │ │ ├── callgraph.json # Expected call graph │ │ └── ground-truth.json │ ├── dotnet/ # .NET test cases │ └── native/ # Native (C/C++/Rust) test cases ├── fixtures/ │ └── reachbench-2025-expanded/ │ ├── INDEX.json # Fixture index │ └── cases/ │ └── / │ └── images/ │ ├── reachable/ │ │ └── reachgraph.truth.json │ └── unreachable/ │ └── reachgraph.truth.json └── StellaOps.Reachability.FixtureTests/ ├── CorpusFixtureTests.cs └── ReachbenchFixtureTests.cs ``` ### Ground-Truth Schema All ground-truth files follow the `reachbench.reachgraph.truth/v1` schema: ```json { "schema_version": "reachbench.reachgraph.truth/v1", "case_id": "CVE-2023-38545", "variant": "reachable", "paths": [ { "entry_point": "main", "vulnerable_function": "curl_easy_perform", "frames": ["main", "do_http_request", "curl_easy_perform"] } ], "metadata": { "cve_id": "CVE-2023-38545", "purl": "pkg:generic/curl@8.4.0" } } ``` ### Running Fixture Tests ```bash # Run all reachability fixture tests dotnet test tests/reachability/StellaOps.Reachability.FixtureTests # Run only corpus tests dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \ --filter "FullyQualifiedName~CorpusFixtureTests" # Run only reachbench tests dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \ --filter "FullyQualifiedName~ReachbenchFixtureTests" # Cross-platform runner scripts ./scripts/reachability/run_all.sh # Unix ./scripts/reachability/run_all.ps1 # Windows ``` ### CI Integration The reachability corpus is validated in CI via `.gitea/workflows/reachability-corpus-ci.yml`: 1. **validate-corpus**: Runs fixture tests, verifies SHA-256 hashes 2. **validate-ground-truths**: Validates schema version and structure 3. **determinism-check**: Ensures JSON files have sorted keys Triggers: - Push/PR to paths: `tests/reachability/**`, `scripts/reachability/**` - Manual workflow dispatch ## CAS Layout Reference ### Content-Addressable Storage Paths StellaOps uses BLAKE3 hashes for content-addressable storage: | Artifact Type | CAS Path Pattern | Example | |--------------|------------------|---------| | Call Graph | `cas://reachability/graphs/{blake3}` | `cas://reachability/graphs/3a7f2b...` | | Runtime Facts | `cas://reachability/runtime-facts/{blake3}` | `cas://reachability/runtime-facts/8c4d1e...` | | Replay Manifest | `cas://reachability/replay/{blake3}` | `cas://reachability/replay/f2e9c8...` | | Evidence Bundle | `cas://reachability/evidence/{blake3}` | `cas://reachability/evidence/a1b2c3...` | | DSSE Envelope | `cas://attestation/dsse/{blake3}` | `cas://attestation/dsse/d4e5f6...` | | Symbol Manifest | `cas://symbols/manifests/{blake3}` | `cas://symbols/manifests/7g8h9i...` | ### Hash Algorithm All CAS URIs use BLAKE3 with base16 (hex) encoding: ``` cas://{namespace}/{artifact-type}/{blake3-hex} ``` Example hash computation: ```python import hashlib # Use BLAKE3 for CAS hashing from blake3 import blake3 content_hash = blake3(file_content).hexdigest() ``` ## Replay Workflow ### Replay Manifest v2 Schema ```json { "version": 2, "hashAlg": "blake3", "hash": "blake3:3a7f2b...", "created_at": "2025-12-14T00:00:00Z", "entries": [ { "type": "callgraph", "cas_uri": "cas://reachability/graphs/3a7f2b...", "hash": "blake3:3a7f2b..." }, { "type": "runtime-facts", "cas_uri": "cas://reachability/runtime-facts/8c4d1e...", "hash": "blake3:8c4d1e..." } ], "code_id_coverage": 0.95 } ``` ### Replay Steps 1. **Export replay manifest**: ```bash stella replay export --scan-id --output replay-manifest.json ``` 2. **Validate manifest integrity**: ```bash stella replay validate --manifest replay-manifest.json ``` 3. **Fetch CAS artifacts** (online): ```bash stella replay fetch --manifest replay-manifest.json --output ./artifacts/ ``` 4. **Import for replay** (air-gapped): ```bash stella replay import --bundle replay-bundle.tar.gz --verify ``` 5. **Execute replay**: ```bash stella replay run --manifest replay-manifest.json --compare-to ``` ### Validation Error Codes | Code | Description | |------|-------------| | `REPLAY_MANIFEST_MISSING_VERSION` | Manifest missing version field | | `VERSION_MISMATCH` | Unexpected manifest version | | `MISSING_HASH_ALG` | Hash algorithm not specified | | `UNSORTED_ENTRIES` | CAS entries not sorted (non-deterministic) | | `CAS_NOT_FOUND` | Referenced CAS artifact missing | | `HASH_MISMATCH` | Computed hash differs from declared | ## Benchmark Automation ### Running Benchmarks ```bash # Full benchmark pipeline ./scripts/bench/run-baseline.sh --all # Individual steps ./scripts/bench/run-baseline.sh --populate # Generate findings from fixtures ./scripts/bench/run-baseline.sh --compute # Compute metrics # Compare with baseline scanner ./scripts/bench/run-baseline.sh --compare baseline-results.json ``` ### Benchmark Outputs Results are written to `bench/results/`: - `summary.csv`: Per-run metrics (TP, FP, TN, FN, precision, recall, F1) - `metrics.json`: Detailed findings with evidence hashes - `replay/`: Replay outputs for verification ### Verification Tools ```bash # Online verification (DSSE + Rekor) ./bench/tools/verify.sh # Offline verification python3 bench/tools/verify.py --bundle --offline # Compare scanners python3 bench/tools/compare.py --baseline --json ``` ## References - [Function-Level Evidence Guide](../docs/reachability/function-level-evidence.md) - [Reachability Runtime Runbook](../docs/runbooks/reachability-runtime.md) - [Replay Manifest Specification](../docs/replay/DETERMINISTIC_REPLAY.md) - [VEX Evidence Playbook](../docs/benchmarks/vex-evidence-playbook.md) - [Ground-Truth Schema](../docs/reachability/ground-truth-schema.md)