up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Reachability Corpus Validation / validate-corpus (push) Has been cancelled
Reachability Corpus Validation / validate-ground-truths (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Reachability Corpus Validation / determinism-check (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Reachability Corpus Validation / validate-corpus (push) Has been cancelled
Reachability Corpus Validation / validate-ground-truths (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Reachability Corpus Validation / determinism-check (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
This commit is contained in:
229
tests/README.md
Normal file
229
tests/README.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# StellaOps Test Infrastructure
|
||||
|
||||
This document describes the test infrastructure for StellaOps, including reachability corpus fixtures, benchmark automation, and CI integration.
|
||||
|
||||
## Reachability Test Fixtures
|
||||
|
||||
### Corpus Structure
|
||||
|
||||
The reachability corpus is located at `tests/reachability/` and contains:
|
||||
|
||||
```
|
||||
tests/reachability/
|
||||
├── corpus/
|
||||
│ ├── manifest.json # SHA-256 hashes for all corpus files
|
||||
│ ├── java/ # Java test cases
|
||||
│ │ └── <case-id>/
|
||||
│ │ ├── project/ # Source code
|
||||
│ │ ├── callgraph.json # Expected call graph
|
||||
│ │ └── ground-truth.json
|
||||
│ ├── dotnet/ # .NET test cases
|
||||
│ └── native/ # Native (C/C++/Rust) test cases
|
||||
├── fixtures/
|
||||
│ └── reachbench-2025-expanded/
|
||||
│ ├── INDEX.json # Fixture index
|
||||
│ └── cases/
|
||||
│ └── <case-id>/
|
||||
│ └── images/
|
||||
│ ├── reachable/
|
||||
│ │ └── reachgraph.truth.json
|
||||
│ └── unreachable/
|
||||
│ └── reachgraph.truth.json
|
||||
└── StellaOps.Reachability.FixtureTests/
|
||||
├── CorpusFixtureTests.cs
|
||||
└── ReachbenchFixtureTests.cs
|
||||
```
|
||||
|
||||
### Ground-Truth Schema
|
||||
|
||||
All ground-truth files follow the `reachbench.reachgraph.truth/v1` schema:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "reachbench.reachgraph.truth/v1",
|
||||
"case_id": "CVE-2023-38545",
|
||||
"variant": "reachable",
|
||||
"paths": [
|
||||
{
|
||||
"entry_point": "main",
|
||||
"vulnerable_function": "curl_easy_perform",
|
||||
"frames": ["main", "do_http_request", "curl_easy_perform"]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"cve_id": "CVE-2023-38545",
|
||||
"purl": "pkg:generic/curl@8.4.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Running Fixture Tests
|
||||
|
||||
```bash
|
||||
# Run all reachability fixture tests
|
||||
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests
|
||||
|
||||
# Run only corpus tests
|
||||
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
|
||||
--filter "FullyQualifiedName~CorpusFixtureTests"
|
||||
|
||||
# Run only reachbench tests
|
||||
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
|
||||
--filter "FullyQualifiedName~ReachbenchFixtureTests"
|
||||
|
||||
# Cross-platform runner scripts
|
||||
./scripts/reachability/run_all.sh # Unix
|
||||
./scripts/reachability/run_all.ps1 # Windows
|
||||
```
|
||||
|
||||
### CI Integration
|
||||
|
||||
The reachability corpus is validated in CI via `.gitea/workflows/reachability-corpus-ci.yml`:
|
||||
|
||||
1. **validate-corpus**: Runs fixture tests, verifies SHA-256 hashes
|
||||
2. **validate-ground-truths**: Validates schema version and structure
|
||||
3. **determinism-check**: Ensures JSON files have sorted keys
|
||||
|
||||
Triggers:
|
||||
- Push/PR to paths: `tests/reachability/**`, `scripts/reachability/**`
|
||||
- Manual workflow dispatch
|
||||
|
||||
## CAS Layout Reference
|
||||
|
||||
### Content-Addressable Storage Paths
|
||||
|
||||
StellaOps uses BLAKE3 hashes for content-addressable storage:
|
||||
|
||||
| Artifact Type | CAS Path Pattern | Example |
|
||||
|--------------|------------------|---------|
|
||||
| Call Graph | `cas://reachability/graphs/{blake3}` | `cas://reachability/graphs/3a7f2b...` |
|
||||
| Runtime Facts | `cas://reachability/runtime-facts/{blake3}` | `cas://reachability/runtime-facts/8c4d1e...` |
|
||||
| Replay Manifest | `cas://reachability/replay/{blake3}` | `cas://reachability/replay/f2e9c8...` |
|
||||
| Evidence Bundle | `cas://reachability/evidence/{blake3}` | `cas://reachability/evidence/a1b2c3...` |
|
||||
| DSSE Envelope | `cas://attestation/dsse/{blake3}` | `cas://attestation/dsse/d4e5f6...` |
|
||||
| Symbol Manifest | `cas://symbols/manifests/{blake3}` | `cas://symbols/manifests/7g8h9i...` |
|
||||
|
||||
### Hash Algorithm
|
||||
|
||||
All CAS URIs use BLAKE3 with base16 (hex) encoding:
|
||||
|
||||
```
|
||||
cas://{namespace}/{artifact-type}/{blake3-hex}
|
||||
```
|
||||
|
||||
Example hash computation:
|
||||
```python
|
||||
import hashlib
|
||||
# Use BLAKE3 for CAS hashing
|
||||
from blake3 import blake3
|
||||
content_hash = blake3(file_content).hexdigest()
|
||||
```
|
||||
|
||||
## Replay Workflow
|
||||
|
||||
### Replay Manifest v2 Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 2,
|
||||
"hashAlg": "blake3",
|
||||
"hash": "blake3:3a7f2b...",
|
||||
"created_at": "2025-12-14T00:00:00Z",
|
||||
"entries": [
|
||||
{
|
||||
"type": "callgraph",
|
||||
"cas_uri": "cas://reachability/graphs/3a7f2b...",
|
||||
"hash": "blake3:3a7f2b..."
|
||||
},
|
||||
{
|
||||
"type": "runtime-facts",
|
||||
"cas_uri": "cas://reachability/runtime-facts/8c4d1e...",
|
||||
"hash": "blake3:8c4d1e..."
|
||||
}
|
||||
],
|
||||
"code_id_coverage": 0.95
|
||||
}
|
||||
```
|
||||
|
||||
### Replay Steps
|
||||
|
||||
1. **Export replay manifest**:
|
||||
```bash
|
||||
stella replay export --scan-id <scan-id> --output replay-manifest.json
|
||||
```
|
||||
|
||||
2. **Validate manifest integrity**:
|
||||
```bash
|
||||
stella replay validate --manifest replay-manifest.json
|
||||
```
|
||||
|
||||
3. **Fetch CAS artifacts** (online):
|
||||
```bash
|
||||
stella replay fetch --manifest replay-manifest.json --output ./artifacts/
|
||||
```
|
||||
|
||||
4. **Import for replay** (air-gapped):
|
||||
```bash
|
||||
stella replay import --bundle replay-bundle.tar.gz --verify
|
||||
```
|
||||
|
||||
5. **Execute replay**:
|
||||
```bash
|
||||
stella replay run --manifest replay-manifest.json --compare-to <baseline-hash>
|
||||
```
|
||||
|
||||
### Validation Error Codes
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `REPLAY_MANIFEST_MISSING_VERSION` | Manifest missing version field |
|
||||
| `VERSION_MISMATCH` | Unexpected manifest version |
|
||||
| `MISSING_HASH_ALG` | Hash algorithm not specified |
|
||||
| `UNSORTED_ENTRIES` | CAS entries not sorted (non-deterministic) |
|
||||
| `CAS_NOT_FOUND` | Referenced CAS artifact missing |
|
||||
| `HASH_MISMATCH` | Computed hash differs from declared |
|
||||
|
||||
## Benchmark Automation
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
```bash
|
||||
# Full benchmark pipeline
|
||||
./scripts/bench/run-baseline.sh --all
|
||||
|
||||
# Individual steps
|
||||
./scripts/bench/run-baseline.sh --populate # Generate findings from fixtures
|
||||
./scripts/bench/run-baseline.sh --compute # Compute metrics
|
||||
|
||||
# Compare with baseline scanner
|
||||
./scripts/bench/run-baseline.sh --compare baseline-results.json
|
||||
```
|
||||
|
||||
### Benchmark Outputs
|
||||
|
||||
Results are written to `bench/results/`:
|
||||
|
||||
- `summary.csv`: Per-run metrics (TP, FP, TN, FN, precision, recall, F1)
|
||||
- `metrics.json`: Detailed findings with evidence hashes
|
||||
- `replay/`: Replay outputs for verification
|
||||
|
||||
### Verification Tools
|
||||
|
||||
```bash
|
||||
# Online verification (DSSE + Rekor)
|
||||
./bench/tools/verify.sh <finding-bundle>
|
||||
|
||||
# Offline verification
|
||||
python3 bench/tools/verify.py --bundle <finding-dir> --offline
|
||||
|
||||
# Compare scanners
|
||||
python3 bench/tools/compare.py --baseline <scanner-results> --json
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Function-Level Evidence Guide](../docs/reachability/function-level-evidence.md)
|
||||
- [Reachability Runtime Runbook](../docs/runbooks/reachability-runtime.md)
|
||||
- [Replay Manifest Specification](../docs/replay/DETERMINISTIC_REPLAY.md)
|
||||
- [VEX Evidence Playbook](../docs/benchmarks/vex-evidence-playbook.md)
|
||||
- [Ground-Truth Schema](../docs/reachability/ground-truth-schema.md)
|
||||
Reference in New Issue
Block a user