up

2025-12-14 15:50:38 +02:00
parent f1a39c4ce3
commit 233873f620
249 changed files with 29746 additions and 154 deletions
--- a/tests/README.md
+++ b/tests/README.md
@@ -0,0 +1,229 @@
+# StellaOps Test Infrastructure
+
+This document describes the test infrastructure for StellaOps, including reachability corpus fixtures, benchmark automation, and CI integration.
+
+## Reachability Test Fixtures
+
+### Corpus Structure
+
+The reachability corpus is located at `tests/reachability/` and contains:
+
+```
+tests/reachability/
+├── corpus/
+│   ├── manifest.json          # SHA-256 hashes for all corpus files
+│   ├── java/                  # Java test cases
+│   │   └── <case-id>/
+│   │       ├── project/       # Source code
+│   │       ├── callgraph.json # Expected call graph
+│   │       └── ground-truth.json
+│   ├── dotnet/                # .NET test cases
+│   └── native/                # Native (C/C++/Rust) test cases
+├── fixtures/
+│   └── reachbench-2025-expanded/
+│       ├── INDEX.json         # Fixture index
+│       └── cases/
+│           └── <case-id>/
+│               └── images/
+│                   ├── reachable/
+│                   │   └── reachgraph.truth.json
+│                   └── unreachable/
+│                       └── reachgraph.truth.json
+└── StellaOps.Reachability.FixtureTests/
+    ├── CorpusFixtureTests.cs
+    └── ReachbenchFixtureTests.cs
+```
+
+### Ground-Truth Schema
+
+All ground-truth files follow the `reachbench.reachgraph.truth/v1` schema:
+
+```json
+{
+  "schema_version": "reachbench.reachgraph.truth/v1",
+  "case_id": "CVE-2023-38545",
+  "variant": "reachable",
+  "paths": [
+    {
+      "entry_point": "main",
+      "vulnerable_function": "curl_easy_perform",
+      "frames": ["main", "do_http_request", "curl_easy_perform"]
+    }
+  ],
+  "metadata": {
+    "cve_id": "CVE-2023-38545",
+    "purl": "pkg:generic/curl@8.4.0"
+  }
+}
+```
+
+### Running Fixture Tests
+
+```bash
+# Run all reachability fixture tests
+dotnet test tests/reachability/StellaOps.Reachability.FixtureTests
+
+# Run only corpus tests
+dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
+  --filter "FullyQualifiedName~CorpusFixtureTests"
+
+# Run only reachbench tests
+dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
+  --filter "FullyQualifiedName~ReachbenchFixtureTests"
+
+# Cross-platform runner scripts
+./scripts/reachability/run_all.sh       # Unix
+./scripts/reachability/run_all.ps1      # Windows
+```
+
+### CI Integration
+
+The reachability corpus is validated in CI via `.gitea/workflows/reachability-corpus-ci.yml`:
+
+1. **validate-corpus**: Runs fixture tests, verifies SHA-256 hashes
+2. **validate-ground-truths**: Validates schema version and structure
+3. **determinism-check**: Ensures JSON files have sorted keys
+
+Triggers:
+- Push/PR to paths: `tests/reachability/**`, `scripts/reachability/**`
+- Manual workflow dispatch
+
+## CAS Layout Reference
+
+### Content-Addressable Storage Paths
+
+StellaOps uses BLAKE3 hashes for content-addressable storage:
+
+| Artifact Type | CAS Path Pattern | Example |
+|--------------|------------------|---------|
+| Call Graph | `cas://reachability/graphs/{blake3}` | `cas://reachability/graphs/3a7f2b...` |
+| Runtime Facts | `cas://reachability/runtime-facts/{blake3}` | `cas://reachability/runtime-facts/8c4d1e...` |
+| Replay Manifest | `cas://reachability/replay/{blake3}` | `cas://reachability/replay/f2e9c8...` |
+| Evidence Bundle | `cas://reachability/evidence/{blake3}` | `cas://reachability/evidence/a1b2c3...` |
+| DSSE Envelope | `cas://attestation/dsse/{blake3}` | `cas://attestation/dsse/d4e5f6...` |
+| Symbol Manifest | `cas://symbols/manifests/{blake3}` | `cas://symbols/manifests/7g8h9i...` |
+
+### Hash Algorithm
+
+All CAS URIs use BLAKE3 with base16 (hex) encoding:
+
+```
+cas://{namespace}/{artifact-type}/{blake3-hex}
+```
+
+Example hash computation:
+```python
+import hashlib
+# Use BLAKE3 for CAS hashing
+from blake3 import blake3
+content_hash = blake3(file_content).hexdigest()
+```
+
+## Replay Workflow
+
+### Replay Manifest v2 Schema
+
+```json
+{
+  "version": 2,
+  "hashAlg": "blake3",
+  "hash": "blake3:3a7f2b...",
+  "created_at": "2025-12-14T00:00:00Z",
+  "entries": [
+    {
+      "type": "callgraph",
+      "cas_uri": "cas://reachability/graphs/3a7f2b...",
+      "hash": "blake3:3a7f2b..."
+    },
+    {
+      "type": "runtime-facts",
+      "cas_uri": "cas://reachability/runtime-facts/8c4d1e...",
+      "hash": "blake3:8c4d1e..."
+    }
+  ],
+  "code_id_coverage": 0.95
+}
+```
+
+### Replay Steps
+
+1. **Export replay manifest**:
+   ```bash
+   stella replay export --scan-id <scan-id> --output replay-manifest.json
+   ```
+
+2. **Validate manifest integrity**:
+   ```bash
+   stella replay validate --manifest replay-manifest.json
+   ```
+
+3. **Fetch CAS artifacts** (online):
+   ```bash
+   stella replay fetch --manifest replay-manifest.json --output ./artifacts/
+   ```
+
+4. **Import for replay** (air-gapped):
+   ```bash
+   stella replay import --bundle replay-bundle.tar.gz --verify
+   ```
+
+5. **Execute replay**:
+   ```bash
+   stella replay run --manifest replay-manifest.json --compare-to <baseline-hash>
+   ```
+
+### Validation Error Codes
+
+| Code | Description |
+|------|-------------|
+| `REPLAY_MANIFEST_MISSING_VERSION` | Manifest missing version field |
+| `VERSION_MISMATCH` | Unexpected manifest version |
+| `MISSING_HASH_ALG` | Hash algorithm not specified |
+| `UNSORTED_ENTRIES` | CAS entries not sorted (non-deterministic) |
+| `CAS_NOT_FOUND` | Referenced CAS artifact missing |
+| `HASH_MISMATCH` | Computed hash differs from declared |
+
+## Benchmark Automation
+
+### Running Benchmarks
+
+```bash
+# Full benchmark pipeline
+./scripts/bench/run-baseline.sh --all
+
+# Individual steps
+./scripts/bench/run-baseline.sh --populate   # Generate findings from fixtures
+./scripts/bench/run-baseline.sh --compute    # Compute metrics
+
+# Compare with baseline scanner
+./scripts/bench/run-baseline.sh --compare baseline-results.json
+```
+
+### Benchmark Outputs
+
+Results are written to `bench/results/`:
+
+- `summary.csv`: Per-run metrics (TP, FP, TN, FN, precision, recall, F1)
+- `metrics.json`: Detailed findings with evidence hashes
+- `replay/`: Replay outputs for verification
+
+### Verification Tools
+
+```bash
+# Online verification (DSSE + Rekor)
+./bench/tools/verify.sh <finding-bundle>
+
+# Offline verification
+python3 bench/tools/verify.py --bundle <finding-dir> --offline
+
+# Compare scanners
+python3 bench/tools/compare.py --baseline <scanner-results> --json
+```
+
+## References
+
+- [Function-Level Evidence Guide](../docs/reachability/function-level-evidence.md)
+- [Reachability Runtime Runbook](../docs/runbooks/reachability-runtime.md)
+- [Replay Manifest Specification](../docs/replay/DETERMINISTIC_REPLAY.md)
+- [VEX Evidence Playbook](../docs/benchmarks/vex-evidence-playbook.md)
+- [Ground-Truth Schema](../docs/reachability/ground-truth-schema.md)