feat: Add Scanner CI runner and related artifacts
- Implemented `run-scanner-ci.sh` to build and run tests for the Scanner solution with a warmed NuGet cache. - Created `excititor-vex-traces.json` dashboard for monitoring Excititor VEX observations. - Added Docker Compose configuration for the OTLP span sink in `docker-compose.spansink.yml`. - Configured OpenTelemetry collector in `otel-spansink.yaml` to receive and process traces. - Developed `run-spansink.sh` script to run the OTLP span sink for Excititor traces. - Introduced `FileSystemRiskBundleObjectStore` for storing risk bundle artifacts in the filesystem. - Built `RiskBundleBuilder` for creating risk bundles with associated metadata and providers. - Established `RiskBundleJob` to execute the risk bundle creation and storage process. - Defined models for risk bundle inputs, entries, and manifests in `RiskBundleModels.cs`. - Implemented signing functionality for risk bundle manifests with `HmacRiskBundleManifestSigner`. - Created unit tests for `RiskBundleBuilder`, `RiskBundleJob`, and signing functionality to ensure correctness. - Added filesystem artifact reader tests to validate manifest parsing and artifact listing. - Included test manifests for egress scenarios in the task runner tests. - Developed timeline query service tests to verify tenant and event ID handling.
This commit is contained in:
@@ -13,6 +13,7 @@ Deterministic, reproducible benchmark for reachability analysis tools.
|
||||
- `benchmark/truth/` — ground-truth labels (hidden/internal split optional).
|
||||
- `benchmark/submissions/` — sample submissions and format reference.
|
||||
- `tools/scorer/` — `rb-score` CLI and tests.
|
||||
- `tools/build/` — `build_all.py` (run all cases) and `validate_builds.py` (run twice and compare hashes).
|
||||
- `baselines/` — reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
|
||||
- `ci/` — deterministic CI workflows and scripts.
|
||||
- `website/` — static site (leaderboard/docs/downloads).
|
||||
|
||||
@@ -28,6 +28,8 @@ python -m pip install -r requirements.txt
|
||||
python -m unittest tests/test_scoring.py
|
||||
```
|
||||
|
||||
Explainability tiers (task 513-009) are covered by `test_explainability_tiers` in `tests/test_scoring.py`.
|
||||
|
||||
## Notes
|
||||
- Predictions for sinks not present in truth count as false positives (strict posture).
|
||||
- Truth sinks with label `unknown` are ignored for FN/FP counting.
|
||||
|
||||
Binary file not shown.
@@ -65,6 +65,37 @@ class TestScoring(unittest.TestCase):
|
||||
self.assertEqual(report.f1, 0.0)
|
||||
self.assertEqual(report.determinism_rate, 1.0)
|
||||
|
||||
def test_explainability_tiers(self):
|
||||
# Build synthetic predictions to exercise explainability tiers 0-3
|
||||
preds = [
|
||||
{"sink_id": "a", "prediction": "reachable", "explain": {}}, # tier 0
|
||||
{"sink_id": "b", "prediction": "reachable", "explain": {"path": ["f1", "f2"]}}, # tier 1
|
||||
{"sink_id": "c", "prediction": "reachable", "explain": {"entry": "E", "path": ["f1", "f2", "f3"]}}, # tier 2
|
||||
{"sink_id": "d", "prediction": "reachable", "explain": {"guards": ["x"], "path": ["f1", "f2"]}}, # tier 3
|
||||
]
|
||||
# Minimal truth to allow scoring
|
||||
truth_doc = {
|
||||
"version": "1.0.0",
|
||||
"cases": [
|
||||
{
|
||||
"case_id": "case-1",
|
||||
"sinks": [
|
||||
{"sink_id": s, "label": "reachable"} for s in ["a", "b", "c", "d"]
|
||||
],
|
||||
}
|
||||
],
|
||||
}
|
||||
submission = {
|
||||
"version": "1.0.0",
|
||||
"tool": {"name": "t", "version": "1"},
|
||||
"run": {"platform": "x"},
|
||||
"cases": [{"case_id": "case-1", "sinks": preds}],
|
||||
}
|
||||
|
||||
report = rb_score.score(truth_doc, submission)
|
||||
# explainability average should be (0+1+2+3)/4 = 1.5
|
||||
self.assertAlmostEqual(report.explain_avg, 1.5, places=4)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
||||
Reference in New Issue
Block a user