feat: Add Scanner CI runner and related artifacts

- Implemented `run-scanner-ci.sh` to build and run tests for the Scanner solution with a warmed NuGet cache. - Created `excititor-vex-traces.json` dashboard for monitoring Excititor VEX observations. - Added Docker Compose configuration for the OTLP span sink in `docker-compose.spansink.yml`. - Configured OpenTelemetry collector in `otel-spansink.yaml` to receive and process traces. - Developed `run-spansink.sh` script to run the OTLP span sink for Excititor traces. - Introduced `FileSystemRiskBundleObjectStore` for storing risk bundle artifacts in the filesystem. - Built `RiskBundleBuilder` for creating risk bundles with associated metadata and providers. - Established `RiskBundleJob` to execute the risk bundle creation and storage process. - Defined models for risk bundle inputs, entries, and manifests in `RiskBundleModels.cs`. - Implemented signing functionality for risk bundle manifests with `HmacRiskBundleManifestSigner`. - Created unit tests for `RiskBundleBuilder`, `RiskBundleJob`, and signing functionality to ensure correctness. - Added filesystem artifact reader tests to validate manifest parsing and artifact listing. - Included test manifests for egress scenarios in the task runner tests. - Developed timeline query service tests to verify tenant and event ID handling.
2025-11-30 19:12:35 +02:00
parent 17d45a6d30
commit 71e9a56cfd
92 changed files with 2596 additions and 387 deletions
--- a/bench/reachability-benchmark/README.md
+++ b/bench/reachability-benchmark/README.md
@@ -13,6 +13,7 @@ Deterministic, reproducible benchmark for reachability analysis tools.
 - `benchmark/truth/` — ground-truth labels (hidden/internal split optional).
 - `benchmark/submissions/` — sample submissions and format reference.
 - `tools/scorer/` — `rb-score` CLI and tests.
+- `tools/build/` — `build_all.py` (run all cases) and `validate_builds.py` (run twice and compare hashes).
 - `baselines/` — reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
 - `ci/` — deterministic CI workflows and scripts.
 - `website/` — static site (leaderboard/docs/downloads).
--- a/bench/reachability-benchmark/tools/scorer/README.md
+++ b/bench/reachability-benchmark/tools/scorer/README.md
@@ -28,6 +28,8 @@ python -m pip install -r requirements.txt
 python -m unittest tests/test_scoring.py
 ```

+Explainability tiers (task 513-009) are covered by `test_explainability_tiers` in `tests/test_scoring.py`.
+
 ## Notes
 - Predictions for sinks not present in truth count as false positives (strict posture).
 - Truth sinks with label `unknown` are ignored for FN/FP counting.
--- a/bench/reachability-benchmark/tools/scorer/tests/pycache/test_scoring.cpython-312.pyc
+++ b/bench/reachability-benchmark/tools/scorer/tests/pycache/test_scoring.cpython-312.pyc
--- a/bench/reachability-benchmark/tools/scorer/tests/test_scoring.py
+++ b/bench/reachability-benchmark/tools/scorer/tests/test_scoring.py
@@ -65,6 +65,37 @@ class TestScoring(unittest.TestCase):
        self.assertEqual(report.f1, 0.0)
        self.assertEqual(report.determinism_rate, 1.0)

+    def test_explainability_tiers(self):
+        # Build synthetic predictions to exercise explainability tiers 0-3
+        preds = [
+            {"sink_id": "a", "prediction": "reachable", "explain": {}},  # tier 0
+            {"sink_id": "b", "prediction": "reachable", "explain": {"path": ["f1", "f2"]}},  # tier 1
+            {"sink_id": "c", "prediction": "reachable", "explain": {"entry": "E", "path": ["f1", "f2", "f3"]}},  # tier 2
+            {"sink_id": "d", "prediction": "reachable", "explain": {"guards": ["x"], "path": ["f1", "f2"]}},  # tier 3
+        ]
+        # Minimal truth to allow scoring
+        truth_doc = {
+            "version": "1.0.0",
+            "cases": [
+                {
+                    "case_id": "case-1",
+                    "sinks": [
+                        {"sink_id": s, "label": "reachable"} for s in ["a", "b", "c", "d"]
+                    ],
+                }
+            ],
+        }
+        submission = {
+            "version": "1.0.0",
+            "tool": {"name": "t", "version": "1"},
+            "run": {"platform": "x"},
+            "cases": [{"case_id": "case-1", "sinks": preds}],
+        }
+
+        report = rb_score.score(truth_doc, submission)
+        # explainability average should be (0+1+2+3)/4 = 1.5
+        self.assertAlmostEqual(report.explain_avg, 1.5, places=4)
+

 if __name__ == "__main__":
    unittest.main()