- Implemented `run-scanner-ci.sh` to build and run tests for the Scanner solution with a warmed NuGet cache. - Created `excititor-vex-traces.json` dashboard for monitoring Excititor VEX observations. - Added Docker Compose configuration for the OTLP span sink in `docker-compose.spansink.yml`. - Configured OpenTelemetry collector in `otel-spansink.yaml` to receive and process traces. - Developed `run-spansink.sh` script to run the OTLP span sink for Excititor traces. - Introduced `FileSystemRiskBundleObjectStore` for storing risk bundle artifacts in the filesystem. - Built `RiskBundleBuilder` for creating risk bundles with associated metadata and providers. - Established `RiskBundleJob` to execute the risk bundle creation and storage process. - Defined models for risk bundle inputs, entries, and manifests in `RiskBundleModels.cs`. - Implemented signing functionality for risk bundle manifests with `HmacRiskBundleManifestSigner`. - Created unit tests for `RiskBundleBuilder`, `RiskBundleJob`, and signing functionality to ensure correctness. - Added filesystem artifact reader tests to validate manifest parsing and artifact listing. - Included test manifests for egress scenarios in the task runner tests. - Developed timeline query service tests to verify tenant and event ID handling.
rb-score
Deterministic scorer for the reachability benchmark.
What it does
- Validates submissions against
schemas/submission.schema.jsonand truth againstschemas/truth.schema.json. - Computes precision/recall/F1 (micro, sink-level).
- Computes explainability score per prediction (0–3) and averages it.
- Checks duplicate predictions for determinism (inconsistent duplicates lower the rate).
- Surfaces runtime metadata from the submission (
runblock).
Install (offline-friendly)
python -m pip install -r requirements.txt
Usage
./rb_score.py --truth ../../benchmark/truth/public.json --submission ../../benchmark/submissions/sample.json --format json
Output
text(default): short human-readable summary.json: deterministic JSON with top-level metrics and per-case breakdown.
Tests
python -m unittest tests/test_scoring.py
Explainability tiers (task 513-009) are covered by test_explainability_tiers in tests/test_scoring.py.
Notes
- Predictions for sinks not present in truth count as false positives (strict posture).
- Truth sinks with label
unknownare ignored for FN/FP counting. - Explainability tiering: 0=no context; 1=path>=2 nodes; 2=entry + path>=3; 3=guards present.