Update SPRINT_0513_0001_0001_public_reachability_benchmark.md
This commit is contained in:
@@ -32,7 +32,7 @@
|
||||
| 2 | BENCH-SCHEMA-513-002 | DONE (2025-11-29) | Depends on 513-001. | Bench Guild | Define and publish schemas: `case.schema.yaml` (component, sink, label, evidence), `entrypoints.schema.yaml`, `truth.schema.yaml`, `submission.schema.json`. Include JSON Schema validation. |
|
||||
| 3 | BENCH-CASES-JS-513-003 | DONE (2025-11-30) | Depends on 513-002. | Bench Guild · JS Track (`bench/reachability-benchmark/cases/js`) | Create 5-8 JavaScript/Node.js cases: 2 small (Express), 2 medium (Fastify/Koa), mix of reachable/unreachable. Include Dockerfiles, package-lock.json, unit test oracles, coverage output. Delivered 5 cases: unsafe-eval (reachable), guarded-eval (unreachable), express-eval (reachable), express-guarded (unreachable), fastify-template (reachable). |
|
||||
| 4 | BENCH-CASES-PY-513-004 | DONE (2025-11-30) | Depends on 513-002. | Bench Guild · Python Track (`bench/reachability-benchmark/cases/py`) | Create 5-8 Python cases: Flask, Django, FastAPI. Include requirements.txt pinned, pytest oracles, coverage.py output. Delivered 5 cases: unsafe-exec (reachable), guarded-exec (unreachable), flask-template (reachable), fastapi-guarded (unreachable), django-ssti (reachable). |
|
||||
| 5 | BENCH-CASES-JAVA-513-005 | DONE (2025-12-05) | Vendored Temurin 21 via `tools/java/ensure_jdk.sh`; build_all updated | Bench Guild <20> Java Track (`bench/reachability-benchmark/cases/java`) | Create 5-8 Java cases: Spring Boot, Micronaut. Delivered 5 cases (`spring-deserialize`, `spring-guarded`, `micronaut-deserialize`, `micronaut-guarded`, `spring-reflection`) with coverage/traces and skip-lang aware builds using vendored JDK fallback. |
|
||||
| 5 | BENCH-CASES-JAVA-513-005 | DONE (2025-12-05) | Vendored Temurin 21 via `tools/java/ensure_jdk.sh`; build_all updated | Bench Guild <20> Java Track (`bench/reachability-benchmark/cases/java`) | Create 5-8 Java cases: Spring Boot, Micronaut. Delivered 5 cases (`spring-deserialize`, `spring-guarded`, `micronaut-deserialize`, `micronaut-guarded`, `spring-reflection`) with coverage/traces and skip-lang aware builds using vendored JDK fallback. |
|
||||
| 6 | BENCH-CASES-C-513-006 | DONE (2025-12-01) | Depends on 513-002. | Bench Guild · Native Track (`bench/reachability-benchmark/cases/c`) | Create 3-5 C/ELF cases: small HTTP servers, crypto utilities. Include Makefile, gcov/llvm-cov coverage, deterministic builds (SOURCE_DATE_EPOCH). |
|
||||
| 7 | BENCH-BUILD-513-007 | DONE (2025-12-02) | Depends on 513-003 through 513-006. | Bench Guild · DevOps Guild | Implement `build_all.py` and `validate_builds.py`: deterministic Docker builds, hash verification, SBOM generation (syft), attestation stubs. Progress: scripts now auto-emit deterministic SBOM/attestation stubs from `case.yaml`; validate checks auxiliary artifact determinism; SBOM swap-in for syft still pending. |
|
||||
| 8 | BENCH-SCORER-513-008 | DONE (2025-11-30) | Depends on 513-002. | Bench Guild (`bench/reachability-benchmark/tools/scorer`) | Implement `rb-score` CLI: load cases/truth, validate submissions, compute precision/recall/F1, explainability score (0-3), runtime stats, determinism rate. |
|
||||
@@ -40,7 +40,7 @@
|
||||
| 10 | BENCH-BASELINE-SEMGREP-513-010 | DONE (2025-12-01) | Depends on 513-008 and cases. | Bench Guild | Semgrep baseline runner: added `baselines/semgrep/run_case.sh`, `run_all.sh`, rules, and `normalize.py` to emit benchmark submissions deterministically (telemetry off, schema-compliant). |
|
||||
| 11 | BENCH-BASELINE-CODEQL-513-011 | DONE (2025-12-01) | Depends on 513-008 and cases. | Bench Guild | CodeQL baseline runner: deterministic offline-safe runner producing schema-compliant submissions (fallback unreachable when CodeQL missing). |
|
||||
| 12 | BENCH-BASELINE-STELLA-513-012 | DONE (2025-12-01) | Depends on 513-008 and Sprint 0401 reachability. | Bench Guild · Scanner Guild | Stella Ops baseline runner: deterministic offline runner building submission from truth; stable ordering, no external deps. |
|
||||
| 13 | BENCH-CI-513-013 | DONE (2025-12-01) | Depends on 513-007, 513-008. | Bench Guild <20> DevOps Guild | GitHub Actions-style script: validate schemas, deterministic build_all (vendored JDK; skip-lang flag for missing toolchains), run Semgrep/Stella/CodeQL baselines, produce leaderboard. |
|
||||
| 13 | BENCH-CI-513-013 | DONE (2025-12-01) | Depends on 513-007, 513-008. | Bench Guild <20> DevOps Guild | GitHub Actions-style script: validate schemas, deterministic build_all (vendored JDK; skip-lang flag for missing toolchains), run Semgrep/Stella/CodeQL baselines, produce leaderboard. |
|
||||
| 14 | BENCH-LEADERBOARD-513-014 | DONE (2025-12-01) | Depends on 513-008. | Bench Guild | Implemented `rb-compare` to generate `leaderboard.json` from multiple submissions; deterministic sorting. |
|
||||
| 15 | BENCH-WEBSITE-513-015 | DONE (2025-12-01) | Depends on 513-014. | UI Guild · Bench Guild (`bench/reachability-benchmark/website`) | Static website: home page, leaderboard rendering, docs (how to run, how to submit), download links. Use Docusaurus or plain HTML. |
|
||||
| 16 | BENCH-DOCS-513-016 | DONE (2025-12-01) | Depends on all above. | Docs Guild | CONTRIBUTING.md, submission guide, governance doc (TAC roles, hidden test set rotation), quarterly update cadence. |
|
||||
@@ -55,7 +55,7 @@
|
||||
| W1 Foundation | Bench Guild · DevOps Guild | None | DONE (2025-11-29) | Tasks 1-2 shipped: repo + schemas. |
|
||||
| W2 Dataset | Bench Guild (per language track) | W1 complete | DONE (2025-12-05) | JS/PY/C cases DONE; Java track unblocked via vendored JDK with 5 cases and coverage/traces; builds deterministic with skip-lang option. |
|
||||
| W3 Scoring | Bench Guild | W1 complete | DONE (2025-11-30) | Tasks 8-9 shipped: scorer + explainability tiers/tests. |
|
||||
| W4 Baselines | Bench Guild <20> Scanner Guild | W2, W3 complete | DONE (2025-12-01) | Tasks 10-12 shipped: Semgrep, CodeQL, Stella baselines (offline-safe). |
|
||||
| W4 Baselines | Bench Guild <20> Scanner Guild | W2, W3 complete | DONE (2025-12-01) | Tasks 10-12 shipped: Semgrep, CodeQL, Stella baselines (offline-safe). |
|
||||
| W5 Publish | All Guilds | W4 complete | DONE (2025-12-01) | Tasks 13-17 shipped: CI, leaderboard, website, docs, launch. |
|
||||
|
||||
## Wave Detail Snapshots
|
||||
@@ -95,6 +95,7 @@
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-05 | Verified JS builds with Node shim (`tools/node/node`) and vendored JDK; all cases build individually; build_all covers JS when Node is present (shim included in PATH). | Implementer |
|
||||
| 2025-12-05 | BENCH-CASES-JAVA-513-005 DONE: vendored Temurin 21 via `tools/java/ensure_jdk.sh`, added micronaut-deserialize/guarded + spring-reflection cases with coverage/traces, updated build_all skip-lang + CI comment, and ran `python tools/build/build_all.py --cases cases --skip-lang js` (Java pass; js skipped due to missing Node). | Implementer |
|
||||
| 2025-12-03 | Closed BENCH-GAPS-513-018, DATASET-GAPS-513-019, REACH-FIXTURE-GAPS-513-020: added manifest schema + sample with hashes/SBOM/attestation, coverage/trace schemas, sandbox/redaction fields in case schema, determinism env templates, dataset safety checklist, offline kit packager, semgrep rule hash, and `tools/verify_manifest.py` validation (all cases validated; Java build still blocked on JDK). | Implementer |
|
||||
| 2025-12-02 | BENCH-BUILD-513-007: added optional Syft SBOM path with deterministic fallback stub, attestation/SBOM stub tests, and verified via `python bench/reachability-benchmark/tools/build/test_build_tools.py`. Status set to DONE. | Bench Guild |
|
||||
|
||||
Reference in New Issue
Block a user