feat(scanner): Implement Deno analyzer and associated tests
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Added Deno analyzer with comprehensive metadata and evidence structure.
- Created a detailed implementation plan for Sprint 130 focusing on Deno analyzer.
- Introduced AdvisoryAiGuardrailOptions for managing guardrail configurations.
- Developed GuardrailPhraseLoader for loading blocked phrases from JSON files.
- Implemented tests for AdvisoryGuardrailOptions binding and phrase loading.
- Enhanced telemetry for Advisory AI with metrics tracking.
- Added VexObservationProjectionService for querying VEX observations.
- Created extensive tests for VexObservationProjectionService functionality.
- Introduced Ruby language analyzer with tests for simple and complex workspaces.
- Added Ruby application fixtures for testing purposes.
This commit is contained in:
master
2025-11-12 10:01:54 +02:00
parent 0e8655cbb1
commit babb81af52
75 changed files with 3346 additions and 187 deletions

View File

@@ -36,6 +36,8 @@ This guide translates the deterministic reachability blueprint into concrete wor
| Stream | Owner Guild(s) | Key deliverables |
|--------|----------------|------------------|
| **Native symbols & callgraphs** | Scanner Worker · Symbols Guild | Ship `Scanner.Symbols.Native` + `Scanner.CallGraph.Native`, integrate Symbol Manifest v1, demangle Itanium/MSVC names, emit `FuncNode`/`CallEdge` CAS bundles (task `SCANNER-NATIVE-401-015`). |
| **Reachability store** | Signals · BE-Base Platform | Provision shared Mongo collections (`func_nodes`, `call_edges`, `cve_func_hits`), indexes, and repositories plus REST hooks for reuse (task `SIG-STORE-401-016`). |
| **Language lifters** | Scanner Worker | CLI/hosted lifters for DotNet, Go, Node/Deno, JVM, Rust, Swift, Binary, Shell with CAS uploads and richgraph output |
| **Signals ingestion & scoring** | Signals | `/callgraphs`, `/runtime-facts` (JSON + NDJSON/gzip), `/graphs/{id}`, `/reachability/recompute` GA; CAS-backed storage, runtime dedupe, BFS+predicates scoring |
| **Runtime capture** | Zastava + Runtime Guild | EntryTrace/eBPF samplers, NDJSON batches (symbol IDs + timestamps + counts) |
@@ -104,7 +106,8 @@ Each sprint is two weeks; refer to `docs/implplan/SPRINT_401_reachability_eviden
- Place developer-facing updates here (`docs/reachability`).
- [Function-level evidence guide](function-level-evidence.md) captures the Nov2025 advisory scope, task references, and schema expectations; keep it in lockstep with sprint status.
- Operator runbooks (`docs/runbooks/reachability-runtime.md`) TODO reference to be added when runtime pipeline lands.
- [Reachability runtime runbook](../runbooks/reachability-runtime.md) now documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
- Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.
---

View File

@@ -1,6 +1,6 @@
# Function-Level Evidence Readiness (Nov 2025 Advisory)
_Last updated: 2025-11-09. Owner: Business Analysis Guild._
_Last updated: 2025-11-12. Owner: Business Analysis Guild._
This memo captures the outstanding work required to make StellaOps scanners emit stable, function-level evidence that matches the November2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.
@@ -62,6 +62,18 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
* Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains.
* Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays.
### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
* Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that:
* parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
* emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and
* attach analyzer + toolchain identifiers consumed by `richgraph-v1`.
* Introduce `Reachability.Store` collections in Mongo:
* `func_nodes` keyed by `func:<format>:<sha256>:<va>` with `{binDigest,name,addr,size,lang,confidence,sym}`.
* `call_edges` `{from,to,kind,confidence,evidence[]}` linking internal/external nodes.
* `cve_func_hits` `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment.
* Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
---
## 4. Schema & API Touchpoints
@@ -86,6 +98,50 @@ API contracts to amend:
- Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
- Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
### 4.2 Reachability store layout (SIG-STORE-401-016)
All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
```json
// func_nodes
{
"_id": "func:ELF:sha256:4012a0",
"binDigest": "sha256:deadbeef...",
"name": "ssl3_read_bytes",
"addr": "0x4012a0",
"size": 312,
"lang": "c",
"confidence": 0.92,
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
"sym": "present"
}
// call_edges
{
"from": "func:ELF:sha256:4012a0",
"to": "func:ELF:sha256:40f0ff",
"kind": "static",
"confidence": 0.88,
"evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
}
// cve_func_hits
{
"cve": "CVE-2023-XXXX",
"purl": "pkg:generic/openssl@1.1.1u",
"func_id": "func:ELF:sha256:4012a0",
"match": "name+version",
"confidence": 0.77,
"source": "concelier:openssl-advisory"
}
```
Writers **must**:
1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable.
2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing.
3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance.
---
## 5. Test & Fixture Expectations