prep docs and service updates
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
This commit is contained in:
@@ -45,6 +45,7 @@ This guide translates the deterministic reachability blueprint into concrete wor
|
||||
| **Authority attestations** | Authority + Signer | DSSE predicates for SBOM, Graph, Replay, VEX; Rekor mirror alignment |
|
||||
| **Policy & VEX** | Policy Engine + Web + CLI + UI | Accept reachability states, render “Why safe” call paths, CLI/UI explain flows |
|
||||
| **QA & Docs** | QA + Docs Guilds | `reachbench-2025-expanded` fixtures wired to CI; operator + developer runbooks |
|
||||
| **Binary quality guardrails (Nov 2026)** | Scanner · Signals · QA | Build-id capture, init-array roots, purl-resolved edges, unknowns emission, and patch-oracle fixtures; see sections 5.7–5.9 |
|
||||
|
||||
---
|
||||
|
||||
@@ -90,6 +91,38 @@ Each sprint is two weeks; refer to `docs/implplan/SPRINT_401_reachability_eviden
|
||||
3. **UI/CLI** – Visual explain drawer/CLI command showing signed call-path, predicates, runtime hits; counterfactual toggles.
|
||||
4. **VEX emitter** – generate OpenVEX statements with evidence references, DSSE sign via Signer.
|
||||
|
||||
### 5.5 Native binaries (build-id + init roots)
|
||||
|
||||
- Capture ELF build-id (`.note.gnu.build-id`) alongside soname/path and propagate into `SymbolID`/`code_id` so SBOM/runtime joins stay stable even when paths change.
|
||||
- Treat `.preinit_array`, `.init_array`, `.ctors`, and `_init` as synthetic graph roots with `phase=load`; include constructors from `DT_NEEDED` deps. Persist the root list in scan evidence.
|
||||
- Add deterministic tests covering build-id present/absent and init-array edge creation.
|
||||
|
||||
### 5.6 PURL-resolved edges
|
||||
|
||||
- Annotate every call edge with callee `purl` and `symbol_digest` per `docs/reachability/purl-resolved-edges.md`.
|
||||
- Update `richgraph-v1` schema, CAS metadata, and CLI/UI explainers to display `purl@version` + demangled name.
|
||||
- Signals merges graphs by `(purl, symbol_digest)`; Policy uses the same keys when mapping CVE-affected functions.
|
||||
|
||||
### 5.7 Unknowns Registry integration
|
||||
|
||||
- Emit structured Unknowns when symbol→purl mapping, edge targets, or hashes are ambiguous; write them via Signals API per `docs/signals/unknowns-registry.md`.
|
||||
- Scoring adds `unknowns_pressure` so `not_affected` claims cannot bypass unresolved evidence.
|
||||
- UI/CLI should surface unknown chips and triage actions.
|
||||
|
||||
### 5.8 Patch-oracle guardrails
|
||||
|
||||
- Add `tests/reachability/patch-oracles/**` with paired vuln/fixed binaries and `oracle.yml` expectations (functions/edges added/removed).
|
||||
- Scanner binary analyzer tests must fail if expected guard functions or edges are missing; CI job ensures determinism.
|
||||
- See `docs/reachability/patch-oracles.md` for fixture layout and manifest schema.
|
||||
|
||||
### 5.9 JS/PHP framework reachability
|
||||
|
||||
- Model framework entrypoints explicitly: Express/Fastify/Nest handlers, Laravel/Symfony routes/commands/hooks. Generate graph roots from route/handler catalogs instead of generic `main` only.
|
||||
- Represent dynamic import/require/include resolution as graph nodes so ambiguity stays visible (`resolution` edges with confidence).
|
||||
- Keep multi-layer graphs: source-level (TS/JS/PHP) plus bundled output (Webpack/Vite). Merge with runtime hints when available.
|
||||
- Status model: `always_reachable`, `conditional`, `not_reachable`, `not_analyzed`, `ambiguous`, each with confidence and evidence tags.
|
||||
- Deliver language-specific profiles + fixture cases to prove coverage; update CLI/UI explainers to show framework route context.
|
||||
|
||||
---
|
||||
|
||||
## 6. Acceptance Tests
|
||||
@@ -109,6 +142,10 @@ Each sprint is two weeks; refer to `docs/implplan/SPRINT_401_reachability_eviden
|
||||
- [Reachability runtime runbook](../runbooks/reachability-runtime.md) now documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
|
||||
- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
|
||||
- [Reachability lattice](lattice.md) describes the confidence states, evidence/mitigation kinds, scoring policy, event graph schema, and VEX gates; update it when lattices or probes change.
|
||||
- [PURL-resolved edges spec](purl-resolved-edges.md) defines the purl + symbol-digest annotation rules for graphs and SBOM joins.
|
||||
- [Patch-oracles QA pattern](patch-oracles.md) describes the fixture layout and expectations for binary reachability guards.
|
||||
- [Unknowns registry](../signals/unknowns-registry.md) documents how unresolved symbols/edges are recorded and how scoring uses `unknowns_pressure`.
|
||||
- [Evidence schema](evidence-schema.md) is the canonical field list for richgraph, runtime facts, and Unknowns CAS objects.
|
||||
- Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.
|
||||
|
||||
---
|
||||
|
||||
86
docs/reachability/evidence-schema.md
Normal file
86
docs/reachability/evidence-schema.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Reachability Evidence Schema (Draft v1, Nov 2026)
|
||||
|
||||
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
|
||||
|
||||
## 1. Core identifiers
|
||||
|
||||
- `symbol_id`: canonical ID for a function/symbol; includes `{format, build_id?, file_hash?, section?, addr, length}` plus optional `code_block_hash`. Always deterministic and lowercase.
|
||||
- `code_id`: `{format, build_id?, file_hash?, start, length, code_block_hash?}`; used when symbol names are absent.
|
||||
- `symbol_digest`: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.
|
||||
- `purl`: package URL of the owning component (from SBOM resolver); `pkg:unknown` when unresolved.
|
||||
|
||||
## 2. Graph payload (`richgraph-v1` additions)
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"id": "sym:sha256:...",
|
||||
"symbol_id": "func:ELF:sha256:...",
|
||||
"code_id": "code:ELF:sha256:...",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
|
||||
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
|
||||
"build_id": "a1b2c3...",
|
||||
"lang": "c",
|
||||
"evidence": ["dwarf", "dynsym"],
|
||||
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:sha256:caller",
|
||||
"to": "sym:sha256:callee",
|
||||
"kind": "direct|plt|indirect|runtime",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
|
||||
"symbol_digest": "sha256:...", // callee digest
|
||||
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
|
||||
"confidence": 0.92,
|
||||
"evidence": ["import", "reloc@GOT"]
|
||||
}
|
||||
],
|
||||
"roots": [
|
||||
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
|
||||
{ "id": "main", "phase": "runtime" }
|
||||
],
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
## 3. Runtime facts (Signals ingestion)
|
||||
|
||||
Fields per NDJSON event:
|
||||
|
||||
- `symbolId` (required), `codeId`, `symbolDigest?`, `purl?`
|
||||
- `hitCount`, `observedAt`, `loaderBase`, `processId`, `processName`, `containerId`, `socketAddress?`
|
||||
- `callgraphId` or `scanId`, plus `evidenceUri` (CAS) if trace stored externally
|
||||
- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
|
||||
|
||||
## 4. Unknowns registry payload
|
||||
|
||||
See `docs/signals/unknowns-registry.md`; reachability producers emit Unknowns when:
|
||||
- symbol→purl unresolved,
|
||||
- call edge target unresolved,
|
||||
- build-id missing for ELF and file hash used instead.
|
||||
|
||||
Unknowns must include `unknown_type`, `scope`, `provenance`, `confidence.p`, and `labels`.
|
||||
|
||||
## 5. CAS layout
|
||||
|
||||
- Graphs: `cas://reachability/graphs/{blake3}` (canonical JSON, sorted keys/arrays)
|
||||
- Runtime traces: `cas://reachability/runtime/{sha256}`
|
||||
- Unknowns evidence (optional large blobs): `cas://unknowns/{sha256}`
|
||||
|
||||
Metadata for each CAS object: `{ schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }`. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
|
||||
|
||||
## 6. Validation rules
|
||||
|
||||
- All edges must carry either `purl` or `candidates[]`; never leave both empty.
|
||||
- If `build_id` present, `symbol_id` and `code_id` must store it; if absent, record `build_id_source: "FileHash"`.
|
||||
- Evidence arrays sorted; confidence in [0,1].
|
||||
- Roots must include load-time constructors when present.
|
||||
|
||||
## 7. Acceptance checklist
|
||||
|
||||
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
|
||||
- CAS writers enforce canonicalization before hashing.
|
||||
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.
|
||||
@@ -26,6 +26,11 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
|
||||
| Replay/DSSE coverage | Replay manifests don’t enforce hash/CAS registration for graphs/traces. | Sprint 400 `REPLAY-REACH-201-005`, Sprint 401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
|
||||
| Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint 401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
|
||||
| Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint 401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
|
||||
| Build-id propagation | Build-id not consistently captured or threaded into `SymbolID`/`code_id`; SBOM/runtime joins are brittle. | Sprint 401 `SCANNER-BUILDID-401-035` | Capture `.note.gnu.build-id`, include in code identity, expose in SBOM exports and runtime events. |
|
||||
| Load-time constructors as roots | Graph roots omit `.preinit_array`/`.init_array`/`_init`, missing load-time edges. | Sprint 401 `SCANNER-INITROOT-401-036` | Add synthetic roots with `phase=load`; include `DT_NEEDED` deps’ constructors. |
|
||||
| PURL-resolved edges | Call edges do not carry `purl` or `symbol_digest`, slowing SBOM joins. | Sprint 401 `GRAPH-PURL-401-034` | Annotate edges per `docs/reachability/purl-resolved-edges.md`; keep deterministic graph hash. |
|
||||
| Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint 0400 `SIGNALS-UNKNOWN-201-008` | Emit Unknowns records (see `docs/signals/unknowns-registry.md`) and feed `unknowns_pressure` into scoring. |
|
||||
| Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint 401 `QA-PORACLE-401-037` | Add paired vuln/fixed fixtures and expectations; wire to CI using `docs/reachability/patch-oracles.md`. |
|
||||
|
||||
---
|
||||
|
||||
@@ -78,6 +83,8 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
|
||||
|
||||
## 4. Schema & API Touchpoints
|
||||
|
||||
Authoritative field list lives in `docs/reachability/evidence-schema.md`; use it for DTOs and CAS writers.
|
||||
|
||||
The next implementation pass must cover the following documents/files (create them if missing):
|
||||
|
||||
1. `docs/data/evidence-schema.md` – authoritative schema for `{code_id, symbol, tool}` blocks.
|
||||
|
||||
69
docs/reachability/patch-oracles.md
Normal file
69
docs/reachability/patch-oracles.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Patch-Oracles QA Pattern (Nov 2026)
|
||||
|
||||
Patch oracles are paired vulnerable/fixed binaries that prove our analyzers can see the function and call-edge deltas introduced by real CVE fixes. This file replaces earlier advisory text; use it directly when adding tests.
|
||||
|
||||
## 1. Workflow (per CVE)
|
||||
|
||||
1) Pick a CVE with a small, clean fix (e.g., OpenSSL, zlib, BusyBox). Identify vulnerable commit `A` and fixed commit `B`.
|
||||
2) Build two stripped binaries (`vuln`, `fixed`) with identical toolchains/flags; keep a tiny harness that exercises the affected path.
|
||||
3) Run Scanner binary analyzers to emit `richgraph-v1` for each binary.
|
||||
4) Diff graphs: expect new/removed functions and edges to match the patch (e.g., `foo_parse -> validate_len` added; `foo_parse -> memcpy` removed).
|
||||
5) Fail the test if expected functions/edges are absent or unchanged.
|
||||
|
||||
## 2. Oracle manifest (YAML)
|
||||
|
||||
```yaml
|
||||
cve: CVE-YYYY-XXXX
|
||||
target: libfoo 1.2.3
|
||||
build:
|
||||
cc: clang
|
||||
cflags: [-O2, -fno-omit-frame-pointer]
|
||||
ldflags: []
|
||||
strip: true
|
||||
expect:
|
||||
functions_added: [validate_len]
|
||||
functions_removed: [unsafe_copy]
|
||||
edges_added:
|
||||
- { caller: foo_parse, callee: validate_len }
|
||||
edges_removed:
|
||||
- { caller: foo_parse, callee: memcpy }
|
||||
tolerances:
|
||||
allow_unresolved_symbols: 0
|
||||
allow_extra_funcs: 2
|
||||
```
|
||||
|
||||
Place manifests under `tests/reachability/patch-oracles/<cve>/oracle.yml` next to the sources/build scripts.
|
||||
|
||||
## 3. Repository layout
|
||||
|
||||
```
|
||||
tests/reachability/patch-oracles/
|
||||
CVE-YYYY-XXXX-foo/
|
||||
src/ # vuln + fixed sources + harness
|
||||
build.sh # produces ./out/vuln ./out/fixed
|
||||
oracle.yml
|
||||
```
|
||||
|
||||
## 4. Harness rules
|
||||
|
||||
- Output binaries to `out/vuln` and `out/fixed` with deterministic flags and stripped symbols.
|
||||
- Record toolchain version in a sidecar `build-meta.json` so Replay captures provenance.
|
||||
- Never download from the internet during CI; vendor tiny sources into the fixture folder.
|
||||
|
||||
## 5. Test runner expectations
|
||||
|
||||
- Runs Scanner binary analyzers on both binaries; emits `richgraph-v1` CAS entries.
|
||||
- Compares graphs against `oracle.yml` expectations (functions/edges added/removed, tolerances).
|
||||
- Fails when deltas are missing; succeeds when expected guards/edges are present.
|
||||
|
||||
## 6. Integration points
|
||||
|
||||
- **Scanner**: add fixture runner under `tests/reachability/StellaOps.Scanner.Binary.PatchOracleTests`.
|
||||
- **CI**: wire into reachbench/patch-oracles job; ensure artifacts are small and deterministic.
|
||||
- **Docs**: link this file from reachability delivery guide once tests are live.
|
||||
|
||||
## 7. Acceptance criteria
|
||||
|
||||
- At least three seed oracles (e.g., zlib overflow, OpenSSL length guard, BusyBox ash fix) committed with passing expectations.
|
||||
- CI job proves deterministic hashes across reruns.
|
||||
- Failures emit clear diffs (`expected edge foo->validate_len missing`).
|
||||
51
docs/reachability/purl-resolved-edges.md
Normal file
51
docs/reachability/purl-resolved-edges.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# PURL-Resolved Callgraph Edges (Nov 2026)
|
||||
|
||||
This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
|
||||
|
||||
## 1. Goal
|
||||
|
||||
Annotate every call edge in `richgraph-v1` with:
|
||||
|
||||
- `purl` of the component that defines the callee, and
|
||||
- a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint).
|
||||
|
||||
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
|
||||
|
||||
## 2. Data model additions
|
||||
|
||||
- **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
|
||||
- **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly.
|
||||
- **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata.
|
||||
|
||||
## 3. Producer rules
|
||||
|
||||
1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence.
|
||||
2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
|
||||
3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry).
|
||||
4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON.
|
||||
|
||||
## 4. Consumer rules
|
||||
|
||||
- **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays.
|
||||
- **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl.
|
||||
- **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present.
|
||||
|
||||
## 5. SBOM join strategy
|
||||
|
||||
1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`.
|
||||
2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
|
||||
3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge.
|
||||
|
||||
## 6. Acceptance tests
|
||||
|
||||
- Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly.
|
||||
- Disassembly: direct `call` to internal function carries `purl` of the hosting binary’s SBOM entry.
|
||||
- Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`.
|
||||
- Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
|
||||
|
||||
## 7. Deliverables
|
||||
|
||||
- Update `richgraph-v1` schema and DTOs (Scanner + Signals).
|
||||
- Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests.
|
||||
- CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest.
|
||||
- Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.
|
||||
Reference in New Issue
Block a user