prep docs and service updates
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
master
2025-11-21 06:56:36 +00:00
parent ca35db9ef4
commit d519782a8f
242 changed files with 17293 additions and 13367 deletions

View File

@@ -45,6 +45,7 @@ This guide translates the deterministic reachability blueprint into concrete wor
| **Authority attestations** | Authority + Signer | DSSE predicates for SBOM, Graph, Replay, VEX; Rekor mirror alignment |
| **Policy & VEX** | Policy Engine + Web + CLI + UI | Accept reachability states, render “Why safe” call paths, CLI/UI explain flows |
| **QA & Docs** | QA + Docs Guilds | `reachbench-2025-expanded` fixtures wired to CI; operator + developer runbooks |
| **Binary quality guardrails (Nov 2026)** | Scanner · Signals · QA | Build-id capture, init-array roots, purl-resolved edges, unknowns emission, and patch-oracle fixtures; see sections 5.75.9 |
---
@@ -90,6 +91,38 @@ Each sprint is two weeks; refer to `docs/implplan/SPRINT_401_reachability_eviden
3. **UI/CLI** Visual explain drawer/CLI command showing signed call-path, predicates, runtime hits; counterfactual toggles.
4. **VEX emitter** generate OpenVEX statements with evidence references, DSSE sign via Signer.
### 5.5 Native binaries (build-id + init roots)
- Capture ELF build-id (`.note.gnu.build-id`) alongside soname/path and propagate into `SymbolID`/`code_id` so SBOM/runtime joins stay stable even when paths change.
- Treat `.preinit_array`, `.init_array`, `.ctors`, and `_init` as synthetic graph roots with `phase=load`; include constructors from `DT_NEEDED` deps. Persist the root list in scan evidence.
- Add deterministic tests covering build-id present/absent and init-array edge creation.
### 5.6 PURL-resolved edges
- Annotate every call edge with callee `purl` and `symbol_digest` per `docs/reachability/purl-resolved-edges.md`.
- Update `richgraph-v1` schema, CAS metadata, and CLI/UI explainers to display `purl@version` + demangled name.
- Signals merges graphs by `(purl, symbol_digest)`; Policy uses the same keys when mapping CVE-affected functions.
### 5.7 Unknowns Registry integration
- Emit structured Unknowns when symbol→purl mapping, edge targets, or hashes are ambiguous; write them via Signals API per `docs/signals/unknowns-registry.md`.
- Scoring adds `unknowns_pressure` so `not_affected` claims cannot bypass unresolved evidence.
- UI/CLI should surface unknown chips and triage actions.
### 5.8 Patch-oracle guardrails
- Add `tests/reachability/patch-oracles/**` with paired vuln/fixed binaries and `oracle.yml` expectations (functions/edges added/removed).
- Scanner binary analyzer tests must fail if expected guard functions or edges are missing; CI job ensures determinism.
- See `docs/reachability/patch-oracles.md` for fixture layout and manifest schema.
### 5.9 JS/PHP framework reachability
- Model framework entrypoints explicitly: Express/Fastify/Nest handlers, Laravel/Symfony routes/commands/hooks. Generate graph roots from route/handler catalogs instead of generic `main` only.
- Represent dynamic import/require/include resolution as graph nodes so ambiguity stays visible (`resolution` edges with confidence).
- Keep multi-layer graphs: source-level (TS/JS/PHP) plus bundled output (Webpack/Vite). Merge with runtime hints when available.
- Status model: `always_reachable`, `conditional`, `not_reachable`, `not_analyzed`, `ambiguous`, each with confidence and evidence tags.
- Deliver language-specific profiles + fixture cases to prove coverage; update CLI/UI explainers to show framework route context.
---
## 6. Acceptance Tests
@@ -109,6 +142,10 @@ Each sprint is two weeks; refer to `docs/implplan/SPRINT_401_reachability_eviden
- [Reachability runtime runbook](../runbooks/reachability-runtime.md) now documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
- [Reachability lattice](lattice.md) describes the confidence states, evidence/mitigation kinds, scoring policy, event graph schema, and VEX gates; update it when lattices or probes change.
- [PURL-resolved edges spec](purl-resolved-edges.md) defines the purl + symbol-digest annotation rules for graphs and SBOM joins.
- [Patch-oracles QA pattern](patch-oracles.md) describes the fixture layout and expectations for binary reachability guards.
- [Unknowns registry](../signals/unknowns-registry.md) documents how unresolved symbols/edges are recorded and how scoring uses `unknowns_pressure`.
- [Evidence schema](evidence-schema.md) is the canonical field list for richgraph, runtime facts, and Unknowns CAS objects.
- Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.
---

View File

@@ -0,0 +1,86 @@
# Reachability Evidence Schema (Draft v1, Nov 2026)
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
## 1. Core identifiers
- `symbol_id`: canonical ID for a function/symbol; includes `{format, build_id?, file_hash?, section?, addr, length}` plus optional `code_block_hash`. Always deterministic and lowercase.
- `code_id`: `{format, build_id?, file_hash?, start, length, code_block_hash?}`; used when symbol names are absent.
- `symbol_digest`: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.
- `purl`: package URL of the owning component (from SBOM resolver); `pkg:unknown` when unresolved.
## 2. Graph payload (`richgraph-v1` additions)
```jsonc
{
"nodes": [
{
"id": "sym:sha256:...",
"symbol_id": "func:ELF:sha256:...",
"code_id": "code:ELF:sha256:...",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
"build_id": "a1b2c3...",
"lang": "c",
"evidence": ["dwarf", "dynsym"],
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
}
],
"edges": [
{
"from": "sym:sha256:caller",
"to": "sym:sha256:callee",
"kind": "direct|plt|indirect|runtime",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
"symbol_digest": "sha256:...", // callee digest
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
"confidence": 0.92,
"evidence": ["import", "reloc@GOT"]
}
],
"roots": [
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
{ "id": "main", "phase": "runtime" }
],
"graph_hash": "blake3:..."
}
```
## 3. Runtime facts (Signals ingestion)
Fields per NDJSON event:
- `symbolId` (required), `codeId`, `symbolDigest?`, `purl?`
- `hitCount`, `observedAt`, `loaderBase`, `processId`, `processName`, `containerId`, `socketAddress?`
- `callgraphId` or `scanId`, plus `evidenceUri` (CAS) if trace stored externally
- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
## 4. Unknowns registry payload
See `docs/signals/unknowns-registry.md`; reachability producers emit Unknowns when:
- symbol→purl unresolved,
- call edge target unresolved,
- build-id missing for ELF and file hash used instead.
Unknowns must include `unknown_type`, `scope`, `provenance`, `confidence.p`, and `labels`.
## 5. CAS layout
- Graphs: `cas://reachability/graphs/{blake3}` (canonical JSON, sorted keys/arrays)
- Runtime traces: `cas://reachability/runtime/{sha256}`
- Unknowns evidence (optional large blobs): `cas://unknowns/{sha256}`
Metadata for each CAS object: `{ schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }`. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
## 6. Validation rules
- All edges must carry either `purl` or `candidates[]`; never leave both empty.
- If `build_id` present, `symbol_id` and `code_id` must store it; if absent, record `build_id_source: "FileHash"`.
- Evidence arrays sorted; confidence in [0,1].
- Roots must include load-time constructors when present.
## 7. Acceptance checklist
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
- CAS writers enforce canonicalization before hashing.
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.

View File

@@ -26,6 +26,11 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
| Replay/DSSE coverage | Replay manifests dont enforce hash/CAS registration for graphs/traces. | Sprint400 `REPLAY-REACH-201-005`, Sprint401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
| Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
| Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
| Build-id propagation | Build-id not consistently captured or threaded into `SymbolID`/`code_id`; SBOM/runtime joins are brittle. | Sprint401 `SCANNER-BUILDID-401-035` | Capture `.note.gnu.build-id`, include in code identity, expose in SBOM exports and runtime events. |
| Load-time constructors as roots | Graph roots omit `.preinit_array`/`.init_array`/`_init`, missing load-time edges. | Sprint401 `SCANNER-INITROOT-401-036` | Add synthetic roots with `phase=load`; include `DT_NEEDED` deps constructors. |
| PURL-resolved edges | Call edges do not carry `purl` or `symbol_digest`, slowing SBOM joins. | Sprint401 `GRAPH-PURL-401-034` | Annotate edges per `docs/reachability/purl-resolved-edges.md`; keep deterministic graph hash. |
| Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint0400 `SIGNALS-UNKNOWN-201-008` | Emit Unknowns records (see `docs/signals/unknowns-registry.md`) and feed `unknowns_pressure` into scoring. |
| Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint401 `QA-PORACLE-401-037` | Add paired vuln/fixed fixtures and expectations; wire to CI using `docs/reachability/patch-oracles.md`. |
---
@@ -78,6 +83,8 @@ Out of scope: implementing disassemblers or symbol servers; those will be handle
## 4. Schema & API Touchpoints
Authoritative field list lives in `docs/reachability/evidence-schema.md`; use it for DTOs and CAS writers.
The next implementation pass must cover the following documents/files (create them if missing):
1. `docs/data/evidence-schema.md` authoritative schema for `{code_id, symbol, tool}` blocks.

View File

@@ -0,0 +1,69 @@
# Patch-Oracles QA Pattern (Nov 2026)
Patch oracles are paired vulnerable/fixed binaries that prove our analyzers can see the function and call-edge deltas introduced by real CVE fixes. This file replaces earlier advisory text; use it directly when adding tests.
## 1. Workflow (per CVE)
1) Pick a CVE with a small, clean fix (e.g., OpenSSL, zlib, BusyBox). Identify vulnerable commit `A` and fixed commit `B`.
2) Build two stripped binaries (`vuln`, `fixed`) with identical toolchains/flags; keep a tiny harness that exercises the affected path.
3) Run Scanner binary analyzers to emit `richgraph-v1` for each binary.
4) Diff graphs: expect new/removed functions and edges to match the patch (e.g., `foo_parse -> validate_len` added; `foo_parse -> memcpy` removed).
5) Fail the test if expected functions/edges are absent or unchanged.
## 2. Oracle manifest (YAML)
```yaml
cve: CVE-YYYY-XXXX
target: libfoo 1.2.3
build:
cc: clang
cflags: [-O2, -fno-omit-frame-pointer]
ldflags: []
strip: true
expect:
functions_added: [validate_len]
functions_removed: [unsafe_copy]
edges_added:
- { caller: foo_parse, callee: validate_len }
edges_removed:
- { caller: foo_parse, callee: memcpy }
tolerances:
allow_unresolved_symbols: 0
allow_extra_funcs: 2
```
Place manifests under `tests/reachability/patch-oracles/<cve>/oracle.yml` next to the sources/build scripts.
## 3. Repository layout
```
tests/reachability/patch-oracles/
CVE-YYYY-XXXX-foo/
src/ # vuln + fixed sources + harness
build.sh # produces ./out/vuln ./out/fixed
oracle.yml
```
## 4. Harness rules
- Output binaries to `out/vuln` and `out/fixed` with deterministic flags and stripped symbols.
- Record toolchain version in a sidecar `build-meta.json` so Replay captures provenance.
- Never download from the internet during CI; vendor tiny sources into the fixture folder.
## 5. Test runner expectations
- Runs Scanner binary analyzers on both binaries; emits `richgraph-v1` CAS entries.
- Compares graphs against `oracle.yml` expectations (functions/edges added/removed, tolerances).
- Fails when deltas are missing; succeeds when expected guards/edges are present.
## 6. Integration points
- **Scanner**: add fixture runner under `tests/reachability/StellaOps.Scanner.Binary.PatchOracleTests`.
- **CI**: wire into reachbench/patch-oracles job; ensure artifacts are small and deterministic.
- **Docs**: link this file from reachability delivery guide once tests are live.
## 7. Acceptance criteria
- At least three seed oracles (e.g., zlib overflow, OpenSSL length guard, BusyBox ash fix) committed with passing expectations.
- CI job proves deterministic hashes across reruns.
- Failures emit clear diffs (`expected edge foo->validate_len missing`).

View File

@@ -0,0 +1,51 @@
# PURL-Resolved Callgraph Edges (Nov 2026)
This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
## 1. Goal
Annotate every call edge in `richgraph-v1` with:
- `purl` of the component that defines the callee, and
- a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint).
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
## 2. Data model additions
- **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
- **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly.
- **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata.
## 3. Producer rules
1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence.
2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry).
4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON.
## 4. Consumer rules
- **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays.
- **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl.
- **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present.
## 5. SBOM join strategy
1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`.
2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge.
## 6. Acceptance tests
- Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly.
- Disassembly: direct `call` to internal function carries `purl` of the hosting binarys SBOM entry.
- Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`.
- Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
## 7. Deliverables
- Update `richgraph-v1` schema and DTOs (Scanner + Signals).
- Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests.
- CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest.
- Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.