docs consolidation and others
This commit is contained in:
202
docs/modules/reach-graph/guides/DELIVERY_GUIDE.md
Normal file
202
docs/modules/reach-graph/guides/DELIVERY_GUIDE.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Reachability Evidence Delivery Guide
|
||||
|
||||
_Last updated: November 8, 2025. Owner: Reachability Tiger Team (Scanner, Signals, Replay, Policy, Authority, UI)._
|
||||
|
||||
This guide translates the deterministic reachability blueprint into concrete work streams that average contributors can pick up without re-reading the entire proposal. Use it as the single navigation point when you land a reachability ticket. For a task-centric view of remaining gaps, see `docs/modules/reach-graph/guides/REACHABILITY_GAP_TASKS.md`.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope & Principles
|
||||
|
||||
**Goal**: ship a verifiable reachability signal for every scan by chaining SBOM → graph → runtime facts → VEX into DSSE-attested, replayable evidence.
|
||||
|
||||
**Principles**
|
||||
|
||||
1. **Deterministic inputs** – canonical IDs, sorted payloads, normalized timestamps.
|
||||
2. **Provable facts** – every artifact has a DSSE envelope anchored in Authority + Rekor mirror.
|
||||
3. **Replay-first** – manifests pin feed snapshots, analyzer digests, and policies so auditors can rerun.
|
||||
4. **Least surprise** – same API and file layouts across languages; tests run fixture packs at CI time.
|
||||
|
||||
---
|
||||
|
||||
## 2. Evidence Chain Overview
|
||||
|
||||
| Stage | Producer | Artifact | Requirements |
|
||||
|-------|----------|----------|--------------|
|
||||
| SBOM per layer & composed image | Scanner Worker + Sbomer | `sbom.layer.cdx.json`, `sbom.image.cdx.json` | Deterministic CycloneDX 1.6, DSSE envelope, CAS URI |
|
||||
| Static reachability graph | Scanner Worker lifters (DotNet, Go, Node/Deno, Rust, Swift, JVM, Binary, Shell) | `richgraph-v1.json` + `sha256` | Canonical SymbolIDs, framework entries, predicates, graph hash |
|
||||
| Runtime facts | Zastava Observer / runtime probes | `runtime-trace.ndjson` (gzip or JSON) | EntryTrace schema, CAS pointer, process/socket/container metadata, optional compression |
|
||||
| Replay manifest | Scanner Worker + Replay Core | `replay.yaml` | Contains analyzer versions, feed locks, graph hash, runtime trace digests |
|
||||
| VEX statements | Scanner WebService + Policy Engine | `reachability.json` + OpenVEX doc | Links SBOM attn, graph attn, runtime evidence IDs |
|
||||
| Signed bundle | Authority + Signer | DSSE envelope referencing above | Support FIPS + PQ variants (Dilithium where required) |
|
||||
|
||||
---
|
||||
|
||||
## 3. Work Streams (modules + hand-offs)
|
||||
|
||||
| Stream | Owner Guild(s) | Key deliverables |
|
||||
|--------|----------------|------------------|
|
||||
| **Native symbols & callgraphs** | Scanner Worker · Symbols Guild | Ship `Scanner.Symbols.Native` + `Scanner.CallGraph.Native`, integrate Symbol Manifest v1, demangle Itanium/MSVC names, emit `FuncNode`/`CallEdge` CAS bundles (task `SCANNER-NATIVE-401-015`). |
|
||||
| **Reachability store** | Signals · BE-Base Platform | Provision shared PostgreSQL tables (`func_nodes`, `call_edges`, `cve_func_hits`), indexes, and repositories plus REST hooks for reuse (task `SIG-STORE-401-016`). |
|
||||
| **Language lifters** | Scanner Worker | CLI/hosted lifters for DotNet, Go, Node/Deno, JVM, Rust, Swift, Binary, Shell with CAS uploads and richgraph output |
|
||||
| **Signals ingestion & scoring** | Signals | `/callgraphs`, `/runtime-facts` (JSON + NDJSON/gzip), `/graphs/{id}`, `/reachability/recompute` GA; CAS-backed storage, runtime dedupe, BFS+predicates scoring |
|
||||
| **Runtime capture** | Zastava + Runtime Guild | EntryTrace/eBPF samplers, NDJSON batches (symbol IDs + timestamps + counts) |
|
||||
| **Replay evidence** | Replay Core + Scanner Worker | Manifest schema v2, `ReachabilityReplayWriter` integration, hash-lock tests |
|
||||
| **Authority attestations** | Authority + Signer | DSSE predicates for SBOM, Graph, Replay, VEX; Rekor mirror alignment |
|
||||
| **Policy & VEX** | Policy Engine + Web + CLI + UI | Accept reachability states, render “Why safe” call paths, CLI/UI explain flows |
|
||||
| **QA & Docs** | QA + Docs Guilds | `reachbench-2025-expanded` fixtures wired to CI; operator + developer runbooks |
|
||||
| **Binary quality guardrails (Nov 2026)** | Scanner · Signals · QA | Build-id capture, init-array roots, purl-resolved edges, unknowns emission, and patch-oracle fixtures; see sections 5.7–5.9 |
|
||||
|
||||
---
|
||||
|
||||
## 4. Sprint Targets
|
||||
|
||||
| Sprint | Nickname | Focus | Exit Criteria |
|
||||
|--------|----------|-------|---------------|
|
||||
| **401** | Evidence Pipeline | Finish static lifters + CAS graph storage + runtime ingestion endpoint | Graph CAS layout documented, lifter fixtures passing, `/runtime-facts` receives NDJSON batches |
|
||||
| **402** | Replay & Attest | Manifest v2, DSSE envelopes, Authority/Rekor publishing | Replay packs include hashes + analyzer fingerprint; DSSE statements passed integration; Rekor mirror updated |
|
||||
| **403** | Policy & Explain | VEX generation, SPL predicates, UI/CLI explainers | Policy engine uses reachability states, CLI `stella graph explain` returns signed paths, UI shows explain drawer |
|
||||
|
||||
Each sprint is two weeks; refer to `docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md` (new) for per-task tracking.
|
||||
|
||||
---
|
||||
|
||||
## 5. Task Breakdown Cheat Sheet
|
||||
|
||||
### 5.1 Scanner Worker
|
||||
|
||||
1. **Lifter SDK** – Define `RichGraphWriter`, canonical SymbolID helpers, analyzer interface updates.
|
||||
2. **Language passes** – deliverables per language: discovery, graph build, framework wiring, predicate extraction, runtime overlay.
|
||||
3. **Replay hooks** – plug lifter output + runtime traces into `ReachabilityReplayWriter`; enforce CAS registration before emitting manifest references.
|
||||
4. **Fixture runs** – add tests under `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` to execute lifter outputs against reachbench A/B cases.
|
||||
|
||||
### 5.2 Signals Service
|
||||
|
||||
1. **Callgraph CAS layout** – migrate from filesystem to CAS (`cas://reachability/graphs/{hash}`), include metadata doc.
|
||||
2. **Runtime facts API** – accept NDJSON or gzip, dedupe events, compute hit stats, link to graph nodes.
|
||||
3. **Scoring engine v2** – support multi-state lattice (`Unknown → Observed`), record predicates, blocked edges, runtime evidence CAS URIs.
|
||||
4. **API responses** – `/graphs/{scanId}` returns graph CAS refs + manifest pointers; `/reachability/recompute` accepts replay manifest IDs.
|
||||
|
||||
### 5.3 Replay Core & Authority
|
||||
|
||||
1. **Manifest schema v2** – YAML + JSON versions, includes feeds/analyzers/policies.
|
||||
2. **CAS naming** – standardize `cas://reachability/{kind}/{sha256}`.
|
||||
3. **DSSE predicate types** – `SbomAttestation`, `GraphAttestation`, `VexAttestation`, `ReplayManifest`.
|
||||
4. **Authority integration** – new endpoints for submitting reachability predicates, rotation tests, Rekor mirror update instructions.
|
||||
|
||||
### 5.4 Policy / Web / UI / CLI
|
||||
|
||||
1. **Policy Engine** – ingest reachability fact from Signals, expose via SPL, produce metrics, integrate into explanation tree.
|
||||
2. **Web API** – join reachability fields in vuln responses, add override endpoints, simulate support.
|
||||
3. **UI/CLI** – Visual explain drawer/CLI command showing signed call-path, predicates, runtime hits; counterfactual toggles.
|
||||
4. **VEX emitter** – generate OpenVEX statements with evidence references, DSSE sign via Signer.
|
||||
|
||||
### 5.5 Native binaries (build-id + init roots)
|
||||
|
||||
- Capture ELF build-id (`.note.gnu.build-id`) alongside soname/path and propagate into `SymbolID`/`code_id` so SBOM/runtime joins stay stable even when paths change.
|
||||
- Treat `.preinit_array`, `.init_array`, `.ctors`, and `_init` as synthetic graph roots with `phase=load`; include constructors from `DT_NEEDED` deps. Persist the root list in scan evidence.
|
||||
- Add deterministic tests covering build-id present/absent and init-array edge creation.
|
||||
|
||||
### 5.6 PURL-resolved edges
|
||||
|
||||
- Annotate every call edge with callee `purl` and `symbol_digest` per `docs/modules/reach-graph/guides/purl-resolved-edges.md`.
|
||||
- Update `richgraph-v1` schema, CAS metadata, and CLI/UI explainers to display `purl@version` + demangled name.
|
||||
- Signals merges graphs by `(purl, symbol_digest)`; Policy uses the same keys when mapping CVE-affected functions.
|
||||
|
||||
### 5.7 Unknowns Registry integration
|
||||
|
||||
- Emit structured Unknowns when symbol->purl mapping, edge targets, or hashes are ambiguous; write them via Signals API per `docs/modules/signals/guides/unknowns-registry.md`.
|
||||
- Scoring adds `unknowns_pressure` so `not_affected` claims cannot bypass unresolved evidence.
|
||||
- UI/CLI should surface unknown chips and triage actions.
|
||||
|
||||
### 5.8 Patch-oracle guardrails
|
||||
|
||||
- Add `tests/reachability/patch-oracles/**` with paired vuln/fixed binaries and `oracle.yml` expectations (functions/edges added/removed).
|
||||
- Scanner binary analyzer tests must fail if expected guard functions or edges are missing; CI job ensures determinism.
|
||||
- See `docs/modules/reach-graph/guides/patch-oracles.md` for fixture layout and manifest schema.
|
||||
|
||||
### 5.9 JS/PHP framework reachability
|
||||
|
||||
- Model framework entrypoints explicitly: Express/Fastify/Nest handlers, Laravel/Symfony routes/commands/hooks. Generate graph roots from route/handler catalogs instead of generic `main` only.
|
||||
- Represent dynamic import/require/include resolution as graph nodes so ambiguity stays visible (`resolution` edges with confidence).
|
||||
- Keep multi-layer graphs: source-level (TS/JS/PHP) plus bundled output (Webpack/Vite). Merge with runtime hints when available.
|
||||
- Status model: `always_reachable`, `conditional`, `not_reachable`, `not_analyzed`, `ambiguous`, each with confidence and evidence tags.
|
||||
- Deliver language-specific profiles + fixture cases to prove coverage; update CLI/UI explainers to show framework route context.
|
||||
|
||||
### 5.10 Vulnerability Surfaces (Sprint 3700)
|
||||
|
||||
Vulnerability surfaces identify **which specific methods changed** in a security fix, enabling precise reachability analysis:
|
||||
|
||||
- **Surface computation**: Download vulnerable and fixed package versions, fingerprint all methods, diff to find changed methods (sinks).
|
||||
- **Trigger extraction**: Build internal call graphs, reverse BFS from sinks to public APIs (triggers).
|
||||
- **Per-ecosystem support**:
|
||||
- NuGet: Cecil IL fingerprinting
|
||||
- npm: Babel AST fingerprinting
|
||||
- Maven: ASM bytecode fingerprinting
|
||||
- PyPI: Python AST fingerprinting
|
||||
- **Integration**: `ISurfaceQueryService` queries triggers during scan; use triggers as sinks instead of all package methods.
|
||||
- **Storage**: `scanner.vuln_surfaces`, `scanner.vuln_surface_sinks`, `scanner.vuln_surface_triggers` tables.
|
||||
- **Docs**: `docs/contracts/vuln-surface-v1.md` for schema details.
|
||||
|
||||
### 5.11 Confidence Tiers
|
||||
|
||||
Reachability findings are classified into confidence tiers:
|
||||
|
||||
| Tier | Condition | Display | Implications |
|
||||
|------|-----------|---------|--------------|
|
||||
| **Confirmed** | Surface exists AND trigger method is reachable | Red badge | Highest confidence—vulnerable code definitely called |
|
||||
| **Likely** | No surface but package API is called | Orange badge | Medium confidence—package used but specific vuln path unknown |
|
||||
| **Present** | No call graph, dependency in SBOM | Gray badge | Lowest confidence—cannot determine reachability |
|
||||
| **Unreachable** | Surface exists AND no trigger reachable | Green badge | High confidence vulnerability is not exploitable |
|
||||
|
||||
- Tier assignment logic in `SurfaceAwareReachabilityAnalyzer`
|
||||
- API responses include `confidenceTier` and `confidenceDisplay`
|
||||
- UI badges reflect tier colors
|
||||
- VEX statements reference tier in justification
|
||||
|
||||
### 5.12 Reachability Drift (Sprint 3600)
|
||||
|
||||
Track function-level reachability changes between scans:
|
||||
|
||||
- **New reachable**: Sinks that became reachable (alert)
|
||||
- **Mitigated**: Sinks that became unreachable (positive)
|
||||
- **Causal attribution**: Why change occurred (guard removed, new route, code change)
|
||||
- **Components**: `DriftDetectionEngine`, `PathCompressor`, `DriftCauseExplainer`
|
||||
- **API**: `POST /api/drift/analyze`, `GET /api/drift/{id}`
|
||||
- **UI**: `PathViewerComponent`, `RiskDriftCardComponent`
|
||||
- **Attestation**: DSSE-signed drift predicates for evidence chain
|
||||
|
||||
---
|
||||
|
||||
## 6. Acceptance Tests
|
||||
|
||||
1. **Hash-lock** – reorder analyzer flags and confirm graph hash unchanged.
|
||||
2. **Replay** – delete caches, replay manifest, verify DSSE + hash equality.
|
||||
3. **Tamper** – alter single edge and expect VEX verification failure with specific path mismatch.
|
||||
4. **Golden corpus** – run all reachbench cases; ensure NotReachable vs Reachable twins align with expectations JSON.
|
||||
5. **Runtime sanity** – feed staged runtime traces and ensure confidence bump + `observed=true` path chips propagate to UI.
|
||||
|
||||
---
|
||||
|
||||
## 7. Documentation & Runbooks
|
||||
|
||||
- Place developer-facing updates here (`docs/modules/reach-graph/guides`).
|
||||
- [Function-level evidence guide](function-level-evidence.md) captures the Nov 2025 advisory scope, task references, and schema expectations; keep it in lockstep with sprint status.
|
||||
- [Reachability runtime runbook](../runbooks/reachability-runtime.md) documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
|
||||
- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
|
||||
- [Reachability lattice](lattice.md) describes the confidence states, evidence/mitigation kinds, scoring policy, event graph schema, and VEX gates; update it when lattices or probes change.
|
||||
- [PURL-resolved edges spec](purl-resolved-edges.md) defines the purl + symbol-digest annotation rules for graphs and SBOM joins.
|
||||
- [Patch-oracles QA pattern](patch-oracles.md) describes the fixture layout and expectations for binary reachability guards.
|
||||
- [Unknowns registry](../signals/unknowns-registry.md) documents how unresolved symbols/edges are recorded and how scoring uses `unknowns_pressure`.
|
||||
- [Evidence schema](evidence-schema.md) is the canonical field list for richgraph, runtime facts, and Unknowns CAS objects.
|
||||
- Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.
|
||||
|
||||
---
|
||||
|
||||
## 8. Contact & Rituals
|
||||
|
||||
- **Daily reachability stand-up** in `#reachability-build`.
|
||||
- **Fixture sync** every Friday: QA leads run reachbench matrix, post report to Confluence + link in `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`.
|
||||
- **Decision log** – Append ADRs under `docs/adr/reachability-*` for schema changes.
|
||||
|
||||
Keep this guide updated whenever scope shifts or a new sprint is added.
|
||||
44
docs/modules/reach-graph/guides/callgraph-formats.md
Normal file
44
docs/modules/reach-graph/guides/callgraph-formats.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Reachability Callgraph Formats (richgraph-v1)
|
||||
|
||||
## Purpose
|
||||
Normalize static callgraphs across languages so Signals can merge them with runtime traces and replay bundles deterministically.
|
||||
|
||||
## Core fields (per node/edge)
|
||||
- `nodes[].id` — canonical SymbolID (language-specific, stable, lowercase where applicable).
|
||||
- `nodes[].kind` — e.g., method/function/class/file.
|
||||
- `edges[].sourceId` / `edges[].targetId` — SymbolIDs; edge types include `call`, `import`, `inherit`, `reference`.
|
||||
- `artifact` — CAS paths for source graph files; include `sha256`, `uri`, optional `generator` (analyzer name/version).
|
||||
|
||||
## Language-specific notes
|
||||
- **JVM**: use JVM internal names; include signature for overloads.
|
||||
- **.NET/Roslyn**: fully-qualified method token; include assembly and module for cross-assembly edges.
|
||||
- **Go SSA**: package path + function; include receiver for methods.
|
||||
- **Node/Deno TS**: module path + exported symbol; ES module graph only.
|
||||
- **Rust MIR**: crate::module::symbol; monomorphized forms allowed if stable.
|
||||
- **Swift SIL**: mangled name; demangled kept in metadata only.
|
||||
- **Shell/binaries**: `SymbolID = sym:binary:{sha256(file)\0section\0addr\0name\0linkage}` via `SymbolId.ForBinaryAddressed`, include `code_id = CodeId.ForBinarySegment(...)` and set `kind=binary`.
|
||||
|
||||
## CAS layout
|
||||
- Store graph bundles under `reachability_graphs/<hh>/<sha>.tar.zst`.
|
||||
- Bundle SHOULD contain `meta.json` with analyzer, version, language, component, and entry points (array).
|
||||
- File order inside tar must be lexicographic to keep hashes stable.
|
||||
|
||||
## Validation rules
|
||||
- No duplicate node IDs; edges must reference existing nodes.
|
||||
- Entry points list must be present (even if empty) for Signals recompute.
|
||||
- Graph SHA256 must match tar content; Signals rejects mismatched SHA.
|
||||
- Only ASCII; UTF-8 paths are allowed but must be normalized (NFC).
|
||||
|
||||
## V1 Schema Reference
|
||||
|
||||
The `stella.callgraph.v1` schema provides enhanced fields for explainability:
|
||||
- **Edge Reasons**: 13 reason codes explaining why edges exist
|
||||
- **Symbol Visibility**: Public/Internal/Protected/Private access levels
|
||||
- **Typed Entrypoints**: Framework-aware entrypoint detection
|
||||
|
||||
See [Callgraph Schema Reference](../signals/callgraph-formats.md) for complete v1 schema documentation.
|
||||
|
||||
## References
|
||||
- **V1 Schema Reference**: `docs/modules/signals/guides/callgraph-formats.md`
|
||||
- Union schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
|
||||
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
|
||||
69
docs/modules/reach-graph/guides/corpus-plan.md
Normal file
69
docs/modules/reach-graph/guides/corpus-plan.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Reachability Corpus Plan (QA-CORPUS-401-031)
|
||||
|
||||
Objective
|
||||
- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
|
||||
- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.
|
||||
|
||||
## Corpus Map
|
||||
|
||||
### 1) Multi-runtime corpus (internal MVP)
|
||||
|
||||
Path: `tests/reachability/corpus/`
|
||||
|
||||
Per-case layout: `tests/reachability/corpus/<language>/<case>/`
|
||||
- `callgraph.static.json` — static call graph sample (stub for MVP).
|
||||
- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
|
||||
- `vex.openvex.json` — expected VEX slice for the case.
|
||||
- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
|
||||
|
||||
`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
|
||||
|
||||
### 2) Public mini dataset (PHP/JS/C#)
|
||||
|
||||
Path: `tests/reachability/samples-public/`
|
||||
|
||||
Layout:
|
||||
- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
|
||||
- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
|
||||
- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
|
||||
- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
|
||||
|
||||
### 3) Reachbench fixture pack (expanded, dual variants)
|
||||
|
||||
Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
|
||||
|
||||
Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
|
||||
|
||||
## Ground Truth Conventions
|
||||
|
||||
- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
|
||||
- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
|
||||
- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
|
||||
|
||||
## Determinism & Runners
|
||||
|
||||
Regenerate all reachability manifests (corpus + public samples + reachbench pack):
|
||||
- `tests/reachability/runners/run_all.sh`
|
||||
- `tests/reachability/runners/run_all.ps1`
|
||||
|
||||
Individual scripts:
|
||||
- `python tests/reachability/scripts/update_corpus_manifest.py`
|
||||
- `python tests/reachability/samples-public/scripts/update_manifest.py`
|
||||
- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
|
||||
|
||||
## CI Gates
|
||||
|
||||
- `tests/reachability/StellaOps.Reachability.FixtureTests`
|
||||
- validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
|
||||
- enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
|
||||
|
||||
## MVP Slice (stub cases)
|
||||
- Go: `go-ssh-CVE-2020-9283-keyexchange`
|
||||
- .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
|
||||
- Python: `python-django-CVE-2019-19844-sqli-like`
|
||||
- Rust: `rust-axum-header-parsing-TBD`
|
||||
|
||||
## Next Work (post-MVP)
|
||||
- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
|
||||
- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.
|
||||
|
||||
143
docs/modules/reach-graph/guides/cve-symbol-mapping.md
Normal file
143
docs/modules/reach-graph/guides/cve-symbol-mapping.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# CVE-to-Symbol Mapping
|
||||
|
||||
_Last updated: 2025-12-22. Owner: Scanner Guild + Concelier Guild._
|
||||
|
||||
This document describes how StellaOps maps CVE identifiers to specific binary symbols/functions for reachability slices.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
To determine if a vulnerability is reachable, StellaOps resolves:
|
||||
|
||||
- **CVE identifiers** (e.g., `CVE-2024-1234`)
|
||||
- **Package coordinates** (e.g., `pkg:npm/lodash@4.17.21`)
|
||||
- **Affected symbols** (e.g., `lodash.template`, `openssl:EVP_PKEY_decrypt`)
|
||||
|
||||
The mapping is used by `SliceExtractor` to target the right symbols and by downstream VEX decisions.
|
||||
|
||||
---
|
||||
|
||||
## 2. Data Sources
|
||||
|
||||
### 2.1 Patch Diff Surfaces (Preferred)
|
||||
|
||||
Highest-fidelity source: compute method-level diffs between vulnerable and fixed versions.
|
||||
|
||||
**Implementation**: `StellaOps.Scanner.VulnSurfaces`
|
||||
|
||||
### 2.2 Advisory Linksets (Concelier)
|
||||
|
||||
Scanner queries Concelier's LNM linksets for package coordinates and optional symbol hints.
|
||||
|
||||
**Implementation**: `StellaOps.Scanner.Advisory` -> Concelier `/v1/lnm/linksets/{cveId}` or `/v1/lnm/linksets/search`
|
||||
|
||||
### 2.3 Offline Bundles
|
||||
|
||||
For air-gapped environments, precomputed bundles map CVEs to packages and symbols.
|
||||
|
||||
**Implementation**: `FileAdvisoryBundleStore`
|
||||
|
||||
---
|
||||
|
||||
## 3. Service Contracts
|
||||
|
||||
### 3.1 CVE -> Package/Symbol Mapping
|
||||
|
||||
```csharp
|
||||
public interface IAdvisoryClient
|
||||
{
|
||||
Task<AdvisorySymbolMapping?> GetCveSymbolsAsync(string cveId, CancellationToken ct = default);
|
||||
}
|
||||
|
||||
public sealed record AdvisorySymbolMapping
|
||||
{
|
||||
public required string CveId { get; init; }
|
||||
public ImmutableArray<AdvisoryPackageSymbols> Packages { get; init; }
|
||||
public required string Source { get; init; } // "concelier" | "bundle"
|
||||
}
|
||||
|
||||
public sealed record AdvisoryPackageSymbols
|
||||
{
|
||||
public required string Purl { get; init; }
|
||||
public ImmutableArray<string> Symbols { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 CVE + PURL -> Affected Symbols
|
||||
|
||||
```csharp
|
||||
public interface IVulnSurfaceService
|
||||
{
|
||||
Task<VulnSurfaceResult> GetAffectedSymbolsAsync(
|
||||
string cveId,
|
||||
string purl,
|
||||
CancellationToken ct = default);
|
||||
}
|
||||
|
||||
public sealed record VulnSurfaceResult
|
||||
{
|
||||
public required string CveId { get; init; }
|
||||
public required string Purl { get; init; }
|
||||
public required ImmutableArray<AffectedSymbol> Symbols { get; init; }
|
||||
public required string Source { get; init; } // "surface" | "package-symbols" | "heuristic"
|
||||
public required double Confidence { get; init; }
|
||||
}
|
||||
|
||||
public sealed record AffectedSymbol
|
||||
{
|
||||
public required string SymbolId { get; init; }
|
||||
public string? MethodKey { get; init; }
|
||||
public string? DisplayName { get; init; }
|
||||
public string? ChangeType { get; init; }
|
||||
public double Confidence { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Caching Strategy
|
||||
|
||||
| Data | TTL | Notes |
|
||||
|------|-----|------|
|
||||
| Advisory linksets | 1 hour | In-memory cache; configurable TTL |
|
||||
| Offline bundles | Process lifetime | Loaded once from file |
|
||||
|
||||
---
|
||||
|
||||
## 5. Offline Bundle Format
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{
|
||||
"cveId": "CVE-2024-1234",
|
||||
"source": "bundle",
|
||||
"packages": [
|
||||
{
|
||||
"purl": "pkg:npm/lodash@4.17.21",
|
||||
"symbols": ["template", "templateSettings"]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Fallback Behavior
|
||||
|
||||
When no surface or advisory mapping is available, the service returns an empty symbol list with low confidence and `Source = "heuristic"`. Callers may inject an `IPackageSymbolProvider` to supply public-symbol fallbacks.
|
||||
|
||||
---
|
||||
|
||||
## 7. Related Documentation
|
||||
|
||||
- [Slice Schema](./slice-schema.md)
|
||||
- [Patch Oracles](./patch-oracles.md)
|
||||
- [Concelier Architecture](../modules/concelier/architecture.md)
|
||||
|
||||
---
|
||||
|
||||
_Created: 2025-12-22. See Sprint 3810 for implementation details._
|
||||
535
docs/modules/reach-graph/guides/function-level-evidence.md
Normal file
535
docs/modules/reach-graph/guides/function-level-evidence.md
Normal file
@@ -0,0 +1,535 @@
|
||||
# Function-Level Evidence Guide
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Docs Guild._
|
||||
|
||||
This guide documents the cross-module function-level evidence chain that enables provable reachability claims. It covers the schema, identifiers, API usage, CLI commands, and integration patterns for Scanner, Signals, Policy, and Replay.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
StellaOps implements a **function-level evidence chain** that anchors every vulnerability finding to immutable identifiers (`code_id`, `symbol_id`, `graph_hash`) enabling:
|
||||
|
||||
- **Provable reachability:** Deterministic call-path evidence from entry points to vulnerable functions.
|
||||
- **Stripped binary support:** `code_id` + `code_block_hash` provides identity when symbols are absent.
|
||||
- **Evidence replay:** Sealed artifacts with DSSE attestation allow offline verification.
|
||||
- **Cross-module linking:** Scanner -> Signals -> Policy -> VEX -> UI/CLI evidence chain.
|
||||
|
||||
### 1.1 Core Identifiers
|
||||
|
||||
| Identifier | Format | Purpose | Example |
|
||||
|------------|--------|---------|---------|
|
||||
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity | `sym:java:R3JlZXRpbmc...` |
|
||||
| `code_id` | `code:{lang}:{base64url}` | Identity for name-less code blocks | `code:binary:YWJjZGVm...` |
|
||||
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity | `blake3:a1b2c3d4e5f6...` |
|
||||
| `symbol_digest` | `sha256:{hex}` | Hash of symbol_id for edge linking | `sha256:e5f6a7b8c9d0...` |
|
||||
| `build_id` | `gnu-build-id:{hex}` | ELF/PE debug identifier | `gnu-build-id:5f0c7c3c...` |
|
||||
|
||||
### 1.2 Evidence Chain Flow
|
||||
|
||||
```
|
||||
Scanner -> richgraph-v1 -> Signals -> Scoring -> Policy -> VEX -> UI/CLI
|
||||
| | | | | | |
|
||||
| | | | | | +-- stella graph explain
|
||||
| | | | | +-- OpenVEX with call-path proofs
|
||||
| | | | +-- Policy gates + reachability.state
|
||||
| | | +-- Lattice state + confidence + riskScore
|
||||
| | +-- Runtime facts + static paths
|
||||
| +-- BLAKE3 graph_hash + DSSE attestation
|
||||
+-- code_id, symbol_id, build_id per node
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Schema Reference
|
||||
|
||||
### 2.1 SymbolID Construction
|
||||
|
||||
Per-language canonical tuple format (NUL-separated, then SHA-256 -> base64url):
|
||||
|
||||
| Language | Tuple Components | Example |
|
||||
|----------|------------------|---------|
|
||||
| Java | `{package}\0{class}\0{method}\0{descriptor}` | `com.example\0Foo\0bar\0(Ljava/lang/String;)V` |
|
||||
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | `MyApp\0Controllers\0UserController\0GetById(int)` |
|
||||
| Go | `{module}\0{package}\0{receiver}\0{func}` | `github.com/user/repo\0handler\0*Server\0Handle` |
|
||||
| Node | `{pkg_or_path}\0{export_path}\0{kind}` | `lodash\0get\0function` |
|
||||
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | `sha256:abc...\0.text\00x401000\0ssl3_read\0global\0` |
|
||||
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` | `requests\0api\0get` |
|
||||
| Ruby | `{gem_or_path}\0{module}\0{method}` | `rails\0ActionController::Base\0render` |
|
||||
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | `symfony/http-kernel\0Kernel\0handle` |
|
||||
|
||||
### 2.2 CodeID Construction
|
||||
|
||||
For stripped binaries or name-less code blocks:
|
||||
|
||||
```
|
||||
code:{lang}:{base64url_sha256(format + file_hash + addr + length + section + code_block_hash)}
|
||||
```
|
||||
|
||||
Example for stripped ELF:
|
||||
```
|
||||
code:binary:YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo
|
||||
```
|
||||
|
||||
### 2.3 Graph Node Schema
|
||||
|
||||
Each node in a richgraph-v1 document includes:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
|
||||
"symbol_id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
|
||||
"code_id": "code:java:...",
|
||||
"lang": "java",
|
||||
"kind": "method",
|
||||
"display": "com.example.GreetingService.greet(String)",
|
||||
"purl": "pkg:maven/com.example/greeting-service@1.0.0",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_digest": "sha256:e5f6a7b8...",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"symbol": {
|
||||
"mangled": null,
|
||||
"demangled": "com.example.GreetingService.greet(String)",
|
||||
"source": "DWARF",
|
||||
"confidence": 0.98
|
||||
},
|
||||
"evidence": ["import", "bytecode"],
|
||||
"attributes": {}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.4 Graph Edge Schema
|
||||
|
||||
Edges carry callee `purl` and `symbol_digest` for SBOM correlation:
|
||||
|
||||
```json
|
||||
{
|
||||
"from": "sym:java:caller...",
|
||||
"to": "sym:java:callee...",
|
||||
"kind": "call",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"symbol_digest": "sha256:f1e2d3c4...",
|
||||
"confidence": 0.92,
|
||||
"evidence": ["bytecode", "import"],
|
||||
"candidates": []
|
||||
}
|
||||
```
|
||||
|
||||
### 2.5 Evidence Block Schema
|
||||
|
||||
Evidence blocks in Policy/VEX responses cite all relevant identifiers:
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "display": "processRequest()"},
|
||||
{"symbol_id": "sym:java:...", "display": "log4j.error()"}
|
||||
],
|
||||
"path_length": 3,
|
||||
"confidence": 0.85,
|
||||
"runtime_hits": ["probe:jfr:1234"],
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"toolchain_digest": "sha256:..."
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Usage
|
||||
|
||||
### 3.1 Signals Callgraph Ingestion
|
||||
|
||||
Submit a callgraph and receive a deterministic `graph_hash`:
|
||||
|
||||
```http
|
||||
POST /signals/callgraphs
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"schema": "richgraph-v1",
|
||||
"analyzer": {"name": "scanner.java", "version": "1.2.0"},
|
||||
"nodes": [...],
|
||||
"edges": [...],
|
||||
"roots": [...]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"graphHash": "blake3:a1b2c3d4e5f6...",
|
||||
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"nodeCount": 1247,
|
||||
"edgeCount": 3891
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Signals Runtime Facts
|
||||
|
||||
Submit runtime observations with `code_id` anchors:
|
||||
|
||||
```http
|
||||
POST /signals/runtime-facts/ndjson?scanId=scan-123&imageDigest=sha256:abc123
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/x-ndjson
|
||||
Content-Encoding: gzip
|
||||
|
||||
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":47,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:00Z"}
|
||||
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":12,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:01Z"}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"accepted": 128,
|
||||
"duplicates": 2,
|
||||
"evidenceUri": "cas://reachability/runtime/sha256:xyz789..."
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Fetch Reachability Facts
|
||||
|
||||
Query reachability state for a subject:
|
||||
|
||||
```http
|
||||
GET /signals/facts/{subjectKey}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
|
||||
"metadata": {
|
||||
"fact": {
|
||||
"digest": "sha256:abc123...",
|
||||
"version": 3
|
||||
}
|
||||
},
|
||||
"states": [
|
||||
{
|
||||
"symbol": "sym:java:...",
|
||||
"latticeState": "CR",
|
||||
"bucket": "runtime",
|
||||
"confidence": 0.92,
|
||||
"score": 0.78,
|
||||
"path": ["sym:java:main...", "sym:java:process...", "sym:java:log4j..."],
|
||||
"evidence": {
|
||||
"static": {"graphHash": "blake3:...", "pathLength": 3, "confidence": 0.85},
|
||||
"runtime": {"probeId": "probe:jfr:1234", "hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
|
||||
}
|
||||
}
|
||||
],
|
||||
"score": 0.78,
|
||||
"aggregateTier": "T2",
|
||||
"riskScore": 0.65
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Policy Findings with Reachability Evidence
|
||||
|
||||
```http
|
||||
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response (excerpt):**
|
||||
|
||||
```json
|
||||
{
|
||||
"findingId": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "display": "Logger.error()"}
|
||||
],
|
||||
"runtime_hits": 47,
|
||||
"fact_digest": "sha256:abc123..."
|
||||
}
|
||||
},
|
||||
"steps": [
|
||||
{"rule": "reachability_gate", "state": "CR", "allowed": true},
|
||||
{"rule": "severity_baseline", "severity": {"normalized": "Critical", "score": 10.0}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. CLI Usage
|
||||
|
||||
### 4.1 Graph Explain Command
|
||||
|
||||
View the call path and evidence for a finding:
|
||||
|
||||
```bash
|
||||
stella graph explain --finding "pkg:maven/log4j@2.14.1:CVE-2021-44228" --scan-id scan-123
|
||||
|
||||
# Output:
|
||||
Finding: CVE-2021-44228 in pkg:maven/log4j@2.14.1
|
||||
Reachability: CONFIRMED_REACHABLE (CR)
|
||||
Confidence: 0.92
|
||||
Graph Hash: blake3:a1b2c3d4e5f6...
|
||||
|
||||
Call Path (3 hops):
|
||||
1. main() [sym:java:R3JlZXRpbmcuLi4=]
|
||||
-> processRequest() [direct call]
|
||||
2. processRequest() [sym:java:cHJvY2Vzcy4uLg==]
|
||||
-> Logger.error() [virtual call]
|
||||
3. Logger.error() [sym:java:bG9nNGouLi4=]
|
||||
[VULNERABLE: CVE-2021-44228]
|
||||
|
||||
Runtime Evidence:
|
||||
- JFR probe hit: 47 times
|
||||
- Last observed: 2025-12-13T10:00:00Z
|
||||
|
||||
DSSE Attestation: cas://reachability/graphs/a1b2c3d4....dsse
|
||||
```
|
||||
|
||||
### 4.2 Graph Export Command
|
||||
|
||||
Export a reachability graph for offline analysis:
|
||||
|
||||
```bash
|
||||
stella graph export --scan-id scan-123 --output ./evidence-bundle/
|
||||
|
||||
# Creates:
|
||||
# ./evidence-bundle/richgraph-v1.json # Canonical graph
|
||||
# ./evidence-bundle/richgraph-v1.json.dsse # DSSE envelope
|
||||
# ./evidence-bundle/meta.json # Metadata
|
||||
# ./evidence-bundle/runtime-facts.ndjson # Runtime observations
|
||||
```
|
||||
|
||||
### 4.3 Graph Verify Command
|
||||
|
||||
Verify a graph's DSSE signature and Rekor inclusion:
|
||||
|
||||
```bash
|
||||
stella graph verify --graph ./evidence-bundle/richgraph-v1.json \
|
||||
--dsse ./evidence-bundle/richgraph-v1.json.dsse \
|
||||
--rekor-log
|
||||
|
||||
# Output:
|
||||
Graph Hash: blake3:a1b2c3d4e5f6...
|
||||
DSSE Signature: VALID (key: scanner-signing-2025)
|
||||
Rekor Entry: 12345678 (verified)
|
||||
Timestamp: 2025-12-13T09:30:00Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. OpenVEX Integration
|
||||
|
||||
### 5.1 OpenVEX with Reachability Evidence
|
||||
|
||||
When Policy emits VEX decisions, reachability evidence is included:
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://openvex.dev/ns/v0.2.0",
|
||||
"@id": "https://stellaops.example/vex/2025-12-13/001",
|
||||
"author": "StellaOps Policy Engine",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"version": 1,
|
||||
"statements": [
|
||||
{
|
||||
"vulnerability": {"@id": "CVE-2021-44228"},
|
||||
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
|
||||
"status": "affected",
|
||||
"justification": "vulnerable_code_in_container",
|
||||
"impact_statement": "Vulnerable Log4j method reachable from main entry point.",
|
||||
"action_statement": "Upgrade to log4j 2.17.1 or later.",
|
||||
"stellaops:reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"path_length": 3,
|
||||
"evidence_uri": "cas://reachability/graphs/a1b2c3d4..."
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 VEX "not_affected" with Unreachability Evidence
|
||||
|
||||
When code is provably unreachable:
|
||||
|
||||
```json
|
||||
{
|
||||
"statements": [
|
||||
{
|
||||
"vulnerability": {"@id": "CVE-2023-XXXXX"},
|
||||
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
|
||||
"status": "not_affected",
|
||||
"justification": "vulnerable_code_not_in_execute_path",
|
||||
"impact_statement": "Vulnerable function not reachable from any entry point.",
|
||||
"stellaops:reachability": {
|
||||
"state": "CU",
|
||||
"confidence": 0.88,
|
||||
"graph_hash": "blake3:d4e5f6a7b8c9...",
|
||||
"evidence_uri": "cas://reachability/graphs/d4e5f6a7b8c9...",
|
||||
"runtime_observation_window": "72h",
|
||||
"runtime_hits": 0
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Replay Manifest v2
|
||||
|
||||
### 6.1 Manifest Structure
|
||||
|
||||
Replay manifests now enforce BLAKE3 hashing and CAS registration:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.replay.manifest@v2",
|
||||
"subject": "scan:123",
|
||||
"generatedAt": "2025-12-13T10:00:00Z",
|
||||
"hashAlg": "blake3",
|
||||
"artifacts": [
|
||||
{
|
||||
"kind": "richgraph",
|
||||
"uri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6...",
|
||||
"hash": "blake3:a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6....dsse"
|
||||
},
|
||||
{
|
||||
"kind": "runtime-facts",
|
||||
"uri": "cas://reachability/runtime/sha256:xyz789...",
|
||||
"hash": "sha256:xyz789..."
|
||||
},
|
||||
{
|
||||
"kind": "sbom",
|
||||
"uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"hash": "sha256:def456..."
|
||||
}
|
||||
],
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"toolchain_digest": "sha256:..."
|
||||
},
|
||||
"code_id_coverage": {
|
||||
"total_symbols": 1247,
|
||||
"with_code_id": 1189,
|
||||
"coverage_pct": 95.3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Determinism Verification
|
||||
|
||||
Replay a manifest to verify determinism:
|
||||
|
||||
```bash
|
||||
stella replay verify --manifest ./manifest.json --sealed
|
||||
|
||||
# Output:
|
||||
Manifest: stellaops.replay.manifest@v2
|
||||
Subject: scan:123
|
||||
Artifacts: 3
|
||||
|
||||
Verifying richgraph...
|
||||
Computed: blake3:a1b2c3d4e5f6...
|
||||
Expected: blake3:a1b2c3d4e5f6...
|
||||
Status: MATCH
|
||||
|
||||
Verifying runtime-facts...
|
||||
Computed: sha256:xyz789...
|
||||
Expected: sha256:xyz789...
|
||||
Status: MATCH
|
||||
|
||||
Verifying sbom...
|
||||
Computed: sha256:def456...
|
||||
Expected: sha256:def456...
|
||||
Status: MATCH
|
||||
|
||||
All artifacts verified. Determinism check PASSED.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Module Integration Guide
|
||||
|
||||
### 7.1 Scanner -> Signals
|
||||
|
||||
Scanner emits richgraph-v1 with `code_id` and `symbol_id`:
|
||||
|
||||
1. Scanner analyzes container/artifact
|
||||
2. Callgraph generators emit nodes with `symbol_id`, `code_id`, `build_id`
|
||||
3. RichGraphWriter canonicalizes (sorted arrays/keys) and computes `graph_hash` (BLAKE3)
|
||||
4. DSSE signer wraps canonical JSON
|
||||
5. CAS store persists body + envelope
|
||||
6. Signals ingestion API receives URI reference
|
||||
|
||||
### 7.2 Signals -> Policy
|
||||
|
||||
Signals provides reachability facts to Policy:
|
||||
|
||||
1. Policy queries `/signals/facts/{subjectKey}`
|
||||
2. Response includes `metadata.fact.digest`, `states[]`, `score`
|
||||
3. Policy gates check `latticeState` (U, SR, SU, RO, RU, CR, CU, X)
|
||||
4. Evidence blocks in findings reference `graph_hash`, `path[]`, `runtime_hits[]`
|
||||
|
||||
### 7.3 Policy -> VEX/UI
|
||||
|
||||
Policy emits OpenVEX with evidence:
|
||||
|
||||
1. VexDecisionEmitter serializes OpenVEX with `stellaops:reachability` extension
|
||||
2. UI explain drawer fetches evidence via `/api/policy/findings/{id}/explain`
|
||||
3. CLI `stella graph explain` renders call path and attestation refs
|
||||
|
||||
---
|
||||
|
||||
## 8. CAS Layout Reference
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
graphs/
|
||||
{blake3}/ # Graph body (canonical JSON)
|
||||
{blake3}.dsse # DSSE envelope
|
||||
edges/
|
||||
{graph_hash}/{bundle_id} # Edge bundle body (optional)
|
||||
{graph_hash}/{bundle_id}.dsse
|
||||
runtime/
|
||||
{sha256}/ # Runtime facts NDJSON
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Related Documentation
|
||||
|
||||
- [Reachability Lattice Model](./lattice.md) - State definitions and join rules
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Schema specification
|
||||
- [Evidence Schema](./evidence-schema.md) - Detailed field definitions
|
||||
- [Signals API Contract](../api/signals/reachability-contract.md) - API reference
|
||||
- [Policy Gates](./policy-gate.md) - Gate configuration
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Ground Truth Schema](./ground-truth-schema.md) - Test fixture format
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GAP-DOC-008 for change history._
|
||||
206
docs/modules/reach-graph/guides/gates.md
Normal file
206
docs/modules/reach-graph/guides/gates.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# Gate Detection for Reachability Scoring
|
||||
|
||||
> **Sprint:** SPRINT_3405_0001_0001
|
||||
> **Module:** Scanner Reachability / Signals
|
||||
|
||||
## Overview
|
||||
|
||||
Gate detection identifies protective controls in code paths that reduce the likelihood of vulnerability exploitation. When a vulnerable function is protected by authentication, feature flags, admin-only checks, or configuration gates, the reachability score is reduced proportionally.
|
||||
|
||||
## Gate Types
|
||||
|
||||
| Gate Type | Multiplier | Description |
|
||||
|-----------|------------|-------------|
|
||||
| `AuthRequired` | 30% | Code path requires authentication |
|
||||
| `FeatureFlag` | 20% | Code path behind a feature flag |
|
||||
| `AdminOnly` | 15% | Code path requires admin/elevated role |
|
||||
| `NonDefaultConfig` | 50% | Code path requires non-default configuration |
|
||||
|
||||
### Multiplier Stacking
|
||||
|
||||
Multiple gate types stack multiplicatively:
|
||||
|
||||
```
|
||||
Auth (30%) × Feature Flag (20%) = 6%
|
||||
Auth (30%) × Admin (15%) = 4.5%
|
||||
All four gates = ~0.45% (floored to 5%)
|
||||
```
|
||||
|
||||
A minimum floor of **5%** prevents scores from reaching zero.
|
||||
|
||||
## Detection Methods
|
||||
|
||||
### AuthGateDetector
|
||||
|
||||
Detects authentication requirements:
|
||||
|
||||
**C# Patterns:**
|
||||
- `[Authorize]` attribute
|
||||
- `User.Identity.IsAuthenticated` checks
|
||||
- `HttpContext.User` access
|
||||
- JWT/Bearer token validation
|
||||
|
||||
**Java Patterns:**
|
||||
- `@PreAuthorize`, `@Secured` annotations
|
||||
- `SecurityContextHolder.getContext()`
|
||||
- Spring Security filter chains
|
||||
|
||||
**Go Patterns:**
|
||||
- Middleware patterns (`authMiddleware`, `RequireAuth`)
|
||||
- Context-based auth checks
|
||||
|
||||
**JavaScript/TypeScript Patterns:**
|
||||
- Express.js `passport` middleware
|
||||
- JWT verification middleware
|
||||
- Session checks
|
||||
|
||||
### FeatureFlagDetector
|
||||
|
||||
Detects feature flag guards:
|
||||
|
||||
**Patterns:**
|
||||
- LaunchDarkly: `ldClient.variation()`, `ld.boolVariation()`
|
||||
- Split.io: `splitClient.getTreatment()`
|
||||
- Unleash: `unleash.isEnabled()`
|
||||
- Custom: `featureFlags.isEnabled()`, `isFeatureEnabled()`
|
||||
|
||||
### AdminOnlyDetector
|
||||
|
||||
Detects admin/role requirements:
|
||||
|
||||
**Patterns:**
|
||||
- `[Authorize(Roles = "Admin")]`
|
||||
- `User.IsInRole("Admin")`
|
||||
- `@RolesAllowed("ADMIN")`
|
||||
- RBAC middleware checks
|
||||
|
||||
### ConfigGateDetector
|
||||
|
||||
Detects configuration-based gates:
|
||||
|
||||
**Patterns:**
|
||||
- Environment variable checks (`process.env.ENABLE_FEATURE`)
|
||||
- Configuration file conditionals
|
||||
- Runtime feature toggles
|
||||
- Debug-only code paths
|
||||
|
||||
## Output Contract
|
||||
|
||||
### DetectedGate
|
||||
|
||||
**Note:** In **Signals API outputs**, `type` is serialized as the C# enum name (e.g., `"AuthRequired"`). In **richgraph-v1** JSON, `type` is lowerCamelCase and gate fields are snake_case (see example below).
|
||||
|
||||
```typescript
|
||||
interface DetectedGate {
|
||||
type: 'AuthRequired' | 'FeatureFlag' | 'AdminOnly' | 'NonDefaultConfig';
|
||||
detail: string; // Human-readable description
|
||||
guardSymbol: string; // Symbol where gate was detected
|
||||
sourceFile?: string; // Source file location
|
||||
lineNumber?: number; // Line number
|
||||
confidence: number; // 0.0-1.0 confidence score
|
||||
detectionMethod: string; // Detection algorithm used
|
||||
}
|
||||
```
|
||||
|
||||
### GateDetectionResult
|
||||
|
||||
```typescript
|
||||
interface GateDetectionResult {
|
||||
gates: DetectedGate[];
|
||||
hasGates: boolean;
|
||||
primaryGate?: DetectedGate; // Highest confidence gate
|
||||
combinedMultiplierBps: number; // Basis points (10000 = 100%)
|
||||
}
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
### RichGraph Edge Annotation
|
||||
|
||||
Gates are annotated on `RichGraphEdge` objects:
|
||||
|
||||
```csharp
|
||||
public sealed record RichGraphEdge
|
||||
{
|
||||
// ... existing properties ...
|
||||
|
||||
/// <summary>Gates detected on this edge</summary>
|
||||
public IReadOnlyList<DetectedGate> Gates { get; init; } = [];
|
||||
|
||||
/// <summary>Combined gate multiplier in basis points</summary>
|
||||
public int GateMultiplierBps { get; init; } = 10000;
|
||||
}
|
||||
```
|
||||
|
||||
**richgraph-v1 JSON example (edge fragment):**
|
||||
|
||||
```json
|
||||
{
|
||||
"gate_multiplier_bps": 3000,
|
||||
"gates": [
|
||||
{
|
||||
"type": "authRequired",
|
||||
"detail": "[Authorize] attribute on controller",
|
||||
"guard_symbol": "MyController.VulnerableAction",
|
||||
"source_file": "src/MyController.cs",
|
||||
"line_number": 42,
|
||||
"detection_method": "csharp.attribute",
|
||||
"confidence": 0.95
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### ReachabilityReport
|
||||
|
||||
Gates are included in the reachability report:
|
||||
|
||||
```json
|
||||
{
|
||||
"vulnId": "CVE-2024-0001",
|
||||
"reachable": true,
|
||||
"score": 7.5,
|
||||
"adjustedScore": 2.25,
|
||||
"gates": [
|
||||
{
|
||||
"type": "AuthRequired",
|
||||
"detail": "[Authorize] attribute on controller",
|
||||
"guardSymbol": "MyController.VulnerableAction",
|
||||
"confidence": 0.95
|
||||
}
|
||||
],
|
||||
"gateMultiplierBps": 3000
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### appsettings.json
|
||||
|
||||
```json
|
||||
{
|
||||
"Reachability": {
|
||||
"GateMultipliers": {
|
||||
"AuthRequiredMultiplierBps": 3000,
|
||||
"FeatureFlagMultiplierBps": 2000,
|
||||
"AdminOnlyMultiplierBps": 1500,
|
||||
"NonDefaultConfigMultiplierBps": 5000,
|
||||
"MinimumMultiplierBps": 500
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| `scanner.gates_detected_total` | Total gates detected by type |
|
||||
| `scanner.gate_reduction_applied` | Histogram of multiplier reductions |
|
||||
| `scanner.gated_vulns_total` | Vulnerabilities with gates detected |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Reachability Architecture](../modules/scanner/architecture.md)
|
||||
- [Determinism Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md) - Sections 2.2, 4.3
|
||||
- [Signals Service](../modules/signals/architecture.md)
|
||||
508
docs/modules/reach-graph/guides/hybrid-attestation.md
Normal file
508
docs/modules/reach-graph/guides/hybrid-attestation.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# Hybrid Reachability Attestation (Graph + Edge-Bundle)
|
||||
|
||||
> Decision date: 2025-12-11 · Owners: Scanner Guild, Attestor Guild, Signals Guild, Policy Guild
|
||||
|
||||
## 0. Context: Four Capabilities
|
||||
|
||||
This document supports **Signed Reachability**—one of four capabilities no competitor offers together:
|
||||
|
||||
1. **Signed Reachability** – Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. Both static call-graph edges and runtime-derived edges can be attested—true hybrid reachability.
|
||||
2. **Deterministic Replay** – Scans run bit-for-bit identical from frozen feeds and analyzer manifests.
|
||||
3. **Explainable Policy (Lattice VEX)** – Evidence-linked VEX decisions with explicit "Unknown" state handling.
|
||||
4. **Sovereign + Offline Operation** – FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles.
|
||||
|
||||
All evidence is sealed in **Decision Capsules** for audit-grade reproducibility.
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
- Guarantee replayable, signed reachability evidence with **graph-level DSSE** for every scan while enabling **selective edge-level DSSE bundles** when finer provenance or dispute handling is required.
|
||||
- Keep CI/offline bundles lean (graph-first), but allow auditors/regulators to quarantine or prove individual edges without regenerating whole graphs.
|
||||
- Support **hybrid reachability** by attesting both static call-graph edges and runtime-derived edges.
|
||||
|
||||
## 2. Attestation levels
|
||||
- **Level 0 (Graph DSSE) — Required**
|
||||
- Payload: canonical `richgraph-v1` (nodes, edges, roots, graph_hash, analyzer metadata, policy_hash).
|
||||
- Signature: one DSSE envelope per graph; submit digest to Rekor (or mirror) always.
|
||||
- CAS: `cas://reachability/graphs/{blake3}` (body) + `cas://reachability/graphs/{blake3}.dsse` (envelope).
|
||||
- **Level 1 (Edge-Bundle DSSE) — Optional/Selective**
|
||||
- Payload: batch of edges (size ≤ 512) with per-edge reason, evidence hashes, `symbol_digest`, `purl`, `confidence`, and `phase`.
|
||||
- Criteria to emit bundles:
|
||||
- Edge reason is `runtime`, `init_array`/constructors/TLS callbacks, or comes from third-party provenance.
|
||||
- Edge is contested/flagged in Unknowns registry or under policy quarantine.
|
||||
- Signature: one DSSE envelope per bundle; Rekor submission **configurable** (default on for contested/high-risk bundles, off for bulk benign bundles in sealed mode).
|
||||
- CAS: `cas://reachability/edges/{graph_hash}/{bundle_id}` JSON + `.../{bundle_id}.dsse`.
|
||||
|
||||
## 3. Producer responsibilities
|
||||
- **Scanner**
|
||||
- Always emit Level 0 graph + manifest.
|
||||
- When criteria match, emit Level 1 bundles; include `bundle_reason` (e.g., `runtime-hit`, `init-root`, `third-party`, `disputed`).
|
||||
- Canonicalise JSON (sorted keys/arrays) before hashing; BLAKE3 as graph hash, SHA-256 inside bundles.
|
||||
- For hybrid reachability: tag edges with `source: static` or `source: runtime` to distinguish call-graph derived vs. runtime-observed edges.
|
||||
- **Attestor/Signer**
|
||||
- Apply DSSE for both levels; respect sovereign crypto modes (FIPS/GOST/SM/PQC) from environment.
|
||||
- Rekor: push graph envelope digests; push edge-bundle digests only when `rekor_publish=true` (policy/default for high-risk bundles).
|
||||
|
||||
## 4. Consumer responsibilities
|
||||
- **Signals**
|
||||
- Ingest graph DSSE as the canonical source; ingest edge-bundles when present and attach to the same `graph_hash`.
|
||||
- Store per-edge DSSE metadata for quarantine/override flows; surface missing edges as Unknowns only when absent from both graph and bundles.
|
||||
- **Policy**
|
||||
- Default trust path: graph DSSE + CAS object.
|
||||
- When an edge is quarantined/contested, drop it from consideration if an edge-bundle DSSE marks it `revoked=true` or if the Unknowns registry lists it with policy quarantine flag.
|
||||
- For "evidence-required" rules, require either (a) graph DSSE + policy_hash match **or** (b) edge-bundle DSSE that covers the vulnerable path edges.
|
||||
- **Replay/Bench/CLI**
|
||||
- `stella graph verify` should accept `--graph {hash}` and optional `--edge-bundles` to validate deeper provenance offline.
|
||||
|
||||
## 5. Verification and quarantine flows
|
||||
- **Happy path**: verify graph DSSE → verify Rekor inclusion (or mirror) → hash graph body → match `graph_hash` in policy/replay manifest → accept.
|
||||
- **Dispute/quarantine**: mark specific `edge_id` as `revoked` in an edge-bundle DSSE; Policy/Signals exclude it, recompute reachability, and surface delta in explainers.
|
||||
- **Offline**: retain graph DSSE and selected edge-bundles inside replay pack; Rekor proofs cached when available.
|
||||
- **Sovereign Verification Mode**: Even with no internet, all signatures and transparency proofs can be locally verified using Offline Update Kits.
|
||||
|
||||
## 6. Performance & storage guardrails
|
||||
- Default: only graph DSSE is mandatory; edge-bundles capped at 512 edges per envelope and emitted only on criteria above.
|
||||
- Rekor flood control: cap edge-bundle Rekor submissions per graph (config `reachability.edgeBundles.maxRekorPublishes`, default 5). Others stay CAS-only.
|
||||
- Determinism: bundle ordering = stable sort by `(bundle_reason, edge_id)`; hash before signing.
|
||||
|
||||
## 7. Hybrid Reachability Details
|
||||
|
||||
Stella Ops provides **true hybrid reachability** by combining:
|
||||
|
||||
| Signal Type | Source | Attestation |
|
||||
|-------------|--------|-------------|
|
||||
| Static call-graph edges | IL/bytecode analysis, framework routing models, entry-point proximity | Graph DSSE (Level 0) |
|
||||
| Runtime-observed edges | EventPipe, JFR, Node inspector, Go/Rust probes | Edge-bundle DSSE (Level 1) with `source: runtime` |
|
||||
|
||||
**Why hybrid matters:**
|
||||
- Static analysis catches code paths that may not execute during observed runtime
|
||||
- Runtime analysis catches dynamic dispatch, reflection, and framework-injected paths
|
||||
- Combining both provides confidence across build and runtime contexts
|
||||
- Each edge type is separately attestable for audit and dispute resolution
|
||||
|
||||
**Evidence linking:** Each edge in the graph or bundle includes `evidenceRefs` pointing to the underlying proof artifacts (static analysis artifacts, runtime traces), enabling **evidence-linked VEX decisions**.
|
||||
|
||||
## 8. Decisions (Frozen 2025-12-13)
|
||||
|
||||
### 8.1 DSSE/Rekor Budget by Deployment Tier
|
||||
|
||||
| Tier | Graph DSSE | Edge-Bundle DSSE | Rekor Publish | Max Bundles/Graph |
|
||||
|------|------------|------------------|---------------|-------------------|
|
||||
| **Regulated** (SOC2, FedRAMP, PCI) | Required | Required for runtime/contested | Required | 10 |
|
||||
| **Standard** | Required | Optional (criteria-based) | Graph only | 5 |
|
||||
| **Air-gapped** | Required | Optional | Offline checkpoint | 5 |
|
||||
| **Dev/Test** | Optional | Optional | Disabled | Unlimited |
|
||||
|
||||
**Budget enforcement:**
|
||||
- Graph DSSE: Always submit digest to Rekor (or offline checkpoint for air-gapped)
|
||||
- Edge-bundle DSSE: Submit to Rekor only when `bundle_reason` is `disputed`, `runtime-hit`, or `security-critical`
|
||||
- Cap enforced by `reachability.edgeBundles.maxRekorPublishes` config (per tier defaults above)
|
||||
|
||||
### 8.2 Signing Layout and CAS Paths
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
graphs/
|
||||
{blake3}/ # richgraph-v1 body (JSON)
|
||||
{blake3}.dsse # Graph DSSE envelope
|
||||
{blake3}.rekor # Rekor inclusion proof (optional)
|
||||
edges/
|
||||
{graph_hash}/
|
||||
{bundle_id}.json # Edge bundle body
|
||||
{bundle_id}.dsse # Edge bundle DSSE envelope
|
||||
{bundle_id}.rekor # Rekor inclusion proof (if published)
|
||||
revisions/
|
||||
{revision_id}/ # Revision manifest + lineage
|
||||
```
|
||||
|
||||
**Signing workflow:**
|
||||
1. Canonicalize richgraph-v1 JSON (sorted keys, arrays by deterministic key)
|
||||
2. Compute BLAKE3-256 hash -> `graph_hash`
|
||||
3. Create DSSE envelope with `stella.ops/graph@v1` predicate
|
||||
4. Submit digest to Rekor (online) or cache checkpoint (offline)
|
||||
5. Store graph body + envelope + proof in CAS
|
||||
|
||||
### 8.3 CLI UX for Selective Bundle Verification
|
||||
|
||||
```bash
|
||||
# Verify graph DSSE only (default)
|
||||
stella graph verify --hash blake3:a1b2c3d4...
|
||||
|
||||
# Verify graph + all edge bundles
|
||||
stella graph verify --hash blake3:a1b2c3d4... --include-bundles
|
||||
|
||||
# Verify specific edge bundle
|
||||
stella graph verify --hash blake3:a1b2c3d4... --bundle bundle:001
|
||||
|
||||
# Offline verification with local CAS
|
||||
stella graph verify --hash blake3:a1b2c3d4... --cas-root ./offline-cas/
|
||||
|
||||
# Verify Rekor inclusion
|
||||
stella graph verify --hash blake3:a1b2c3d4... --rekor-proof
|
||||
|
||||
# Output formats
|
||||
stella graph verify --hash blake3:a1b2c3d4... --format json|table|summary
|
||||
```
|
||||
|
||||
### 8.4 Golden Fixture Plan
|
||||
|
||||
**Fixture location:** `tests/Reachability/Hybrid/`
|
||||
|
||||
**Required fixtures:**
|
||||
| Fixture | Description | Expected Verification Time |
|
||||
|---------|-------------|---------------------------|
|
||||
| `graph-only.golden.json` | Minimal richgraph-v1 with DSSE | < 100ms |
|
||||
| `graph-with-runtime.golden.json` | Graph + 1 runtime edge bundle | < 200ms |
|
||||
| `graph-with-contested.golden.json` | Graph + 1 contested/revoked edge bundle | < 200ms |
|
||||
| `large-graph.golden.json` | 10K nodes, 50K edges, 5 bundles | < 2s |
|
||||
| `offline-bundle.golden.tgz` | Complete offline replay pack | < 5s |
|
||||
|
||||
**CI integration:**
|
||||
- `.gitea/workflows/hybrid-attestation.yml` runs verification fixtures
|
||||
- Size gate: Graph body < 10MB, individual bundle < 1MB
|
||||
- Time gate: Full verification < 5s for standard tier
|
||||
|
||||
### 8.5 Implementation Status
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Graph DSSE predicate | Done | `stella.ops/graph@v1` in PredicateTypes.cs |
|
||||
| Edge-bundle DSSE predicate | Done | `stella.ops/edgeBundle@v1` via EdgeBundlePublisher |
|
||||
| Edge-bundle models | Done | EdgeBundle.cs, EdgeBundleReason, EdgeReason enums |
|
||||
| Edge-bundle CAS publisher | Done | EdgeBundlePublisher.cs with deterministic DSSE |
|
||||
| Edge-bundle ingestion | Done | EdgeBundleIngestionService in Signals |
|
||||
| CAS layout | Done | Per section 8.2 |
|
||||
| Runtime-facts CAS storage | Done | IRuntimeFactsArtifactStore, FileSystemRuntimeFactsArtifactStore |
|
||||
| CLI verify command | Planned | Per section 8.3 |
|
||||
| Golden fixtures | Planned | Per section 8.4 |
|
||||
| Rekor integration | Done | Via Attestor module |
|
||||
| Quarantine enforcement | Done | HasQuarantinedEdges in ReachabilityFactDocument |
|
||||
|
||||
---
|
||||
|
||||
## 9. Verification Runbook
|
||||
|
||||
This section provides step-by-step guidance for verifying hybrid attestations in different scenarios.
|
||||
|
||||
### 9.1 Graph-Only Verification
|
||||
|
||||
Use this workflow when only graph-level attestation is required (default for most use cases).
|
||||
|
||||
**Prerequisites:**
|
||||
- Access to CAS storage (local or remote)
|
||||
- `stella` CLI installed
|
||||
- Optional: Rekor instance access for transparency verification
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Retrieve graph DSSE envelope:**
|
||||
```bash
|
||||
stella graph fetch --hash blake3:<graph_hash> --output ./verification/
|
||||
```
|
||||
|
||||
2. **Verify DSSE signature:**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash>
|
||||
# Output: ✓ Graph signature valid (key: <key_id>)
|
||||
```
|
||||
|
||||
3. **Verify content integrity:**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> --check-content
|
||||
# Output: ✓ Content hash matches BLAKE3:<graph_hash>
|
||||
```
|
||||
|
||||
4. **Verify Rekor inclusion (online):**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> --rekor-proof
|
||||
# Output: ✓ Rekor inclusion verified (log index: <index>)
|
||||
```
|
||||
|
||||
5. **Verify policy hash binding:**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> --policy-hash sha256:<policy_hash>
|
||||
# Output: ✓ Policy hash matches graph metadata
|
||||
```
|
||||
|
||||
### 9.2 Graph + Edge-Bundle Verification
|
||||
|
||||
Use this workflow when finer-grained verification of specific edges is required.
|
||||
|
||||
**When to use:**
|
||||
- Auditing runtime-observed paths
|
||||
- Investigating contested/disputed edges
|
||||
- Verifying init-section or TLS callback roots
|
||||
- Regulatory compliance requiring edge-level attestation
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **List available edge bundles:**
|
||||
```bash
|
||||
stella graph bundles --hash blake3:<graph_hash>
|
||||
# Output:
|
||||
# Bundle ID Reason Edges Rekor
|
||||
# bundle:001 runtime-hit 42 ✓
|
||||
# bundle:002 init-root 15 ✓
|
||||
# bundle:003 third-party 128 -
|
||||
```
|
||||
|
||||
2. **Verify specific bundle:**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> --bundle bundle:001
|
||||
# Output:
|
||||
# ✓ Bundle DSSE signature valid
|
||||
# ✓ All 42 edges link to graph_hash
|
||||
# ✓ Rekor inclusion verified
|
||||
```
|
||||
|
||||
3. **Verify all bundles:**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> --include-bundles
|
||||
# Output:
|
||||
# ✓ Graph signature valid
|
||||
# ✓ 3 bundles verified (185 edges total)
|
||||
```
|
||||
|
||||
4. **Check for revoked edges:**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> --check-revoked
|
||||
# Output:
|
||||
# ⚠ 2 edges marked revoked in bundle:002
|
||||
# - edge:func_a→func_b (reason: policy-quarantine)
|
||||
# - edge:func_c→func_d (reason: revoked)
|
||||
```
|
||||
|
||||
### 9.3 Verification Decision Matrix
|
||||
|
||||
| Scenario | Graph DSSE | Edge Bundles | Rekor | Policy Hash |
|
||||
|----------|------------|--------------|-------|-------------|
|
||||
| Standard CI/CD | Required | Optional | Recommended | Required |
|
||||
| Regulated audit | Required | Required | Required | Required |
|
||||
| Dispute resolution | Required | Required (contested) | Required | Optional |
|
||||
| Offline replay | Required | As available | Cached proof | Required |
|
||||
| Dev/test | Optional | Optional | Disabled | Optional |
|
||||
|
||||
---
|
||||
|
||||
## 10. Rekor Guidance
|
||||
|
||||
### 10.1 Rekor Integration Overview
|
||||
|
||||
Rekor provides an immutable transparency log for attestation artifacts. StellaOps integrates with Rekor (or compatible mirrors) to provide verifiable timestamps and inclusion proofs.
|
||||
|
||||
### 10.2 What Gets Published to Rekor
|
||||
|
||||
| Artifact Type | Rekor Publish | Condition |
|
||||
|---------------|---------------|-----------|
|
||||
| Graph DSSE digest | Always | All deployment tiers (except dev/test) |
|
||||
| Edge-bundle DSSE digest | Conditional | Only for `disputed`, `runtime-hit`, `security-critical` reasons |
|
||||
| VEX decision DSSE digest | Always | When VEX decisions are generated |
|
||||
|
||||
### 10.3 Rekor Configuration
|
||||
|
||||
```yaml
|
||||
# etc/signals.yaml
|
||||
reachability:
|
||||
rekor:
|
||||
enabled: true
|
||||
endpoint: "https://rekor.sigstore.dev" # Or private mirror
|
||||
timeout: 30s
|
||||
retry:
|
||||
attempts: 3
|
||||
backoff: exponential
|
||||
edgeBundles:
|
||||
maxRekorPublishes: 5 # Per graph, configurable by tier
|
||||
publishReasons:
|
||||
- disputed
|
||||
- runtime-hit
|
||||
- security-critical
|
||||
```
|
||||
|
||||
### 10.4 Private Rekor Mirror
|
||||
|
||||
For air-gapped or regulated environments:
|
||||
|
||||
```yaml
|
||||
reachability:
|
||||
rekor:
|
||||
enabled: true
|
||||
endpoint: "https://rekor.internal.example.com"
|
||||
tls:
|
||||
ca: /etc/stellaops/ca.crt
|
||||
clientCert: /etc/stellaops/client.crt
|
||||
clientKey: /etc/stellaops/client.key
|
||||
```
|
||||
|
||||
### 10.5 Rekor Proof Caching
|
||||
|
||||
Inclusion proofs are cached locally for offline verification:
|
||||
|
||||
```
|
||||
cas://reachability/graphs/{blake3}.rekor # Graph inclusion proof
|
||||
cas://reachability/edges/{graph_hash}/{bundle_id}.rekor # Bundle proof
|
||||
```
|
||||
|
||||
**Proof format:**
|
||||
```json
|
||||
{
|
||||
"logIndex": 12345678,
|
||||
"logId": "c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d",
|
||||
"integratedTime": 1702492800,
|
||||
"inclusionProof": {
|
||||
"logIndex": 12345678,
|
||||
"rootHash": "abc123...",
|
||||
"treeSize": 50000000,
|
||||
"hashes": ["def456...", "ghi789..."]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Offline Replay Steps
|
||||
|
||||
### 11.1 Overview
|
||||
|
||||
Offline replay enables full verification of reachability attestations without network access. This is essential for air-gapped deployments and regulatory compliance scenarios.
|
||||
|
||||
### 11.2 Creating an Offline Replay Pack
|
||||
|
||||
**Step 1: Export graph and bundles**
|
||||
```bash
|
||||
stella graph export --hash blake3:<graph_hash> \
|
||||
--include-bundles \
|
||||
--include-rekor-proofs \
|
||||
--output ./offline-pack/
|
||||
```
|
||||
|
||||
**Step 2: Include required artifacts**
|
||||
The export creates:
|
||||
```
|
||||
offline-pack/
|
||||
├── manifest.json # Replay manifest v2
|
||||
├── graphs/
|
||||
│ └── <blake3>/
|
||||
│ ├── richgraph-v1.json # Graph body
|
||||
│ ├── graph.dsse # DSSE envelope
|
||||
│ └── graph.rekor # Inclusion proof
|
||||
├── edges/
|
||||
│ └── <graph_hash>/
|
||||
│ ├── bundle-001.json
|
||||
│ ├── bundle-001.dsse
|
||||
│ └── bundle-001.rekor
|
||||
├── runtime-facts/
|
||||
│ └── <hash>/
|
||||
│ └── runtime-facts.ndjson
|
||||
└── checkpoints/
|
||||
└── rekor-checkpoint.json # Transparency log checkpoint
|
||||
```
|
||||
|
||||
**Step 3: Bundle for transfer**
|
||||
```bash
|
||||
stella offline pack --input ./offline-pack/ --output offline-replay.tgz
|
||||
```
|
||||
|
||||
### 11.3 Verifying an Offline Pack
|
||||
|
||||
**Step 1: Extract pack**
|
||||
```bash
|
||||
stella offline unpack --input offline-replay.tgz --output ./verify/
|
||||
```
|
||||
|
||||
**Step 2: Verify manifest integrity**
|
||||
```bash
|
||||
stella offline verify --manifest ./verify/manifest.json
|
||||
# Output:
|
||||
# ✓ Manifest version: 2
|
||||
# ✓ Hash algorithm: blake3
|
||||
# ✓ All CAS entries present
|
||||
# ✓ All hashes verified
|
||||
```
|
||||
|
||||
**Step 3: Verify attestations offline**
|
||||
```bash
|
||||
stella graph verify --hash blake3:<graph_hash> \
|
||||
--cas-root ./verify/ \
|
||||
--offline
|
||||
# Output:
|
||||
# ✓ Graph DSSE signature valid (offline mode)
|
||||
# ✓ Rekor proof verified against checkpoint
|
||||
# ✓ 3 bundles verified offline
|
||||
```
|
||||
|
||||
### 11.4 Offline Verification Trust Model
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Offline Pack │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
|
||||
│ │ Graph DSSE │ │ Edge Bundle │ │ Rekor │ │
|
||||
│ │ Envelope │ │ DSSE │ │ Checkpoint │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ Local Verification Engine │ │
|
||||
│ │ 1. Verify DSSE signatures against trusted keys │ │
|
||||
│ │ 2. Verify content hashes match DSSE payloads │ │
|
||||
│ │ 3. Verify Rekor proofs against checkpoint │ │
|
||||
│ │ 4. Verify policy hash binding │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 11.5 Air-Gapped Deployment Checklist
|
||||
|
||||
- [ ] Trusted signing keys pre-installed
|
||||
- [ ] Rekor checkpoint from last sync included
|
||||
- [ ] All referenced CAS artifacts bundled
|
||||
- [ ] Policy hash recorded in manifest
|
||||
- [ ] Analyzer manifests included for replay
|
||||
- [ ] Runtime-facts artifacts included (if applicable)
|
||||
|
||||
---
|
||||
|
||||
## 12. Release Notes
|
||||
|
||||
### 12.1 Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0 | 2025-12-11 | Initial hybrid attestation design |
|
||||
| 1.1 | 2025-12-13 | Added edge-bundle ingestion, CAS storage, verification runbook |
|
||||
|
||||
### 12.2 Breaking Changes
|
||||
|
||||
None. Hybrid attestation is additive; existing graph-only workflows remain unchanged.
|
||||
|
||||
### 12.3 Migration Guide
|
||||
|
||||
**From graph-only to hybrid:**
|
||||
1. No migration required for existing graphs
|
||||
2. Enable edge-bundle emission in scanner config:
|
||||
```yaml
|
||||
scanner:
|
||||
reachability:
|
||||
edgeBundles:
|
||||
enabled: true
|
||||
emitRuntime: true
|
||||
emitContested: true
|
||||
```
|
||||
3. Signals automatically ingests edge bundles when present
|
||||
|
||||
---
|
||||
|
||||
## 13. Cross-References
|
||||
|
||||
- **Sprint:** SPRINT_0401_0001_0001_reachability_evidence_chain.md (Tasks 53-56)
|
||||
- **Contracts:** docs/contracts/richgraph-v1.md, docs/contracts/edge-bundle-v1.md
|
||||
- **Implementation:**
|
||||
- Scanner: `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/EdgeBundle*.cs`
|
||||
- Signals: `src/Signals/StellaOps.Signals/Ingestion/EdgeBundleIngestionService.cs`
|
||||
- Policy: `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateEvaluator.cs`
|
||||
- **Related docs:**
|
||||
- docs/modules/reach-graph/guides/function-level-evidence.md
|
||||
- docs/modules/reach-graph/guides/lattice.md
|
||||
- docs/replay/DETERMINISTIC_REPLAY.md
|
||||
- docs/ARCHITECTURE_OVERVIEW.md
|
||||
254
docs/modules/reach-graph/guides/lattice.md
Normal file
254
docs/modules/reach-graph/guides/lattice.md
Normal file
@@ -0,0 +1,254 @@
|
||||
# Reachability Lattice & Scoring Model
|
||||
|
||||
> **Status:** Implemented v0 in Signals; this document describes the current deterministic bucket model and its policy-facing implications.
|
||||
> **Owners:** Scanner Guild · Signals Guild · Policy Guild.
|
||||
|
||||
StellaOps models reachability as a deterministic, evidence-linked outcome that can safely represent "unknown" without silently producing false safety. Signals produces a `ReachabilityFactDocument` with per-target `states[]` and a top-level `score` that is stable under replays.
|
||||
|
||||
---
|
||||
|
||||
## 1. Current model (Signals v0)
|
||||
|
||||
Signals scoring (`src/Signals/StellaOps.Signals/Services/ReachabilityScoringService.cs`) computes, for each `target` symbol:
|
||||
|
||||
- `reachable`: whether there exists a path from the selected `entryPoints[]` to `target`.
|
||||
- `bucket`: a coarse classification of *why* the target is/was reachable.
|
||||
- `confidence` (0..1): a bounded confidence value.
|
||||
- `weight` (0..1): bucket multiplier.
|
||||
- `score` (0..1): `confidence * weight`.
|
||||
- `path[]`: the discovered path (if reachable), deterministically ordered.
|
||||
- `evidence.runtimeHits[]`: runtime hit symbols that appear on the chosen path.
|
||||
|
||||
The fact-level `score` is the average of per-target scores, penalized by unknowns pressure (see §4).
|
||||
|
||||
---
|
||||
|
||||
## 2. Buckets & default weights
|
||||
|
||||
Bucket assignment is deterministic and uses this precedence:
|
||||
|
||||
1. `unreachable` — no path exists.
|
||||
2. `entrypoint` — the `target` itself is an entrypoint.
|
||||
3. `runtime` — at least one runtime hit overlaps the discovered path.
|
||||
4. `direct` — reachable and the discovered path is length ≤ 2.
|
||||
5. `unknown` — reachable but none of the above classifications apply.
|
||||
|
||||
Default weights (configurable via `SignalsOptions:Scoring:ReachabilityBuckets`):
|
||||
|
||||
| Bucket | Default weight |
|
||||
|--------|----------------|
|
||||
| `entrypoint` | `1.0` |
|
||||
| `direct` | `0.85` |
|
||||
| `runtime` | `0.45` |
|
||||
| `unknown` | `0.5` |
|
||||
| `unreachable` | `0.0` |
|
||||
|
||||
---
|
||||
|
||||
## 3. Confidence (reachable vs unreachable)
|
||||
|
||||
Default confidence values (configurable via `SignalsOptions:Scoring:*`):
|
||||
|
||||
| Input | Default |
|
||||
|-------|---------|
|
||||
| `reachableConfidence` | `0.75` |
|
||||
| `unreachableConfidence` | `0.25` |
|
||||
| `runtimeBonus` | `0.15` |
|
||||
| `minConfidence` | `0.05` |
|
||||
| `maxConfidence` | `0.99` |
|
||||
|
||||
Rules:
|
||||
|
||||
- Base confidence is `reachableConfidence` when `reachable=true`, otherwise `unreachableConfidence`.
|
||||
- When `reachable=true` and runtime evidence overlaps the selected path, add `runtimeBonus` (bounded by `maxConfidence`).
|
||||
- The final confidence is clamped to `[minConfidence, maxConfidence]`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Unknowns pressure (missing/ambiguous evidence)
|
||||
|
||||
Signals tracks unresolved symbols/edges as **Unknowns** (see `docs/modules/signals/guides/unknowns-registry.md`). The number of unknowns for a subject influences the final score:
|
||||
|
||||
```
|
||||
unknownsPressure = unknownsCount / (targetsCount + unknownsCount)
|
||||
pressurePenalty = min(unknownsPenaltyCeiling, unknownsPressure)
|
||||
fact.score = avg(states[i].score) * (1 - pressurePenalty)
|
||||
```
|
||||
|
||||
Default `unknownsPenaltyCeiling` is `0.35` (configurable).
|
||||
|
||||
This keeps the system deterministic while preventing unknown-heavy subjects from appearing "safe" by omission.
|
||||
|
||||
---
|
||||
|
||||
## 5. Evidence references & determinism anchors
|
||||
|
||||
Signals produces stable references intended for downstream evidence chains:
|
||||
|
||||
- `metadata.fact.digest` — canonical digest of the reachability fact (`sha256:<hex>`).
|
||||
- `metadata.fact.version` — monotonically increasing integer for the same `subjectKey`.
|
||||
- Callgraph ingestion returns a deterministic `graphHash` (sha256) for the normalized callgraph.
|
||||
|
||||
Downstream services (Policy, UI/CLI explainers, replay tooling) should use these fields as stable evidence references.
|
||||
|
||||
---
|
||||
|
||||
## 6. Policy-facing guidance (avoid false "not affected")
|
||||
|
||||
Policy should treat `unreachable` (or low fact score) as **insufficient** to claim "not affected" unless:
|
||||
|
||||
- the reachability evidence is present and referenced (`metadata.fact.digest`), and
|
||||
- confidence is above a high-confidence threshold.
|
||||
|
||||
When evidence is missing or confidence is low, the correct output is **under investigation** rather than "not affected".
|
||||
|
||||
---
|
||||
|
||||
## 7. Signals API pointers
|
||||
|
||||
- `docs/modules/signals/api/reachability-contract.md`
|
||||
- `docs/modules/signals/api/samples/facts-sample.json`
|
||||
|
||||
---
|
||||
|
||||
## 8. Roadmap (tracked in Sprint 0401)
|
||||
|
||||
- Introduce first-class uncertainty state lists + entropy-derived `riskScore` (see `uncertainty-entropy.md`).
|
||||
- Extend evidence refs to include CAS/DSSE pointers for graph-level and edge-bundle attestations.
|
||||
|
||||
---
|
||||
|
||||
## 9. Formal Lattice Model v1 (design — Sprint 0401)
|
||||
|
||||
The v0 bucket model provides coarse classification. The v1 lattice model introduces a formal 7-state lattice with algebraic join/meet operations for monotonic, deterministic reachability analysis across evidence types.
|
||||
|
||||
### 9.1 State Definitions
|
||||
|
||||
| State | Code | Ordering | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `Unknown` | `U` | ⊥ (bottom) | No evidence available; default state |
|
||||
| `StaticallyReachable` | `SR` | 1 | Static analysis suggests path exists |
|
||||
| `StaticallyUnreachable` | `SU` | 1 | Static analysis finds no path |
|
||||
| `RuntimeObserved` | `RO` | 2 | Runtime probe/hit confirms execution |
|
||||
| `RuntimeUnobserved` | `RU` | 2 | Runtime probe active but no hit observed |
|
||||
| `ConfirmedReachable` | `CR` | 3 | Both static + runtime agree reachable |
|
||||
| `ConfirmedUnreachable` | `CU` | 3 | Both static + runtime agree unreachable |
|
||||
| `Contested` | `X` | ⊤ (top) | Static and runtime evidence conflict |
|
||||
|
||||
### 9.2 Lattice Ordering (Hasse Diagram)
|
||||
|
||||
```
|
||||
Contested (X)
|
||||
/ | \
|
||||
/ | \
|
||||
ConfirmedReachable | ConfirmedUnreachable
|
||||
(CR) | (CU)
|
||||
| \ / / |
|
||||
| \ / / |
|
||||
| \ / / |
|
||||
RuntimeObserved RuntimeUnobserved
|
||||
(RO) (RU)
|
||||
| |
|
||||
| |
|
||||
StaticallyReachable StaticallyUnreachable
|
||||
(SR) (SU)
|
||||
\ /
|
||||
\ /
|
||||
Unknown (U)
|
||||
```
|
||||
|
||||
### 9.3 Join Rules (⊔ — least upper bound)
|
||||
|
||||
When combining evidence from multiple sources, use the join operation:
|
||||
|
||||
```
|
||||
U ⊔ S = S (any evidence beats unknown)
|
||||
SR ⊔ RO = CR (static reachable + runtime hit = confirmed)
|
||||
SU ⊔ RU = CU (static unreachable + runtime miss = confirmed)
|
||||
SR ⊔ RU = X (static reachable but runtime miss = contested)
|
||||
SU ⊔ RO = X (static unreachable but runtime hit = contested)
|
||||
CR ⊔ CU = X (conflicting confirmations = contested)
|
||||
X ⊔ * = X (contested absorbs all)
|
||||
```
|
||||
|
||||
**Full join table:**
|
||||
|
||||
| ⊔ | U | SR | SU | RO | RU | CR | CU | X |
|
||||
|---|---|----|----|----|----|----|----|---|
|
||||
| **U** | U | SR | SU | RO | RU | CR | CU | X |
|
||||
| **SR** | SR | SR | X | CR | X | CR | X | X |
|
||||
| **SU** | SU | X | SU | X | CU | X | CU | X |
|
||||
| **RO** | RO | CR | X | RO | X | CR | X | X |
|
||||
| **RU** | RU | X | CU | X | RU | X | CU | X |
|
||||
| **CR** | CR | CR | X | CR | X | CR | X | X |
|
||||
| **CU** | CU | X | CU | X | CU | X | CU | X |
|
||||
| **X** | X | X | X | X | X | X | X | X |
|
||||
|
||||
### 9.4 Meet Rules (⊓ — greatest lower bound)
|
||||
|
||||
Used for conservative intersection (e.g., multi-entry-point consensus):
|
||||
|
||||
```
|
||||
U ⊓ * = U (unknown is bottom)
|
||||
CR ⊓ CR = CR (agreement preserved)
|
||||
X ⊓ S = S (drop contested to either side)
|
||||
```
|
||||
|
||||
### 9.5 Monotonicity Properties
|
||||
|
||||
1. **Evidence accumulation is monotonic:** Once state rises in the lattice, it cannot descend without explicit revocation.
|
||||
2. **Revocation resets to Unknown:** When evidence is invalidated (e.g., graph invalidation), state resets to `U`.
|
||||
3. **Contested states require human triage:** `X` state triggers policy flags and UI attention.
|
||||
|
||||
### 9.6 Mapping v0 Buckets to v1 States
|
||||
|
||||
| v0 Bucket | v1 State(s) | Notes |
|
||||
|-----------|-------------|-------|
|
||||
| `unreachable` | `SU`, `CU` | Depends on runtime evidence availability |
|
||||
| `entrypoint` | `CR` | Entry points are by definition reachable |
|
||||
| `runtime` | `RO`, `CR` | Depends on static analysis agreement |
|
||||
| `direct` | `SR`, `CR` | Direct paths with/without runtime confirmation |
|
||||
| `unknown` | `U` | No evidence available |
|
||||
|
||||
### 9.7 Policy Decision Matrix
|
||||
|
||||
| v1 State | VEX "not_affected" | VEX "affected" | VEX "under_investigation" |
|
||||
|----------|-------------------|----------------|---------------------------|
|
||||
| `U` | ❌ blocked | ⚠️ needs evidence | ✅ default |
|
||||
| `SR` | ❌ blocked | ✅ allowed | ✅ allowed |
|
||||
| `SU` | ⚠️ low confidence | ❌ contested | ✅ allowed |
|
||||
| `RO` | ❌ blocked | ✅ allowed | ✅ allowed |
|
||||
| `RU` | ⚠️ medium confidence | ❌ contested | ✅ allowed |
|
||||
| `CR` | ❌ blocked | ✅ required | ❌ invalid |
|
||||
| `CU` | ✅ allowed | ❌ blocked | ❌ invalid |
|
||||
| `X` | ❌ blocked | ❌ blocked | ✅ required |
|
||||
|
||||
### 9.8 Implementation Notes
|
||||
|
||||
- **State storage:** `ReachabilityFactDocument.states[].latticeState` field (enum)
|
||||
- **Join implementation:** `ReachabilityLattice.Join(a, b)` in `src/Signals/StellaOps.Signals/Services/`
|
||||
- **Backward compatibility:** v0 bucket computed from v1 state for API consumers
|
||||
|
||||
### 9.9 Evidence Chain Requirements
|
||||
|
||||
Each lattice state transition must be accompanied by evidence references:
|
||||
|
||||
```json
|
||||
{
|
||||
"symbol": "sym:java:...",
|
||||
"latticeState": "CR",
|
||||
"previousState": "SR",
|
||||
"evidence": {
|
||||
"static": {
|
||||
"graphHash": "blake3:...",
|
||||
"pathLength": 3,
|
||||
"confidence": 0.92
|
||||
},
|
||||
"runtime": {
|
||||
"probeId": "probe:...",
|
||||
"hitCount": 47,
|
||||
"observedAt": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
},
|
||||
"transitionAt": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
78
docs/modules/reach-graph/guides/lead.md
Normal file
78
docs/modules/reach-graph/guides/lead.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# Deterministic Reachability — Product Moat (Nov 2025)
|
||||
|
||||
Source: internal advisory “23-Nov-2025 - Where Stella Ops Can Truly Lead”. Supersedes/extends archived binary reachability advisories (18-Nov-2025 - Binary-Reachability-Engine, Encoding Binary Reachability with PURL-Resolved Edges, CSharp-Binary-Analyzer). This page is the canonical, high-level articulation of our reachability moat for architects, PMM, and field teams. Detailed schemas live in `docs/modules/reach-graph/guides/evidence-schema.md` and `docs/modules/reach-graph/guides/hybrid-attestation.md`.
|
||||
|
||||
## Why it matters
|
||||
- Most scanners list every CVE; reachability asks whether vulnerable code is actually callable.
|
||||
- Competitors infer paths and rarely sign evidence; we **prove** paths with deterministic graphs and attestations.
|
||||
- Outcome targets: ≥40% fewer noisy vulns shown; ≥25% faster triage via explainable “why” paths.
|
||||
|
||||
## Moat elements
|
||||
1) **Deterministic call-graphs per artifact**
|
||||
- Stable node IDs: `purl@version!build-id!symbol-signature` (or code offset when stripped).
|
||||
- Stable edge IDs: `SHA256(nodeA||nodeB||tool-version||inputs-hash)`.
|
||||
- Graph hash: BLAKE3 over canonical JSON; locked by manifest.
|
||||
2) **Signed evidence**
|
||||
- Graph-level DSSE for every scan (mandatory).
|
||||
- Optional edge-bundle DSSE (≤512 edges) for runtime/init/contested edges; Rekor publish capped. See `docs/modules/reach-graph/guides/hybrid-attestation.md`.
|
||||
3) **Explainability**
|
||||
- Each finding carries call-chain + per-edge reason + VEX gate decision + layer attribution.
|
||||
4) **Container layer provenance**
|
||||
- Track file-to-layer mapping; show “introduced in layer X from base Y”.
|
||||
5) **Replayability**
|
||||
- Determinism manifest locks feeds, toolchain hashes, analyzer flags; replay yields identical graph and attestations.
|
||||
|
||||
## Minimal architecture slice
|
||||
- **Sbomer/Scanner**: emit SBOM + symbol maps + per-layer file index; capture Build-IDs.
|
||||
- **Cartographer**: build deterministic call-graphs (language + native), output `EdgeList.jsonl` with stable IDs.
|
||||
- **Attestor**: wrap graph (and edge bundles when emitted) into DSSE; log digests to Rekor/mirror.
|
||||
- **Vexer/Policy**: evaluate lattice, produce OpenVEX with linked edge proofs.
|
||||
- **Ledger**: retain manifests and DSSE; mirror to Rekor where allowed.
|
||||
|
||||
## Practical spec (condensed)
|
||||
- **Node fields**: `symbol_id`, `code_id`, `purl`, `build_id`, `symbol_digest`, `lang`, `evidence[]`.
|
||||
- **Edge fields**: `from`, `to`, `kind` (direct|plt|runtime|init), `purl`, `symbol_digest`, `reason`, `confidence`, `evidence[]`.
|
||||
- **Roots**: exports, entrypoints, **.init_array/.ctors/TLS callbacks**, plugin hooks.
|
||||
- **Attestation layout**:
|
||||
- Graph: `cas://reachability/graphs/{blake3}` + `{blake3}.dsse` (Rekor always).
|
||||
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` (Rekor optional, capped).
|
||||
|
||||
### Example: Edge-bundle DSSE payload (abridged)
|
||||
```json
|
||||
{
|
||||
"graph_hash": "blake3:...",
|
||||
"bundle_reason": "runtime-hit",
|
||||
"edges": [{
|
||||
"edge_id": "sha256:...",
|
||||
"from": "sym:...caller",
|
||||
"to": "sym:...callee",
|
||||
"reason": "plt",
|
||||
"purl": "pkg:deb/openssl@3.0.2?arch=amd64",
|
||||
"symbol_digest": "sha256:...",
|
||||
"revoked": false
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Field cheat sheet (for sprint readers)
|
||||
- `graph_hash` — BLAKE3 of canonical graph JSON.
|
||||
- `bundle_reason` — `runtime-hit | init-root | contested | third-party`.
|
||||
- `edge_id` — sha256(from||to||reason||tool-version||inputs-hash).
|
||||
- `revoked` — when true, policy/Signals must drop this edge before reachability scoring.
|
||||
- `purl` + `symbol_digest` — bind edge to SBOM component and callee identity.
|
||||
|
||||
## Quick wins (ship order)
|
||||
1) Capture Build-IDs in Scanner and thread into `symbol_id`/`code_id`.
|
||||
2) Emit Graph Determinism Manifest (feeds + toolchain hashes) per scan.
|
||||
3) Turn on edge-bundle DSSE for runtime/init edges first; keep Rekor cap low.
|
||||
4) Surface “why path” + layer attribution in CLI/UI explainers.
|
||||
|
||||
## APIs (strawman)
|
||||
- `POST /graph/edges: attest` — idempotent; same inputs → same edge IDs.
|
||||
- `GET /findings/:id/proof` — returns call-chain + Rekor inclusion proofs.
|
||||
- `GET /vex/:artifact` — streams OpenVEX with embedded proofs.
|
||||
|
||||
## Links
|
||||
- Advisory source: `docs/product-advisories/23-Nov-2025 - Where Stella Ops Can Truly Lead.md`
|
||||
- Schemas: `docs/modules/reach-graph/guides/evidence-schema.md`, `docs/modules/reach-graph/guides/hybrid-attestation.md`
|
||||
- Sprint tracking: `docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md`
|
||||
220
docs/modules/reach-graph/guides/patch-oracles.md
Normal file
220
docs/modules/reach-graph/guides/patch-oracles.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# Patch-Oracles QA Pattern
|
||||
|
||||
Patch oracles define expected functions and edges that must be present (or absent) in generated reachability graphs. The CI pipeline uses these oracles to ensure that:
|
||||
|
||||
1. Critical vulnerability paths are correctly identified as reachable
|
||||
2. Mitigated paths are correctly identified as unreachable
|
||||
3. Graph generation remains deterministic and complete
|
||||
|
||||
This document covers both the **JSON-based harness** (for reachbench integration) and the **YAML-based format** (for binary patch testing).
|
||||
|
||||
---
|
||||
|
||||
## Part A: JSON Patch-Oracle Harness (v1)
|
||||
|
||||
The JSON-based patch-oracle harness integrates with the reachbench fixture system for CI graph validation.
|
||||
|
||||
### A.1 Schema Overview
|
||||
|
||||
Patch-oracle fixtures follow the `patch-oracle/v1` schema:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "patch-oracle/v1",
|
||||
"id": "curl-CVE-2023-38545-socks5-heap-reachable",
|
||||
"case_ref": "curl-CVE-2023-38545-socks5-heap",
|
||||
"variant": "reachable",
|
||||
"description": "Validates SOCKS5 heap overflow path is reachable",
|
||||
"expected_functions": [...],
|
||||
"expected_edges": [...],
|
||||
"expected_roots": [...],
|
||||
"forbidden_functions": [...],
|
||||
"forbidden_edges": [...],
|
||||
"min_confidence": 0.5,
|
||||
"strict_mode": false
|
||||
}
|
||||
```
|
||||
|
||||
### A.2 Expected Functions
|
||||
|
||||
Define functions that MUST be present in the graph:
|
||||
|
||||
```json
|
||||
{
|
||||
"symbol_id": "sym://curl:curl.c#sink",
|
||||
"lang": "c",
|
||||
"kind": "function",
|
||||
"purl_pattern": "pkg:github/curl/*",
|
||||
"required": true,
|
||||
"reason": "Vulnerable buffer handling function"
|
||||
}
|
||||
```
|
||||
|
||||
### A.3 Expected Edges
|
||||
|
||||
Define edges that MUST be present in the graph:
|
||||
|
||||
```json
|
||||
{
|
||||
"from": "sym://net:handler#read",
|
||||
"to": "sym://curl:curl.c#entry",
|
||||
"kind": "call",
|
||||
"min_confidence": 0.8,
|
||||
"required": true,
|
||||
"reason": "Data flows from network to SOCKS5 handler"
|
||||
}
|
||||
```
|
||||
|
||||
### A.4 Forbidden Elements (for unreachable variants)
|
||||
|
||||
```json
|
||||
{
|
||||
"forbidden_functions": [
|
||||
{
|
||||
"symbol_id": "sym://dangerous#sink",
|
||||
"reason": "Should not be reachable when feature disabled"
|
||||
}
|
||||
],
|
||||
"forbidden_edges": [
|
||||
{
|
||||
"from": "sym://entry",
|
||||
"to": "sym://sink",
|
||||
"reason": "Path should be blocked by feature flag"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### A.5 Wildcard Patterns
|
||||
|
||||
Symbol IDs support `*` wildcards:
|
||||
- `sym://test#func1` - exact match
|
||||
- `sym://test#*` - matches any symbol starting with `sym://test#`
|
||||
- `*` - matches anything
|
||||
|
||||
### A.6 Directory Structure
|
||||
|
||||
```
|
||||
tests/reachability/fixtures/patch-oracles/
|
||||
├── INDEX.json # Oracle index
|
||||
├── schema/
|
||||
│ └── patch-oracle-v1.json # JSON Schema
|
||||
└── cases/
|
||||
├── curl-CVE-2023-38545-socks5-heap/
|
||||
│ ├── reachable.oracle.json
|
||||
│ └── unreachable.oracle.json
|
||||
└── java-log4j-CVE-2021-44228-log4shell/
|
||||
└── reachable.oracle.json
|
||||
```
|
||||
|
||||
### A.7 Usage in Tests
|
||||
|
||||
```csharp
|
||||
var loader = new PatchOracleLoader(fixtureRoot);
|
||||
var oracle = loader.LoadOracle("curl-CVE-2023-38545-socks5-heap-reachable");
|
||||
|
||||
var comparer = new PatchOracleComparer(oracle);
|
||||
var result = comparer.Compare(richGraph);
|
||||
|
||||
if (!result.Success)
|
||||
{
|
||||
foreach (var violation in result.Violations)
|
||||
{
|
||||
Console.WriteLine($"[{violation.Type}] {violation.From} -> {violation.To}");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### A.8 Violation Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `MissingFunction` | Required function not found |
|
||||
| `MissingEdge` | Required edge not found |
|
||||
| `MissingRoot` | Required root not found |
|
||||
| `ForbiddenFunctionPresent` | Forbidden function found |
|
||||
| `ForbiddenEdgePresent` | Forbidden edge found |
|
||||
| `UnexpectedFunction` | Unexpected function in strict mode |
|
||||
| `UnexpectedEdge` | Unexpected edge in strict mode |
|
||||
|
||||
---
|
||||
|
||||
## Part B: YAML Binary Patch-Oracles
|
||||
|
||||
The YAML-based format is used for paired vulnerable/fixed binary testing.
|
||||
|
||||
### B.1 Workflow (per CVE)
|
||||
|
||||
1) Pick a CVE with a small, clean fix (e.g., OpenSSL, zlib, BusyBox). Identify vulnerable commit `A` and fixed commit `B`.
|
||||
2) Build two stripped binaries (`vuln`, `fixed`) with identical toolchains/flags; keep a tiny harness that exercises the affected path.
|
||||
3) Run Scanner binary analyzers to emit `richgraph-v1` for each binary.
|
||||
4) Diff graphs: expect new/removed functions and edges to match the patch (e.g., `foo_parse -> validate_len` added; `foo_parse -> memcpy` removed).
|
||||
5) Fail the test if expected functions/edges are absent or unchanged.
|
||||
|
||||
### B.2 Oracle manifest (YAML)
|
||||
|
||||
```yaml
|
||||
cve: CVE-YYYY-XXXX
|
||||
target: libfoo 1.2.3
|
||||
build:
|
||||
cc: clang
|
||||
cflags: [-O2, -fno-omit-frame-pointer]
|
||||
ldflags: []
|
||||
strip: true
|
||||
expect:
|
||||
functions_added: [validate_len]
|
||||
functions_removed: [unsafe_copy]
|
||||
edges_added:
|
||||
- { caller: foo_parse, callee: validate_len }
|
||||
edges_removed:
|
||||
- { caller: foo_parse, callee: memcpy }
|
||||
tolerances:
|
||||
allow_unresolved_symbols: 0
|
||||
allow_extra_funcs: 2
|
||||
```
|
||||
|
||||
Place manifests under `tests/reachability/patch-oracles/<cve>/oracle.yml` next to the sources/build scripts.
|
||||
|
||||
## 3. Repository layout
|
||||
|
||||
```
|
||||
tests/reachability/patch-oracles/
|
||||
CVE-YYYY-XXXX-foo/
|
||||
src/ # vuln + fixed sources + harness
|
||||
build.sh # produces ./out/vuln ./out/fixed
|
||||
oracle.yml
|
||||
```
|
||||
|
||||
## 4. Harness rules
|
||||
|
||||
- Output binaries to `out/vuln` and `out/fixed` with deterministic flags and stripped symbols.
|
||||
- Record toolchain version in a sidecar `build-meta.json` so Replay captures provenance.
|
||||
- Never download from the internet during CI; vendor tiny sources into the fixture folder.
|
||||
|
||||
## 5. Test runner expectations
|
||||
|
||||
- Runs Scanner binary analyzers on both binaries; emits `richgraph-v1` CAS entries.
|
||||
- Compares graphs against `oracle.yml` expectations (functions/edges added/removed, tolerances).
|
||||
- Fails when deltas are missing; succeeds when expected guards/edges are present.
|
||||
|
||||
## 6. Integration points
|
||||
|
||||
- **Scanner**: add fixture runner under `tests/reachability/StellaOps.Scanner.Binary.PatchOracleTests`.
|
||||
- **CI**: wire into reachbench/patch-oracles job; ensure artifacts are small and deterministic.
|
||||
- **Docs**: link this file from reachability delivery guide once tests are live.
|
||||
|
||||
### B.7 Acceptance criteria
|
||||
|
||||
- At least three seed oracles (e.g., zlib overflow, OpenSSL length guard, BusyBox ash fix) committed with passing expectations.
|
||||
- CI job proves deterministic hashes across reruns.
|
||||
- Failures emit clear diffs (`expected edge foo->validate_len missing`).
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Reachability Evidence Chain](./function-level-evidence.md)
|
||||
- [RichGraph Schema](../contracts/richgraph-v1.md)
|
||||
- [Ground Truth Schema](./ground-truth-schema.md)
|
||||
- [Lattice States](./lattice.md)
|
||||
- [Reachability Delivery Guide](./DELIVERY_GUIDE.md)
|
||||
269
docs/modules/reach-graph/guides/policy-gate.md
Normal file
269
docs/modules/reach-graph/guides/policy-gate.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# Reachability Evidence Policy Gates
|
||||
|
||||
> **Status:** Design v1 (Sprint 0401)
|
||||
> **Owners:** Policy Guild, Signals Guild, VEX Guild
|
||||
|
||||
This document defines the policy gates that enforce reachability evidence requirements for VEX decisions. Gates prevent unsafe "not_affected" claims when evidence is insufficient.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Policy gates act as checkpoints between evidence (reachability lattice state, uncertainty tier) and VEX status transitions. They ensure that:
|
||||
|
||||
1. **No false safety:** "not_affected" requires strong evidence of unreachability
|
||||
2. **Explicit uncertainty:** Missing evidence triggers "under_investigation" rather than silence
|
||||
3. **Audit trail:** All gate decisions are logged with evidence references
|
||||
|
||||
---
|
||||
|
||||
## 2. Gate Types
|
||||
|
||||
### 2.1 Lattice State Gate
|
||||
|
||||
Guards VEX status transitions based on the v1 lattice state (see `docs/modules/reach-graph/guides/lattice.md` §9).
|
||||
|
||||
| Requested VEX Status | Required Lattice State | Gate Action |
|
||||
|---------------------|------------------------|-------------|
|
||||
| `not_affected` | `CU` (ConfirmedUnreachable) | ✅ Allow |
|
||||
| `not_affected` | `SU` (StaticallyUnreachable) | ⚠️ Allow with warning, requires `justification` |
|
||||
| `not_affected` | `RU` (RuntimeUnobserved) | ⚠️ Allow with warning, requires `justification` |
|
||||
| `not_affected` | `U`, `SR`, `RO`, `CR`, `X` | ❌ Block |
|
||||
| `affected` | `CR` (ConfirmedReachable) | ✅ Allow |
|
||||
| `affected` | `SR`, `RO` | ✅ Allow |
|
||||
| `affected` | `U`, `SU`, `RU`, `CU`, `X` | ⚠️ Warn (potential false positive) |
|
||||
| `under_investigation` | Any | ✅ Allow (safe default) |
|
||||
| `fixed` | Any | ✅ Allow (remediation action) |
|
||||
|
||||
### 2.2 Uncertainty Tier Gate
|
||||
|
||||
Guards VEX status transitions based on the uncertainty tier (see `uncertainty-entropy.md` §1.1).
|
||||
|
||||
| Requested VEX Status | Uncertainty Tier | Gate Action |
|
||||
|---------------------|------------------|-------------|
|
||||
| `not_affected` | T1 (High) | ❌ Block |
|
||||
| `not_affected` | T2 (Medium) | ⚠️ Warn, require explicit override |
|
||||
| `not_affected` | T3 (Low) | ⚠️ Allow with advisory note |
|
||||
| `not_affected` | T4 (Negligible) | ✅ Allow |
|
||||
| `affected` | T1 (High) | ⚠️ Review required (may be false positive) |
|
||||
| `affected` | T2-T4 | ✅ Allow |
|
||||
|
||||
### 2.3 Evidence Completeness Gate
|
||||
|
||||
Guards based on the presence of required evidence artifacts.
|
||||
|
||||
| VEX Status | Required Evidence | Gate Action if Missing |
|
||||
|------------|-------------------|----------------------|
|
||||
| `not_affected` | `graphHash` (DSSE-attested) | ❌ Block |
|
||||
| `not_affected` | `pathAnalysis.pathLength >= 0` | ❌ Block |
|
||||
| `not_affected` | `confidence >= 0.8` | ⚠️ Warn if < 0.8 |
|
||||
| `affected` | `graphHash` OR `runtimeProbe` | ⚠️ Warn if neither |
|
||||
| `under_investigation` | None required | ✅ Allow |
|
||||
|
||||
---
|
||||
|
||||
## 3. Gate Evaluation Order
|
||||
|
||||
Gates are evaluated in this order; first blocking gate stops evaluation:
|
||||
|
||||
```
|
||||
1. Evidence Completeness Gate → Block if required evidence missing
|
||||
2. Lattice State Gate → Block if state incompatible with status
|
||||
3. Uncertainty Tier Gate → Block/warn based on tier
|
||||
4. Confidence Threshold Gate → Warn if confidence below threshold
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Gate Decision Document
|
||||
|
||||
Each gate evaluation produces a decision document:
|
||||
|
||||
```json
|
||||
{
|
||||
"gateId": "gate:vex:not_affected:2025-12-13T10:00:00Z",
|
||||
"requestedStatus": "not_affected",
|
||||
"subject": {
|
||||
"vulnId": "CVE-2025-12345",
|
||||
"purl": "pkg:maven/com.example/foo@1.0.0",
|
||||
"symbolId": "sym:java:..."
|
||||
},
|
||||
"evidence": {
|
||||
"latticeState": "CU",
|
||||
"uncertaintyTier": "T3",
|
||||
"graphHash": "blake3:...",
|
||||
"riskScore": 0.25,
|
||||
"confidence": 0.92
|
||||
},
|
||||
"gates": [
|
||||
{
|
||||
"name": "EvidenceCompleteness",
|
||||
"result": "pass",
|
||||
"reason": "graphHash present"
|
||||
},
|
||||
{
|
||||
"name": "LatticeState",
|
||||
"result": "pass",
|
||||
"reason": "CU allows not_affected"
|
||||
},
|
||||
{
|
||||
"name": "UncertaintyTier",
|
||||
"result": "pass_with_note",
|
||||
"reason": "T3 allows with advisory note",
|
||||
"note": "MissingPurl uncertainty at 35% entropy"
|
||||
}
|
||||
],
|
||||
"decision": "allow",
|
||||
"advisory": "VEX status allowed with note: T3 uncertainty from MissingPurl",
|
||||
"decidedAt": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Contested State Handling
|
||||
|
||||
When lattice state is `X` (Contested):
|
||||
|
||||
1. **Block all definitive statuses:** Neither "not_affected" nor "affected" allowed
|
||||
2. **Force "under_investigation":** Auto-assign until triage resolves conflict
|
||||
3. **Emit triage event:** Notify VEX operators of conflict with evidence links
|
||||
4. **Evidence overlay:** Show both static and runtime evidence for manual review
|
||||
|
||||
### Contested Resolution Workflow
|
||||
|
||||
```
|
||||
1. System detects X state
|
||||
2. VEX status locked to "under_investigation"
|
||||
3. Triage event emitted to operator queue
|
||||
4. Operator reviews:
|
||||
a. Static evidence (graph, paths)
|
||||
b. Runtime evidence (probes, hits)
|
||||
5. Operator provides resolution:
|
||||
a. Trust static → state becomes SU/SR
|
||||
b. Trust runtime → state becomes RU/RO
|
||||
c. Add new evidence → recompute lattice
|
||||
6. Gate re-evaluates with new state
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Override Mechanism
|
||||
|
||||
Operators with `vex:gate:override` permission can bypass gates with mandatory fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"override": {
|
||||
"gateId": "gate:vex:not_affected:...",
|
||||
"operator": "user:alice@example.com",
|
||||
"justification": "Manual review confirms code path is dead code",
|
||||
"evidence": {
|
||||
"type": "ManualReview",
|
||||
"reviewId": "review:2025-12-13:001",
|
||||
"attachments": ["cas://evidence/review/..."]
|
||||
},
|
||||
"approvedAt": "2025-12-13T11:00:00Z",
|
||||
"expiresAt": "2026-01-13T11:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Override requirements:
|
||||
- `justification` is mandatory and logged
|
||||
- Overrides expire after configurable period (default: 30 days)
|
||||
- All overrides are auditable and appear in compliance reports
|
||||
|
||||
---
|
||||
|
||||
## 7. Configuration
|
||||
|
||||
Gate thresholds are configurable via `PolicyGatewayOptions`:
|
||||
|
||||
```yaml
|
||||
PolicyGateway:
|
||||
Gates:
|
||||
LatticeState:
|
||||
AllowSUForNotAffected: true # Allow SU with warning
|
||||
AllowRUForNotAffected: true # Allow RU with warning
|
||||
RequireJustificationForWeakStates: true
|
||||
UncertaintyTier:
|
||||
BlockT1ForNotAffected: true
|
||||
WarnT2ForNotAffected: true
|
||||
EvidenceCompleteness:
|
||||
RequireGraphHashForNotAffected: true
|
||||
MinConfidenceForNotAffected: 0.8
|
||||
MinConfidenceWarning: 0.6
|
||||
Override:
|
||||
DefaultExpirationDays: 30
|
||||
RequireJustification: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. API Integration
|
||||
|
||||
### POST `/api/v1/vex/status`
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"vulnId": "CVE-2025-12345",
|
||||
"purl": "pkg:maven/com.example/foo@1.0.0",
|
||||
"status": "not_affected",
|
||||
"justification": "vulnerable_code_not_present",
|
||||
"reachabilityEvidence": {
|
||||
"factDigest": "sha256:...",
|
||||
"graphHash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response (gate blocked):
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"gateDecision": {
|
||||
"decision": "block",
|
||||
"blockedBy": "LatticeState",
|
||||
"reason": "Lattice state SR (StaticallyReachable) incompatible with not_affected",
|
||||
"currentState": "SR",
|
||||
"requiredStates": ["CU", "SU", "RU"],
|
||||
"suggestion": "Submit runtime probe evidence or change to under_investigation"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Metrics & Alerts
|
||||
|
||||
The policy gateway emits metrics:
|
||||
|
||||
| Metric | Labels | Description |
|
||||
|--------|--------|-------------|
|
||||
| `stellaops_gate_decisions_total` | `gate`, `result`, `status` | Total gate decisions |
|
||||
| `stellaops_gate_blocks_total` | `gate`, `reason` | Total blocked requests |
|
||||
| `stellaops_gate_overrides_total` | `operator` | Total override uses |
|
||||
| `stellaops_contested_states_total` | `vulnId` | Active contested states |
|
||||
|
||||
Alert conditions:
|
||||
- `stellaops_gate_overrides_total` rate > threshold → Audit review
|
||||
- `stellaops_contested_states_total` > 10 → Triage backlog alert
|
||||
|
||||
---
|
||||
|
||||
## 10. Related Documents
|
||||
|
||||
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
|
||||
- [Uncertainty States](uncertainty-entropy.md) — Tier definitions and risk scoring
|
||||
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
|
||||
- [VEX Contract](../contracts/vex-v1.md) — VEX document schema
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-13 | Policy Guild | Initial design from Sprint 0401 |
|
||||
51
docs/modules/reach-graph/guides/purl-resolved-edges.md
Normal file
51
docs/modules/reach-graph/guides/purl-resolved-edges.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# PURL-Resolved Callgraph Edges (Nov 2026)
|
||||
|
||||
This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
|
||||
|
||||
## 1. Goal
|
||||
|
||||
Annotate every call edge in `richgraph-v1` with:
|
||||
|
||||
- `purl` of the component that defines the callee, and
|
||||
- a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint).
|
||||
|
||||
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
|
||||
|
||||
## 2. Data model additions
|
||||
|
||||
- **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
|
||||
- **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly.
|
||||
- **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata.
|
||||
|
||||
## 3. Producer rules
|
||||
|
||||
1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence.
|
||||
2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
|
||||
3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry).
|
||||
4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON.
|
||||
|
||||
## 4. Consumer rules
|
||||
|
||||
- **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays.
|
||||
- **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl.
|
||||
- **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present.
|
||||
|
||||
## 5. SBOM join strategy
|
||||
|
||||
1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`.
|
||||
2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
|
||||
3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge.
|
||||
|
||||
## 6. Acceptance tests
|
||||
|
||||
- Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly.
|
||||
- Disassembly: direct `call` to internal function carries `purl` of the hosting binary’s SBOM entry.
|
||||
- Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`.
|
||||
- Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
|
||||
|
||||
## 7. Deliverables
|
||||
|
||||
- Update `richgraph-v1` schema and DTOs (Scanner + Signals).
|
||||
- Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests.
|
||||
- CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest.
|
||||
- Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.
|
||||
48
docs/modules/reach-graph/guides/reachability.md
Normal file
48
docs/modules/reach-graph/guides/reachability.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Reachability · Runtime + Static Union (v0.1)
|
||||
|
||||
## What this covers
|
||||
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
|
||||
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
|
||||
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.
|
||||
|
||||
## Pipeline (at a glance)
|
||||
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
|
||||
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
|
||||
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
|
||||
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
|
||||
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.
|
||||
|
||||
## Storage & CAS namespaces
|
||||
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
|
||||
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
|
||||
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.
|
||||
|
||||
## Signals API quick reference
|
||||
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
|
||||
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
|
||||
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
|
||||
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
|
||||
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
|
||||
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).
|
||||
|
||||
## Scoring and unknowns
|
||||
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
|
||||
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.05–0.99).
|
||||
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
|
||||
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.
|
||||
|
||||
## Replay contract changes (v0.1 add-ons)
|
||||
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
|
||||
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
|
||||
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.
|
||||
|
||||
## Operator checklist
|
||||
- Use deterministic CAS paths; never embed absolute file paths.
|
||||
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
|
||||
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
|
||||
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.
|
||||
|
||||
## References
|
||||
- Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
|
||||
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
|
||||
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.
|
||||
332
docs/modules/reach-graph/guides/replay-verification.md
Normal file
332
docs/modules/reach-graph/guides/replay-verification.md
Normal file
@@ -0,0 +1,332 @@
|
||||
# Replay Verification
|
||||
|
||||
_Last updated: 2025-12-22. Owner: Scanner Guild._
|
||||
|
||||
This document describes the **replay verification** workflow that ensures reachability slices are reproducible and tamper-evident.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Replay verification answers: *"Given the same inputs, do we get the exact same slice?"*
|
||||
|
||||
This is critical for:
|
||||
- **Audit trails**: Prove analysis results are genuine
|
||||
- **Tamper detection**: Detect modified inputs or results
|
||||
- **Debugging**: Identify sources of non-determinism
|
||||
- **Compliance**: Demonstrate reproducible security analysis
|
||||
|
||||
---
|
||||
|
||||
## 2. Replay Workflow
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
|
||||
│ Original │ │ Rehydrate │ │ Recompute │
|
||||
│ Slice │────►│ Inputs │────►│ Slice │
|
||||
│ (with digest) │ │ from CAS │ │ (fresh) │
|
||||
└─────────────────┘ └──────────────────┘ └───────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ Compare │
|
||||
│ byte-for-byte │
|
||||
└───────────────────┘
|
||||
│
|
||||
┌─────────────┴─────────────┐
|
||||
▼ ▼
|
||||
┌──────────┐ ┌──────────┐
|
||||
│ MATCH │ │ MISMATCH │
|
||||
│ ✓ │ │ + diff │
|
||||
└──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Replay Endpoint
|
||||
|
||||
```http
|
||||
POST /api/slices/replay
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"sliceDigest": "blake3:a1b2c3d4..."
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Response Format
|
||||
|
||||
**Match Response (200 OK)**:
|
||||
```json
|
||||
{
|
||||
"match": true,
|
||||
"originalDigest": "blake3:a1b2c3d4...",
|
||||
"recomputedDigest": "blake3:a1b2c3d4...",
|
||||
"replayedAt": "2025-12-22T10:00:00Z",
|
||||
"inputsVerified": true
|
||||
}
|
||||
```
|
||||
|
||||
**Mismatch Response (200 OK)**:
|
||||
```json
|
||||
{
|
||||
"match": false,
|
||||
"originalDigest": "blake3:a1b2c3d4...",
|
||||
"recomputedDigest": "blake3:e5f6g7h8...",
|
||||
"replayedAt": "2025-12-22T10:00:00Z",
|
||||
"diff": {
|
||||
"missingNodes": ["node:5"],
|
||||
"extraNodes": ["node:6"],
|
||||
"missingEdges": [{"from": "node:1", "to": "node:5"}],
|
||||
"extraEdges": [{"from": "node:1", "to": "node:6"}],
|
||||
"verdictDiff": {
|
||||
"original": "unreachable",
|
||||
"recomputed": "reachable"
|
||||
},
|
||||
"confidenceDiff": {
|
||||
"original": 0.95,
|
||||
"recomputed": 0.72
|
||||
}
|
||||
},
|
||||
"possibleCauses": [
|
||||
"Input graph may have been modified",
|
||||
"Analyzer version mismatch: 1.2.0 vs 1.2.1",
|
||||
"Feed version changed: nvd-2025-12-20 vs nvd-2025-12-22"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Error Response (404 Not Found)**:
|
||||
```json
|
||||
{
|
||||
"error": "slice_not_found",
|
||||
"message": "Slice with digest blake3:a1b2c3d4... not found in CAS",
|
||||
"sliceDigest": "blake3:a1b2c3d4..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Input Rehydration
|
||||
|
||||
All inputs must be CAS-addressed for replay:
|
||||
|
||||
### 4.1 Required Inputs
|
||||
|
||||
| Input | CAS Key | Description |
|
||||
|-------|---------|-------------|
|
||||
| Graph | `cas://graphs/{digest}` | Full RichGraph JSON |
|
||||
| Binaries | `cas://binaries/{digest}` | Binary file hashes |
|
||||
| SBOM | `cas://sboms/{digest}` | CycloneDX/SPDX document |
|
||||
| Policy | `cas://policies/{digest}` | Policy DSL |
|
||||
| Feeds | `cas://feeds/{version}` | Advisory feed snapshot |
|
||||
|
||||
### 4.2 Manifest Contents
|
||||
|
||||
```json
|
||||
{
|
||||
"manifest": {
|
||||
"analyzerVersion": "scanner.native:1.2.0",
|
||||
"rulesetHash": "sha256:abc123...",
|
||||
"feedVersions": {
|
||||
"nvd": "2025-12-20",
|
||||
"osv": "2025-12-20",
|
||||
"ghsa": "2025-12-20"
|
||||
},
|
||||
"createdAt": "2025-12-22T10:00:00Z",
|
||||
"toolchain": "iced-x86:1.21.0",
|
||||
"environment": {
|
||||
"os": "linux",
|
||||
"arch": "x86_64"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Determinism Requirements
|
||||
|
||||
For byte-for-byte reproducibility:
|
||||
|
||||
### 5.1 JSON Canonicalization
|
||||
|
||||
```
|
||||
1. Keys sorted alphabetically at all levels
|
||||
2. No whitespace (compact JSON)
|
||||
3. UTF-8 encoding
|
||||
4. Lowercase hex for all hashes
|
||||
5. Numbers: no trailing zeros, scientific notation for large values
|
||||
```
|
||||
|
||||
### 5.2 Graph Ordering
|
||||
|
||||
```
|
||||
Nodes: sorted by symbolId (lexicographic)
|
||||
Edges: sorted by (from, to) tuple (lexicographic)
|
||||
Paths: sorted by first node, then path length
|
||||
```
|
||||
|
||||
### 5.3 Timestamp Handling
|
||||
|
||||
```
|
||||
All timestamps: UTC, ISO-8601, with 'Z' suffix
|
||||
Example: "2025-12-22T10:00:00Z"
|
||||
No milliseconds unless significant
|
||||
```
|
||||
|
||||
### 5.4 Floating Point
|
||||
|
||||
```
|
||||
Confidence values: round to 6 decimal places
|
||||
Example: 0.950000, not 0.95 or 0.9500001
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Diff Computation
|
||||
|
||||
When slices don't match:
|
||||
|
||||
### 6.1 Diff Algorithm
|
||||
|
||||
```python
|
||||
def compute_diff(original, recomputed):
|
||||
diff = SliceDiff()
|
||||
|
||||
# Node diff
|
||||
orig_nodes = set(n.id for n in original.subgraph.nodes)
|
||||
new_nodes = set(n.id for n in recomputed.subgraph.nodes)
|
||||
diff.missing_nodes = list(orig_nodes - new_nodes)
|
||||
diff.extra_nodes = list(new_nodes - orig_nodes)
|
||||
|
||||
# Edge diff
|
||||
orig_edges = set((e.from, e.to) for e in original.subgraph.edges)
|
||||
new_edges = set((e.from, e.to) for e in recomputed.subgraph.edges)
|
||||
diff.missing_edges = list(orig_edges - new_edges)
|
||||
diff.extra_edges = list(new_edges - orig_edges)
|
||||
|
||||
# Verdict diff
|
||||
if original.verdict.status != recomputed.verdict.status:
|
||||
diff.verdict_diff = {
|
||||
"original": original.verdict.status,
|
||||
"recomputed": recomputed.verdict.status
|
||||
}
|
||||
|
||||
return diff
|
||||
```
|
||||
|
||||
### 6.2 Cause Analysis
|
||||
|
||||
```python
|
||||
def analyze_causes(original, recomputed, manifest):
|
||||
causes = []
|
||||
|
||||
if manifest.analyzerVersion != current_version():
|
||||
causes.append(f"Analyzer version mismatch")
|
||||
|
||||
if manifest.feedVersions != current_feed_versions():
|
||||
causes.append(f"Feed version changed")
|
||||
|
||||
if original.inputs.graphDigest != fetch_graph_digest():
|
||||
causes.append(f"Input graph may have been modified")
|
||||
|
||||
return causes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. CLI Usage
|
||||
|
||||
### 7.1 Replay Command
|
||||
|
||||
```bash
|
||||
# Replay and verify a slice
|
||||
stella slice replay --digest blake3:a1b2c3d4...
|
||||
|
||||
# Output:
|
||||
# ✓ Slice verified: digest matches
|
||||
# Original: blake3:a1b2c3d4...
|
||||
# Recomputed: blake3:a1b2c3d4...
|
||||
```
|
||||
|
||||
### 7.2 Verbose Mode
|
||||
|
||||
```bash
|
||||
stella slice replay --digest blake3:a1b2c3d4... --verbose
|
||||
|
||||
# Output:
|
||||
# Fetching slice from CAS...
|
||||
# Rehydrating inputs:
|
||||
# - Graph: cas://graphs/blake3:xyz... ✓
|
||||
# - SBOM: cas://sboms/sha256:abc... ✓
|
||||
# - Policy: cas://policies/sha256:def... ✓
|
||||
# Recomputing slice...
|
||||
# Comparing results...
|
||||
# ✓ Match confirmed
|
||||
```
|
||||
|
||||
### 7.3 Mismatch Handling
|
||||
|
||||
```bash
|
||||
stella slice replay --digest blake3:a1b2c3d4...
|
||||
|
||||
# Output:
|
||||
# ✗ Slice mismatch detected!
|
||||
#
|
||||
# Differences:
|
||||
# Nodes: 1 missing, 0 extra
|
||||
# Edges: 1 missing, 1 extra
|
||||
# Verdict: unreachable → reachable
|
||||
#
|
||||
# Possible causes:
|
||||
# - Input graph may have been modified
|
||||
# - Analyzer version: 1.2.0 → 1.2.1
|
||||
#
|
||||
# Run with --diff-file to export detailed diff
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Error Handling
|
||||
|
||||
| Error | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| `slice_not_found` | Slice not in CAS | Check digest, verify upload |
|
||||
| `input_not_found` | Referenced input missing | Reupload inputs |
|
||||
| `version_mismatch` | Analyzer version differs | Pin version or accept drift |
|
||||
| `feed_stale` | Feed snapshot unavailable | Use latest or pin version |
|
||||
|
||||
---
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
1. **Input integrity**: Verify CAS digests before replay
|
||||
2. **Audit logging**: Log all replay attempts
|
||||
3. **Rate limiting**: Prevent replay DoS
|
||||
4. **Access control**: Same permissions as slice access
|
||||
|
||||
---
|
||||
|
||||
## 10. Performance Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Replay latency | <5s for typical slice |
|
||||
| Input fetch | <2s (parallel CAS fetches) |
|
||||
| Comparison | <100ms |
|
||||
|
||||
---
|
||||
|
||||
## 11. Related Documentation
|
||||
|
||||
- [Slice Schema](./slice-schema.md)
|
||||
- [Binary Reachability Schema](./binary-reachability-schema.md)
|
||||
- [Determinism Requirements](../contracts/determinism.md)
|
||||
- [CAS Architecture](../modules/platform/cas.md)
|
||||
|
||||
---
|
||||
|
||||
_Created: 2025-12-22. See Sprint 3820 for implementation details._
|
||||
38
docs/modules/reach-graph/guides/runtime-facts.md
Normal file
38
docs/modules/reach-graph/guides/runtime-facts.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Runtime Facts (Signals/Zastava) v0.1
|
||||
|
||||
## Payload shapes
|
||||
- **Structured** (`POST /signals/runtime-facts`):
|
||||
- `subject` (imageDigest | scanId | component+version)
|
||||
- `callgraphId` (required)
|
||||
- `events[]`: `{ symbolId, codeId?, purl?, buildId?, loaderBase?, processId?, processName?, socketAddress?, containerId?, evidenceUri?, hitCount, observedAt?, metadata{} }`
|
||||
- **Streaming NDJSON** (`POST /signals/runtime-facts/ndjson`): one JSON object per line with the same fields; supports `Content-Encoding: gzip`; callgraphId provided via query/header metadata.
|
||||
|
||||
## Provenance/metadata
|
||||
- Signals stamps:
|
||||
- `provenance.source` (defaults to `runtime` unless provided in metadata)
|
||||
- `provenance.ingestedAt` (ISO-8601 UTC)
|
||||
- `provenance.callgraphId`
|
||||
- Runtime hits are aggregated per `symbolId` (summing hitCount) before persisting and feeding scoring.
|
||||
|
||||
## Validation
|
||||
- `symbolId` required; events list must not be empty.
|
||||
- `callgraphId` required and must resolve to a stored callgraph/union bundle.
|
||||
- Subject must yield a non-empty `subjectKey`.
|
||||
- Empty runtime stream is rejected.
|
||||
|
||||
## Storage and cache
|
||||
- Stored alongside reachability facts in PostgreSQL table `reachability_facts`.
|
||||
- Runtime hits cached in Valkey via `reachability_cache:*` entries; invalidated on ingest.
|
||||
|
||||
## Interaction with scoring
|
||||
- Ingest triggers recompute: runtime hits added to prior facts’ hits, targets set to symbols observed, entryPoints taken from callgraph.
|
||||
- Reachability states include runtime evidence on the path; bucket/weight may be `runtime` when hits are present.
|
||||
- Unknowns registry stays separate; unknowns count still factors into fact score via pressure penalty.
|
||||
|
||||
## Replay alignment
|
||||
- Runtime traces packaged under CAS namespace `runtime_traces`; referenced in replay manifest with `namespace` and `analysisId` to link to static graphs.
|
||||
|
||||
## Determinism rules
|
||||
- Keep NDJSON ordering stable when generating bundles.
|
||||
- Use UTC timestamps; avoid environment-dependent metadata values.
|
||||
- No external network lookups during ingest.
|
||||
461
docs/modules/reach-graph/schemas/binary-reachability-schema.md
Normal file
461
docs/modules/reach-graph/schemas/binary-reachability-schema.md
Normal file
@@ -0,0 +1,461 @@
|
||||
# Binary Reachability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
|
||||
|
||||
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
|
||||
|
||||
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
|
||||
- **Build variants:** Handling multiple builds from same source
|
||||
- **Large graphs:** Chunking and size limits for DSSE/Rekor
|
||||
- **Offline verification:** Air-gapped attestation workflows
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### BR1: Canonical DSSE/Predicate Schemas
|
||||
|
||||
**Binary graph predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryGraph@v1
|
||||
```
|
||||
|
||||
**Predicate schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "graph",
|
||||
"digest": {"blake3": "a1b2c3d4e5f6..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"analyzer": {
|
||||
"name": "scanner.native",
|
||||
"version": "1.2.0",
|
||||
"toolchain": "ghidra-11.2"
|
||||
},
|
||||
"binary": {
|
||||
"format": "ELF",
|
||||
"arch": "x86_64",
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c..."
|
||||
},
|
||||
"graph_stats": {
|
||||
"node_count": 1247,
|
||||
"edge_count": 3891,
|
||||
"root_count": 5
|
||||
},
|
||||
"evidence": {
|
||||
"symbols_source": "DWARF",
|
||||
"stripped_symbols": 58,
|
||||
"heuristic_symbols": 12
|
||||
},
|
||||
"created_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Edge bundle predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryEdgeBundle@v1
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "edges",
|
||||
"digest": {"sha256": "..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"bundle_id": "bundle:001",
|
||||
"bundle_reason": "init_array",
|
||||
"edge_count": 128,
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:binary:...",
|
||||
"to": "sym:binary:...",
|
||||
"reason": "init-array",
|
||||
"confidence": 0.95
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR2: Edge Hash Recipe
|
||||
|
||||
**Binary edge hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason,
|
||||
"binary_hash": binary.file_hash // Binary context included
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Hash includes binary context:**
|
||||
|
||||
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
|
||||
2. No whitespace, UTF-8 encoding
|
||||
3. Lowercase hex for all hashes
|
||||
|
||||
### BR3: Required Binary Evidence with CAS Refs
|
||||
|
||||
**Required evidence per node:**
|
||||
|
||||
| Evidence Type | Required | CAS Storage |
|
||||
|---------------|----------|-------------|
|
||||
| File hash | Yes | N/A (inline) |
|
||||
| Build ID | Conditional | N/A (inline) |
|
||||
| Symbol source | Yes | N/A (inline) |
|
||||
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
|
||||
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
|
||||
| CFG | Optional | `cas://binary/cfg/{sha256}` |
|
||||
|
||||
**Evidence schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"binary_evidence": {
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_source": "DWARF",
|
||||
"symbol_confidence": 0.95,
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
|
||||
"disassembly_uri": "cas://binary/disasm/sha256:...",
|
||||
"cfg_uri": "cas://binary/cfg/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**CAS layout:**
|
||||
|
||||
```
|
||||
cas://binary/
|
||||
blocks/{sha256}/ # Code block bytes
|
||||
disasm/{sha256}/ # Disassembly JSON
|
||||
cfg/{sha256}/ # Control flow graph
|
||||
symbols/{sha256}/ # Symbol table extract
|
||||
```
|
||||
|
||||
### BR4: Build-ID/Variant Rules
|
||||
|
||||
**Build-ID sources:**
|
||||
|
||||
| Format | Build-ID Source | Example |
|
||||
|--------|-----------------|---------|
|
||||
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
|
||||
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
|
||||
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
|
||||
|
||||
**Fallback when build-ID absent:**
|
||||
|
||||
```json
|
||||
{
|
||||
"build_id": null,
|
||||
"build_id_fallback": {
|
||||
"method": "file_hash",
|
||||
"value": "sha256:...",
|
||||
"confidence": 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variant handling:**
|
||||
|
||||
Multiple binaries from same source (debug/release, different arch):
|
||||
|
||||
```json
|
||||
{
|
||||
"variant_group": "sha256:source_hash...",
|
||||
"variants": [
|
||||
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
|
||||
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
|
||||
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### BR5: Policy Hash Governance
|
||||
|
||||
**Policy version binding:**
|
||||
|
||||
Binary reachability graphs are bound to a policy version:
|
||||
|
||||
```json
|
||||
{
|
||||
"policy_binding": {
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"bound_at": "2025-12-13T10:00:00Z",
|
||||
"binding_mode": "strict"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Binding modes:**
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `strict` | Graph invalid if policy changes |
|
||||
| `forward` | Graph valid with newer policy versions |
|
||||
| `any` | Graph valid with any policy version |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. Production graphs use `strict` binding
|
||||
2. Test graphs may use `forward`
|
||||
3. Policy hash computed from canonical DSL
|
||||
4. Binding stored in graph metadata
|
||||
|
||||
### BR6: Sigstore Bundle/Log Routing
|
||||
|
||||
**Sigstore integration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"bundle_type": "hashedrekord",
|
||||
"log_index": 12345678,
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"inclusion_proof": {
|
||||
"log_index": 12345678,
|
||||
"root_hash": "sha256:...",
|
||||
"tree_size": 98765432,
|
||||
"hashes": ["sha256:...", "sha256:..."]
|
||||
},
|
||||
"signed_entry_timestamp": "base64:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Log routing:**
|
||||
|
||||
| Evidence Type | Log | Notes |
|
||||
|---------------|-----|-------|
|
||||
| Graph DSSE | Rekor (public) | Always |
|
||||
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
|
||||
| Code block | No log | CAS only |
|
||||
| CFG/Disasm | No log | CAS only |
|
||||
|
||||
**Offline mode:**
|
||||
|
||||
When Rekor unavailable:
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"mode": "offline",
|
||||
"checkpoint": {
|
||||
"origin": "rekor.sigstore.dev",
|
||||
"checkpoint_data": "base64:...",
|
||||
"captured_at": "2025-12-13T10:00:00Z"
|
||||
},
|
||||
"deferred_submission": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR7: Idempotent Submission Keys
|
||||
|
||||
**Submission key format:**
|
||||
|
||||
```
|
||||
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
|
||||
```
|
||||
|
||||
**Idempotency rules:**
|
||||
|
||||
1. Same key returns existing entry (no duplicate)
|
||||
2. Key includes hour-granularity timestamp for rate limiting
|
||||
3. Different graphs from same binary produce different keys
|
||||
4. Retry within 1 hour uses same key
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```json
|
||||
{
|
||||
"submission": {
|
||||
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
|
||||
"status": "accepted",
|
||||
"existing_entry": false,
|
||||
"log_index": 12345678
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR8: Size/Chunking Limits
|
||||
|
||||
**Size limits:**
|
||||
|
||||
| Element | Limit | Action on Exceed |
|
||||
|---------|-------|------------------|
|
||||
| Graph JSON | 10 MB | Chunk nodes/edges |
|
||||
| Edge bundle | 512 edges | Split bundles |
|
||||
| DSSE payload | 1 MB | Compress/chunk |
|
||||
| Rekor entry | 100 KB | Reference CAS |
|
||||
|
||||
**Chunking strategy:**
|
||||
|
||||
For large graphs (>10MB):
|
||||
|
||||
```json
|
||||
{
|
||||
"chunked_graph": {
|
||||
"chunk_count": 5,
|
||||
"chunks": [
|
||||
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
|
||||
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
|
||||
],
|
||||
"assembly_order": ["chunk:001", "chunk:002", ...],
|
||||
"assembled_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Compression:**
|
||||
|
||||
- Graph JSON: gzip before DSSE
|
||||
- CAS storage: Raw JSON (indexed)
|
||||
- Rekor payload: DSSE references CAS
|
||||
|
||||
### BR9: API/CLI/UI Surfacing
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `POST` | `/api/binary/graphs` | Submit binary graph |
|
||||
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
|
||||
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
|
||||
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
|
||||
| `POST` | `/api/binary/verify` | Verify graph attestation |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# Submit binary graph
|
||||
stella binary submit --graph ./richgraph.json --binary ./app
|
||||
|
||||
# Get graph info
|
||||
stella binary info --hash blake3:a1b2c3d4...
|
||||
|
||||
# List symbols
|
||||
stella binary symbols --hash blake3:... --stripped-only
|
||||
|
||||
# Verify attestation
|
||||
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
|
||||
```
|
||||
|
||||
**UI components:**
|
||||
|
||||
- Binary graph visualization with zoom/pan
|
||||
- Symbol table with search/filter
|
||||
- Edge explorer with confidence highlighting
|
||||
- Attestation status badges
|
||||
- Build variant selector
|
||||
|
||||
### BR10: Binary Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Binary/
|
||||
fixtures/
|
||||
elf-x86_64-with-debug/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
elf-stripped/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
pe-x64-with-pdb/
|
||||
binary.exe
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
golden/
|
||||
elf-x86_64.golden.json
|
||||
pe-x64.golden.json
|
||||
|
||||
datasets/binary/
|
||||
schema/
|
||||
binary-graph.schema.json
|
||||
binary-edge.schema.json
|
||||
samples/
|
||||
openssl-1.1.1/
|
||||
libssl.so
|
||||
graph.json
|
||||
edges.ndjson
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each binary format has at least one fixture
|
||||
2. Stripped and debug variants for each format
|
||||
3. Expected hashes verified by CI
|
||||
4. Golden outputs include DSSE envelopes
|
||||
5. Fixtures reproducible from source (where legal)
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Hash stability:** Same binary produces same graph hash
|
||||
2. **Build-ID extraction:** Correct build-ID parsing per format
|
||||
3. **Symbol recovery:** DWARF/PDB parsing accuracy
|
||||
4. **Stripped handling:** Code block hash computation
|
||||
5. **Chunking:** Large graph assembly/disassembly
|
||||
6. **DSSE signing:** Envelope creation and verification
|
||||
7. **Rekor integration:** Submission and verification
|
||||
|
||||
---
|
||||
|
||||
## 3. Implementation Status
|
||||
|
||||
| Component | Location | Status |
|
||||
|-----------|----------|--------|
|
||||
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
|
||||
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
|
||||
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
|
||||
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
|
||||
| UI components | `src/Web/StellaOps.Web` | Implemented |
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._
|
||||
416
docs/modules/reach-graph/schemas/edge-explainability-schema.md
Normal file
416
docs/modules/reach-graph/schemas/edge-explainability-schema.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Edge Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
|
||||
|
||||
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
|
||||
|
||||
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
|
||||
- **Confidence score:** Certainty of the edge's existence
|
||||
- **Evidence sources:** Detectors and rules that contributed to edge discovery
|
||||
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EG1: Reason Enum Governance
|
||||
|
||||
**Standard reason codes:**
|
||||
|
||||
| Code | Category | Description | Example |
|
||||
|------|----------|-------------|---------|
|
||||
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
|
||||
| `bytecode-field` | Static | Field access leading to call | Static initializer |
|
||||
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
|
||||
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
|
||||
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
|
||||
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
|
||||
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
|
||||
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
|
||||
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
|
||||
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
|
||||
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
|
||||
| `user-annotated` | Manual | User-provided edge | Policy override |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. New reason codes require RFC + review by Scanner Guild
|
||||
2. Deprecated codes remain valid for 2 major versions
|
||||
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
|
||||
4. Codes are case-insensitive, normalized to lowercase
|
||||
|
||||
**Code registry:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.edge.reason.registry@v1",
|
||||
"version": "2025-12-13",
|
||||
"reasons": [
|
||||
{
|
||||
"code": "bytecode-invoke",
|
||||
"category": "static",
|
||||
"description": "Bytecode invocation instruction",
|
||||
"languages": ["java", "dotnet"],
|
||||
"confidence_range": [0.9, 1.0],
|
||||
"deprecated": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### EG2: Canonical Edge Schema with Hash Rules
|
||||
|
||||
**Edge schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"edge_id": "edge:sha256:{hex}",
|
||||
"from": "sym:java:...",
|
||||
"to": "sym:java:...",
|
||||
"kind": "call",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95,
|
||||
"evidence": [
|
||||
{
|
||||
"source": "detector:java-bytecode-analyzer",
|
||||
"rule_id": "invoke-virtual",
|
||||
"rule_version": "1.0.0",
|
||||
"location": {
|
||||
"file": "com/example/Foo.class",
|
||||
"offset": 1234,
|
||||
"instruction": "invokevirtual #42"
|
||||
},
|
||||
"timestamp": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"virtual": true,
|
||||
"polymorphic_targets": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
|
||||
2. Sort JSON keys alphabetically
|
||||
3. No whitespace, UTF-8 encoding
|
||||
4. Hash is lowercase hex with `sha256:` prefix
|
||||
|
||||
### EG3: Evidence Limits/Redaction
|
||||
|
||||
**Evidence limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Evidence entries per edge | 10 | Yes |
|
||||
| Location detail fields | 5 | Yes |
|
||||
| Instruction preview length | 100 chars | Yes |
|
||||
| File path depth | 10 segments | No |
|
||||
|
||||
**Redaction rules:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
|
||||
| Bytecode offsets | Keep | Offsets are not PII |
|
||||
| Instruction text | Truncate | First 100 chars |
|
||||
| Source line content | Omit | Not included by default |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence_truncated": true,
|
||||
"evidence_count": 15,
|
||||
"evidence_shown": 10,
|
||||
"full_evidence_uri": "cas://edges/evidence/sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### EG4: Confidence Rubric
|
||||
|
||||
**Confidence scale:**
|
||||
|
||||
| Level | Range | Description | Typical Sources |
|
||||
|-------|-------|-------------|-----------------|
|
||||
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
|
||||
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
|
||||
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
|
||||
| `low` | 0.2-0.49 | Possible | Heuristic carving |
|
||||
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
|
||||
|
||||
**Confidence computation:**
|
||||
|
||||
```
|
||||
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
|
||||
```
|
||||
|
||||
**Base confidence by reason:**
|
||||
|
||||
| Reason | Base Confidence |
|
||||
|--------|-----------------|
|
||||
| `bytecode-invoke` | 0.98 |
|
||||
| `import-symbol` | 0.95 |
|
||||
| `plt-stub` | 0.92 |
|
||||
| `reloc-target` | 0.90 |
|
||||
| `init-array` | 0.95 |
|
||||
| `vtable-slot` | 0.75 |
|
||||
| `indirect-target` | 0.60 |
|
||||
| `reflection-invoke` | 0.50 |
|
||||
| `runtime-observed` | 0.99 |
|
||||
| `user-annotated` | 0.80 |
|
||||
|
||||
### EG5: Detector/Rule Provenance
|
||||
|
||||
**Provenance schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"provenance": {
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"detector": {
|
||||
"name": "java-bytecode-analyzer",
|
||||
"version": "2.0.0",
|
||||
"rule_set": "default"
|
||||
},
|
||||
"rule": {
|
||||
"id": "invoke-virtual",
|
||||
"version": "1.0.0",
|
||||
"description": "Detect invokevirtual bytecode instructions"
|
||||
},
|
||||
"input_artifacts": [
|
||||
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
|
||||
],
|
||||
"detected_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Provenance requirements:**
|
||||
|
||||
1. All edges must include analyzer provenance
|
||||
2. Detector/rule provenance required for non-runtime edges
|
||||
3. Input artifact digests enable reproducibility
|
||||
4. Detection timestamp uses UTC ISO-8601
|
||||
|
||||
### EG6: API/CLI Parity
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/api/edges/{edgeId}` | Get edge details |
|
||||
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
|
||||
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
|
||||
| `POST` | `/api/edges/search` | Search edges by criteria |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List edges for a graph
|
||||
stella edge list --graph blake3:a1b2c3d4...
|
||||
|
||||
# Get edge details
|
||||
stella edge show --id edge:sha256:...
|
||||
|
||||
# Search edges
|
||||
stella edge search --from "sym:java:..." --reason bytecode-invoke
|
||||
|
||||
# Export edges
|
||||
stella edge export --graph blake3:... --output ./edges.ndjson
|
||||
```
|
||||
|
||||
**Output parity:**
|
||||
|
||||
- API and CLI return identical JSON structure
|
||||
- CLI supports `--json` for machine-readable output
|
||||
- Both support filtering by reason, confidence, from/to
|
||||
|
||||
### EG7: Deterministic Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Edge/
|
||||
fixtures/
|
||||
bytecode-invoke.json
|
||||
plt-stub.json
|
||||
vtable-dispatch.json
|
||||
init-array-constructor.json
|
||||
runtime-observed.json
|
||||
golden/
|
||||
bytecode-invoke.golden.json
|
||||
graph-with-edges.golden.json
|
||||
|
||||
datasets/edges/
|
||||
schema/
|
||||
edge.schema.json
|
||||
reason-registry.json
|
||||
samples/
|
||||
java-spring-boot/
|
||||
edges.ndjson
|
||||
expected-hashes.txt
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each reason code has at least one fixture
|
||||
2. Fixtures include expected `edge_id` hash
|
||||
3. Golden outputs frozen after review
|
||||
4. CI verifies hash stability
|
||||
|
||||
### EG8: Propagation into Explanation Graphs/VEX
|
||||
|
||||
**Explanation graph inclusion:**
|
||||
|
||||
```json
|
||||
{
|
||||
"explanation": {
|
||||
"path": [
|
||||
{
|
||||
"node": "sym:java:main...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:handler...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.98
|
||||
}
|
||||
},
|
||||
{
|
||||
"node": "sym:java:handler...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:log4j...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
}
|
||||
}
|
||||
],
|
||||
"aggregate_path_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**VEX evidence format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"stellaops:reachability": {
|
||||
"path_edges": [
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
|
||||
],
|
||||
"weakest_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"aggregate_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EG9: Localization Guidance
|
||||
|
||||
**Localizable elements:**
|
||||
|
||||
| Element | Localization | Example |
|
||||
|---------|--------------|---------|
|
||||
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
|
||||
| Confidence level | Message catalog | `high` -> "High confidence" |
|
||||
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
|
||||
| Error messages | Message catalog | Standard error codes |
|
||||
|
||||
**Message catalog structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"locale": "en-US",
|
||||
"messages": {
|
||||
"edge.reason.bytecode-invoke": "Bytecode method call",
|
||||
"edge.reason.plt-stub": "PLT/GOT library call",
|
||||
"edge.confidence.high": "High confidence ({0:P0})",
|
||||
"edge.evidence.location": "Detected at offset {offset} in {file}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Supported locales:**
|
||||
|
||||
- `en-US` (default)
|
||||
- Additional locales via contribution
|
||||
|
||||
### EG10: Backfill Plan
|
||||
|
||||
**Backfill strategy:**
|
||||
|
||||
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
|
||||
2. **Phase 2:** Run detector upgrade on graphs without reason codes
|
||||
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
|
||||
|
||||
**Migration script:**
|
||||
|
||||
```bash
|
||||
stella edge backfill --graph blake3:... --dry-run
|
||||
|
||||
# Output:
|
||||
Graph: blake3:a1b2c3d4...
|
||||
Edges without reason: 1234
|
||||
Edges to update: 1234
|
||||
|
||||
Dry run - no changes made.
|
||||
|
||||
# Execute:
|
||||
stella edge backfill --graph blake3:... --execute
|
||||
```
|
||||
|
||||
**Backfill metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"backfill": {
|
||||
"status": "complete",
|
||||
"original_analyzer_version": "1.0.0",
|
||||
"backfill_analyzer_version": "1.2.0",
|
||||
"backfilled_at": "2025-12-13T10:00:00Z",
|
||||
"edges_updated": 1234
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Explainability Schema](./explainability-schema.md) - Explanation format
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._
|
||||
101
docs/modules/reach-graph/schemas/evidence-schema.md
Normal file
101
docs/modules/reach-graph/schemas/evidence-schema.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Reachability Evidence Schema (Draft v1, Nov 2026)
|
||||
|
||||
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
|
||||
|
||||
## 1. Core identifiers
|
||||
|
||||
- `symbol_id`: canonical ID for a function/symbol; includes `{format, build_id?, file_hash?, section?, addr, length}` plus optional `code_block_hash`. Always deterministic and lowercase.
|
||||
- `code_id`: `{format, build_id?, file_hash?, start, length, code_block_hash?}`; used when symbol names are absent.
|
||||
- `symbol_digest`: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.
|
||||
- `purl`: package URL of the owning component (from SBOM resolver); `pkg:unknown` when unresolved.
|
||||
|
||||
## 2. Graph payload (`richgraph-v1` additions)
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"id": "sym:sha256:...",
|
||||
"symbol_id": "func:ELF:sha256:...",
|
||||
"code_id": "code:ELF:sha256:...",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
|
||||
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
|
||||
"build_id": "a1b2c3...",
|
||||
"lang": "c",
|
||||
"evidence": ["dwarf", "dynsym"],
|
||||
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:sha256:caller",
|
||||
"to": "sym:sha256:callee",
|
||||
"kind": "direct|plt|indirect|runtime",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
|
||||
"symbol_digest": "sha256:...", // callee digest
|
||||
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
|
||||
"confidence": 0.92,
|
||||
"evidence": ["import", "reloc@GOT"]
|
||||
}
|
||||
],
|
||||
"roots": [
|
||||
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
|
||||
{ "id": "main", "phase": "runtime" }
|
||||
],
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
## 2.5 Attestation levels (hybrid default)
|
||||
|
||||
- **Graph DSSE (required):** one DSSE envelope over the canonical graph JSON (sorted arrays/keys) with `graph_hash` = BLAKE3 of body; Rekor publish always (or mirror when offline).
|
||||
- **Edge-bundle DSSE (optional):** batches of ≤512 edges, emitted only for high-signal cases (`runtime`, `init_array`/TLS roots, contested/third-party edges). Each bundle carries `graph_hash`, `bundle_reason`, per-edge `reason`, `symbol_digest`, `purl`, `confidence`, and optional `revoked=true` for quarantine. Rekor publish is configurable; CAS storage is mandatory.
|
||||
- CAS layout additions:
|
||||
- Graph body: `cas://reachability/graphs/{blake3}`
|
||||
- Graph DSSE: `cas://reachability/graphs/{blake3}.dsse`
|
||||
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}` + `.dsse`
|
||||
- Determinism: bundle ordering by `(bundle_reason, edge_id)`; arrays sorted before hashing.
|
||||
|
||||
## 3. Runtime facts (Signals ingestion)
|
||||
|
||||
Fields per NDJSON event:
|
||||
|
||||
- `symbolId` (required), `codeId`, `symbolDigest?`, `purl?`
|
||||
- `hitCount`, `observedAt`, `loaderBase`, `processId`, `processName`, `containerId`, `socketAddress?`
|
||||
- `callgraphId` or `scanId`, plus `evidenceUri` (CAS) if trace stored externally
|
||||
- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
|
||||
|
||||
## 4. Unknowns registry payload
|
||||
|
||||
See `docs/modules/signals/guides/unknowns-registry.md`; reachability producers emit Unknowns when:
|
||||
- symbol→purl unresolved,
|
||||
- call edge target unresolved,
|
||||
- build-id missing for ELF and file hash used instead.
|
||||
|
||||
Unknowns must include `unknown_type`, `scope`, `provenance`, `confidence.p`, and `labels`.
|
||||
|
||||
## 5. CAS layout
|
||||
|
||||
- Graphs: `cas://reachability/graphs/{blake3}` (canonical JSON, sorted keys/arrays)
|
||||
- Runtime traces: `cas://reachability/runtime/{sha256}`
|
||||
- Unknowns evidence (optional large blobs): `cas://unknowns/{sha256}`
|
||||
- Edge bundles: `cas://reachability/edges/{graph_hash}/{bundle_id}` (JSON + `.dsse`)
|
||||
|
||||
Metadata for each CAS object: `{ schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }`. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
|
||||
|
||||
## 6. Validation rules
|
||||
|
||||
- All edges must carry either `purl` or `candidates[]`; never leave both empty.
|
||||
- If `build_id` present, `symbol_id` and `code_id` must store it; if absent, record `build_id_source: "FileHash"`.
|
||||
- Evidence arrays sorted; confidence in [0,1].
|
||||
- `code_block_hash` (when present) must be lowercase hex with an algorithm prefix (e.g., `sha256:`) and only accompany stripped/heuristic nodes.
|
||||
- Roots must include load-time constructors when present.
|
||||
- When `edge_bundles` are present, each edge in a bundle must also exist in the graph edge set; `revoked=true` bundles override graph edges for policy/scoring.
|
||||
- Graph DSSE is mandatory per scan; edge-bundle DSSEs are optional but must reference `graph_hash` and `bundle_id`.
|
||||
|
||||
## 7. Acceptance checklist
|
||||
|
||||
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
|
||||
- CAS writers enforce canonicalization before hashing.
|
||||
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.
|
||||
454
docs/modules/reach-graph/schemas/explainability-schema.md
Normal file
454
docs/modules/reach-graph/schemas/explainability-schema.md
Normal file
@@ -0,0 +1,454 @@
|
||||
# Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._
|
||||
|
||||
This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:
|
||||
|
||||
- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
|
||||
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
|
||||
- **Confidence scores:** Per-rule and aggregate confidence values
|
||||
- **Redaction metadata:** PII handling and data classification
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EX1: Schema/Canonicalization + Hashes
|
||||
|
||||
**Explanation schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation@v1",
|
||||
"explanation_id": "explain:sha256:{hex}",
|
||||
"finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"verdict": {
|
||||
"status": "affected",
|
||||
"severity": {"normalized": "Critical", "score": 10.0},
|
||||
"confidence": 0.92
|
||||
},
|
||||
"decision_chain": [
|
||||
{
|
||||
"rule_id": "rule:reachability_gate",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"reachability.state": "CR",
|
||||
"reachability.confidence": 0.92
|
||||
},
|
||||
"output": {"allowed": true, "contribution": 0.4},
|
||||
"evidence_refs": ["cas://reachability/graphs/blake3:..."]
|
||||
},
|
||||
{
|
||||
"rule_id": "rule:severity_baseline",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"cvss_base": 10.0,
|
||||
"epss_percentile": 0.95
|
||||
},
|
||||
"output": {"severity": "Critical", "contribution": 0.6},
|
||||
"evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
|
||||
}
|
||||
],
|
||||
"aggregate_confidence": 0.88,
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"policy_version": "sha256:...",
|
||||
"graph_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
**Canonicalization rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all levels
|
||||
2. Arrays in `decision_chain` ordered by rule execution sequence
|
||||
3. `evidence_refs` arrays sorted alphabetically
|
||||
4. No whitespace, UTF-8 encoding
|
||||
5. Hash computed over canonical JSON: `sha256(canonical_json)`
|
||||
|
||||
### EX2: DSSE Predicate/Signing Policy
|
||||
|
||||
**DSSE predicate type:**
|
||||
|
||||
```
|
||||
stella.ops/explanation@v1
|
||||
```
|
||||
|
||||
**Signing policy:**
|
||||
|
||||
| Element | Required | Signer |
|
||||
|---------|----------|--------|
|
||||
| Explanation body | Yes | Policy Engine key |
|
||||
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
|
||||
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |
|
||||
|
||||
**DSSE envelope structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.explanation+json",
|
||||
"payload": "<base64(canonical_explanation_json)>",
|
||||
"signatures": [
|
||||
{
|
||||
"keyid": "policy-engine-signing-2025",
|
||||
"sig": "base64:..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Signing requirements:**
|
||||
|
||||
- All explanations must be signed before CAS storage
|
||||
- Signing key must be registered in Authority key store
|
||||
- Key rotation triggers re-signing of active explanations (configurable)
|
||||
|
||||
### EX3: CAS Storage Rules for Evidence
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://explanations/
|
||||
{sha256}/ # Explanation body
|
||||
{sha256}.dsse # DSSE envelope
|
||||
by-finding/{finding_id}/ # Index by finding
|
||||
by-policy/{policy_digest}/ # Index by policy version
|
||||
by-graph/{graph_revision_id}/ # Index by graph revision
|
||||
```
|
||||
|
||||
**Storage rules:**
|
||||
|
||||
1. Explanations are immutable after signing
|
||||
2. New verdicts create new explanation documents (no updates)
|
||||
3. Previous explanations are retained per retention policy
|
||||
4. Cross-references validated at write time (graphs, VEX must exist)
|
||||
|
||||
**Deduplication:**
|
||||
|
||||
- Identical canonical JSON produces identical hash
|
||||
- CAS returns existing reference if content matches
|
||||
|
||||
### EX4: Link to Decision/Policy and graph_revision_id
|
||||
|
||||
**Required links:**
|
||||
|
||||
```json
|
||||
{
|
||||
"links": {
|
||||
"policy_version": "sha256:7e1d...",
|
||||
"policy_uri": "cas://policy/versions/sha256:7e1d...",
|
||||
"graph_revision_id": "rev:blake3:a1b2...",
|
||||
"graph_uri": "cas://reachability/revisions/blake3:a1b2...",
|
||||
"sbom_digest": "sha256:def4...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"vex_digest": "sha256:e5f6...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Validation:**
|
||||
|
||||
- All linked artifacts must exist at explanation creation time
|
||||
- Links are verified during replay/audit
|
||||
- Broken links cause replay verification failure
|
||||
|
||||
### EX5: Export/Replay Bundle Format
|
||||
|
||||
**Export bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation.bundle@v1",
|
||||
"bundle_id": "bundle:explain:2025-12-13",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"explanations": [
|
||||
{
|
||||
"explanation_id": "explain:sha256:...",
|
||||
"finding_id": "...",
|
||||
"explanation_uri": "explanations/sha256:....json",
|
||||
"dsse_uri": "explanations/sha256:....dsse"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
"graphs": [
|
||||
{"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
|
||||
],
|
||||
"policies": [
|
||||
{"digest": "sha256:...", "uri": "policies/sha256:....json"}
|
||||
],
|
||||
"vex_statements": [
|
||||
{"digest": "sha256:...", "uri": "vex/sha256:....json"}
|
||||
]
|
||||
},
|
||||
"verification": {
|
||||
"bundle_hash": "sha256:...",
|
||||
"signature": "base64:...",
|
||||
"signed_by": "policy-engine-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Replay verification:**
|
||||
|
||||
```bash
|
||||
stella explain verify --bundle ./explanation-bundle.tgz
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:explain:2025-12-13
|
||||
Explanations: 42
|
||||
Dependencies: 5 graphs, 2 policies, 12 VEX
|
||||
|
||||
Verifying explanations...
|
||||
Canonical hashes: 42/42 MATCH
|
||||
DSSE signatures: 42/42 VALID
|
||||
Dependency links: 42/42 RESOLVED
|
||||
|
||||
Replay verification PASSED.
|
||||
```
|
||||
|
||||
### EX6: PII/Redaction Rules
|
||||
|
||||
**Redaction categories:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
|
||||
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
|
||||
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
|
||||
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
|
||||
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |
|
||||
|
||||
**Redaction metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"redaction": {
|
||||
"applied": true,
|
||||
"level": "standard",
|
||||
"fields_redacted": ["actor.email", "evidence.file_path"],
|
||||
"redaction_policy": "stellaops.redaction.standard@v1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Export modes:**
|
||||
|
||||
- `--redacted` (default): Apply standard redaction
|
||||
- `--full`: Include all data (requires `explain:export:full` scope)
|
||||
- `--audit`: Include redaction audit trail
|
||||
|
||||
### EX7: Size Budgets
|
||||
|
||||
**Limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Explanation body | 256 KB | Yes |
|
||||
| Decision chain entries | 100 | Yes |
|
||||
| Evidence refs per rule | 20 | Yes |
|
||||
| Total evidence refs | 200 | Yes |
|
||||
| Path entries | 50 | No |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
When limits are exceeded:
|
||||
1. Log warning with truncation details
|
||||
2. Add `truncation` metadata to explanation
|
||||
3. Store full evidence in separate CAS object
|
||||
4. Include `full_evidence_uri` reference
|
||||
|
||||
```json
|
||||
{
|
||||
"truncation": {
|
||||
"applied": true,
|
||||
"elements_truncated": ["decision_chain", "evidence_refs"],
|
||||
"full_evidence_uri": "cas://explanations/full/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EX8: Versioning
|
||||
|
||||
**Schema versioning:**
|
||||
|
||||
- Schema version in `schema` field: `stellaops.explanation@v1`
|
||||
- Breaking changes increment major version
|
||||
- Minor changes (additive fields) use v1.x
|
||||
- Backward compatibility maintained for 2 major versions
|
||||
|
||||
**Migration support:**
|
||||
|
||||
```bash
|
||||
stella explain migrate --from v1 --to v2 --input ./explanations/
|
||||
|
||||
# Output:
|
||||
Migrating 1000 explanations from v1 to v2...
|
||||
Migrated: 998
|
||||
Skipped (already v2): 2
|
||||
|
||||
Migration complete.
|
||||
```
|
||||
|
||||
**Version compatibility matrix:**
|
||||
|
||||
| API Version | Schema v1 | Schema v2 |
|
||||
|-------------|-----------|-----------|
|
||||
| 1.0.x | Full | N/A |
|
||||
| 1.1.x | Full | Full |
|
||||
| 2.0.x | Read-only | Full |
|
||||
|
||||
### EX9: Golden Fixtures/Tests
|
||||
|
||||
**Test fixture location:**
|
||||
|
||||
```
|
||||
tests/Explanation/
|
||||
fixtures/
|
||||
simple-affected.json
|
||||
simple-not-affected.json
|
||||
with-reachability-evidence.json
|
||||
multi-rule-chain.json
|
||||
truncated-evidence.json
|
||||
redacted-pii.json
|
||||
golden/
|
||||
simple-affected.golden.json
|
||||
simple-affected.golden.dsse
|
||||
|
||||
datasets/explanations/
|
||||
schema/
|
||||
explanation.schema.json
|
||||
samples/
|
||||
log4j-affected/
|
||||
explanation.json
|
||||
expected-hash.txt
|
||||
```
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Canonicalization tests:** Verify hash stability across JSON reordering
|
||||
2. **DSSE signing tests:** Verify signature creation and verification
|
||||
3. **Redaction tests:** Verify PII handling
|
||||
4. **Truncation tests:** Verify size budget enforcement
|
||||
5. **Replay tests:** Verify bundle export/import cycle
|
||||
6. **Migration tests:** Verify version upgrade paths
|
||||
|
||||
**CI integration:**
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/explanation-tests.yml
|
||||
explanation-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Run explanation tests
|
||||
run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
|
||||
- name: Verify golden fixtures
|
||||
run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
|
||||
```
|
||||
|
||||
### EX10: Determinism Guarantees
|
||||
|
||||
**Determinism requirements:**
|
||||
|
||||
1. Same inputs produce identical `explanation_id` hash
|
||||
2. Decision chain ordering is stable (execution order)
|
||||
3. Evidence refs sorted alphabetically
|
||||
4. Timestamps use UTC ISO-8601 with millisecond precision
|
||||
5. Floating-point values rounded to 6 decimal places
|
||||
|
||||
**Verification:**
|
||||
|
||||
```bash
|
||||
# Run twice with same inputs, verify identical hashes
|
||||
stella explain generate --finding "..." --output a.json
|
||||
stella explain generate --finding "..." --output b.json
|
||||
diff a.json b.json # Should be empty
|
||||
|
||||
# Or use built-in verify
|
||||
stella explain verify-determinism --finding "..." --iterations 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Generate Explanation
|
||||
|
||||
```http
|
||||
POST /api/policy/findings/{findingId}/explain
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"mode": "full",
|
||||
"include_evidence": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Explanation
|
||||
|
||||
```http
|
||||
GET /api/explanations/{explanationId}
|
||||
Authorization: Bearer <token>
|
||||
Accept: application/json
|
||||
```
|
||||
|
||||
### 3.3 Export Explanation Bundle
|
||||
|
||||
```http
|
||||
POST /api/explanations/export
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"finding_ids": ["...", "..."],
|
||||
"include_dependencies": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Verify Explanation
|
||||
|
||||
```http
|
||||
POST /api/explanations/{explanationId}/verify
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. CLI Reference
|
||||
|
||||
```bash
|
||||
# Generate explanation for a finding
|
||||
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"
|
||||
|
||||
# Export explanation bundle
|
||||
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz
|
||||
|
||||
# Verify explanation
|
||||
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse
|
||||
|
||||
# Verify bundle
|
||||
stella explain verify --bundle ./bundle.tgz
|
||||
|
||||
# Check determinism
|
||||
stella explain verify-determinism --finding "..." --iterations 5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Related Documentation
|
||||
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
|
||||
- [Policy API](../api/policy.md) - Policy Engine REST API
|
||||
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._
|
||||
377
docs/modules/reach-graph/schemas/graph-revision-schema.md
Normal file
377
docs/modules/reach-graph/schemas/graph-revision-schema.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Graph Revision Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Platform Guild._
|
||||
|
||||
This document defines the graph revision schema addressing gaps GR1-GR10 from the November 2025 product findings. It specifies manifest structure, hash algorithms, storage layout, lineage tracking, and governance rules for deterministic, auditable reachability graphs.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Graph revisions provide content-addressable, append-only versioning for `richgraph-v1` documents. Every graph mutation produces a new immutable revision with:
|
||||
|
||||
- **Deterministic hash:** BLAKE3-256 of canonical JSON
|
||||
- **Lineage metadata:** Parent revision + diff summary
|
||||
- **Cross-artifact digests:** Links to SBOM, VEX, policy, and tool versions
|
||||
- **Audit trail:** Timestamp, actor, tenant, and operation type
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### GR1: Manifest Schema + Canonical Hash Rules
|
||||
|
||||
**Manifest schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:a1b2c3d4e5f6...",
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"parent_revision_id": "rev:blake3:9f8e7d6c5b4a...",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"created_by": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"operation": "create",
|
||||
"lineage": {
|
||||
"depth": 3,
|
||||
"root_revision_id": "rev:blake3:1a2b3c4d5e6f..."
|
||||
},
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"vex_digest": "sha256:...",
|
||||
"policy_digest": "sha256:...",
|
||||
"analyzer_digest": "sha256:..."
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"roots_changed": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Canonical hash rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all nesting levels
|
||||
2. No whitespace/indentation (compact JSON)
|
||||
3. UTF-8 encoding, no BOM
|
||||
4. Arrays sorted by deterministic key (nodes by `id`, edges by `from,to,kind`)
|
||||
5. Null/empty values omitted
|
||||
6. Numeric values without trailing zeros
|
||||
|
||||
### GR2: Mandated BLAKE3-256 Encoding
|
||||
|
||||
All graph-level hashes use BLAKE3-256 with the following format:
|
||||
|
||||
```
|
||||
blake3:{64_hex_chars}
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- BLAKE3 is 3x+ faster than SHA-256 on modern CPUs
|
||||
- Parallelizable for large graphs (>100K nodes)
|
||||
- Cryptographically secure (256-bit security)
|
||||
- Algorithm prefix enables future migration
|
||||
|
||||
### GR3: Append-Only Storage
|
||||
|
||||
Graph revisions are immutable. Operations:
|
||||
|
||||
| Operation | Creates New Revision | Modifies Existing |
|
||||
|-----------|---------------------|-------------------|
|
||||
| `create` | Yes | No |
|
||||
| `update` | Yes | No |
|
||||
| `merge` | Yes | No |
|
||||
| `tombstone` | Yes | No |
|
||||
| `read` | No | No |
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
revisions/
|
||||
{blake3}/ # Revision manifest
|
||||
{blake3}.graph # Graph body
|
||||
{blake3}.dsse # DSSE envelope
|
||||
indices/
|
||||
by-tenant/{tenant_id}/ # Tenant index
|
||||
by-sbom/{sbom_digest}/ # SBOM correlation
|
||||
by-root/{root_revision_id}/ # Lineage tree
|
||||
```
|
||||
|
||||
### GR4: Lineage/Diff Metadata
|
||||
|
||||
Every revision tracks its lineage:
|
||||
|
||||
```json
|
||||
{
|
||||
"lineage": {
|
||||
"depth": 5,
|
||||
"root_revision_id": "rev:blake3:...",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"merge_parents": []
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"nodes_modified": 0,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"edges_modified": 0,
|
||||
"roots_added": 0,
|
||||
"roots_removed": 0
|
||||
},
|
||||
"diff_detail_uri": "cas://reachability/diffs/{parent_hash}_{child_hash}.ndjson"
|
||||
}
|
||||
```
|
||||
|
||||
**Diff detail format (NDJSON):**
|
||||
|
||||
```ndjson
|
||||
{"op":"add","path":"nodes","value":{"id":"sym:java:...","display":"..."}}
|
||||
{"op":"remove","path":"edges","from":"sym:java:a","to":"sym:java:b"}
|
||||
```
|
||||
|
||||
### GR5: Cross-Artifact Digests (SBOM/VEX/Policy/Tool)
|
||||
|
||||
Every revision links to related artifacts:
|
||||
|
||||
```json
|
||||
{
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"sbom_format": "cyclonedx-1.6",
|
||||
"vex_digest": "sha256:...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json",
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"analyzer_digest": "sha256:...",
|
||||
"analyzer_name": "scanner.java",
|
||||
"analyzer_version": "1.2.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR6: UI/CLI Surfacing of Full/Short IDs
|
||||
|
||||
**Full ID format:**
|
||||
```
|
||||
rev:blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Short ID format (for display):**
|
||||
```
|
||||
rev:a1b2c3d4
|
||||
```
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List revisions
|
||||
stella graph revisions --scan-id scan-123
|
||||
|
||||
# Show full ID
|
||||
stella graph revisions --scan-id scan-123 --full
|
||||
|
||||
# Output:
|
||||
REVISION CREATED NODES EDGES PARENT
|
||||
rev:a1b2c3d4 2025-12-13T10:00:00 1247 3891 rev:9f8e7d6c
|
||||
rev:9f8e7d6c 2025-12-12T15:30:00 1235 3867 rev:1a2b3c4d
|
||||
```
|
||||
|
||||
**UI display:**
|
||||
|
||||
- Revision chips show short ID with copy-to-clipboard for full ID
|
||||
- Hover tooltip shows full ID and creation timestamp
|
||||
- Lineage tree visualization available in "Revision History" drawer
|
||||
|
||||
### GR7: Shard/Tenant Context
|
||||
|
||||
Every revision includes partition context:
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"namespace": "prod",
|
||||
"workspace_id": "ws:default"
|
||||
}
|
||||
```
|
||||
|
||||
**Tenant isolation:**
|
||||
|
||||
- Revisions are tenant-scoped; cross-tenant access requires explicit grants
|
||||
- Shard ID enables horizontal scaling and data locality
|
||||
- Namespace supports multi-environment deployments
|
||||
|
||||
### GR8: Pin/Audit Governance
|
||||
|
||||
**Pinned revisions:**
|
||||
|
||||
Revisions can be pinned to prevent automatic retention cleanup:
|
||||
|
||||
```json
|
||||
{
|
||||
"pinned": true,
|
||||
"pinned_at": "2025-12-13T10:00:00Z",
|
||||
"pinned_by": "user:alice",
|
||||
"pin_reason": "Audit retention for CVE-2021-44228 investigation",
|
||||
"pin_expires_at": "2026-12-13T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Audit events:**
|
||||
|
||||
All revision operations emit audit events:
|
||||
|
||||
```json
|
||||
{
|
||||
"event_type": "graph.revision.created",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"actor": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"metadata": {
|
||||
"operation": "create",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR9: Retention/Tombstones
|
||||
|
||||
**Retention policy:**
|
||||
|
||||
| Category | Default Retention | Configurable |
|
||||
|----------|-------------------|--------------|
|
||||
| Latest revision | Forever | No |
|
||||
| Intermediate revisions | 90 days | Yes |
|
||||
| Tombstoned revisions | 30 days | Yes |
|
||||
| Pinned revisions | Until unpin + 7 days | No |
|
||||
|
||||
**Tombstone format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"tombstone": true,
|
||||
"tombstoned_at": "2025-12-13T10:00:00Z",
|
||||
"tombstoned_by": "service:retention-worker",
|
||||
"tombstone_reason": "retention_policy",
|
||||
"successor_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
### GR10: Inclusion in Offline Kits
|
||||
|
||||
Offline kits include graph revisions for air-gapped deployments:
|
||||
|
||||
**Offline bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.offline.bundle@v1",
|
||||
"bundle_id": "bundle:2025-12-13",
|
||||
"graph_revisions": [
|
||||
{
|
||||
"revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:...",
|
||||
"included_artifacts": ["graph", "dsse", "diff"]
|
||||
}
|
||||
],
|
||||
"rekor_checkpoints": [
|
||||
{
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"checkpoint": "...",
|
||||
"verified_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"signature": {
|
||||
"algorithm": "ecdsa-p256",
|
||||
"value": "base64:...",
|
||||
"public_key_id": "key:offline-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Import verification:**
|
||||
|
||||
```bash
|
||||
stella offline import --bundle ./offline-bundle.tgz --verify
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:2025-12-13
|
||||
Graph Revisions: 5
|
||||
Rekor Checkpoints: 2
|
||||
|
||||
Verifying signatures...
|
||||
Bundle signature: VALID
|
||||
DSSE envelopes: 5/5 VALID
|
||||
Rekor checkpoints: 2/2 VERIFIED
|
||||
|
||||
Import complete.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Create Revision
|
||||
|
||||
```http
|
||||
POST /api/graph/revisions
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"graph": { ... richgraph-v1 ... },
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"cross_artifacts": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Revision
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/{revision_id}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.3 List Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions?tenant_id=acme&sbom_digest=sha256:...&limit=20
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.4 Diff Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/diff?from={rev_a}&to={rev_b}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [CAS Infrastructure](../contracts/cas-infrastructure.md) - Content-addressable storage
|
||||
- [Offline Kit](../OFFLINE_KIT.md) - Air-gap deployment
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GRAPHREV-GAPS-401-063 for change history._
|
||||
337
docs/modules/reach-graph/schemas/ground-truth-schema.md
Normal file
337
docs/modules/reach-graph/schemas/ground-truth-schema.md
Normal file
@@ -0,0 +1,337 @@
|
||||
# Ground Truth Schema for Reachability Datasets
|
||||
|
||||
> **Status:** Design v1 (Sprint 0401)
|
||||
> **Owners:** Scanner Guild, Signals Guild, Quality Guild
|
||||
|
||||
This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Ground truth datasets enable:
|
||||
|
||||
1. **Regression testing:** Detect regressions in reachability analysis accuracy
|
||||
2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery
|
||||
3. **Lattice validation:** Verify join/meet operations produce expected states
|
||||
4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions
|
||||
|
||||
---
|
||||
|
||||
## 2. Dataset Structure
|
||||
|
||||
### 2.1 Directory Layout
|
||||
|
||||
```
|
||||
datasets/reachability/
|
||||
├── samples/
|
||||
│ ├── java/
|
||||
│ │ ├── vulnerable-log4j/
|
||||
│ │ │ ├── manifest.json # Sample metadata
|
||||
│ │ │ ├── richgraph-v1.json # Input callgraph
|
||||
│ │ │ ├── ground-truth.json # Expected outcomes
|
||||
│ │ │ └── artifacts/ # Source binaries/SBOMs
|
||||
│ │ └── safe-spring-boot/
|
||||
│ │ └── ...
|
||||
│ ├── native/
|
||||
│ │ ├── stripped-elf/
|
||||
│ │ └── openssl-vuln/
|
||||
│ └── polyglot/
|
||||
│ └── node-native-addon/
|
||||
├── corpus/
|
||||
│ ├── positive/ # Known reachable samples
|
||||
│ ├── negative/ # Known unreachable samples
|
||||
│ └── contested/ # Known conflict samples
|
||||
└── schema/
|
||||
├── manifest.schema.json
|
||||
└── ground-truth.schema.json
|
||||
```
|
||||
|
||||
### 2.2 Sample Manifest (`manifest.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"sampleId": "sample:java:vulnerable-log4j:001",
|
||||
"version": "1.0.0",
|
||||
"createdAt": "2025-12-13T10:00:00Z",
|
||||
"language": "java",
|
||||
"category": "positive",
|
||||
"description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
|
||||
"source": {
|
||||
"repository": "https://github.com/example/vuln-app",
|
||||
"commit": "abc123...",
|
||||
"buildToolchain": "maven:3.9.0,jdk:17"
|
||||
},
|
||||
"vulnerabilities": [
|
||||
{
|
||||
"vulnId": "CVE-2021-44228",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
|
||||
}
|
||||
],
|
||||
"artifacts": [
|
||||
{
|
||||
"path": "artifacts/app.jar",
|
||||
"hash": "sha256:...",
|
||||
"type": "application/java-archive"
|
||||
},
|
||||
{
|
||||
"path": "artifacts/sbom.cdx.json",
|
||||
"hash": "sha256:...",
|
||||
"type": "application/vnd.cyclonedx+json"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Ground Truth Document (`ground-truth.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "ground-truth-v1",
|
||||
"sampleId": "sample:java:vulnerable-log4j:001",
|
||||
"generatedAt": "2025-12-13T10:00:00Z",
|
||||
"generator": {
|
||||
"name": "manual-annotation",
|
||||
"version": "1.0.0",
|
||||
"annotator": "security-team"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"symbolId": "sym:java:...",
|
||||
"display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"expected": {
|
||||
"latticeState": "CR",
|
||||
"bucket": "direct",
|
||||
"reachable": true,
|
||||
"confidence": 0.95,
|
||||
"pathLength": 3,
|
||||
"path": [
|
||||
"sym:java:...main",
|
||||
"sym:java:...logInfo",
|
||||
"sym:java:...JndiLookup.lookup"
|
||||
]
|
||||
},
|
||||
"reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
|
||||
},
|
||||
{
|
||||
"symbolId": "sym:java:...",
|
||||
"display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"expected": {
|
||||
"latticeState": "CU",
|
||||
"bucket": "unreachable",
|
||||
"reachable": false,
|
||||
"confidence": 0.90,
|
||||
"pathLength": null,
|
||||
"path": null
|
||||
},
|
||||
"reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
|
||||
}
|
||||
],
|
||||
"entryPoints": [
|
||||
{
|
||||
"symbolId": "sym:java:...",
|
||||
"display": "com.example.app.Main.main",
|
||||
"phase": "runtime",
|
||||
"source": "manifest"
|
||||
}
|
||||
],
|
||||
"expectedUncertainty": {
|
||||
"states": [],
|
||||
"aggregateTier": "T4",
|
||||
"riskScore": 0.0
|
||||
},
|
||||
"expectedGateDecisions": [
|
||||
{
|
||||
"vulnId": "CVE-2021-44228",
|
||||
"targetSymbol": "sym:java:...JndiLookup.lookup",
|
||||
"requestedStatus": "not_affected",
|
||||
"expectedDecision": "block",
|
||||
"expectedBlockedBy": "LatticeState",
|
||||
"expectedReason": "CR state incompatible with not_affected"
|
||||
},
|
||||
{
|
||||
"vulnId": "CVE-2021-44228",
|
||||
"targetSymbol": "sym:java:...JndiLookup.lookup",
|
||||
"requestedStatus": "affected",
|
||||
"expectedDecision": "allow"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Schema Definitions
|
||||
|
||||
### 3.1 Ground Truth Target
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) |
|
||||
| `display` | string | No | Human-readable symbol name |
|
||||
| `purl` | string | No | Package URL of containing package |
|
||||
| `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` |
|
||||
| `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) |
|
||||
| `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point |
|
||||
| `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] |
|
||||
| `expected.pathLength` | number | No | Expected path length (null if unreachable) |
|
||||
| `expected.path` | string[] | No | Expected path (sorted, deterministic) |
|
||||
| `reasoning` | string | Yes | Human explanation of expected outcome |
|
||||
|
||||
### 3.2 Expected Gate Decision
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `vulnId` | string | Yes | Vulnerability identifier |
|
||||
| `targetSymbol` | string | Yes | Target SymbolID |
|
||||
| `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` |
|
||||
| `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` |
|
||||
| `expectedBlockedBy` | string | No | Gate name if blocked |
|
||||
| `expectedReason` | string | No | Expected reason message |
|
||||
|
||||
---
|
||||
|
||||
## 4. Sample Categories
|
||||
|
||||
### 4.1 Positive Samples (Reachable)
|
||||
|
||||
Known-reachable cases where vulnerable code is called:
|
||||
|
||||
- **direct-call:** Vulnerable function called directly from entry point
|
||||
- **transitive:** Multi-hop path from entry point to vulnerable function
|
||||
- **runtime-observed:** Confirmed reachable via runtime probe
|
||||
- **init-array:** Reachable via load-time constructor
|
||||
|
||||
### 4.2 Negative Samples (Unreachable)
|
||||
|
||||
Known-unreachable cases where vulnerable code exists but isn't called:
|
||||
|
||||
- **dead-code:** Function present but never invoked
|
||||
- **conditional-unreachable:** Function behind impossible condition
|
||||
- **test-only:** Function only reachable from test entry points
|
||||
- **deprecated-api:** Old API present but replaced by new implementation
|
||||
|
||||
### 4.3 Contested Samples
|
||||
|
||||
Cases where static and runtime evidence conflict:
|
||||
|
||||
- **static-reach-runtime-miss:** Static analysis finds path, runtime never observes
|
||||
- **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution
|
||||
- **version-mismatch:** Analysis version differs from runtime version
|
||||
|
||||
---
|
||||
|
||||
## 5. Benchmark Metrics
|
||||
|
||||
### 5.1 Path Discovery Metrics
|
||||
|
||||
```
|
||||
Precision = TruePositive / (TruePositive + FalsePositive)
|
||||
Recall = TruePositive / (TruePositive + FalseNegative)
|
||||
F1 = 2 * (Precision * Recall) / (Precision + Recall)
|
||||
```
|
||||
|
||||
### 5.2 Lattice State Accuracy
|
||||
|
||||
```
|
||||
StateAccuracy = CorrectStates / TotalTargets
|
||||
BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility)
|
||||
```
|
||||
|
||||
### 5.3 Gate Decision Accuracy
|
||||
|
||||
```
|
||||
GateAccuracy = CorrectDecisions / TotalGateTests
|
||||
FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric)
|
||||
FalseBlock = BlockedWhenShouldAllow / TotalAllows
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Test Harness Integration
|
||||
|
||||
### 6.1 xUnit Test Pattern
|
||||
|
||||
```csharp
|
||||
[Theory]
|
||||
[MemberData(nameof(GetGroundTruthSamples))]
|
||||
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
|
||||
{
|
||||
// Arrange
|
||||
var graph = await LoadRichGraphAsync(sample.GraphPath);
|
||||
var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();
|
||||
|
||||
// Act
|
||||
var result = await scorer.ComputeAsync(graph, sample.EntryPoints);
|
||||
|
||||
// Assert
|
||||
foreach (var target in sample.Targets)
|
||||
{
|
||||
var actual = result.States.First(s => s.SymbolId == target.SymbolId);
|
||||
Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
|
||||
Assert.Equal(target.Expected.Reachable, actual.Reachable);
|
||||
Assert.InRange(actual.Confidence,
|
||||
target.Expected.Confidence - 0.05,
|
||||
target.Expected.Confidence + 0.05);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Benchmark Runner
|
||||
|
||||
```bash
|
||||
# Run reachability benchmarks
|
||||
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
|
||||
--dataset datasets/reachability/samples \
|
||||
--output benchmark-results.json \
|
||||
--threshold-f1 0.95 \
|
||||
--threshold-gate-accuracy 0.99
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Sample Contribution Guidelines
|
||||
|
||||
### 7.1 Adding New Samples
|
||||
|
||||
1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/`
|
||||
2. Add `manifest.json` with sample metadata
|
||||
3. Add `richgraph-v1.json` (run scanner on artifacts)
|
||||
4. Create `ground-truth.json` with manual annotations
|
||||
5. Include reasoning for each expected outcome
|
||||
6. Run validation: `dotnet test --filter "GroundTruth"`
|
||||
|
||||
### 7.2 Ground Truth Validation
|
||||
|
||||
Ground truth files must pass schema validation:
|
||||
|
||||
```bash
|
||||
npx ajv validate -s docs/modules/reach-graph/schemas/ground-truth.schema.json \
|
||||
-d datasets/reachability/samples/**/ground-truth.json
|
||||
```
|
||||
|
||||
### 7.3 Review Requirements
|
||||
|
||||
- All samples require two independent annotators
|
||||
- Contested samples require security team review
|
||||
- Changes to existing samples require regression test pass
|
||||
|
||||
---
|
||||
|
||||
## 8. Related Documents
|
||||
|
||||
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
|
||||
- [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions
|
||||
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |
|
||||
129
docs/modules/reach-graph/schemas/runtime-static-union-schema.md
Normal file
129
docs/modules/reach-graph/schemas/runtime-static-union-schema.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)
|
||||
|
||||
## Goals
|
||||
- Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
|
||||
- Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
|
||||
- Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.
|
||||
|
||||
## File layout (CAS)
|
||||
- Namespace root: `reachability_graphs/<analysis_id>/` (analysis_id is caller-supplied UUID or hash).
|
||||
- Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
|
||||
- `nodes.ndjson` (sorted by `symbol_id`)
|
||||
- `edges.ndjson` (sorted by `from` then `to` then `edge_type`)
|
||||
- `facts_runtime.ndjson` (sorted by `symbol_id`, optional)
|
||||
- `meta.json` (single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
|
||||
- Hashing: SHA-256 of each file recorded in `meta.json` under `files[]` with `path`, `sha256`, `records`.
|
||||
- Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.
|
||||
|
||||
## SymbolID (language-agnostic envelope)
|
||||
```
|
||||
symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
|
||||
```
|
||||
- `lang`: `java|dotnet|go|node|deno|rust|swift|shell|binary`
|
||||
- `stable-fragment`: SHA-256(base64url-no-pad) of the canonical tuple per language:
|
||||
- **java**: (`package`, `class`, `method`, `descriptor`) lowercased, descriptor in JVM format.
|
||||
- **dotnet**: (`assembly_name`, `namespace`, `type`, `member_signature`) using ECMA-335 signature string.
|
||||
- **node/deno**: (`pkg_name_or_path`, `export_path`, `kind`) where `export_path` is slash-joined ESM/CJS path; `pkg_name_or_path` uses npm name or normalized absolute path with drive stripped.
|
||||
- **go**: (`module_path`, `package_path`, `receiver`, `func`), with receiver empty for functions.
|
||||
- **rust**: (`crate`, `module_path`, `item_name`, `mangled`)
|
||||
- **swift**: (`module`, `type`, `member`, `swift-mangled`)
|
||||
- **shell**: (`script_relpath`, `function_or_cmd`)
|
||||
- **binary**: (`binary_build_id`, `section`, `symbol_name`)
|
||||
|
||||
## nodes.ndjson
|
||||
Each line:
|
||||
```
|
||||
{
|
||||
"symbol_id": "sym:lang:...",
|
||||
"lang": "dotnet",
|
||||
"kind": "function|method|type|module|package|binary",
|
||||
"display": "Human readable name",
|
||||
"source": {
|
||||
"file": "relative/or/pkg/path",
|
||||
"line": 123,
|
||||
"col": 1,
|
||||
"digest": "sha256:<hex>"
|
||||
},
|
||||
"attributes": {
|
||||
"visibility": "public|internal|private",
|
||||
"async": true,
|
||||
"static": false,
|
||||
"generic_arity": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside `attributes` (e.g., `jvm_descriptor`, `dotnet_signature`).
|
||||
|
||||
## edges.ndjson
|
||||
Each line (static or runtime-derived; see `source`):
|
||||
```
|
||||
{
|
||||
"from": "sym:...",
|
||||
"to": "sym:...",
|
||||
"edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
|
||||
"confidence": "certain|high|medium|low",
|
||||
"source": {
|
||||
"origin": "static|runtime",
|
||||
"provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
|
||||
"evidence": "file:path:line"
|
||||
}
|
||||
}
|
||||
```
|
||||
- Ordering: primary `from`, secondary `to`, tertiary `edge_type`.
|
||||
- Duplicate edges with different provenance are allowed; consumers deduplicate by (`from`,`to`,`edge_type`,`provenance`).
|
||||
|
||||
## facts_runtime.ndjson (optional)
|
||||
Runtime-only observations attached to symbols:
|
||||
```
|
||||
{
|
||||
"symbol_id": "sym:...",
|
||||
"samples": {
|
||||
"call_count": 14,
|
||||
"first_seen_utc": "2025-11-22T18:21:12Z",
|
||||
"last_seen_utc": "2025-11-22T18:23:01Z"
|
||||
},
|
||||
"env": {
|
||||
"pid": 1234,
|
||||
"image": "sha256:...",
|
||||
"entrypoint": "main",
|
||||
"tags": ["sealed","offline"]
|
||||
}
|
||||
}
|
||||
```
|
||||
Sorting by `symbol_id`. Time fields must be UTC ISO-8601 with `Z`.
|
||||
|
||||
## meta.json
|
||||
```
|
||||
{
|
||||
"schema": "reachability-union@0.1",
|
||||
"generated_at": "2025-11-23T00:00:00Z",
|
||||
"produced_by": {
|
||||
"tool": "StellaOps.Scanner.Worker",
|
||||
"version": "0.1.0",
|
||||
"analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
|
||||
},
|
||||
"files": [
|
||||
{"path":"nodes.ndjson","sha256":"...","records":1234},
|
||||
{"path":"edges.ndjson","sha256":"...","records":4567},
|
||||
{"path":"facts_runtime.ndjson","sha256":"...","records":89}
|
||||
],
|
||||
"options": {
|
||||
"dedupe_edges": false,
|
||||
"include_runtime": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Determinism rules
|
||||
- Sort order as noted; no nulls; omit empty objects/arrays.
|
||||
- All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
|
||||
- Hash inputs use exact serialized bytes (no trailing spaces, newline `\n` only).
|
||||
|
||||
## Validation
|
||||
- JSON Schema draft 2020-12 available at `docs/modules/reach-graph/schemas/runtime-static-union-schema.json` (to be generated from this spec; allowable values match enumerations above).
|
||||
- Minimal required fields: `symbol_id`, `lang`, `kind` (nodes); `from`, `to`, `edge_type`, `source.origin` (edges).
|
||||
|
||||
## Integration guidance
|
||||
- Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
|
||||
- CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
|
||||
- Consumers should treat runtime edges as additive; when both origins exist, prefer `origin=runtime` for exploitability scoring but keep static edges for coverage.
|
||||
243
docs/modules/reach-graph/schemas/slice-schema.md
Normal file
243
docs/modules/reach-graph/schemas/slice-schema.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Reachability Slice Schema
|
||||
|
||||
_Last updated: 2025-12-22. Owner: Scanner Guild._
|
||||
|
||||
This document defines the **Reachability Slice** schema - a minimal, attestable proof unit that answers whether a vulnerable symbol is reachable from application entrypoints.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
A **slice** is a focused subgraph extracted from a full reachability graph, containing only the nodes and edges relevant to answering a specific reachability query (for example, "Is CVE-2024-1234's vulnerable function reachable?").
|
||||
|
||||
### Key Properties
|
||||
|
||||
| Property | Description |
|
||||
|----------|-------------|
|
||||
| **Minimal** | Contains only nodes/edges on paths between entrypoints and targets |
|
||||
| **Attestable** | DSSE-signed with a dedicated slice predicate |
|
||||
| **Reproducible** | Same inputs -> same bytes (deterministic) |
|
||||
| **Content-addressed** | Retrieved by BLAKE3 digest |
|
||||
|
||||
---
|
||||
|
||||
## 2. Predicate Type & Schema
|
||||
|
||||
- Predicate type: `stellaops.dev/predicates/reachability-slice@v1`
|
||||
- JSON schema: `https://stellaops.dev/schemas/stellaops-slice.v1.schema.json`
|
||||
- DSSE payload type: `application/vnd.stellaops.slice.v1+json`
|
||||
|
||||
---
|
||||
|
||||
## 3. Schema Structure
|
||||
|
||||
### 3.1 ReachabilitySlice
|
||||
|
||||
```csharp
|
||||
public sealed record ReachabilitySlice
|
||||
{
|
||||
[JsonPropertyName("_type")]
|
||||
public string Type { get; init; } = "stellaops.dev/predicates/reachability-slice@v1";
|
||||
|
||||
[JsonPropertyName("inputs")]
|
||||
public required SliceInputs Inputs { get; init; }
|
||||
|
||||
[JsonPropertyName("query")]
|
||||
public required SliceQuery Query { get; init; }
|
||||
|
||||
[JsonPropertyName("subgraph")]
|
||||
public required SliceSubgraph Subgraph { get; init; }
|
||||
|
||||
[JsonPropertyName("verdict")]
|
||||
public required SliceVerdict Verdict { get; init; }
|
||||
|
||||
[JsonPropertyName("manifest")]
|
||||
public required ScanManifest Manifest { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 SliceInputs
|
||||
|
||||
```csharp
|
||||
public sealed record SliceInputs
|
||||
{
|
||||
public required string GraphDigest { get; init; }
|
||||
public ImmutableArray<string> BinaryDigests { get; init; }
|
||||
public string? SbomDigest { get; init; }
|
||||
public ImmutableArray<string> LayerDigests { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 SliceQuery
|
||||
|
||||
```csharp
|
||||
public sealed record SliceQuery
|
||||
{
|
||||
public string? CveId { get; init; }
|
||||
public ImmutableArray<string> TargetSymbols { get; init; }
|
||||
public ImmutableArray<string> Entrypoints { get; init; }
|
||||
public string? PolicyHash { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 SliceSubgraph, Nodes, Edges
|
||||
|
||||
```csharp
|
||||
public sealed record SliceSubgraph
|
||||
{
|
||||
public ImmutableArray<SliceNode> Nodes { get; init; }
|
||||
public ImmutableArray<SliceEdge> Edges { get; init; }
|
||||
}
|
||||
|
||||
public sealed record SliceNode
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required string Symbol { get; init; }
|
||||
public required SliceNodeKind Kind { get; init; } // entrypoint | intermediate | target | unknown
|
||||
public string? File { get; init; }
|
||||
public int? Line { get; init; }
|
||||
public string? Purl { get; init; }
|
||||
public IReadOnlyDictionary<string, string>? Attributes { get; init; }
|
||||
}
|
||||
|
||||
public sealed record SliceEdge
|
||||
{
|
||||
public required string From { get; init; }
|
||||
public required string To { get; init; }
|
||||
public SliceEdgeKind Kind { get; init; } // direct | plt | iat | dynamic | unknown
|
||||
public double Confidence { get; init; }
|
||||
public string? Evidence { get; init; }
|
||||
public SliceGateInfo? Gate { get; init; }
|
||||
public ObservedEdgeMetadata? Observed { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.5 SliceVerdict
|
||||
|
||||
```csharp
|
||||
public sealed record SliceVerdict
|
||||
{
|
||||
public required SliceVerdictStatus Status { get; init; }
|
||||
public required double Confidence { get; init; }
|
||||
public ImmutableArray<string> Reasons { get; init; }
|
||||
public ImmutableArray<string> PathWitnesses { get; init; }
|
||||
public int UnknownCount { get; init; }
|
||||
public ImmutableArray<GatedPath> GatedPaths { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
`SliceVerdictStatus` values (snake_case):
|
||||
- `reachable`
|
||||
- `unreachable`
|
||||
- `unknown`
|
||||
- `gated`
|
||||
- `observed_reachable`
|
||||
|
||||
### 3.6 ScanManifest
|
||||
|
||||
`ScanManifest` is imported from `StellaOps.Scanner.Core` and includes required fields for reproducibility:
|
||||
|
||||
- `scanId`
|
||||
- `createdAtUtc`
|
||||
- `artifactDigest`
|
||||
- `scannerVersion`
|
||||
- `workerVersion`
|
||||
- `concelierSnapshotHash`
|
||||
- `excititorSnapshotHash`
|
||||
- `latticePolicyHash`
|
||||
- `deterministic`
|
||||
- `seed` (base64-encoded 32-byte seed)
|
||||
- `knobs` (string map)
|
||||
|
||||
`artifactPurl` is optional.
|
||||
|
||||
---
|
||||
|
||||
## 4. Verdict Computation Rules
|
||||
|
||||
```
|
||||
reachable := path_exists AND min(path_confidence) > 0.7 AND unknown_edges == 0
|
||||
unreachable := NOT path_exists AND unknown_edges == 0
|
||||
unknown := otherwise
|
||||
```
|
||||
|
||||
`gated` and `observed_reachable` are reserved for feature-gate and runtime-observed paths (see Sprint 3830 and 3840).
|
||||
|
||||
---
|
||||
|
||||
## 5. Example Slice
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "stellaops.dev/predicates/reachability-slice@v1",
|
||||
"inputs": {
|
||||
"graphDigest": "blake3:a1b2c3d4e5f6789012345678901234567890123456789012345678901234abcd",
|
||||
"binaryDigests": ["sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef"],
|
||||
"sbomDigest": "sha256:cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe"
|
||||
},
|
||||
"query": {
|
||||
"cveId": "CVE-2024-1234",
|
||||
"targetSymbols": ["openssl:EVP_PKEY_decrypt"],
|
||||
"entrypoints": ["main", "http_handler"]
|
||||
},
|
||||
"subgraph": {
|
||||
"nodes": [
|
||||
{"id": "node:1", "symbol": "main", "kind": "entrypoint", "file": "/app/main.c", "line": 42},
|
||||
{"id": "node:2", "symbol": "process_request", "kind": "intermediate", "file": "/app/handler.c", "line": 100},
|
||||
{"id": "node:3", "symbol": "decrypt_data", "kind": "intermediate", "file": "/app/crypto.c", "line": 55},
|
||||
{"id": "node:4", "symbol": "EVP_PKEY_decrypt", "kind": "target", "purl": "pkg:generic/openssl@3.0.0"}
|
||||
],
|
||||
"edges": [
|
||||
{"from": "node:1", "to": "node:2", "kind": "direct", "confidence": 1.0},
|
||||
{"from": "node:2", "to": "node:3", "kind": "direct", "confidence": 0.95},
|
||||
{"from": "node:3", "to": "node:4", "kind": "plt", "confidence": 0.9}
|
||||
]
|
||||
},
|
||||
"verdict": {
|
||||
"status": "reachable",
|
||||
"confidence": 0.9,
|
||||
"reasons": ["path_exists_high_confidence"],
|
||||
"pathWitnesses": ["main -> process_request -> decrypt_data -> EVP_PKEY_decrypt"],
|
||||
"unknownCount": 0
|
||||
},
|
||||
"manifest": {
|
||||
"scanId": "scan-1234",
|
||||
"createdAtUtc": "2025-12-22T10:00:00Z",
|
||||
"artifactDigest": "sha256:00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff",
|
||||
"artifactPurl": "pkg:generic/app@1.0.0",
|
||||
"scannerVersion": "scanner.native:1.2.0",
|
||||
"workerVersion": "scanner.worker:1.2.0",
|
||||
"concelierSnapshotHash": "sha256:1111222233334444555566667777888899990000aaaabbbbccccddddeeeeffff",
|
||||
"excititorSnapshotHash": "sha256:2222333344445555666677778888999900001111aaaabbbbccccddddeeeeffff",
|
||||
"latticePolicyHash": "sha256:3333444455556666777788889999000011112222aaaabbbbccccddddeeeeffff",
|
||||
"deterministic": true,
|
||||
"seed": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
|
||||
"knobs": { "maxDepth": "20" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Determinism Requirements
|
||||
|
||||
For reproducible slices:
|
||||
|
||||
1. **Node ordering**: Sort by `id` (ordinal).
|
||||
2. **Edge ordering**: Sort by `from`, then `to`, then `kind`.
|
||||
3. **Strings**: Trim and de-duplicate lists (`targetSymbols`, `entrypoints`, `reasons`).
|
||||
4. **Timestamps**: Use UTC ISO-8601 with `Z` suffix.
|
||||
5. **JSON serialization**: Canonical JSON (sorted keys, no whitespace).
|
||||
|
||||
---
|
||||
|
||||
## 7. Related Documentation
|
||||
|
||||
- [Binary Reachability Schema](./binary-reachability-schema.md)
|
||||
- [RichGraph Contract](../contracts/richgraph-v1.md)
|
||||
- [Function-Level Evidence](./function-level-evidence.md)
|
||||
- [Replay Verification](./replay-verification.md)
|
||||
|
||||
---
|
||||
|
||||
_Created: 2025-12-22. See Sprint 3810 for implementation details._
|
||||
Reference in New Issue
Block a user