docs consolidation and others

This commit is contained in:
master
2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions

View File

@@ -0,0 +1,202 @@
# Reachability Evidence Delivery Guide
_Last updated: November 8, 2025. Owner: Reachability Tiger Team (Scanner, Signals, Replay, Policy, Authority, UI)._
This guide translates the deterministic reachability blueprint into concrete work streams that average contributors can pick up without re-reading the entire proposal. Use it as the single navigation point when you land a reachability ticket. For a task-centric view of remaining gaps, see `docs/modules/reach-graph/guides/REACHABILITY_GAP_TASKS.md`.
---
## 1. Scope & Principles
**Goal**: ship a verifiable reachability signal for every scan by chaining SBOM → graph → runtime facts → VEX into DSSE-attested, replayable evidence.
**Principles**
1. **Deterministic inputs** canonical IDs, sorted payloads, normalized timestamps.
2. **Provable facts** every artifact has a DSSE envelope anchored in Authority + Rekor mirror.
3. **Replay-first** manifests pin feed snapshots, analyzer digests, and policies so auditors can rerun.
4. **Least surprise** same API and file layouts across languages; tests run fixture packs at CI time.
---
## 2. Evidence Chain Overview
| Stage | Producer | Artifact | Requirements |
|-------|----------|----------|--------------|
| SBOM per layer & composed image | Scanner Worker + Sbomer | `sbom.layer.cdx.json`, `sbom.image.cdx.json` | Deterministic CycloneDX 1.6, DSSE envelope, CAS URI |
| Static reachability graph | Scanner Worker lifters (DotNet, Go, Node/Deno, Rust, Swift, JVM, Binary, Shell) | `richgraph-v1.json` + `sha256` | Canonical SymbolIDs, framework entries, predicates, graph hash |
| Runtime facts | Zastava Observer / runtime probes | `runtime-trace.ndjson` (gzip or JSON) | EntryTrace schema, CAS pointer, process/socket/container metadata, optional compression |
| Replay manifest | Scanner Worker + Replay Core | `replay.yaml` | Contains analyzer versions, feed locks, graph hash, runtime trace digests |
| VEX statements | Scanner WebService + Policy Engine | `reachability.json` + OpenVEX doc | Links SBOM attn, graph attn, runtime evidence IDs |
| Signed bundle | Authority + Signer | DSSE envelope referencing above | Support FIPS + PQ variants (Dilithium where required) |
---
## 3. Work Streams (modules + hand-offs)
| Stream | Owner Guild(s) | Key deliverables |
|--------|----------------|------------------|
| **Native symbols & callgraphs** | Scanner Worker · Symbols Guild | Ship `Scanner.Symbols.Native` + `Scanner.CallGraph.Native`, integrate Symbol Manifest v1, demangle Itanium/MSVC names, emit `FuncNode`/`CallEdge` CAS bundles (task `SCANNER-NATIVE-401-015`). |
| **Reachability store** | Signals · BE-Base Platform | Provision shared PostgreSQL tables (`func_nodes`, `call_edges`, `cve_func_hits`), indexes, and repositories plus REST hooks for reuse (task `SIG-STORE-401-016`). |
| **Language lifters** | Scanner Worker | CLI/hosted lifters for DotNet, Go, Node/Deno, JVM, Rust, Swift, Binary, Shell with CAS uploads and richgraph output |
| **Signals ingestion & scoring** | Signals | `/callgraphs`, `/runtime-facts` (JSON + NDJSON/gzip), `/graphs/{id}`, `/reachability/recompute` GA; CAS-backed storage, runtime dedupe, BFS+predicates scoring |
| **Runtime capture** | Zastava + Runtime Guild | EntryTrace/eBPF samplers, NDJSON batches (symbol IDs + timestamps + counts) |
| **Replay evidence** | Replay Core + Scanner Worker | Manifest schema v2, `ReachabilityReplayWriter` integration, hash-lock tests |
| **Authority attestations** | Authority + Signer | DSSE predicates for SBOM, Graph, Replay, VEX; Rekor mirror alignment |
| **Policy & VEX** | Policy Engine + Web + CLI + UI | Accept reachability states, render “Why safe” call paths, CLI/UI explain flows |
| **QA & Docs** | QA + Docs Guilds | `reachbench-2025-expanded` fixtures wired to CI; operator + developer runbooks |
| **Binary quality guardrails (Nov 2026)** | Scanner · Signals · QA | Build-id capture, init-array roots, purl-resolved edges, unknowns emission, and patch-oracle fixtures; see sections 5.75.9 |
---
## 4. Sprint Targets
| Sprint | Nickname | Focus | Exit Criteria |
|--------|----------|-------|---------------|
| **401** | Evidence Pipeline | Finish static lifters + CAS graph storage + runtime ingestion endpoint | Graph CAS layout documented, lifter fixtures passing, `/runtime-facts` receives NDJSON batches |
| **402** | Replay & Attest | Manifest v2, DSSE envelopes, Authority/Rekor publishing | Replay packs include hashes + analyzer fingerprint; DSSE statements passed integration; Rekor mirror updated |
| **403** | Policy & Explain | VEX generation, SPL predicates, UI/CLI explainers | Policy engine uses reachability states, CLI `stella graph explain` returns signed paths, UI shows explain drawer |
Each sprint is two weeks; refer to `docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md` (new) for per-task tracking.
---
## 5. Task Breakdown Cheat Sheet
### 5.1 Scanner Worker
1. **Lifter SDK** Define `RichGraphWriter`, canonical SymbolID helpers, analyzer interface updates.
2. **Language passes** deliverables per language: discovery, graph build, framework wiring, predicate extraction, runtime overlay.
3. **Replay hooks** plug lifter output + runtime traces into `ReachabilityReplayWriter`; enforce CAS registration before emitting manifest references.
4. **Fixture runs** add tests under `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` to execute lifter outputs against reachbench A/B cases.
### 5.2 Signals Service
1. **Callgraph CAS layout** migrate from filesystem to CAS (`cas://reachability/graphs/{hash}`), include metadata doc.
2. **Runtime facts API** accept NDJSON or gzip, dedupe events, compute hit stats, link to graph nodes.
3. **Scoring engine v2** support multi-state lattice (`Unknown → Observed`), record predicates, blocked edges, runtime evidence CAS URIs.
4. **API responses** `/graphs/{scanId}` returns graph CAS refs + manifest pointers; `/reachability/recompute` accepts replay manifest IDs.
### 5.3 Replay Core & Authority
1. **Manifest schema v2** YAML + JSON versions, includes feeds/analyzers/policies.
2. **CAS naming** standardize `cas://reachability/{kind}/{sha256}`.
3. **DSSE predicate types** `SbomAttestation`, `GraphAttestation`, `VexAttestation`, `ReplayManifest`.
4. **Authority integration** new endpoints for submitting reachability predicates, rotation tests, Rekor mirror update instructions.
### 5.4 Policy / Web / UI / CLI
1. **Policy Engine** ingest reachability fact from Signals, expose via SPL, produce metrics, integrate into explanation tree.
2. **Web API** join reachability fields in vuln responses, add override endpoints, simulate support.
3. **UI/CLI** Visual explain drawer/CLI command showing signed call-path, predicates, runtime hits; counterfactual toggles.
4. **VEX emitter** generate OpenVEX statements with evidence references, DSSE sign via Signer.
### 5.5 Native binaries (build-id + init roots)
- Capture ELF build-id (`.note.gnu.build-id`) alongside soname/path and propagate into `SymbolID`/`code_id` so SBOM/runtime joins stay stable even when paths change.
- Treat `.preinit_array`, `.init_array`, `.ctors`, and `_init` as synthetic graph roots with `phase=load`; include constructors from `DT_NEEDED` deps. Persist the root list in scan evidence.
- Add deterministic tests covering build-id present/absent and init-array edge creation.
### 5.6 PURL-resolved edges
- Annotate every call edge with callee `purl` and `symbol_digest` per `docs/modules/reach-graph/guides/purl-resolved-edges.md`.
- Update `richgraph-v1` schema, CAS metadata, and CLI/UI explainers to display `purl@version` + demangled name.
- Signals merges graphs by `(purl, symbol_digest)`; Policy uses the same keys when mapping CVE-affected functions.
### 5.7 Unknowns Registry integration
- Emit structured Unknowns when symbol->purl mapping, edge targets, or hashes are ambiguous; write them via Signals API per `docs/modules/signals/guides/unknowns-registry.md`.
- Scoring adds `unknowns_pressure` so `not_affected` claims cannot bypass unresolved evidence.
- UI/CLI should surface unknown chips and triage actions.
### 5.8 Patch-oracle guardrails
- Add `tests/reachability/patch-oracles/**` with paired vuln/fixed binaries and `oracle.yml` expectations (functions/edges added/removed).
- Scanner binary analyzer tests must fail if expected guard functions or edges are missing; CI job ensures determinism.
- See `docs/modules/reach-graph/guides/patch-oracles.md` for fixture layout and manifest schema.
### 5.9 JS/PHP framework reachability
- Model framework entrypoints explicitly: Express/Fastify/Nest handlers, Laravel/Symfony routes/commands/hooks. Generate graph roots from route/handler catalogs instead of generic `main` only.
- Represent dynamic import/require/include resolution as graph nodes so ambiguity stays visible (`resolution` edges with confidence).
- Keep multi-layer graphs: source-level (TS/JS/PHP) plus bundled output (Webpack/Vite). Merge with runtime hints when available.
- Status model: `always_reachable`, `conditional`, `not_reachable`, `not_analyzed`, `ambiguous`, each with confidence and evidence tags.
- Deliver language-specific profiles + fixture cases to prove coverage; update CLI/UI explainers to show framework route context.
### 5.10 Vulnerability Surfaces (Sprint 3700)
Vulnerability surfaces identify **which specific methods changed** in a security fix, enabling precise reachability analysis:
- **Surface computation**: Download vulnerable and fixed package versions, fingerprint all methods, diff to find changed methods (sinks).
- **Trigger extraction**: Build internal call graphs, reverse BFS from sinks to public APIs (triggers).
- **Per-ecosystem support**:
- NuGet: Cecil IL fingerprinting
- npm: Babel AST fingerprinting
- Maven: ASM bytecode fingerprinting
- PyPI: Python AST fingerprinting
- **Integration**: `ISurfaceQueryService` queries triggers during scan; use triggers as sinks instead of all package methods.
- **Storage**: `scanner.vuln_surfaces`, `scanner.vuln_surface_sinks`, `scanner.vuln_surface_triggers` tables.
- **Docs**: `docs/contracts/vuln-surface-v1.md` for schema details.
### 5.11 Confidence Tiers
Reachability findings are classified into confidence tiers:
| Tier | Condition | Display | Implications |
|------|-----------|---------|--------------|
| **Confirmed** | Surface exists AND trigger method is reachable | Red badge | Highest confidence—vulnerable code definitely called |
| **Likely** | No surface but package API is called | Orange badge | Medium confidence—package used but specific vuln path unknown |
| **Present** | No call graph, dependency in SBOM | Gray badge | Lowest confidence—cannot determine reachability |
| **Unreachable** | Surface exists AND no trigger reachable | Green badge | High confidence vulnerability is not exploitable |
- Tier assignment logic in `SurfaceAwareReachabilityAnalyzer`
- API responses include `confidenceTier` and `confidenceDisplay`
- UI badges reflect tier colors
- VEX statements reference tier in justification
### 5.12 Reachability Drift (Sprint 3600)
Track function-level reachability changes between scans:
- **New reachable**: Sinks that became reachable (alert)
- **Mitigated**: Sinks that became unreachable (positive)
- **Causal attribution**: Why change occurred (guard removed, new route, code change)
- **Components**: `DriftDetectionEngine`, `PathCompressor`, `DriftCauseExplainer`
- **API**: `POST /api/drift/analyze`, `GET /api/drift/{id}`
- **UI**: `PathViewerComponent`, `RiskDriftCardComponent`
- **Attestation**: DSSE-signed drift predicates for evidence chain
---
## 6. Acceptance Tests
1. **Hash-lock** reorder analyzer flags and confirm graph hash unchanged.
2. **Replay** delete caches, replay manifest, verify DSSE + hash equality.
3. **Tamper** alter single edge and expect VEX verification failure with specific path mismatch.
4. **Golden corpus** run all reachbench cases; ensure NotReachable vs Reachable twins align with expectations JSON.
5. **Runtime sanity** feed staged runtime traces and ensure confidence bump + `observed=true` path chips propagate to UI.
---
## 7. Documentation & Runbooks
- Place developer-facing updates here (`docs/modules/reach-graph/guides`).
- [Function-level evidence guide](function-level-evidence.md) captures the Nov2025 advisory scope, task references, and schema expectations; keep it in lockstep with sprint status.
- [Reachability runtime runbook](../runbooks/reachability-runtime.md) documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
- [Reachability lattice](lattice.md) describes the confidence states, evidence/mitigation kinds, scoring policy, event graph schema, and VEX gates; update it when lattices or probes change.
- [PURL-resolved edges spec](purl-resolved-edges.md) defines the purl + symbol-digest annotation rules for graphs and SBOM joins.
- [Patch-oracles QA pattern](patch-oracles.md) describes the fixture layout and expectations for binary reachability guards.
- [Unknowns registry](../signals/unknowns-registry.md) documents how unresolved symbols/edges are recorded and how scoring uses `unknowns_pressure`.
- [Evidence schema](evidence-schema.md) is the canonical field list for richgraph, runtime facts, and Unknowns CAS objects.
- Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.
---
## 8. Contact & Rituals
- **Daily reachability stand-up** in `#reachability-build`.
- **Fixture sync** every Friday: QA leads run reachbench matrix, post report to Confluence + link in `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`.
- **Decision log** Append ADRs under `docs/adr/reachability-*` for schema changes.
Keep this guide updated whenever scope shifts or a new sprint is added.

View File

@@ -0,0 +1,44 @@
# Reachability Callgraph Formats (richgraph-v1)
## Purpose
Normalize static callgraphs across languages so Signals can merge them with runtime traces and replay bundles deterministically.
## Core fields (per node/edge)
- `nodes[].id` — canonical SymbolID (language-specific, stable, lowercase where applicable).
- `nodes[].kind` — e.g., method/function/class/file.
- `edges[].sourceId` / `edges[].targetId` — SymbolIDs; edge types include `call`, `import`, `inherit`, `reference`.
- `artifact` — CAS paths for source graph files; include `sha256`, `uri`, optional `generator` (analyzer name/version).
## Language-specific notes
- **JVM**: use JVM internal names; include signature for overloads.
- **.NET/Roslyn**: fully-qualified method token; include assembly and module for cross-assembly edges.
- **Go SSA**: package path + function; include receiver for methods.
- **Node/Deno TS**: module path + exported symbol; ES module graph only.
- **Rust MIR**: crate::module::symbol; monomorphized forms allowed if stable.
- **Swift SIL**: mangled name; demangled kept in metadata only.
- **Shell/binaries**: `SymbolID = sym:binary:{sha256(file)\0section\0addr\0name\0linkage}` via `SymbolId.ForBinaryAddressed`, include `code_id = CodeId.ForBinarySegment(...)` and set `kind=binary`.
## CAS layout
- Store graph bundles under `reachability_graphs/<hh>/<sha>.tar.zst`.
- Bundle SHOULD contain `meta.json` with analyzer, version, language, component, and entry points (array).
- File order inside tar must be lexicographic to keep hashes stable.
## Validation rules
- No duplicate node IDs; edges must reference existing nodes.
- Entry points list must be present (even if empty) for Signals recompute.
- Graph SHA256 must match tar content; Signals rejects mismatched SHA.
- Only ASCII; UTF-8 paths are allowed but must be normalized (NFC).
## V1 Schema Reference
The `stella.callgraph.v1` schema provides enhanced fields for explainability:
- **Edge Reasons**: 13 reason codes explaining why edges exist
- **Symbol Visibility**: Public/Internal/Protected/Private access levels
- **Typed Entrypoints**: Framework-aware entrypoint detection
See [Callgraph Schema Reference](../signals/callgraph-formats.md) for complete v1 schema documentation.
## References
- **V1 Schema Reference**: `docs/modules/signals/guides/callgraph-formats.md`
- Union schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`

View File

@@ -0,0 +1,69 @@
# Reachability Corpus Plan (QA-CORPUS-401-031)
Objective
- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.
## Corpus Map
### 1) Multi-runtime corpus (internal MVP)
Path: `tests/reachability/corpus/`
Per-case layout: `tests/reachability/corpus/<language>/<case>/`
- `callgraph.static.json` — static call graph sample (stub for MVP).
- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
- `vex.openvex.json` — expected VEX slice for the case.
- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
### 2) Public mini dataset (PHP/JS/C#)
Path: `tests/reachability/samples-public/`
Layout:
- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
### 3) Reachbench fixture pack (expanded, dual variants)
Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
## Ground Truth Conventions
- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
## Determinism & Runners
Regenerate all reachability manifests (corpus + public samples + reachbench pack):
- `tests/reachability/runners/run_all.sh`
- `tests/reachability/runners/run_all.ps1`
Individual scripts:
- `python tests/reachability/scripts/update_corpus_manifest.py`
- `python tests/reachability/samples-public/scripts/update_manifest.py`
- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
## CI Gates
- `tests/reachability/StellaOps.Reachability.FixtureTests`
- validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
- enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
## MVP Slice (stub cases)
- Go: `go-ssh-CVE-2020-9283-keyexchange`
- .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
- Python: `python-django-CVE-2019-19844-sqli-like`
- Rust: `rust-axum-header-parsing-TBD`
## Next Work (post-MVP)
- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.

View File

@@ -0,0 +1,143 @@
# CVE-to-Symbol Mapping
_Last updated: 2025-12-22. Owner: Scanner Guild + Concelier Guild._
This document describes how StellaOps maps CVE identifiers to specific binary symbols/functions for reachability slices.
---
## 1. Overview
To determine if a vulnerability is reachable, StellaOps resolves:
- **CVE identifiers** (e.g., `CVE-2024-1234`)
- **Package coordinates** (e.g., `pkg:npm/lodash@4.17.21`)
- **Affected symbols** (e.g., `lodash.template`, `openssl:EVP_PKEY_decrypt`)
The mapping is used by `SliceExtractor` to target the right symbols and by downstream VEX decisions.
---
## 2. Data Sources
### 2.1 Patch Diff Surfaces (Preferred)
Highest-fidelity source: compute method-level diffs between vulnerable and fixed versions.
**Implementation**: `StellaOps.Scanner.VulnSurfaces`
### 2.2 Advisory Linksets (Concelier)
Scanner queries Concelier's LNM linksets for package coordinates and optional symbol hints.
**Implementation**: `StellaOps.Scanner.Advisory` -> Concelier `/v1/lnm/linksets/{cveId}` or `/v1/lnm/linksets/search`
### 2.3 Offline Bundles
For air-gapped environments, precomputed bundles map CVEs to packages and symbols.
**Implementation**: `FileAdvisoryBundleStore`
---
## 3. Service Contracts
### 3.1 CVE -> Package/Symbol Mapping
```csharp
public interface IAdvisoryClient
{
Task<AdvisorySymbolMapping?> GetCveSymbolsAsync(string cveId, CancellationToken ct = default);
}
public sealed record AdvisorySymbolMapping
{
public required string CveId { get; init; }
public ImmutableArray<AdvisoryPackageSymbols> Packages { get; init; }
public required string Source { get; init; } // "concelier" | "bundle"
}
public sealed record AdvisoryPackageSymbols
{
public required string Purl { get; init; }
public ImmutableArray<string> Symbols { get; init; }
}
```
### 3.2 CVE + PURL -> Affected Symbols
```csharp
public interface IVulnSurfaceService
{
Task<VulnSurfaceResult> GetAffectedSymbolsAsync(
string cveId,
string purl,
CancellationToken ct = default);
}
public sealed record VulnSurfaceResult
{
public required string CveId { get; init; }
public required string Purl { get; init; }
public required ImmutableArray<AffectedSymbol> Symbols { get; init; }
public required string Source { get; init; } // "surface" | "package-symbols" | "heuristic"
public required double Confidence { get; init; }
}
public sealed record AffectedSymbol
{
public required string SymbolId { get; init; }
public string? MethodKey { get; init; }
public string? DisplayName { get; init; }
public string? ChangeType { get; init; }
public double Confidence { get; init; }
}
```
---
## 4. Caching Strategy
| Data | TTL | Notes |
|------|-----|------|
| Advisory linksets | 1 hour | In-memory cache; configurable TTL |
| Offline bundles | Process lifetime | Loaded once from file |
---
## 5. Offline Bundle Format
```json
{
"items": [
{
"cveId": "CVE-2024-1234",
"source": "bundle",
"packages": [
{
"purl": "pkg:npm/lodash@4.17.21",
"symbols": ["template", "templateSettings"]
}
]
}
]
}
```
---
## 6. Fallback Behavior
When no surface or advisory mapping is available, the service returns an empty symbol list with low confidence and `Source = "heuristic"`. Callers may inject an `IPackageSymbolProvider` to supply public-symbol fallbacks.
---
## 7. Related Documentation
- [Slice Schema](./slice-schema.md)
- [Patch Oracles](./patch-oracles.md)
- [Concelier Architecture](../modules/concelier/architecture.md)
---
_Created: 2025-12-22. See Sprint 3810 for implementation details._

View File

@@ -0,0 +1,535 @@
# Function-Level Evidence Guide
_Last updated: 2025-12-13. Owner: Docs Guild._
This guide documents the cross-module function-level evidence chain that enables provable reachability claims. It covers the schema, identifiers, API usage, CLI commands, and integration patterns for Scanner, Signals, Policy, and Replay.
---
## 1. Overview
StellaOps implements a **function-level evidence chain** that anchors every vulnerability finding to immutable identifiers (`code_id`, `symbol_id`, `graph_hash`) enabling:
- **Provable reachability:** Deterministic call-path evidence from entry points to vulnerable functions.
- **Stripped binary support:** `code_id` + `code_block_hash` provides identity when symbols are absent.
- **Evidence replay:** Sealed artifacts with DSSE attestation allow offline verification.
- **Cross-module linking:** Scanner -> Signals -> Policy -> VEX -> UI/CLI evidence chain.
### 1.1 Core Identifiers
| Identifier | Format | Purpose | Example |
|------------|--------|---------|---------|
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity | `sym:java:R3JlZXRpbmc...` |
| `code_id` | `code:{lang}:{base64url}` | Identity for name-less code blocks | `code:binary:YWJjZGVm...` |
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity | `blake3:a1b2c3d4e5f6...` |
| `symbol_digest` | `sha256:{hex}` | Hash of symbol_id for edge linking | `sha256:e5f6a7b8c9d0...` |
| `build_id` | `gnu-build-id:{hex}` | ELF/PE debug identifier | `gnu-build-id:5f0c7c3c...` |
### 1.2 Evidence Chain Flow
```
Scanner -> richgraph-v1 -> Signals -> Scoring -> Policy -> VEX -> UI/CLI
| | | | | | |
| | | | | | +-- stella graph explain
| | | | | +-- OpenVEX with call-path proofs
| | | | +-- Policy gates + reachability.state
| | | +-- Lattice state + confidence + riskScore
| | +-- Runtime facts + static paths
| +-- BLAKE3 graph_hash + DSSE attestation
+-- code_id, symbol_id, build_id per node
```
---
## 2. Schema Reference
### 2.1 SymbolID Construction
Per-language canonical tuple format (NUL-separated, then SHA-256 -> base64url):
| Language | Tuple Components | Example |
|----------|------------------|---------|
| Java | `{package}\0{class}\0{method}\0{descriptor}` | `com.example\0Foo\0bar\0(Ljava/lang/String;)V` |
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | `MyApp\0Controllers\0UserController\0GetById(int)` |
| Go | `{module}\0{package}\0{receiver}\0{func}` | `github.com/user/repo\0handler\0*Server\0Handle` |
| Node | `{pkg_or_path}\0{export_path}\0{kind}` | `lodash\0get\0function` |
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | `sha256:abc...\0.text\00x401000\0ssl3_read\0global\0` |
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` | `requests\0api\0get` |
| Ruby | `{gem_or_path}\0{module}\0{method}` | `rails\0ActionController::Base\0render` |
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | `symfony/http-kernel\0Kernel\0handle` |
### 2.2 CodeID Construction
For stripped binaries or name-less code blocks:
```
code:{lang}:{base64url_sha256(format + file_hash + addr + length + section + code_block_hash)}
```
Example for stripped ELF:
```
code:binary:YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo
```
### 2.3 Graph Node Schema
Each node in a richgraph-v1 document includes:
```json
{
"id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
"symbol_id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
"code_id": "code:java:...",
"lang": "java",
"kind": "method",
"display": "com.example.GreetingService.greet(String)",
"purl": "pkg:maven/com.example/greeting-service@1.0.0",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_digest": "sha256:e5f6a7b8...",
"code_block_hash": "sha256:deadbeef...",
"symbol": {
"mangled": null,
"demangled": "com.example.GreetingService.greet(String)",
"source": "DWARF",
"confidence": 0.98
},
"evidence": ["import", "bytecode"],
"attributes": {}
}
```
### 2.4 Graph Edge Schema
Edges carry callee `purl` and `symbol_digest` for SBOM correlation:
```json
{
"from": "sym:java:caller...",
"to": "sym:java:callee...",
"kind": "call",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"symbol_digest": "sha256:f1e2d3c4...",
"confidence": 0.92,
"evidence": ["bytecode", "import"],
"candidates": []
}
```
### 2.5 Evidence Block Schema
Evidence blocks in Policy/VEX responses cite all relevant identifiers:
```json
{
"evidence": {
"graph_hash": "blake3:a1b2c3d4e5f6...",
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
"path": [
{"symbol_id": "sym:java:...", "display": "main()"},
{"symbol_id": "sym:java:...", "display": "processRequest()"},
{"symbol_id": "sym:java:...", "display": "log4j.error()"}
],
"path_length": 3,
"confidence": 0.85,
"runtime_hits": ["probe:jfr:1234"],
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"toolchain_digest": "sha256:..."
}
}
}
```
---
## 3. API Usage
### 3.1 Signals Callgraph Ingestion
Submit a callgraph and receive a deterministic `graph_hash`:
```http
POST /signals/callgraphs
Authorization: Bearer <token>
Content-Type: application/json
{
"schema": "richgraph-v1",
"analyzer": {"name": "scanner.java", "version": "1.2.0"},
"nodes": [...],
"edges": [...],
"roots": [...]
}
```
**Response:**
```json
{
"graphHash": "blake3:a1b2c3d4e5f6...",
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
"nodeCount": 1247,
"edgeCount": 3891
}
```
### 3.2 Signals Runtime Facts
Submit runtime observations with `code_id` anchors:
```http
POST /signals/runtime-facts/ndjson?scanId=scan-123&imageDigest=sha256:abc123
Authorization: Bearer <token>
Content-Type: application/x-ndjson
Content-Encoding: gzip
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":47,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:00Z"}
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":12,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:01Z"}
```
**Response:**
```json
{
"accepted": 128,
"duplicates": 2,
"evidenceUri": "cas://reachability/runtime/sha256:xyz789..."
}
```
### 3.3 Fetch Reachability Facts
Query reachability state for a subject:
```http
GET /signals/facts/{subjectKey}
Authorization: Bearer <token>
```
**Response:**
```json
{
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
"metadata": {
"fact": {
"digest": "sha256:abc123...",
"version": 3
}
},
"states": [
{
"symbol": "sym:java:...",
"latticeState": "CR",
"bucket": "runtime",
"confidence": 0.92,
"score": 0.78,
"path": ["sym:java:main...", "sym:java:process...", "sym:java:log4j..."],
"evidence": {
"static": {"graphHash": "blake3:...", "pathLength": 3, "confidence": 0.85},
"runtime": {"probeId": "probe:jfr:1234", "hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
}
}
],
"score": 0.78,
"aggregateTier": "T2",
"riskScore": 0.65
}
```
### 3.4 Policy Findings with Reachability Evidence
```http
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
Authorization: Bearer <token>
```
**Response (excerpt):**
```json
{
"findingId": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
"reachability": {
"state": "CR",
"confidence": 0.92,
"evidence": {
"graph_hash": "blake3:a1b2c3d4...",
"path": [
{"symbol_id": "sym:java:...", "display": "main()"},
{"symbol_id": "sym:java:...", "display": "Logger.error()"}
],
"runtime_hits": 47,
"fact_digest": "sha256:abc123..."
}
},
"steps": [
{"rule": "reachability_gate", "state": "CR", "allowed": true},
{"rule": "severity_baseline", "severity": {"normalized": "Critical", "score": 10.0}}
]
}
```
---
## 4. CLI Usage
### 4.1 Graph Explain Command
View the call path and evidence for a finding:
```bash
stella graph explain --finding "pkg:maven/log4j@2.14.1:CVE-2021-44228" --scan-id scan-123
# Output:
Finding: CVE-2021-44228 in pkg:maven/log4j@2.14.1
Reachability: CONFIRMED_REACHABLE (CR)
Confidence: 0.92
Graph Hash: blake3:a1b2c3d4e5f6...
Call Path (3 hops):
1. main() [sym:java:R3JlZXRpbmcuLi4=]
-> processRequest() [direct call]
2. processRequest() [sym:java:cHJvY2Vzcy4uLg==]
-> Logger.error() [virtual call]
3. Logger.error() [sym:java:bG9nNGouLi4=]
[VULNERABLE: CVE-2021-44228]
Runtime Evidence:
- JFR probe hit: 47 times
- Last observed: 2025-12-13T10:00:00Z
DSSE Attestation: cas://reachability/graphs/a1b2c3d4....dsse
```
### 4.2 Graph Export Command
Export a reachability graph for offline analysis:
```bash
stella graph export --scan-id scan-123 --output ./evidence-bundle/
# Creates:
# ./evidence-bundle/richgraph-v1.json # Canonical graph
# ./evidence-bundle/richgraph-v1.json.dsse # DSSE envelope
# ./evidence-bundle/meta.json # Metadata
# ./evidence-bundle/runtime-facts.ndjson # Runtime observations
```
### 4.3 Graph Verify Command
Verify a graph's DSSE signature and Rekor inclusion:
```bash
stella graph verify --graph ./evidence-bundle/richgraph-v1.json \
--dsse ./evidence-bundle/richgraph-v1.json.dsse \
--rekor-log
# Output:
Graph Hash: blake3:a1b2c3d4e5f6...
DSSE Signature: VALID (key: scanner-signing-2025)
Rekor Entry: 12345678 (verified)
Timestamp: 2025-12-13T09:30:00Z
```
---
## 5. OpenVEX Integration
### 5.1 OpenVEX with Reachability Evidence
When Policy emits VEX decisions, reachability evidence is included:
```json
{
"@context": "https://openvex.dev/ns/v0.2.0",
"@id": "https://stellaops.example/vex/2025-12-13/001",
"author": "StellaOps Policy Engine",
"timestamp": "2025-12-13T10:00:00Z",
"version": 1,
"statements": [
{
"vulnerability": {"@id": "CVE-2021-44228"},
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
"status": "affected",
"justification": "vulnerable_code_in_container",
"impact_statement": "Vulnerable Log4j method reachable from main entry point.",
"action_statement": "Upgrade to log4j 2.17.1 or later.",
"stellaops:reachability": {
"state": "CR",
"confidence": 0.92,
"graph_hash": "blake3:a1b2c3d4e5f6...",
"path_length": 3,
"evidence_uri": "cas://reachability/graphs/a1b2c3d4..."
}
}
]
}
```
### 5.2 VEX "not_affected" with Unreachability Evidence
When code is provably unreachable:
```json
{
"statements": [
{
"vulnerability": {"@id": "CVE-2023-XXXXX"},
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
"status": "not_affected",
"justification": "vulnerable_code_not_in_execute_path",
"impact_statement": "Vulnerable function not reachable from any entry point.",
"stellaops:reachability": {
"state": "CU",
"confidence": 0.88,
"graph_hash": "blake3:d4e5f6a7b8c9...",
"evidence_uri": "cas://reachability/graphs/d4e5f6a7b8c9...",
"runtime_observation_window": "72h",
"runtime_hits": 0
}
}
]
}
```
---
## 6. Replay Manifest v2
### 6.1 Manifest Structure
Replay manifests now enforce BLAKE3 hashing and CAS registration:
```json
{
"schema": "stellaops.replay.manifest@v2",
"subject": "scan:123",
"generatedAt": "2025-12-13T10:00:00Z",
"hashAlg": "blake3",
"artifacts": [
{
"kind": "richgraph",
"uri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6...",
"hash": "blake3:a1b2c3d4e5f6...",
"dsseUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6....dsse"
},
{
"kind": "runtime-facts",
"uri": "cas://reachability/runtime/sha256:xyz789...",
"hash": "sha256:xyz789..."
},
{
"kind": "sbom",
"uri": "cas://scanner-artifacts/sbom.cdx.json",
"hash": "sha256:def456..."
}
],
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"toolchain_digest": "sha256:..."
},
"code_id_coverage": {
"total_symbols": 1247,
"with_code_id": 1189,
"coverage_pct": 95.3
}
}
```
### 6.2 Determinism Verification
Replay a manifest to verify determinism:
```bash
stella replay verify --manifest ./manifest.json --sealed
# Output:
Manifest: stellaops.replay.manifest@v2
Subject: scan:123
Artifacts: 3
Verifying richgraph...
Computed: blake3:a1b2c3d4e5f6...
Expected: blake3:a1b2c3d4e5f6...
Status: MATCH
Verifying runtime-facts...
Computed: sha256:xyz789...
Expected: sha256:xyz789...
Status: MATCH
Verifying sbom...
Computed: sha256:def456...
Expected: sha256:def456...
Status: MATCH
All artifacts verified. Determinism check PASSED.
```
---
## 7. Module Integration Guide
### 7.1 Scanner -> Signals
Scanner emits richgraph-v1 with `code_id` and `symbol_id`:
1. Scanner analyzes container/artifact
2. Callgraph generators emit nodes with `symbol_id`, `code_id`, `build_id`
3. RichGraphWriter canonicalizes (sorted arrays/keys) and computes `graph_hash` (BLAKE3)
4. DSSE signer wraps canonical JSON
5. CAS store persists body + envelope
6. Signals ingestion API receives URI reference
### 7.2 Signals -> Policy
Signals provides reachability facts to Policy:
1. Policy queries `/signals/facts/{subjectKey}`
2. Response includes `metadata.fact.digest`, `states[]`, `score`
3. Policy gates check `latticeState` (U, SR, SU, RO, RU, CR, CU, X)
4. Evidence blocks in findings reference `graph_hash`, `path[]`, `runtime_hits[]`
### 7.3 Policy -> VEX/UI
Policy emits OpenVEX with evidence:
1. VexDecisionEmitter serializes OpenVEX with `stellaops:reachability` extension
2. UI explain drawer fetches evidence via `/api/policy/findings/{id}/explain`
3. CLI `stella graph explain` renders call path and attestation refs
---
## 8. CAS Layout Reference
```
cas://reachability/
graphs/
{blake3}/ # Graph body (canonical JSON)
{blake3}.dsse # DSSE envelope
edges/
{graph_hash}/{bundle_id} # Edge bundle body (optional)
{graph_hash}/{bundle_id}.dsse
runtime/
{sha256}/ # Runtime facts NDJSON
```
---
## 9. Related Documentation
- [Reachability Lattice Model](./lattice.md) - State definitions and join rules
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Schema specification
- [Evidence Schema](./evidence-schema.md) - Detailed field definitions
- [Signals API Contract](../api/signals/reachability-contract.md) - API reference
- [Policy Gates](./policy-gate.md) - Gate configuration
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
- [Ground Truth Schema](./ground-truth-schema.md) - Test fixture format
---
_Last updated: 2025-12-13. See Sprint 0401 GAP-DOC-008 for change history._

View File

@@ -0,0 +1,206 @@
# Gate Detection for Reachability Scoring
> **Sprint:** SPRINT_3405_0001_0001
> **Module:** Scanner Reachability / Signals
## Overview
Gate detection identifies protective controls in code paths that reduce the likelihood of vulnerability exploitation. When a vulnerable function is protected by authentication, feature flags, admin-only checks, or configuration gates, the reachability score is reduced proportionally.
## Gate Types
| Gate Type | Multiplier | Description |
|-----------|------------|-------------|
| `AuthRequired` | 30% | Code path requires authentication |
| `FeatureFlag` | 20% | Code path behind a feature flag |
| `AdminOnly` | 15% | Code path requires admin/elevated role |
| `NonDefaultConfig` | 50% | Code path requires non-default configuration |
### Multiplier Stacking
Multiple gate types stack multiplicatively:
```
Auth (30%) × Feature Flag (20%) = 6%
Auth (30%) × Admin (15%) = 4.5%
All four gates = ~0.45% (floored to 5%)
```
A minimum floor of **5%** prevents scores from reaching zero.
## Detection Methods
### AuthGateDetector
Detects authentication requirements:
**C# Patterns:**
- `[Authorize]` attribute
- `User.Identity.IsAuthenticated` checks
- `HttpContext.User` access
- JWT/Bearer token validation
**Java Patterns:**
- `@PreAuthorize`, `@Secured` annotations
- `SecurityContextHolder.getContext()`
- Spring Security filter chains
**Go Patterns:**
- Middleware patterns (`authMiddleware`, `RequireAuth`)
- Context-based auth checks
**JavaScript/TypeScript Patterns:**
- Express.js `passport` middleware
- JWT verification middleware
- Session checks
### FeatureFlagDetector
Detects feature flag guards:
**Patterns:**
- LaunchDarkly: `ldClient.variation()`, `ld.boolVariation()`
- Split.io: `splitClient.getTreatment()`
- Unleash: `unleash.isEnabled()`
- Custom: `featureFlags.isEnabled()`, `isFeatureEnabled()`
### AdminOnlyDetector
Detects admin/role requirements:
**Patterns:**
- `[Authorize(Roles = "Admin")]`
- `User.IsInRole("Admin")`
- `@RolesAllowed("ADMIN")`
- RBAC middleware checks
### ConfigGateDetector
Detects configuration-based gates:
**Patterns:**
- Environment variable checks (`process.env.ENABLE_FEATURE`)
- Configuration file conditionals
- Runtime feature toggles
- Debug-only code paths
## Output Contract
### DetectedGate
**Note:** In **Signals API outputs**, `type` is serialized as the C# enum name (e.g., `"AuthRequired"`). In **richgraph-v1** JSON, `type` is lowerCamelCase and gate fields are snake_case (see example below).
```typescript
interface DetectedGate {
type: 'AuthRequired' | 'FeatureFlag' | 'AdminOnly' | 'NonDefaultConfig';
detail: string; // Human-readable description
guardSymbol: string; // Symbol where gate was detected
sourceFile?: string; // Source file location
lineNumber?: number; // Line number
confidence: number; // 0.0-1.0 confidence score
detectionMethod: string; // Detection algorithm used
}
```
### GateDetectionResult
```typescript
interface GateDetectionResult {
gates: DetectedGate[];
hasGates: boolean;
primaryGate?: DetectedGate; // Highest confidence gate
combinedMultiplierBps: number; // Basis points (10000 = 100%)
}
```
## Integration
### RichGraph Edge Annotation
Gates are annotated on `RichGraphEdge` objects:
```csharp
public sealed record RichGraphEdge
{
// ... existing properties ...
/// <summary>Gates detected on this edge</summary>
public IReadOnlyList<DetectedGate> Gates { get; init; } = [];
/// <summary>Combined gate multiplier in basis points</summary>
public int GateMultiplierBps { get; init; } = 10000;
}
```
**richgraph-v1 JSON example (edge fragment):**
```json
{
"gate_multiplier_bps": 3000,
"gates": [
{
"type": "authRequired",
"detail": "[Authorize] attribute on controller",
"guard_symbol": "MyController.VulnerableAction",
"source_file": "src/MyController.cs",
"line_number": 42,
"detection_method": "csharp.attribute",
"confidence": 0.95
}
]
}
```
### ReachabilityReport
Gates are included in the reachability report:
```json
{
"vulnId": "CVE-2024-0001",
"reachable": true,
"score": 7.5,
"adjustedScore": 2.25,
"gates": [
{
"type": "AuthRequired",
"detail": "[Authorize] attribute on controller",
"guardSymbol": "MyController.VulnerableAction",
"confidence": 0.95
}
],
"gateMultiplierBps": 3000
}
```
## Configuration
### appsettings.json
```json
{
"Reachability": {
"GateMultipliers": {
"AuthRequiredMultiplierBps": 3000,
"FeatureFlagMultiplierBps": 2000,
"AdminOnlyMultiplierBps": 1500,
"NonDefaultConfigMultiplierBps": 5000,
"MinimumMultiplierBps": 500
}
}
}
```
## Metrics
| Metric | Description |
|--------|-------------|
| `scanner.gates_detected_total` | Total gates detected by type |
| `scanner.gate_reduction_applied` | Histogram of multiplier reductions |
| `scanner.gated_vulns_total` | Vulnerabilities with gates detected |
## Related Documentation
- [Reachability Architecture](../modules/scanner/architecture.md)
- [Determinism Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md) - Sections 2.2, 4.3
- [Signals Service](../modules/signals/architecture.md)

View File

@@ -0,0 +1,508 @@
# Hybrid Reachability Attestation (Graph + Edge-Bundle)
> Decision date: 2025-12-11 · Owners: Scanner Guild, Attestor Guild, Signals Guild, Policy Guild
## 0. Context: Four Capabilities
This document supports **Signed Reachability**—one of four capabilities no competitor offers together:
1. **Signed Reachability** Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. Both static call-graph edges and runtime-derived edges can be attested—true hybrid reachability.
2. **Deterministic Replay** Scans run bit-for-bit identical from frozen feeds and analyzer manifests.
3. **Explainable Policy (Lattice VEX)** Evidence-linked VEX decisions with explicit "Unknown" state handling.
4. **Sovereign + Offline Operation** FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles.
All evidence is sealed in **Decision Capsules** for audit-grade reproducibility.
---
## 1. Purpose
- Guarantee replayable, signed reachability evidence with **graph-level DSSE** for every scan while enabling **selective edge-level DSSE bundles** when finer provenance or dispute handling is required.
- Keep CI/offline bundles lean (graph-first), but allow auditors/regulators to quarantine or prove individual edges without regenerating whole graphs.
- Support **hybrid reachability** by attesting both static call-graph edges and runtime-derived edges.
## 2. Attestation levels
- **Level 0 (Graph DSSE) — Required**
- Payload: canonical `richgraph-v1` (nodes, edges, roots, graph_hash, analyzer metadata, policy_hash).
- Signature: one DSSE envelope per graph; submit digest to Rekor (or mirror) always.
- CAS: `cas://reachability/graphs/{blake3}` (body) + `cas://reachability/graphs/{blake3}.dsse` (envelope).
- **Level 1 (Edge-Bundle DSSE) — Optional/Selective**
- Payload: batch of edges (size ≤ 512) with per-edge reason, evidence hashes, `symbol_digest`, `purl`, `confidence`, and `phase`.
- Criteria to emit bundles:
- Edge reason is `runtime`, `init_array`/constructors/TLS callbacks, or comes from third-party provenance.
- Edge is contested/flagged in Unknowns registry or under policy quarantine.
- Signature: one DSSE envelope per bundle; Rekor submission **configurable** (default on for contested/high-risk bundles, off for bulk benign bundles in sealed mode).
- CAS: `cas://reachability/edges/{graph_hash}/{bundle_id}` JSON + `.../{bundle_id}.dsse`.
## 3. Producer responsibilities
- **Scanner**
- Always emit Level 0 graph + manifest.
- When criteria match, emit Level 1 bundles; include `bundle_reason` (e.g., `runtime-hit`, `init-root`, `third-party`, `disputed`).
- Canonicalise JSON (sorted keys/arrays) before hashing; BLAKE3 as graph hash, SHA-256 inside bundles.
- For hybrid reachability: tag edges with `source: static` or `source: runtime` to distinguish call-graph derived vs. runtime-observed edges.
- **Attestor/Signer**
- Apply DSSE for both levels; respect sovereign crypto modes (FIPS/GOST/SM/PQC) from environment.
- Rekor: push graph envelope digests; push edge-bundle digests only when `rekor_publish=true` (policy/default for high-risk bundles).
## 4. Consumer responsibilities
- **Signals**
- Ingest graph DSSE as the canonical source; ingest edge-bundles when present and attach to the same `graph_hash`.
- Store per-edge DSSE metadata for quarantine/override flows; surface missing edges as Unknowns only when absent from both graph and bundles.
- **Policy**
- Default trust path: graph DSSE + CAS object.
- When an edge is quarantined/contested, drop it from consideration if an edge-bundle DSSE marks it `revoked=true` or if the Unknowns registry lists it with policy quarantine flag.
- For "evidence-required" rules, require either (a) graph DSSE + policy_hash match **or** (b) edge-bundle DSSE that covers the vulnerable path edges.
- **Replay/Bench/CLI**
- `stella graph verify` should accept `--graph {hash}` and optional `--edge-bundles` to validate deeper provenance offline.
## 5. Verification and quarantine flows
- **Happy path**: verify graph DSSE → verify Rekor inclusion (or mirror) → hash graph body → match `graph_hash` in policy/replay manifest → accept.
- **Dispute/quarantine**: mark specific `edge_id` as `revoked` in an edge-bundle DSSE; Policy/Signals exclude it, recompute reachability, and surface delta in explainers.
- **Offline**: retain graph DSSE and selected edge-bundles inside replay pack; Rekor proofs cached when available.
- **Sovereign Verification Mode**: Even with no internet, all signatures and transparency proofs can be locally verified using Offline Update Kits.
## 6. Performance & storage guardrails
- Default: only graph DSSE is mandatory; edge-bundles capped at 512 edges per envelope and emitted only on criteria above.
- Rekor flood control: cap edge-bundle Rekor submissions per graph (config `reachability.edgeBundles.maxRekorPublishes`, default 5). Others stay CAS-only.
- Determinism: bundle ordering = stable sort by `(bundle_reason, edge_id)`; hash before signing.
## 7. Hybrid Reachability Details
Stella Ops provides **true hybrid reachability** by combining:
| Signal Type | Source | Attestation |
|-------------|--------|-------------|
| Static call-graph edges | IL/bytecode analysis, framework routing models, entry-point proximity | Graph DSSE (Level 0) |
| Runtime-observed edges | EventPipe, JFR, Node inspector, Go/Rust probes | Edge-bundle DSSE (Level 1) with `source: runtime` |
**Why hybrid matters:**
- Static analysis catches code paths that may not execute during observed runtime
- Runtime analysis catches dynamic dispatch, reflection, and framework-injected paths
- Combining both provides confidence across build and runtime contexts
- Each edge type is separately attestable for audit and dispute resolution
**Evidence linking:** Each edge in the graph or bundle includes `evidenceRefs` pointing to the underlying proof artifacts (static analysis artifacts, runtime traces), enabling **evidence-linked VEX decisions**.
## 8. Decisions (Frozen 2025-12-13)
### 8.1 DSSE/Rekor Budget by Deployment Tier
| Tier | Graph DSSE | Edge-Bundle DSSE | Rekor Publish | Max Bundles/Graph |
|------|------------|------------------|---------------|-------------------|
| **Regulated** (SOC2, FedRAMP, PCI) | Required | Required for runtime/contested | Required | 10 |
| **Standard** | Required | Optional (criteria-based) | Graph only | 5 |
| **Air-gapped** | Required | Optional | Offline checkpoint | 5 |
| **Dev/Test** | Optional | Optional | Disabled | Unlimited |
**Budget enforcement:**
- Graph DSSE: Always submit digest to Rekor (or offline checkpoint for air-gapped)
- Edge-bundle DSSE: Submit to Rekor only when `bundle_reason` is `disputed`, `runtime-hit`, or `security-critical`
- Cap enforced by `reachability.edgeBundles.maxRekorPublishes` config (per tier defaults above)
### 8.2 Signing Layout and CAS Paths
```
cas://reachability/
graphs/
{blake3}/ # richgraph-v1 body (JSON)
{blake3}.dsse # Graph DSSE envelope
{blake3}.rekor # Rekor inclusion proof (optional)
edges/
{graph_hash}/
{bundle_id}.json # Edge bundle body
{bundle_id}.dsse # Edge bundle DSSE envelope
{bundle_id}.rekor # Rekor inclusion proof (if published)
revisions/
{revision_id}/ # Revision manifest + lineage
```
**Signing workflow:**
1. Canonicalize richgraph-v1 JSON (sorted keys, arrays by deterministic key)
2. Compute BLAKE3-256 hash -> `graph_hash`
3. Create DSSE envelope with `stella.ops/graph@v1` predicate
4. Submit digest to Rekor (online) or cache checkpoint (offline)
5. Store graph body + envelope + proof in CAS
### 8.3 CLI UX for Selective Bundle Verification
```bash
# Verify graph DSSE only (default)
stella graph verify --hash blake3:a1b2c3d4...
# Verify graph + all edge bundles
stella graph verify --hash blake3:a1b2c3d4... --include-bundles
# Verify specific edge bundle
stella graph verify --hash blake3:a1b2c3d4... --bundle bundle:001
# Offline verification with local CAS
stella graph verify --hash blake3:a1b2c3d4... --cas-root ./offline-cas/
# Verify Rekor inclusion
stella graph verify --hash blake3:a1b2c3d4... --rekor-proof
# Output formats
stella graph verify --hash blake3:a1b2c3d4... --format json|table|summary
```
### 8.4 Golden Fixture Plan
**Fixture location:** `tests/Reachability/Hybrid/`
**Required fixtures:**
| Fixture | Description | Expected Verification Time |
|---------|-------------|---------------------------|
| `graph-only.golden.json` | Minimal richgraph-v1 with DSSE | < 100ms |
| `graph-with-runtime.golden.json` | Graph + 1 runtime edge bundle | < 200ms |
| `graph-with-contested.golden.json` | Graph + 1 contested/revoked edge bundle | < 200ms |
| `large-graph.golden.json` | 10K nodes, 50K edges, 5 bundles | < 2s |
| `offline-bundle.golden.tgz` | Complete offline replay pack | < 5s |
**CI integration:**
- `.gitea/workflows/hybrid-attestation.yml` runs verification fixtures
- Size gate: Graph body < 10MB, individual bundle < 1MB
- Time gate: Full verification < 5s for standard tier
### 8.5 Implementation Status
| Component | Status | Notes |
|-----------|--------|-------|
| Graph DSSE predicate | Done | `stella.ops/graph@v1` in PredicateTypes.cs |
| Edge-bundle DSSE predicate | Done | `stella.ops/edgeBundle@v1` via EdgeBundlePublisher |
| Edge-bundle models | Done | EdgeBundle.cs, EdgeBundleReason, EdgeReason enums |
| Edge-bundle CAS publisher | Done | EdgeBundlePublisher.cs with deterministic DSSE |
| Edge-bundle ingestion | Done | EdgeBundleIngestionService in Signals |
| CAS layout | Done | Per section 8.2 |
| Runtime-facts CAS storage | Done | IRuntimeFactsArtifactStore, FileSystemRuntimeFactsArtifactStore |
| CLI verify command | Planned | Per section 8.3 |
| Golden fixtures | Planned | Per section 8.4 |
| Rekor integration | Done | Via Attestor module |
| Quarantine enforcement | Done | HasQuarantinedEdges in ReachabilityFactDocument |
---
## 9. Verification Runbook
This section provides step-by-step guidance for verifying hybrid attestations in different scenarios.
### 9.1 Graph-Only Verification
Use this workflow when only graph-level attestation is required (default for most use cases).
**Prerequisites:**
- Access to CAS storage (local or remote)
- `stella` CLI installed
- Optional: Rekor instance access for transparency verification
**Steps:**
1. **Retrieve graph DSSE envelope:**
```bash
stella graph fetch --hash blake3:<graph_hash> --output ./verification/
```
2. **Verify DSSE signature:**
```bash
stella graph verify --hash blake3:<graph_hash>
# Output: ✓ Graph signature valid (key: <key_id>)
```
3. **Verify content integrity:**
```bash
stella graph verify --hash blake3:<graph_hash> --check-content
# Output: ✓ Content hash matches BLAKE3:<graph_hash>
```
4. **Verify Rekor inclusion (online):**
```bash
stella graph verify --hash blake3:<graph_hash> --rekor-proof
# Output: ✓ Rekor inclusion verified (log index: <index>)
```
5. **Verify policy hash binding:**
```bash
stella graph verify --hash blake3:<graph_hash> --policy-hash sha256:<policy_hash>
# Output: ✓ Policy hash matches graph metadata
```
### 9.2 Graph + Edge-Bundle Verification
Use this workflow when finer-grained verification of specific edges is required.
**When to use:**
- Auditing runtime-observed paths
- Investigating contested/disputed edges
- Verifying init-section or TLS callback roots
- Regulatory compliance requiring edge-level attestation
**Steps:**
1. **List available edge bundles:**
```bash
stella graph bundles --hash blake3:<graph_hash>
# Output:
# Bundle ID Reason Edges Rekor
# bundle:001 runtime-hit 42 ✓
# bundle:002 init-root 15 ✓
# bundle:003 third-party 128 -
```
2. **Verify specific bundle:**
```bash
stella graph verify --hash blake3:<graph_hash> --bundle bundle:001
# Output:
# ✓ Bundle DSSE signature valid
# ✓ All 42 edges link to graph_hash
# ✓ Rekor inclusion verified
```
3. **Verify all bundles:**
```bash
stella graph verify --hash blake3:<graph_hash> --include-bundles
# Output:
# ✓ Graph signature valid
# ✓ 3 bundles verified (185 edges total)
```
4. **Check for revoked edges:**
```bash
stella graph verify --hash blake3:<graph_hash> --check-revoked
# Output:
# ⚠ 2 edges marked revoked in bundle:002
# - edge:func_a→func_b (reason: policy-quarantine)
# - edge:func_c→func_d (reason: revoked)
```
### 9.3 Verification Decision Matrix
| Scenario | Graph DSSE | Edge Bundles | Rekor | Policy Hash |
|----------|------------|--------------|-------|-------------|
| Standard CI/CD | Required | Optional | Recommended | Required |
| Regulated audit | Required | Required | Required | Required |
| Dispute resolution | Required | Required (contested) | Required | Optional |
| Offline replay | Required | As available | Cached proof | Required |
| Dev/test | Optional | Optional | Disabled | Optional |
---
## 10. Rekor Guidance
### 10.1 Rekor Integration Overview
Rekor provides an immutable transparency log for attestation artifacts. StellaOps integrates with Rekor (or compatible mirrors) to provide verifiable timestamps and inclusion proofs.
### 10.2 What Gets Published to Rekor
| Artifact Type | Rekor Publish | Condition |
|---------------|---------------|-----------|
| Graph DSSE digest | Always | All deployment tiers (except dev/test) |
| Edge-bundle DSSE digest | Conditional | Only for `disputed`, `runtime-hit`, `security-critical` reasons |
| VEX decision DSSE digest | Always | When VEX decisions are generated |
### 10.3 Rekor Configuration
```yaml
# etc/signals.yaml
reachability:
rekor:
enabled: true
endpoint: "https://rekor.sigstore.dev" # Or private mirror
timeout: 30s
retry:
attempts: 3
backoff: exponential
edgeBundles:
maxRekorPublishes: 5 # Per graph, configurable by tier
publishReasons:
- disputed
- runtime-hit
- security-critical
```
### 10.4 Private Rekor Mirror
For air-gapped or regulated environments:
```yaml
reachability:
rekor:
enabled: true
endpoint: "https://rekor.internal.example.com"
tls:
ca: /etc/stellaops/ca.crt
clientCert: /etc/stellaops/client.crt
clientKey: /etc/stellaops/client.key
```
### 10.5 Rekor Proof Caching
Inclusion proofs are cached locally for offline verification:
```
cas://reachability/graphs/{blake3}.rekor # Graph inclusion proof
cas://reachability/edges/{graph_hash}/{bundle_id}.rekor # Bundle proof
```
**Proof format:**
```json
{
"logIndex": 12345678,
"logId": "c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d",
"integratedTime": 1702492800,
"inclusionProof": {
"logIndex": 12345678,
"rootHash": "abc123...",
"treeSize": 50000000,
"hashes": ["def456...", "ghi789..."]
}
}
```
---
## 11. Offline Replay Steps
### 11.1 Overview
Offline replay enables full verification of reachability attestations without network access. This is essential for air-gapped deployments and regulatory compliance scenarios.
### 11.2 Creating an Offline Replay Pack
**Step 1: Export graph and bundles**
```bash
stella graph export --hash blake3:<graph_hash> \
--include-bundles \
--include-rekor-proofs \
--output ./offline-pack/
```
**Step 2: Include required artifacts**
The export creates:
```
offline-pack/
├── manifest.json # Replay manifest v2
├── graphs/
│ └── <blake3>/
│ ├── richgraph-v1.json # Graph body
│ ├── graph.dsse # DSSE envelope
│ └── graph.rekor # Inclusion proof
├── edges/
│ └── <graph_hash>/
│ ├── bundle-001.json
│ ├── bundle-001.dsse
│ └── bundle-001.rekor
├── runtime-facts/
│ └── <hash>/
│ └── runtime-facts.ndjson
└── checkpoints/
└── rekor-checkpoint.json # Transparency log checkpoint
```
**Step 3: Bundle for transfer**
```bash
stella offline pack --input ./offline-pack/ --output offline-replay.tgz
```
### 11.3 Verifying an Offline Pack
**Step 1: Extract pack**
```bash
stella offline unpack --input offline-replay.tgz --output ./verify/
```
**Step 2: Verify manifest integrity**
```bash
stella offline verify --manifest ./verify/manifest.json
# Output:
# ✓ Manifest version: 2
# ✓ Hash algorithm: blake3
# ✓ All CAS entries present
# ✓ All hashes verified
```
**Step 3: Verify attestations offline**
```bash
stella graph verify --hash blake3:<graph_hash> \
--cas-root ./verify/ \
--offline
# Output:
# ✓ Graph DSSE signature valid (offline mode)
# ✓ Rekor proof verified against checkpoint
# ✓ 3 bundles verified offline
```
### 11.4 Offline Verification Trust Model
```
┌─────────────────────────────────────────────────────────┐
│ Offline Pack │
├─────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Graph DSSE │ │ Edge Bundle │ │ Rekor │ │
│ │ Envelope │ │ DSSE │ │ Checkpoint │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Local Verification Engine │ │
│ │ 1. Verify DSSE signatures against trusted keys │ │
│ │ 2. Verify content hashes match DSSE payloads │ │
│ │ 3. Verify Rekor proofs against checkpoint │ │
│ │ 4. Verify policy hash binding │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
### 11.5 Air-Gapped Deployment Checklist
- [ ] Trusted signing keys pre-installed
- [ ] Rekor checkpoint from last sync included
- [ ] All referenced CAS artifacts bundled
- [ ] Policy hash recorded in manifest
- [ ] Analyzer manifests included for replay
- [ ] Runtime-facts artifacts included (if applicable)
---
## 12. Release Notes
### 12.1 Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-11 | Initial hybrid attestation design |
| 1.1 | 2025-12-13 | Added edge-bundle ingestion, CAS storage, verification runbook |
### 12.2 Breaking Changes
None. Hybrid attestation is additive; existing graph-only workflows remain unchanged.
### 12.3 Migration Guide
**From graph-only to hybrid:**
1. No migration required for existing graphs
2. Enable edge-bundle emission in scanner config:
```yaml
scanner:
reachability:
edgeBundles:
enabled: true
emitRuntime: true
emitContested: true
```
3. Signals automatically ingests edge bundles when present
---
## 13. Cross-References
- **Sprint:** SPRINT_0401_0001_0001_reachability_evidence_chain.md (Tasks 53-56)
- **Contracts:** docs/contracts/richgraph-v1.md, docs/contracts/edge-bundle-v1.md
- **Implementation:**
- Scanner: `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/EdgeBundle*.cs`
- Signals: `src/Signals/StellaOps.Signals/Ingestion/EdgeBundleIngestionService.cs`
- Policy: `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateEvaluator.cs`
- **Related docs:**
- docs/modules/reach-graph/guides/function-level-evidence.md
- docs/modules/reach-graph/guides/lattice.md
- docs/replay/DETERMINISTIC_REPLAY.md
- docs/ARCHITECTURE_OVERVIEW.md

View File

@@ -0,0 +1,254 @@
# Reachability Lattice & Scoring Model
> **Status:** Implemented v0 in Signals; this document describes the current deterministic bucket model and its policy-facing implications.
> **Owners:** Scanner Guild · Signals Guild · Policy Guild.
StellaOps models reachability as a deterministic, evidence-linked outcome that can safely represent "unknown" without silently producing false safety. Signals produces a `ReachabilityFactDocument` with per-target `states[]` and a top-level `score` that is stable under replays.
---
## 1. Current model (Signals v0)
Signals scoring (`src/Signals/StellaOps.Signals/Services/ReachabilityScoringService.cs`) computes, for each `target` symbol:
- `reachable`: whether there exists a path from the selected `entryPoints[]` to `target`.
- `bucket`: a coarse classification of *why* the target is/was reachable.
- `confidence` (0..1): a bounded confidence value.
- `weight` (0..1): bucket multiplier.
- `score` (0..1): `confidence * weight`.
- `path[]`: the discovered path (if reachable), deterministically ordered.
- `evidence.runtimeHits[]`: runtime hit symbols that appear on the chosen path.
The fact-level `score` is the average of per-target scores, penalized by unknowns pressure (see §4).
---
## 2. Buckets & default weights
Bucket assignment is deterministic and uses this precedence:
1. `unreachable` — no path exists.
2. `entrypoint` — the `target` itself is an entrypoint.
3. `runtime` — at least one runtime hit overlaps the discovered path.
4. `direct` — reachable and the discovered path is length ≤ 2.
5. `unknown` — reachable but none of the above classifications apply.
Default weights (configurable via `SignalsOptions:Scoring:ReachabilityBuckets`):
| Bucket | Default weight |
|--------|----------------|
| `entrypoint` | `1.0` |
| `direct` | `0.85` |
| `runtime` | `0.45` |
| `unknown` | `0.5` |
| `unreachable` | `0.0` |
---
## 3. Confidence (reachable vs unreachable)
Default confidence values (configurable via `SignalsOptions:Scoring:*`):
| Input | Default |
|-------|---------|
| `reachableConfidence` | `0.75` |
| `unreachableConfidence` | `0.25` |
| `runtimeBonus` | `0.15` |
| `minConfidence` | `0.05` |
| `maxConfidence` | `0.99` |
Rules:
- Base confidence is `reachableConfidence` when `reachable=true`, otherwise `unreachableConfidence`.
- When `reachable=true` and runtime evidence overlaps the selected path, add `runtimeBonus` (bounded by `maxConfidence`).
- The final confidence is clamped to `[minConfidence, maxConfidence]`.
---
## 4. Unknowns pressure (missing/ambiguous evidence)
Signals tracks unresolved symbols/edges as **Unknowns** (see `docs/modules/signals/guides/unknowns-registry.md`). The number of unknowns for a subject influences the final score:
```
unknownsPressure = unknownsCount / (targetsCount + unknownsCount)
pressurePenalty = min(unknownsPenaltyCeiling, unknownsPressure)
fact.score = avg(states[i].score) * (1 - pressurePenalty)
```
Default `unknownsPenaltyCeiling` is `0.35` (configurable).
This keeps the system deterministic while preventing unknown-heavy subjects from appearing "safe" by omission.
---
## 5. Evidence references & determinism anchors
Signals produces stable references intended for downstream evidence chains:
- `metadata.fact.digest` — canonical digest of the reachability fact (`sha256:<hex>`).
- `metadata.fact.version` — monotonically increasing integer for the same `subjectKey`.
- Callgraph ingestion returns a deterministic `graphHash` (sha256) for the normalized callgraph.
Downstream services (Policy, UI/CLI explainers, replay tooling) should use these fields as stable evidence references.
---
## 6. Policy-facing guidance (avoid false "not affected")
Policy should treat `unreachable` (or low fact score) as **insufficient** to claim "not affected" unless:
- the reachability evidence is present and referenced (`metadata.fact.digest`), and
- confidence is above a high-confidence threshold.
When evidence is missing or confidence is low, the correct output is **under investigation** rather than "not affected".
---
## 7. Signals API pointers
- `docs/modules/signals/api/reachability-contract.md`
- `docs/modules/signals/api/samples/facts-sample.json`
---
## 8. Roadmap (tracked in Sprint 0401)
- Introduce first-class uncertainty state lists + entropy-derived `riskScore` (see `uncertainty-entropy.md`).
- Extend evidence refs to include CAS/DSSE pointers for graph-level and edge-bundle attestations.
---
## 9. Formal Lattice Model v1 (design — Sprint 0401)
The v0 bucket model provides coarse classification. The v1 lattice model introduces a formal 7-state lattice with algebraic join/meet operations for monotonic, deterministic reachability analysis across evidence types.
### 9.1 State Definitions
| State | Code | Ordering | Description |
|-------|------|----------|-------------|
| `Unknown` | `U` | ⊥ (bottom) | No evidence available; default state |
| `StaticallyReachable` | `SR` | 1 | Static analysis suggests path exists |
| `StaticallyUnreachable` | `SU` | 1 | Static analysis finds no path |
| `RuntimeObserved` | `RO` | 2 | Runtime probe/hit confirms execution |
| `RuntimeUnobserved` | `RU` | 2 | Runtime probe active but no hit observed |
| `ConfirmedReachable` | `CR` | 3 | Both static + runtime agree reachable |
| `ConfirmedUnreachable` | `CU` | 3 | Both static + runtime agree unreachable |
| `Contested` | `X` | (top) | Static and runtime evidence conflict |
### 9.2 Lattice Ordering (Hasse Diagram)
```
Contested (X)
/ | \
/ | \
ConfirmedReachable | ConfirmedUnreachable
(CR) | (CU)
| \ / / |
| \ / / |
| \ / / |
RuntimeObserved RuntimeUnobserved
(RO) (RU)
| |
| |
StaticallyReachable StaticallyUnreachable
(SR) (SU)
\ /
\ /
Unknown (U)
```
### 9.3 Join Rules (⊔ — least upper bound)
When combining evidence from multiple sources, use the join operation:
```
U ⊔ S = S (any evidence beats unknown)
SR ⊔ RO = CR (static reachable + runtime hit = confirmed)
SU ⊔ RU = CU (static unreachable + runtime miss = confirmed)
SR ⊔ RU = X (static reachable but runtime miss = contested)
SU ⊔ RO = X (static unreachable but runtime hit = contested)
CR ⊔ CU = X (conflicting confirmations = contested)
X ⊔ * = X (contested absorbs all)
```
**Full join table:**
| ⊔ | U | SR | SU | RO | RU | CR | CU | X |
|---|---|----|----|----|----|----|----|---|
| **U** | U | SR | SU | RO | RU | CR | CU | X |
| **SR** | SR | SR | X | CR | X | CR | X | X |
| **SU** | SU | X | SU | X | CU | X | CU | X |
| **RO** | RO | CR | X | RO | X | CR | X | X |
| **RU** | RU | X | CU | X | RU | X | CU | X |
| **CR** | CR | CR | X | CR | X | CR | X | X |
| **CU** | CU | X | CU | X | CU | X | CU | X |
| **X** | X | X | X | X | X | X | X | X |
### 9.4 Meet Rules (⊓ — greatest lower bound)
Used for conservative intersection (e.g., multi-entry-point consensus):
```
U ⊓ * = U (unknown is bottom)
CR ⊓ CR = CR (agreement preserved)
X ⊓ S = S (drop contested to either side)
```
### 9.5 Monotonicity Properties
1. **Evidence accumulation is monotonic:** Once state rises in the lattice, it cannot descend without explicit revocation.
2. **Revocation resets to Unknown:** When evidence is invalidated (e.g., graph invalidation), state resets to `U`.
3. **Contested states require human triage:** `X` state triggers policy flags and UI attention.
### 9.6 Mapping v0 Buckets to v1 States
| v0 Bucket | v1 State(s) | Notes |
|-----------|-------------|-------|
| `unreachable` | `SU`, `CU` | Depends on runtime evidence availability |
| `entrypoint` | `CR` | Entry points are by definition reachable |
| `runtime` | `RO`, `CR` | Depends on static analysis agreement |
| `direct` | `SR`, `CR` | Direct paths with/without runtime confirmation |
| `unknown` | `U` | No evidence available |
### 9.7 Policy Decision Matrix
| v1 State | VEX "not_affected" | VEX "affected" | VEX "under_investigation" |
|----------|-------------------|----------------|---------------------------|
| `U` | ❌ blocked | ⚠️ needs evidence | ✅ default |
| `SR` | ❌ blocked | ✅ allowed | ✅ allowed |
| `SU` | ⚠️ low confidence | ❌ contested | ✅ allowed |
| `RO` | ❌ blocked | ✅ allowed | ✅ allowed |
| `RU` | ⚠️ medium confidence | ❌ contested | ✅ allowed |
| `CR` | ❌ blocked | ✅ required | ❌ invalid |
| `CU` | ✅ allowed | ❌ blocked | ❌ invalid |
| `X` | ❌ blocked | ❌ blocked | ✅ required |
### 9.8 Implementation Notes
- **State storage:** `ReachabilityFactDocument.states[].latticeState` field (enum)
- **Join implementation:** `ReachabilityLattice.Join(a, b)` in `src/Signals/StellaOps.Signals/Services/`
- **Backward compatibility:** v0 bucket computed from v1 state for API consumers
### 9.9 Evidence Chain Requirements
Each lattice state transition must be accompanied by evidence references:
```json
{
"symbol": "sym:java:...",
"latticeState": "CR",
"previousState": "SR",
"evidence": {
"static": {
"graphHash": "blake3:...",
"pathLength": 3,
"confidence": 0.92
},
"runtime": {
"probeId": "probe:...",
"hitCount": 47,
"observedAt": "2025-12-13T10:00:00Z"
}
},
"transitionAt": "2025-12-13T10:00:00Z"
}

View File

@@ -0,0 +1,78 @@
# Deterministic Reachability — Product Moat (Nov 2025)
Source: internal advisory “23-Nov-2025 - Where StellaOps Can Truly Lead”. Supersedes/extends archived binary reachability advisories (18-Nov-2025 - Binary-Reachability-Engine, Encoding Binary Reachability with PURL-Resolved Edges, CSharp-Binary-Analyzer). This page is the canonical, high-level articulation of our reachability moat for architects, PMM, and field teams. Detailed schemas live in `docs/modules/reach-graph/guides/evidence-schema.md` and `docs/modules/reach-graph/guides/hybrid-attestation.md`.
## Why it matters
- Most scanners list every CVE; reachability asks whether vulnerable code is actually callable.
- Competitors infer paths and rarely sign evidence; we **prove** paths with deterministic graphs and attestations.
- Outcome targets: ≥40% fewer noisy vulns shown; ≥25% faster triage via explainable “why” paths.
## Moat elements
1) **Deterministic call-graphs per artifact**
- Stable node IDs: `purl@version!build-id!symbol-signature` (or code offset when stripped).
- Stable edge IDs: `SHA256(nodeA||nodeB||tool-version||inputs-hash)`.
- Graph hash: BLAKE3 over canonical JSON; locked by manifest.
2) **Signed evidence**
- Graph-level DSSE for every scan (mandatory).
- Optional edge-bundle DSSE (≤512 edges) for runtime/init/contested edges; Rekor publish capped. See `docs/modules/reach-graph/guides/hybrid-attestation.md`.
3) **Explainability**
- Each finding carries call-chain + per-edge reason + VEX gate decision + layer attribution.
4) **Container layer provenance**
- Track file-to-layer mapping; show “introduced in layer X from base Y”.
5) **Replayability**
- Determinism manifest locks feeds, toolchain hashes, analyzer flags; replay yields identical graph and attestations.
## Minimal architecture slice
- **Sbomer/Scanner**: emit SBOM + symbol maps + per-layer file index; capture Build-IDs.
- **Cartographer**: build deterministic call-graphs (language + native), output `EdgeList.jsonl` with stable IDs.
- **Attestor**: wrap graph (and edge bundles when emitted) into DSSE; log digests to Rekor/mirror.
- **Vexer/Policy**: evaluate lattice, produce OpenVEX with linked edge proofs.
- **Ledger**: retain manifests and DSSE; mirror to Rekor where allowed.
## Practical spec (condensed)
- **Node fields**: `symbol_id`, `code_id`, `purl`, `build_id`, `symbol_digest`, `lang`, `evidence[]`.
- **Edge fields**: `from`, `to`, `kind` (direct|plt|runtime|init), `purl`, `symbol_digest`, `reason`, `confidence`, `evidence[]`.
- **Roots**: exports, entrypoints, **.init_array/.ctors/TLS callbacks**, plugin hooks.
- **Attestation layout**:
- Graph: `cas://reachability/graphs/{blake3}` + `{blake3}.dsse` (Rekor always).
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` (Rekor optional, capped).
### Example: Edge-bundle DSSE payload (abridged)
```json
{
"graph_hash": "blake3:...",
"bundle_reason": "runtime-hit",
"edges": [{
"edge_id": "sha256:...",
"from": "sym:...caller",
"to": "sym:...callee",
"reason": "plt",
"purl": "pkg:deb/openssl@3.0.2?arch=amd64",
"symbol_digest": "sha256:...",
"revoked": false
}]
}
```
### Field cheat sheet (for sprint readers)
- `graph_hash` — BLAKE3 of canonical graph JSON.
- `bundle_reason``runtime-hit | init-root | contested | third-party`.
- `edge_id` — sha256(from||to||reason||tool-version||inputs-hash).
- `revoked` — when true, policy/Signals must drop this edge before reachability scoring.
- `purl` + `symbol_digest` — bind edge to SBOM component and callee identity.
## Quick wins (ship order)
1) Capture Build-IDs in Scanner and thread into `symbol_id`/`code_id`.
2) Emit Graph Determinism Manifest (feeds + toolchain hashes) per scan.
3) Turn on edge-bundle DSSE for runtime/init edges first; keep Rekor cap low.
4) Surface “why path” + layer attribution in CLI/UI explainers.
## APIs (strawman)
- `POST /graph/edges: attest` — idempotent; same inputs → same edge IDs.
- `GET /findings/:id/proof` — returns call-chain + Rekor inclusion proofs.
- `GET /vex/:artifact` — streams OpenVEX with embedded proofs.
## Links
- Advisory source: `docs/product-advisories/23-Nov-2025 - Where StellaOps Can Truly Lead.md`
- Schemas: `docs/modules/reach-graph/guides/evidence-schema.md`, `docs/modules/reach-graph/guides/hybrid-attestation.md`
- Sprint tracking: `docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md`

View File

@@ -0,0 +1,220 @@
# Patch-Oracles QA Pattern
Patch oracles define expected functions and edges that must be present (or absent) in generated reachability graphs. The CI pipeline uses these oracles to ensure that:
1. Critical vulnerability paths are correctly identified as reachable
2. Mitigated paths are correctly identified as unreachable
3. Graph generation remains deterministic and complete
This document covers both the **JSON-based harness** (for reachbench integration) and the **YAML-based format** (for binary patch testing).
---
## Part A: JSON Patch-Oracle Harness (v1)
The JSON-based patch-oracle harness integrates with the reachbench fixture system for CI graph validation.
### A.1 Schema Overview
Patch-oracle fixtures follow the `patch-oracle/v1` schema:
```json
{
"schema_version": "patch-oracle/v1",
"id": "curl-CVE-2023-38545-socks5-heap-reachable",
"case_ref": "curl-CVE-2023-38545-socks5-heap",
"variant": "reachable",
"description": "Validates SOCKS5 heap overflow path is reachable",
"expected_functions": [...],
"expected_edges": [...],
"expected_roots": [...],
"forbidden_functions": [...],
"forbidden_edges": [...],
"min_confidence": 0.5,
"strict_mode": false
}
```
### A.2 Expected Functions
Define functions that MUST be present in the graph:
```json
{
"symbol_id": "sym://curl:curl.c#sink",
"lang": "c",
"kind": "function",
"purl_pattern": "pkg:github/curl/*",
"required": true,
"reason": "Vulnerable buffer handling function"
}
```
### A.3 Expected Edges
Define edges that MUST be present in the graph:
```json
{
"from": "sym://net:handler#read",
"to": "sym://curl:curl.c#entry",
"kind": "call",
"min_confidence": 0.8,
"required": true,
"reason": "Data flows from network to SOCKS5 handler"
}
```
### A.4 Forbidden Elements (for unreachable variants)
```json
{
"forbidden_functions": [
{
"symbol_id": "sym://dangerous#sink",
"reason": "Should not be reachable when feature disabled"
}
],
"forbidden_edges": [
{
"from": "sym://entry",
"to": "sym://sink",
"reason": "Path should be blocked by feature flag"
}
]
}
```
### A.5 Wildcard Patterns
Symbol IDs support `*` wildcards:
- `sym://test#func1` - exact match
- `sym://test#*` - matches any symbol starting with `sym://test#`
- `*` - matches anything
### A.6 Directory Structure
```
tests/reachability/fixtures/patch-oracles/
├── INDEX.json # Oracle index
├── schema/
│ └── patch-oracle-v1.json # JSON Schema
└── cases/
├── curl-CVE-2023-38545-socks5-heap/
│ ├── reachable.oracle.json
│ └── unreachable.oracle.json
└── java-log4j-CVE-2021-44228-log4shell/
└── reachable.oracle.json
```
### A.7 Usage in Tests
```csharp
var loader = new PatchOracleLoader(fixtureRoot);
var oracle = loader.LoadOracle("curl-CVE-2023-38545-socks5-heap-reachable");
var comparer = new PatchOracleComparer(oracle);
var result = comparer.Compare(richGraph);
if (!result.Success)
{
foreach (var violation in result.Violations)
{
Console.WriteLine($"[{violation.Type}] {violation.From} -> {violation.To}");
}
}
```
### A.8 Violation Types
| Type | Description |
|------|-------------|
| `MissingFunction` | Required function not found |
| `MissingEdge` | Required edge not found |
| `MissingRoot` | Required root not found |
| `ForbiddenFunctionPresent` | Forbidden function found |
| `ForbiddenEdgePresent` | Forbidden edge found |
| `UnexpectedFunction` | Unexpected function in strict mode |
| `UnexpectedEdge` | Unexpected edge in strict mode |
---
## Part B: YAML Binary Patch-Oracles
The YAML-based format is used for paired vulnerable/fixed binary testing.
### B.1 Workflow (per CVE)
1) Pick a CVE with a small, clean fix (e.g., OpenSSL, zlib, BusyBox). Identify vulnerable commit `A` and fixed commit `B`.
2) Build two stripped binaries (`vuln`, `fixed`) with identical toolchains/flags; keep a tiny harness that exercises the affected path.
3) Run Scanner binary analyzers to emit `richgraph-v1` for each binary.
4) Diff graphs: expect new/removed functions and edges to match the patch (e.g., `foo_parse -> validate_len` added; `foo_parse -> memcpy` removed).
5) Fail the test if expected functions/edges are absent or unchanged.
### B.2 Oracle manifest (YAML)
```yaml
cve: CVE-YYYY-XXXX
target: libfoo 1.2.3
build:
cc: clang
cflags: [-O2, -fno-omit-frame-pointer]
ldflags: []
strip: true
expect:
functions_added: [validate_len]
functions_removed: [unsafe_copy]
edges_added:
- { caller: foo_parse, callee: validate_len }
edges_removed:
- { caller: foo_parse, callee: memcpy }
tolerances:
allow_unresolved_symbols: 0
allow_extra_funcs: 2
```
Place manifests under `tests/reachability/patch-oracles/<cve>/oracle.yml` next to the sources/build scripts.
## 3. Repository layout
```
tests/reachability/patch-oracles/
CVE-YYYY-XXXX-foo/
src/ # vuln + fixed sources + harness
build.sh # produces ./out/vuln ./out/fixed
oracle.yml
```
## 4. Harness rules
- Output binaries to `out/vuln` and `out/fixed` with deterministic flags and stripped symbols.
- Record toolchain version in a sidecar `build-meta.json` so Replay captures provenance.
- Never download from the internet during CI; vendor tiny sources into the fixture folder.
## 5. Test runner expectations
- Runs Scanner binary analyzers on both binaries; emits `richgraph-v1` CAS entries.
- Compares graphs against `oracle.yml` expectations (functions/edges added/removed, tolerances).
- Fails when deltas are missing; succeeds when expected guards/edges are present.
## 6. Integration points
- **Scanner**: add fixture runner under `tests/reachability/StellaOps.Scanner.Binary.PatchOracleTests`.
- **CI**: wire into reachbench/patch-oracles job; ensure artifacts are small and deterministic.
- **Docs**: link this file from reachability delivery guide once tests are live.
### B.7 Acceptance criteria
- At least three seed oracles (e.g., zlib overflow, OpenSSL length guard, BusyBox ash fix) committed with passing expectations.
- CI job proves deterministic hashes across reruns.
- Failures emit clear diffs (`expected edge foo->validate_len missing`).
---
## Related Documentation
- [Reachability Evidence Chain](./function-level-evidence.md)
- [RichGraph Schema](../contracts/richgraph-v1.md)
- [Ground Truth Schema](./ground-truth-schema.md)
- [Lattice States](./lattice.md)
- [Reachability Delivery Guide](./DELIVERY_GUIDE.md)

View File

@@ -0,0 +1,269 @@
# Reachability Evidence Policy Gates
> **Status:** Design v1 (Sprint 0401)
> **Owners:** Policy Guild, Signals Guild, VEX Guild
This document defines the policy gates that enforce reachability evidence requirements for VEX decisions. Gates prevent unsafe "not_affected" claims when evidence is insufficient.
---
## 1. Overview
Policy gates act as checkpoints between evidence (reachability lattice state, uncertainty tier) and VEX status transitions. They ensure that:
1. **No false safety:** "not_affected" requires strong evidence of unreachability
2. **Explicit uncertainty:** Missing evidence triggers "under_investigation" rather than silence
3. **Audit trail:** All gate decisions are logged with evidence references
---
## 2. Gate Types
### 2.1 Lattice State Gate
Guards VEX status transitions based on the v1 lattice state (see `docs/modules/reach-graph/guides/lattice.md` §9).
| Requested VEX Status | Required Lattice State | Gate Action |
|---------------------|------------------------|-------------|
| `not_affected` | `CU` (ConfirmedUnreachable) | ✅ Allow |
| `not_affected` | `SU` (StaticallyUnreachable) | ⚠️ Allow with warning, requires `justification` |
| `not_affected` | `RU` (RuntimeUnobserved) | ⚠️ Allow with warning, requires `justification` |
| `not_affected` | `U`, `SR`, `RO`, `CR`, `X` | ❌ Block |
| `affected` | `CR` (ConfirmedReachable) | ✅ Allow |
| `affected` | `SR`, `RO` | ✅ Allow |
| `affected` | `U`, `SU`, `RU`, `CU`, `X` | ⚠️ Warn (potential false positive) |
| `under_investigation` | Any | ✅ Allow (safe default) |
| `fixed` | Any | ✅ Allow (remediation action) |
### 2.2 Uncertainty Tier Gate
Guards VEX status transitions based on the uncertainty tier (see `uncertainty-entropy.md` §1.1).
| Requested VEX Status | Uncertainty Tier | Gate Action |
|---------------------|------------------|-------------|
| `not_affected` | T1 (High) | ❌ Block |
| `not_affected` | T2 (Medium) | ⚠️ Warn, require explicit override |
| `not_affected` | T3 (Low) | ⚠️ Allow with advisory note |
| `not_affected` | T4 (Negligible) | ✅ Allow |
| `affected` | T1 (High) | ⚠️ Review required (may be false positive) |
| `affected` | T2-T4 | ✅ Allow |
### 2.3 Evidence Completeness Gate
Guards based on the presence of required evidence artifacts.
| VEX Status | Required Evidence | Gate Action if Missing |
|------------|-------------------|----------------------|
| `not_affected` | `graphHash` (DSSE-attested) | ❌ Block |
| `not_affected` | `pathAnalysis.pathLength >= 0` | ❌ Block |
| `not_affected` | `confidence >= 0.8` | ⚠️ Warn if < 0.8 |
| `affected` | `graphHash` OR `runtimeProbe` | Warn if neither |
| `under_investigation` | None required | Allow |
---
## 3. Gate Evaluation Order
Gates are evaluated in this order; first blocking gate stops evaluation:
```
1. Evidence Completeness Gate → Block if required evidence missing
2. Lattice State Gate → Block if state incompatible with status
3. Uncertainty Tier Gate → Block/warn based on tier
4. Confidence Threshold Gate → Warn if confidence below threshold
```
---
## 4. Gate Decision Document
Each gate evaluation produces a decision document:
```json
{
"gateId": "gate:vex:not_affected:2025-12-13T10:00:00Z",
"requestedStatus": "not_affected",
"subject": {
"vulnId": "CVE-2025-12345",
"purl": "pkg:maven/com.example/foo@1.0.0",
"symbolId": "sym:java:..."
},
"evidence": {
"latticeState": "CU",
"uncertaintyTier": "T3",
"graphHash": "blake3:...",
"riskScore": 0.25,
"confidence": 0.92
},
"gates": [
{
"name": "EvidenceCompleteness",
"result": "pass",
"reason": "graphHash present"
},
{
"name": "LatticeState",
"result": "pass",
"reason": "CU allows not_affected"
},
{
"name": "UncertaintyTier",
"result": "pass_with_note",
"reason": "T3 allows with advisory note",
"note": "MissingPurl uncertainty at 35% entropy"
}
],
"decision": "allow",
"advisory": "VEX status allowed with note: T3 uncertainty from MissingPurl",
"decidedAt": "2025-12-13T10:00:00Z"
}
```
---
## 5. Contested State Handling
When lattice state is `X` (Contested):
1. **Block all definitive statuses:** Neither "not_affected" nor "affected" allowed
2. **Force "under_investigation":** Auto-assign until triage resolves conflict
3. **Emit triage event:** Notify VEX operators of conflict with evidence links
4. **Evidence overlay:** Show both static and runtime evidence for manual review
### Contested Resolution Workflow
```
1. System detects X state
2. VEX status locked to "under_investigation"
3. Triage event emitted to operator queue
4. Operator reviews:
a. Static evidence (graph, paths)
b. Runtime evidence (probes, hits)
5. Operator provides resolution:
a. Trust static → state becomes SU/SR
b. Trust runtime → state becomes RU/RO
c. Add new evidence → recompute lattice
6. Gate re-evaluates with new state
```
---
## 6. Override Mechanism
Operators with `vex:gate:override` permission can bypass gates with mandatory fields:
```json
{
"override": {
"gateId": "gate:vex:not_affected:...",
"operator": "user:alice@example.com",
"justification": "Manual review confirms code path is dead code",
"evidence": {
"type": "ManualReview",
"reviewId": "review:2025-12-13:001",
"attachments": ["cas://evidence/review/..."]
},
"approvedAt": "2025-12-13T11:00:00Z",
"expiresAt": "2026-01-13T11:00:00Z"
}
}
```
Override requirements:
- `justification` is mandatory and logged
- Overrides expire after configurable period (default: 30 days)
- All overrides are auditable and appear in compliance reports
---
## 7. Configuration
Gate thresholds are configurable via `PolicyGatewayOptions`:
```yaml
PolicyGateway:
Gates:
LatticeState:
AllowSUForNotAffected: true # Allow SU with warning
AllowRUForNotAffected: true # Allow RU with warning
RequireJustificationForWeakStates: true
UncertaintyTier:
BlockT1ForNotAffected: true
WarnT2ForNotAffected: true
EvidenceCompleteness:
RequireGraphHashForNotAffected: true
MinConfidenceForNotAffected: 0.8
MinConfidenceWarning: 0.6
Override:
DefaultExpirationDays: 30
RequireJustification: true
```
---
## 8. API Integration
### POST `/api/v1/vex/status`
Request:
```json
{
"vulnId": "CVE-2025-12345",
"purl": "pkg:maven/com.example/foo@1.0.0",
"status": "not_affected",
"justification": "vulnerable_code_not_present",
"reachabilityEvidence": {
"factDigest": "sha256:...",
"graphHash": "blake3:..."
}
}
```
Response (gate blocked):
```json
{
"success": false,
"gateDecision": {
"decision": "block",
"blockedBy": "LatticeState",
"reason": "Lattice state SR (StaticallyReachable) incompatible with not_affected",
"currentState": "SR",
"requiredStates": ["CU", "SU", "RU"],
"suggestion": "Submit runtime probe evidence or change to under_investigation"
}
}
```
---
## 9. Metrics & Alerts
The policy gateway emits metrics:
| Metric | Labels | Description |
|--------|--------|-------------|
| `stellaops_gate_decisions_total` | `gate`, `result`, `status` | Total gate decisions |
| `stellaops_gate_blocks_total` | `gate`, `reason` | Total blocked requests |
| `stellaops_gate_overrides_total` | `operator` | Total override uses |
| `stellaops_contested_states_total` | `vulnId` | Active contested states |
Alert conditions:
- `stellaops_gate_overrides_total` rate > threshold → Audit review
- `stellaops_contested_states_total` > 10 → Triage backlog alert
---
## 10. Related Documents
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
- [Uncertainty States](uncertainty-entropy.md) — Tier definitions and risk scoring
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
- [VEX Contract](../contracts/vex-v1.md) — VEX document schema
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Policy Guild | Initial design from Sprint 0401 |

View File

@@ -0,0 +1,51 @@
# PURL-Resolved Callgraph Edges (Nov 2026)
This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
## 1. Goal
Annotate every call edge in `richgraph-v1` with:
- `purl` of the component that defines the callee, and
- a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint).
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
## 2. Data model additions
- **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
- **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly.
- **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata.
## 3. Producer rules
1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence.
2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry).
4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON.
## 4. Consumer rules
- **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays.
- **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl.
- **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present.
## 5. SBOM join strategy
1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`.
2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge.
## 6. Acceptance tests
- Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly.
- Disassembly: direct `call` to internal function carries `purl` of the hosting binarys SBOM entry.
- Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`.
- Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
## 7. Deliverables
- Update `richgraph-v1` schema and DTOs (Scanner + Signals).
- Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests.
- CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest.
- Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.

View File

@@ -0,0 +1,48 @@
# Reachability · Runtime + Static Union (v0.1)
## What this covers
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.
## Pipeline (at a glance)
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.
## Storage & CAS namespaces
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.
## Signals API quick reference
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).
## Scoring and unknowns
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.050.99).
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.
## Replay contract changes (v0.1 add-ons)
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.
## Operator checklist
- Use deterministic CAS paths; never embed absolute file paths.
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.
## References
- Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.

View File

@@ -0,0 +1,332 @@
# Replay Verification
_Last updated: 2025-12-22. Owner: Scanner Guild._
This document describes the **replay verification** workflow that ensures reachability slices are reproducible and tamper-evident.
---
## 1. Overview
Replay verification answers: *"Given the same inputs, do we get the exact same slice?"*
This is critical for:
- **Audit trails**: Prove analysis results are genuine
- **Tamper detection**: Detect modified inputs or results
- **Debugging**: Identify sources of non-determinism
- **Compliance**: Demonstrate reproducible security analysis
---
## 2. Replay Workflow
```
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ Original │ │ Rehydrate │ │ Recompute │
│ Slice │────►│ Inputs │────►│ Slice │
│ (with digest) │ │ from CAS │ │ (fresh) │
└─────────────────┘ └──────────────────┘ └───────────────────┘
┌───────────────────┐
│ Compare │
│ byte-for-byte │
└───────────────────┘
┌─────────────┴─────────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ MATCH │ │ MISMATCH │
│ ✓ │ │ + diff │
└──────────┘ └──────────┘
```
---
## 3. API Reference
### 3.1 Replay Endpoint
```http
POST /api/slices/replay
Content-Type: application/json
{
"sliceDigest": "blake3:a1b2c3d4..."
}
```
### 3.2 Response Format
**Match Response (200 OK)**:
```json
{
"match": true,
"originalDigest": "blake3:a1b2c3d4...",
"recomputedDigest": "blake3:a1b2c3d4...",
"replayedAt": "2025-12-22T10:00:00Z",
"inputsVerified": true
}
```
**Mismatch Response (200 OK)**:
```json
{
"match": false,
"originalDigest": "blake3:a1b2c3d4...",
"recomputedDigest": "blake3:e5f6g7h8...",
"replayedAt": "2025-12-22T10:00:00Z",
"diff": {
"missingNodes": ["node:5"],
"extraNodes": ["node:6"],
"missingEdges": [{"from": "node:1", "to": "node:5"}],
"extraEdges": [{"from": "node:1", "to": "node:6"}],
"verdictDiff": {
"original": "unreachable",
"recomputed": "reachable"
},
"confidenceDiff": {
"original": 0.95,
"recomputed": 0.72
}
},
"possibleCauses": [
"Input graph may have been modified",
"Analyzer version mismatch: 1.2.0 vs 1.2.1",
"Feed version changed: nvd-2025-12-20 vs nvd-2025-12-22"
]
}
```
**Error Response (404 Not Found)**:
```json
{
"error": "slice_not_found",
"message": "Slice with digest blake3:a1b2c3d4... not found in CAS",
"sliceDigest": "blake3:a1b2c3d4..."
}
```
---
## 4. Input Rehydration
All inputs must be CAS-addressed for replay:
### 4.1 Required Inputs
| Input | CAS Key | Description |
|-------|---------|-------------|
| Graph | `cas://graphs/{digest}` | Full RichGraph JSON |
| Binaries | `cas://binaries/{digest}` | Binary file hashes |
| SBOM | `cas://sboms/{digest}` | CycloneDX/SPDX document |
| Policy | `cas://policies/{digest}` | Policy DSL |
| Feeds | `cas://feeds/{version}` | Advisory feed snapshot |
### 4.2 Manifest Contents
```json
{
"manifest": {
"analyzerVersion": "scanner.native:1.2.0",
"rulesetHash": "sha256:abc123...",
"feedVersions": {
"nvd": "2025-12-20",
"osv": "2025-12-20",
"ghsa": "2025-12-20"
},
"createdAt": "2025-12-22T10:00:00Z",
"toolchain": "iced-x86:1.21.0",
"environment": {
"os": "linux",
"arch": "x86_64"
}
}
}
```
---
## 5. Determinism Requirements
For byte-for-byte reproducibility:
### 5.1 JSON Canonicalization
```
1. Keys sorted alphabetically at all levels
2. No whitespace (compact JSON)
3. UTF-8 encoding
4. Lowercase hex for all hashes
5. Numbers: no trailing zeros, scientific notation for large values
```
### 5.2 Graph Ordering
```
Nodes: sorted by symbolId (lexicographic)
Edges: sorted by (from, to) tuple (lexicographic)
Paths: sorted by first node, then path length
```
### 5.3 Timestamp Handling
```
All timestamps: UTC, ISO-8601, with 'Z' suffix
Example: "2025-12-22T10:00:00Z"
No milliseconds unless significant
```
### 5.4 Floating Point
```
Confidence values: round to 6 decimal places
Example: 0.950000, not 0.95 or 0.9500001
```
---
## 6. Diff Computation
When slices don't match:
### 6.1 Diff Algorithm
```python
def compute_diff(original, recomputed):
diff = SliceDiff()
# Node diff
orig_nodes = set(n.id for n in original.subgraph.nodes)
new_nodes = set(n.id for n in recomputed.subgraph.nodes)
diff.missing_nodes = list(orig_nodes - new_nodes)
diff.extra_nodes = list(new_nodes - orig_nodes)
# Edge diff
orig_edges = set((e.from, e.to) for e in original.subgraph.edges)
new_edges = set((e.from, e.to) for e in recomputed.subgraph.edges)
diff.missing_edges = list(orig_edges - new_edges)
diff.extra_edges = list(new_edges - orig_edges)
# Verdict diff
if original.verdict.status != recomputed.verdict.status:
diff.verdict_diff = {
"original": original.verdict.status,
"recomputed": recomputed.verdict.status
}
return diff
```
### 6.2 Cause Analysis
```python
def analyze_causes(original, recomputed, manifest):
causes = []
if manifest.analyzerVersion != current_version():
causes.append(f"Analyzer version mismatch")
if manifest.feedVersions != current_feed_versions():
causes.append(f"Feed version changed")
if original.inputs.graphDigest != fetch_graph_digest():
causes.append(f"Input graph may have been modified")
return causes
```
---
## 7. CLI Usage
### 7.1 Replay Command
```bash
# Replay and verify a slice
stella slice replay --digest blake3:a1b2c3d4...
# Output:
# ✓ Slice verified: digest matches
# Original: blake3:a1b2c3d4...
# Recomputed: blake3:a1b2c3d4...
```
### 7.2 Verbose Mode
```bash
stella slice replay --digest blake3:a1b2c3d4... --verbose
# Output:
# Fetching slice from CAS...
# Rehydrating inputs:
# - Graph: cas://graphs/blake3:xyz... ✓
# - SBOM: cas://sboms/sha256:abc... ✓
# - Policy: cas://policies/sha256:def... ✓
# Recomputing slice...
# Comparing results...
# ✓ Match confirmed
```
### 7.3 Mismatch Handling
```bash
stella slice replay --digest blake3:a1b2c3d4...
# Output:
# ✗ Slice mismatch detected!
#
# Differences:
# Nodes: 1 missing, 0 extra
# Edges: 1 missing, 1 extra
# Verdict: unreachable → reachable
#
# Possible causes:
# - Input graph may have been modified
# - Analyzer version: 1.2.0 → 1.2.1
#
# Run with --diff-file to export detailed diff
```
---
## 8. Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| `slice_not_found` | Slice not in CAS | Check digest, verify upload |
| `input_not_found` | Referenced input missing | Reupload inputs |
| `version_mismatch` | Analyzer version differs | Pin version or accept drift |
| `feed_stale` | Feed snapshot unavailable | Use latest or pin version |
---
## 9. Security Considerations
1. **Input integrity**: Verify CAS digests before replay
2. **Audit logging**: Log all replay attempts
3. **Rate limiting**: Prevent replay DoS
4. **Access control**: Same permissions as slice access
---
## 10. Performance Targets
| Metric | Target |
|--------|--------|
| Replay latency | <5s for typical slice |
| Input fetch | <2s (parallel CAS fetches) |
| Comparison | <100ms |
---
## 11. Related Documentation
- [Slice Schema](./slice-schema.md)
- [Binary Reachability Schema](./binary-reachability-schema.md)
- [Determinism Requirements](../contracts/determinism.md)
- [CAS Architecture](../modules/platform/cas.md)
---
_Created: 2025-12-22. See Sprint 3820 for implementation details._

View File

@@ -0,0 +1,38 @@
# Runtime Facts (Signals/Zastava) v0.1
## Payload shapes
- **Structured** (`POST /signals/runtime-facts`):
- `subject` (imageDigest | scanId | component+version)
- `callgraphId` (required)
- `events[]`: `{ symbolId, codeId?, purl?, buildId?, loaderBase?, processId?, processName?, socketAddress?, containerId?, evidenceUri?, hitCount, observedAt?, metadata{} }`
- **Streaming NDJSON** (`POST /signals/runtime-facts/ndjson`): one JSON object per line with the same fields; supports `Content-Encoding: gzip`; callgraphId provided via query/header metadata.
## Provenance/metadata
- Signals stamps:
- `provenance.source` (defaults to `runtime` unless provided in metadata)
- `provenance.ingestedAt` (ISO-8601 UTC)
- `provenance.callgraphId`
- Runtime hits are aggregated per `symbolId` (summing hitCount) before persisting and feeding scoring.
## Validation
- `symbolId` required; events list must not be empty.
- `callgraphId` required and must resolve to a stored callgraph/union bundle.
- Subject must yield a non-empty `subjectKey`.
- Empty runtime stream is rejected.
## Storage and cache
- Stored alongside reachability facts in PostgreSQL table `reachability_facts`.
- Runtime hits cached in Valkey via `reachability_cache:*` entries; invalidated on ingest.
## Interaction with scoring
- Ingest triggers recompute: runtime hits added to prior facts hits, targets set to symbols observed, entryPoints taken from callgraph.
- Reachability states include runtime evidence on the path; bucket/weight may be `runtime` when hits are present.
- Unknowns registry stays separate; unknowns count still factors into fact score via pressure penalty.
## Replay alignment
- Runtime traces packaged under CAS namespace `runtime_traces`; referenced in replay manifest with `namespace` and `analysisId` to link to static graphs.
## Determinism rules
- Keep NDJSON ordering stable when generating bundles.
- Use UTC timestamps; avoid environment-dependent metadata values.
- No external network lookups during ingest.

View File

@@ -0,0 +1,461 @@
# Binary Reachability Schema
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
---
## 1. Overview
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
- **Build variants:** Handling multiple builds from same source
- **Large graphs:** Chunking and size limits for DSSE/Rekor
- **Offline verification:** Air-gapped attestation workflows
---
## 2. Gap Resolutions
### BR1: Canonical DSSE/Predicate Schemas
**Binary graph predicate:**
```
stella.ops/binaryGraph@v1
```
**Predicate schema:**
```json
{
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
"subject": [
{
"name": "graph",
"digest": {"blake3": "a1b2c3d4e5f6..."}
}
],
"predicate": {
"analyzer": {
"name": "scanner.native",
"version": "1.2.0",
"toolchain": "ghidra-11.2"
},
"binary": {
"format": "ELF",
"arch": "x86_64",
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c..."
},
"graph_stats": {
"node_count": 1247,
"edge_count": 3891,
"root_count": 5
},
"evidence": {
"symbols_source": "DWARF",
"stripped_symbols": 58,
"heuristic_symbols": 12
},
"created_at": "2025-12-13T10:00:00Z"
}
}
```
**Edge bundle predicate:**
```
stella.ops/binaryEdgeBundle@v1
```
```json
{
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
"subject": [
{
"name": "edges",
"digest": {"sha256": "..."}
}
],
"predicate": {
"graph_hash": "blake3:a1b2c3d4...",
"bundle_id": "bundle:001",
"bundle_reason": "init_array",
"edge_count": 128,
"edges": [
{
"from": "sym:binary:...",
"to": "sym:binary:...",
"reason": "init-array",
"confidence": 0.95
}
]
}
}
```
### BR2: Edge Hash Recipe
**Binary edge hash computation:**
```
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason,
"binary_hash": binary.file_hash // Binary context included
})
)
```
**Hash includes binary context:**
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
**Canonicalization:**
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
2. No whitespace, UTF-8 encoding
3. Lowercase hex for all hashes
### BR3: Required Binary Evidence with CAS Refs
**Required evidence per node:**
| Evidence Type | Required | CAS Storage |
|---------------|----------|-------------|
| File hash | Yes | N/A (inline) |
| Build ID | Conditional | N/A (inline) |
| Symbol source | Yes | N/A (inline) |
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
| CFG | Optional | `cas://binary/cfg/{sha256}` |
**Evidence schema:**
```json
{
"binary_evidence": {
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_source": "DWARF",
"symbol_confidence": 0.95,
"code_block_hash": "sha256:deadbeef...",
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
"disassembly_uri": "cas://binary/disasm/sha256:...",
"cfg_uri": "cas://binary/cfg/sha256:..."
}
}
```
**CAS layout:**
```
cas://binary/
blocks/{sha256}/ # Code block bytes
disasm/{sha256}/ # Disassembly JSON
cfg/{sha256}/ # Control flow graph
symbols/{sha256}/ # Symbol table extract
```
### BR4: Build-ID/Variant Rules
**Build-ID sources:**
| Format | Build-ID Source | Example |
|--------|-----------------|---------|
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
**Fallback when build-ID absent:**
```json
{
"build_id": null,
"build_id_fallback": {
"method": "file_hash",
"value": "sha256:...",
"confidence": 0.7
}
}
```
**Variant handling:**
Multiple binaries from same source (debug/release, different arch):
```json
{
"variant_group": "sha256:source_hash...",
"variants": [
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
]
}
```
### BR5: Policy Hash Governance
**Policy version binding:**
Binary reachability graphs are bound to a policy version:
```json
{
"policy_binding": {
"policy_digest": "sha256:...",
"policy_version": "P-7:v4",
"bound_at": "2025-12-13T10:00:00Z",
"binding_mode": "strict"
}
}
```
**Binding modes:**
| Mode | Behavior |
|------|----------|
| `strict` | Graph invalid if policy changes |
| `forward` | Graph valid with newer policy versions |
| `any` | Graph valid with any policy version |
**Governance rules:**
1. Production graphs use `strict` binding
2. Test graphs may use `forward`
3. Policy hash computed from canonical DSL
4. Binding stored in graph metadata
### BR6: Sigstore Bundle/Log Routing
**Sigstore integration:**
```json
{
"sigstore": {
"bundle_type": "hashedrekord",
"log_index": 12345678,
"log_id": "rekor.sigstore.dev",
"inclusion_proof": {
"log_index": 12345678,
"root_hash": "sha256:...",
"tree_size": 98765432,
"hashes": ["sha256:...", "sha256:..."]
},
"signed_entry_timestamp": "base64:..."
}
}
```
**Log routing:**
| Evidence Type | Log | Notes |
|---------------|-----|-------|
| Graph DSSE | Rekor (public) | Always |
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
| Code block | No log | CAS only |
| CFG/Disasm | No log | CAS only |
**Offline mode:**
When Rekor unavailable:
```json
{
"sigstore": {
"mode": "offline",
"checkpoint": {
"origin": "rekor.sigstore.dev",
"checkpoint_data": "base64:...",
"captured_at": "2025-12-13T10:00:00Z"
},
"deferred_submission": true
}
}
```
### BR7: Idempotent Submission Keys
**Submission key format:**
```
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
```
**Idempotency rules:**
1. Same key returns existing entry (no duplicate)
2. Key includes hour-granularity timestamp for rate limiting
3. Different graphs from same binary produce different keys
4. Retry within 1 hour uses same key
**Implementation:**
```json
{
"submission": {
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
"status": "accepted",
"existing_entry": false,
"log_index": 12345678
}
}
```
### BR8: Size/Chunking Limits
**Size limits:**
| Element | Limit | Action on Exceed |
|---------|-------|------------------|
| Graph JSON | 10 MB | Chunk nodes/edges |
| Edge bundle | 512 edges | Split bundles |
| DSSE payload | 1 MB | Compress/chunk |
| Rekor entry | 100 KB | Reference CAS |
**Chunking strategy:**
For large graphs (>10MB):
```json
{
"chunked_graph": {
"chunk_count": 5,
"chunks": [
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
],
"assembly_order": ["chunk:001", "chunk:002", ...],
"assembled_hash": "blake3:..."
}
}
```
**Compression:**
- Graph JSON: gzip before DSSE
- CAS storage: Raw JSON (indexed)
- Rekor payload: DSSE references CAS
### BR9: API/CLI/UI Surfacing
**API endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/binary/graphs` | Submit binary graph |
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
| `POST` | `/api/binary/verify` | Verify graph attestation |
**CLI commands:**
```bash
# Submit binary graph
stella binary submit --graph ./richgraph.json --binary ./app
# Get graph info
stella binary info --hash blake3:a1b2c3d4...
# List symbols
stella binary symbols --hash blake3:... --stripped-only
# Verify attestation
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
```
**UI components:**
- Binary graph visualization with zoom/pan
- Symbol table with search/filter
- Edge explorer with confidence highlighting
- Attestation status badges
- Build variant selector
### BR10: Binary Fixtures
**Fixture location:**
```
tests/Binary/
fixtures/
elf-x86_64-with-debug/
binary.elf
graph.json
expected-hashes.txt
elf-stripped/
binary.elf
graph.json
expected-hashes.txt
pe-x64-with-pdb/
binary.exe
graph.json
expected-hashes.txt
golden/
elf-x86_64.golden.json
pe-x64.golden.json
datasets/binary/
schema/
binary-graph.schema.json
binary-edge.schema.json
samples/
openssl-1.1.1/
libssl.so
graph.json
edges.ndjson
```
**Fixture requirements:**
1. Each binary format has at least one fixture
2. Stripped and debug variants for each format
3. Expected hashes verified by CI
4. Golden outputs include DSSE envelopes
5. Fixtures reproducible from source (where legal)
**Test categories:**
1. **Hash stability:** Same binary produces same graph hash
2. **Build-ID extraction:** Correct build-ID parsing per format
3. **Symbol recovery:** DWARF/PDB parsing accuracy
4. **Stripped handling:** Code block hash computation
5. **Chunking:** Large graph assembly/disassembly
6. **DSSE signing:** Envelope creation and verification
7. **Rekor integration:** Submission and verification
---
## 3. Implementation Status
| Component | Location | Status |
|-----------|----------|--------|
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
| UI components | `src/Web/StellaOps.Web` | Implemented |
---
## 4. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
---
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._

View File

@@ -0,0 +1,416 @@
# Edge Explainability Schema
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
---
## 1. Overview
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
- **Confidence score:** Certainty of the edge's existence
- **Evidence sources:** Detectors and rules that contributed to edge discovery
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
---
## 2. Gap Resolutions
### EG1: Reason Enum Governance
**Standard reason codes:**
| Code | Category | Description | Example |
|------|----------|-------------|---------|
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
| `bytecode-field` | Static | Field access leading to call | Static initializer |
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
| `user-annotated` | Manual | User-provided edge | Policy override |
**Governance rules:**
1. New reason codes require RFC + review by Scanner Guild
2. Deprecated codes remain valid for 2 major versions
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
4. Codes are case-insensitive, normalized to lowercase
**Code registry:**
```json
{
"schema": "stellaops.edge.reason.registry@v1",
"version": "2025-12-13",
"reasons": [
{
"code": "bytecode-invoke",
"category": "static",
"description": "Bytecode invocation instruction",
"languages": ["java", "dotnet"],
"confidence_range": [0.9, 1.0],
"deprecated": false
}
]
}
```
### EG2: Canonical Edge Schema with Hash Rules
**Edge schema:**
```json
{
"edge_id": "edge:sha256:{hex}",
"from": "sym:java:...",
"to": "sym:java:...",
"kind": "call",
"reason": "bytecode-invoke",
"confidence": 0.95,
"evidence": [
{
"source": "detector:java-bytecode-analyzer",
"rule_id": "invoke-virtual",
"rule_version": "1.0.0",
"location": {
"file": "com/example/Foo.class",
"offset": 1234,
"instruction": "invokevirtual #42"
},
"timestamp": "2025-12-13T10:00:00Z"
}
],
"attributes": {
"virtual": true,
"polymorphic_targets": 3
}
}
```
**Hash computation:**
```
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason
})
)
```
**Canonicalization:**
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
2. Sort JSON keys alphabetically
3. No whitespace, UTF-8 encoding
4. Hash is lowercase hex with `sha256:` prefix
### EG3: Evidence Limits/Redaction
**Evidence limits:**
| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Evidence entries per edge | 10 | Yes |
| Location detail fields | 5 | Yes |
| Instruction preview length | 100 chars | Yes |
| File path depth | 10 segments | No |
**Redaction rules:**
| Category | Redaction | Example |
|----------|-----------|---------|
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
| Bytecode offsets | Keep | Offsets are not PII |
| Instruction text | Truncate | First 100 chars |
| Source line content | Omit | Not included by default |
**Truncation behavior:**
```json
{
"evidence_truncated": true,
"evidence_count": 15,
"evidence_shown": 10,
"full_evidence_uri": "cas://edges/evidence/sha256:..."
}
```
### EG4: Confidence Rubric
**Confidence scale:**
| Level | Range | Description | Typical Sources |
|-------|-------|-------------|-----------------|
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
| `low` | 0.2-0.49 | Possible | Heuristic carving |
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
**Confidence computation:**
```
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
```
**Base confidence by reason:**
| Reason | Base Confidence |
|--------|-----------------|
| `bytecode-invoke` | 0.98 |
| `import-symbol` | 0.95 |
| `plt-stub` | 0.92 |
| `reloc-target` | 0.90 |
| `init-array` | 0.95 |
| `vtable-slot` | 0.75 |
| `indirect-target` | 0.60 |
| `reflection-invoke` | 0.50 |
| `runtime-observed` | 0.99 |
| `user-annotated` | 0.80 |
### EG5: Detector/Rule Provenance
**Provenance schema:**
```json
{
"provenance": {
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"digest": "sha256:..."
},
"detector": {
"name": "java-bytecode-analyzer",
"version": "2.0.0",
"rule_set": "default"
},
"rule": {
"id": "invoke-virtual",
"version": "1.0.0",
"description": "Detect invokevirtual bytecode instructions"
},
"input_artifacts": [
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
],
"detected_at": "2025-12-13T10:00:00Z"
}
}
```
**Provenance requirements:**
1. All edges must include analyzer provenance
2. Detector/rule provenance required for non-runtime edges
3. Input artifact digests enable reproducibility
4. Detection timestamp uses UTC ISO-8601
### EG6: API/CLI Parity
**API endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/edges/{edgeId}` | Get edge details |
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
| `POST` | `/api/edges/search` | Search edges by criteria |
**CLI commands:**
```bash
# List edges for a graph
stella edge list --graph blake3:a1b2c3d4...
# Get edge details
stella edge show --id edge:sha256:...
# Search edges
stella edge search --from "sym:java:..." --reason bytecode-invoke
# Export edges
stella edge export --graph blake3:... --output ./edges.ndjson
```
**Output parity:**
- API and CLI return identical JSON structure
- CLI supports `--json` for machine-readable output
- Both support filtering by reason, confidence, from/to
### EG7: Deterministic Fixtures
**Fixture location:**
```
tests/Edge/
fixtures/
bytecode-invoke.json
plt-stub.json
vtable-dispatch.json
init-array-constructor.json
runtime-observed.json
golden/
bytecode-invoke.golden.json
graph-with-edges.golden.json
datasets/edges/
schema/
edge.schema.json
reason-registry.json
samples/
java-spring-boot/
edges.ndjson
expected-hashes.txt
```
**Fixture requirements:**
1. Each reason code has at least one fixture
2. Fixtures include expected `edge_id` hash
3. Golden outputs frozen after review
4. CI verifies hash stability
### EG8: Propagation into Explanation Graphs/VEX
**Explanation graph inclusion:**
```json
{
"explanation": {
"path": [
{
"node": "sym:java:main...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:handler...",
"reason": "bytecode-invoke",
"confidence": 0.98
}
},
{
"node": "sym:java:handler...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:log4j...",
"reason": "bytecode-invoke",
"confidence": 0.95
}
}
],
"aggregate_path_confidence": 0.93
}
}
```
**VEX evidence format:**
```json
{
"stellaops:reachability": {
"path_edges": [
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
],
"weakest_edge": {
"edge_id": "edge:sha256:...",
"reason": "bytecode-invoke",
"confidence": 0.95
},
"aggregate_confidence": 0.93
}
}
```
### EG9: Localization Guidance
**Localizable elements:**
| Element | Localization | Example |
|---------|--------------|---------|
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
| Confidence level | Message catalog | `high` -> "High confidence" |
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
| Error messages | Message catalog | Standard error codes |
**Message catalog structure:**
```json
{
"locale": "en-US",
"messages": {
"edge.reason.bytecode-invoke": "Bytecode method call",
"edge.reason.plt-stub": "PLT/GOT library call",
"edge.confidence.high": "High confidence ({0:P0})",
"edge.evidence.location": "Detected at offset {offset} in {file}"
}
}
```
**Supported locales:**
- `en-US` (default)
- Additional locales via contribution
### EG10: Backfill Plan
**Backfill strategy:**
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
2. **Phase 2:** Run detector upgrade on graphs without reason codes
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
**Migration script:**
```bash
stella edge backfill --graph blake3:... --dry-run
# Output:
Graph: blake3:a1b2c3d4...
Edges without reason: 1234
Edges to update: 1234
Dry run - no changes made.
# Execute:
stella edge backfill --graph blake3:... --execute
```
**Backfill metadata:**
```json
{
"backfill": {
"status": "complete",
"original_analyzer_version": "1.0.0",
"backfill_analyzer_version": "1.2.0",
"backfilled_at": "2025-12-13T10:00:00Z",
"edges_updated": 1234
}
}
```
---
## 3. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Explainability Schema](./explainability-schema.md) - Explanation format
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
---
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._

View File

@@ -0,0 +1,101 @@
# Reachability Evidence Schema (Draft v1, Nov 2026)
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
## 1. Core identifiers
- `symbol_id`: canonical ID for a function/symbol; includes `{format, build_id?, file_hash?, section?, addr, length}` plus optional `code_block_hash`. Always deterministic and lowercase.
- `code_id`: `{format, build_id?, file_hash?, start, length, code_block_hash?}`; used when symbol names are absent.
- `symbol_digest`: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.
- `purl`: package URL of the owning component (from SBOM resolver); `pkg:unknown` when unresolved.
## 2. Graph payload (`richgraph-v1` additions)
```jsonc
{
"nodes": [
{
"id": "sym:sha256:...",
"symbol_id": "func:ELF:sha256:...",
"code_id": "code:ELF:sha256:...",
"code_block_hash": "sha256:deadbeef...",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
"build_id": "a1b2c3...",
"lang": "c",
"evidence": ["dwarf", "dynsym"],
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
}
],
"edges": [
{
"from": "sym:sha256:caller",
"to": "sym:sha256:callee",
"kind": "direct|plt|indirect|runtime",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
"symbol_digest": "sha256:...", // callee digest
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
"confidence": 0.92,
"evidence": ["import", "reloc@GOT"]
}
],
"roots": [
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
{ "id": "main", "phase": "runtime" }
],
"graph_hash": "blake3:..."
}
```
## 2.5 Attestation levels (hybrid default)
- **Graph DSSE (required):** one DSSE envelope over the canonical graph JSON (sorted arrays/keys) with `graph_hash` = BLAKE3 of body; Rekor publish always (or mirror when offline).
- **Edge-bundle DSSE (optional):** batches of ≤512 edges, emitted only for high-signal cases (`runtime`, `init_array`/TLS roots, contested/third-party edges). Each bundle carries `graph_hash`, `bundle_reason`, per-edge `reason`, `symbol_digest`, `purl`, `confidence`, and optional `revoked=true` for quarantine. Rekor publish is configurable; CAS storage is mandatory.
- CAS layout additions:
- Graph body: `cas://reachability/graphs/{blake3}`
- Graph DSSE: `cas://reachability/graphs/{blake3}.dsse`
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}` + `.dsse`
- Determinism: bundle ordering by `(bundle_reason, edge_id)`; arrays sorted before hashing.
## 3. Runtime facts (Signals ingestion)
Fields per NDJSON event:
- `symbolId` (required), `codeId`, `symbolDigest?`, `purl?`
- `hitCount`, `observedAt`, `loaderBase`, `processId`, `processName`, `containerId`, `socketAddress?`
- `callgraphId` or `scanId`, plus `evidenceUri` (CAS) if trace stored externally
- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
## 4. Unknowns registry payload
See `docs/modules/signals/guides/unknowns-registry.md`; reachability producers emit Unknowns when:
- symbol→purl unresolved,
- call edge target unresolved,
- build-id missing for ELF and file hash used instead.
Unknowns must include `unknown_type`, `scope`, `provenance`, `confidence.p`, and `labels`.
## 5. CAS layout
- Graphs: `cas://reachability/graphs/{blake3}` (canonical JSON, sorted keys/arrays)
- Runtime traces: `cas://reachability/runtime/{sha256}`
- Unknowns evidence (optional large blobs): `cas://unknowns/{sha256}`
- Edge bundles: `cas://reachability/edges/{graph_hash}/{bundle_id}` (JSON + `.dsse`)
Metadata for each CAS object: `{ schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }`. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
## 6. Validation rules
- All edges must carry either `purl` or `candidates[]`; never leave both empty.
- If `build_id` present, `symbol_id` and `code_id` must store it; if absent, record `build_id_source: "FileHash"`.
- Evidence arrays sorted; confidence in [0,1].
- `code_block_hash` (when present) must be lowercase hex with an algorithm prefix (e.g., `sha256:`) and only accompany stripped/heuristic nodes.
- Roots must include load-time constructors when present.
- When `edge_bundles` are present, each edge in a bundle must also exist in the graph edge set; `revoked=true` bundles override graph edges for policy/scoring.
- Graph DSSE is mandatory per scan; edge-bundle DSSEs are optional but must reference `graph_hash` and `bundle_id`.
## 7. Acceptance checklist
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
- CAS writers enforce canonicalization before hashing.
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.

View File

@@ -0,0 +1,454 @@
# Explainability Schema
_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._
This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.
---
## 1. Overview
Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:
- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
- **Confidence scores:** Per-rule and aggregate confidence values
- **Redaction metadata:** PII handling and data classification
---
## 2. Gap Resolutions
### EX1: Schema/Canonicalization + Hashes
**Explanation schema:**
```json
{
"schema": "stellaops.explanation@v1",
"explanation_id": "explain:sha256:{hex}",
"finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
"verdict": {
"status": "affected",
"severity": {"normalized": "Critical", "score": 10.0},
"confidence": 0.92
},
"decision_chain": [
{
"rule_id": "rule:reachability_gate",
"rule_version": "1.0.0",
"inputs": {
"reachability.state": "CR",
"reachability.confidence": 0.92
},
"output": {"allowed": true, "contribution": 0.4},
"evidence_refs": ["cas://reachability/graphs/blake3:..."]
},
{
"rule_id": "rule:severity_baseline",
"rule_version": "1.0.0",
"inputs": {
"cvss_base": 10.0,
"epss_percentile": 0.95
},
"output": {"severity": "Critical", "contribution": 0.6},
"evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
}
],
"aggregate_confidence": 0.88,
"created_at": "2025-12-13T10:00:00Z",
"policy_version": "sha256:...",
"graph_revision_id": "rev:blake3:..."
}
```
**Canonicalization rules:**
1. JSON keys sorted alphabetically at all levels
2. Arrays in `decision_chain` ordered by rule execution sequence
3. `evidence_refs` arrays sorted alphabetically
4. No whitespace, UTF-8 encoding
5. Hash computed over canonical JSON: `sha256(canonical_json)`
### EX2: DSSE Predicate/Signing Policy
**DSSE predicate type:**
```
stella.ops/explanation@v1
```
**Signing policy:**
| Element | Required | Signer |
|---------|----------|--------|
| Explanation body | Yes | Policy Engine key |
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |
**DSSE envelope structure:**
```json
{
"payloadType": "application/vnd.stellaops.explanation+json",
"payload": "<base64(canonical_explanation_json)>",
"signatures": [
{
"keyid": "policy-engine-signing-2025",
"sig": "base64:..."
}
]
}
```
**Signing requirements:**
- All explanations must be signed before CAS storage
- Signing key must be registered in Authority key store
- Key rotation triggers re-signing of active explanations (configurable)
### EX3: CAS Storage Rules for Evidence
**Storage layout:**
```
cas://explanations/
{sha256}/ # Explanation body
{sha256}.dsse # DSSE envelope
by-finding/{finding_id}/ # Index by finding
by-policy/{policy_digest}/ # Index by policy version
by-graph/{graph_revision_id}/ # Index by graph revision
```
**Storage rules:**
1. Explanations are immutable after signing
2. New verdicts create new explanation documents (no updates)
3. Previous explanations are retained per retention policy
4. Cross-references validated at write time (graphs, VEX must exist)
**Deduplication:**
- Identical canonical JSON produces identical hash
- CAS returns existing reference if content matches
### EX4: Link to Decision/Policy and graph_revision_id
**Required links:**
```json
{
"links": {
"policy_version": "sha256:7e1d...",
"policy_uri": "cas://policy/versions/sha256:7e1d...",
"graph_revision_id": "rev:blake3:a1b2...",
"graph_uri": "cas://reachability/revisions/blake3:a1b2...",
"sbom_digest": "sha256:def4...",
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
"vex_digest": "sha256:e5f6...",
"vex_uri": "cas://excititor/vex/openvex.json"
}
}
```
**Validation:**
- All linked artifacts must exist at explanation creation time
- Links are verified during replay/audit
- Broken links cause replay verification failure
### EX5: Export/Replay Bundle Format
**Export bundle manifest:**
```json
{
"schema": "stellaops.explanation.bundle@v1",
"bundle_id": "bundle:explain:2025-12-13",
"created_at": "2025-12-13T10:00:00Z",
"explanations": [
{
"explanation_id": "explain:sha256:...",
"finding_id": "...",
"explanation_uri": "explanations/sha256:....json",
"dsse_uri": "explanations/sha256:....dsse"
}
],
"dependencies": {
"graphs": [
{"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
],
"policies": [
{"digest": "sha256:...", "uri": "policies/sha256:....json"}
],
"vex_statements": [
{"digest": "sha256:...", "uri": "vex/sha256:....json"}
]
},
"verification": {
"bundle_hash": "sha256:...",
"signature": "base64:...",
"signed_by": "policy-engine-signing-2025"
}
}
```
**Replay verification:**
```bash
stella explain verify --bundle ./explanation-bundle.tgz
# Output:
Bundle: bundle:explain:2025-12-13
Explanations: 42
Dependencies: 5 graphs, 2 policies, 12 VEX
Verifying explanations...
Canonical hashes: 42/42 MATCH
DSSE signatures: 42/42 VALID
Dependency links: 42/42 RESOLVED
Replay verification PASSED.
```
### EX6: PII/Redaction Rules
**Redaction categories:**
| Category | Redaction | Example |
|----------|-----------|---------|
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |
**Redaction metadata:**
```json
{
"redaction": {
"applied": true,
"level": "standard",
"fields_redacted": ["actor.email", "evidence.file_path"],
"redaction_policy": "stellaops.redaction.standard@v1"
}
}
```
**Export modes:**
- `--redacted` (default): Apply standard redaction
- `--full`: Include all data (requires `explain:export:full` scope)
- `--audit`: Include redaction audit trail
### EX7: Size Budgets
**Limits:**
| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Explanation body | 256 KB | Yes |
| Decision chain entries | 100 | Yes |
| Evidence refs per rule | 20 | Yes |
| Total evidence refs | 200 | Yes |
| Path entries | 50 | No |
**Truncation behavior:**
When limits are exceeded:
1. Log warning with truncation details
2. Add `truncation` metadata to explanation
3. Store full evidence in separate CAS object
4. Include `full_evidence_uri` reference
```json
{
"truncation": {
"applied": true,
"elements_truncated": ["decision_chain", "evidence_refs"],
"full_evidence_uri": "cas://explanations/full/sha256:..."
}
}
```
### EX8: Versioning
**Schema versioning:**
- Schema version in `schema` field: `stellaops.explanation@v1`
- Breaking changes increment major version
- Minor changes (additive fields) use v1.x
- Backward compatibility maintained for 2 major versions
**Migration support:**
```bash
stella explain migrate --from v1 --to v2 --input ./explanations/
# Output:
Migrating 1000 explanations from v1 to v2...
Migrated: 998
Skipped (already v2): 2
Migration complete.
```
**Version compatibility matrix:**
| API Version | Schema v1 | Schema v2 |
|-------------|-----------|-----------|
| 1.0.x | Full | N/A |
| 1.1.x | Full | Full |
| 2.0.x | Read-only | Full |
### EX9: Golden Fixtures/Tests
**Test fixture location:**
```
tests/Explanation/
fixtures/
simple-affected.json
simple-not-affected.json
with-reachability-evidence.json
multi-rule-chain.json
truncated-evidence.json
redacted-pii.json
golden/
simple-affected.golden.json
simple-affected.golden.dsse
datasets/explanations/
schema/
explanation.schema.json
samples/
log4j-affected/
explanation.json
expected-hash.txt
```
**Test categories:**
1. **Canonicalization tests:** Verify hash stability across JSON reordering
2. **DSSE signing tests:** Verify signature creation and verification
3. **Redaction tests:** Verify PII handling
4. **Truncation tests:** Verify size budget enforcement
5. **Replay tests:** Verify bundle export/import cycle
6. **Migration tests:** Verify version upgrade paths
**CI integration:**
```yaml
# .gitea/workflows/explanation-tests.yml
explanation-tests:
runs-on: ubuntu-latest
steps:
- name: Run explanation tests
run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
- name: Verify golden fixtures
run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
```
### EX10: Determinism Guarantees
**Determinism requirements:**
1. Same inputs produce identical `explanation_id` hash
2. Decision chain ordering is stable (execution order)
3. Evidence refs sorted alphabetically
4. Timestamps use UTC ISO-8601 with millisecond precision
5. Floating-point values rounded to 6 decimal places
**Verification:**
```bash
# Run twice with same inputs, verify identical hashes
stella explain generate --finding "..." --output a.json
stella explain generate --finding "..." --output b.json
diff a.json b.json # Should be empty
# Or use built-in verify
stella explain verify-determinism --finding "..." --iterations 3
```
---
## 3. API Reference
### 3.1 Generate Explanation
```http
POST /api/policy/findings/{findingId}/explain
Authorization: Bearer <token>
Content-Type: application/json
{
"mode": "full",
"include_evidence": true,
"redaction_level": "standard"
}
```
### 3.2 Get Explanation
```http
GET /api/explanations/{explanationId}
Authorization: Bearer <token>
Accept: application/json
```
### 3.3 Export Explanation Bundle
```http
POST /api/explanations/export
Authorization: Bearer <token>
Content-Type: application/json
{
"finding_ids": ["...", "..."],
"include_dependencies": true,
"redaction_level": "standard"
}
```
### 3.4 Verify Explanation
```http
POST /api/explanations/{explanationId}/verify
Authorization: Bearer <token>
```
---
## 4. CLI Reference
```bash
# Generate explanation for a finding
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"
# Export explanation bundle
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz
# Verify explanation
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse
# Verify bundle
stella explain verify --bundle ./bundle.tgz
# Check determinism
stella explain verify-determinism --finding "..." --iterations 5
```
---
## 5. Related Documentation
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
- [Policy API](../api/policy.md) - Policy Engine REST API
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications
---
_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._

View File

@@ -0,0 +1,377 @@
# Graph Revision Schema
_Last updated: 2025-12-13. Owner: Platform Guild._
This document defines the graph revision schema addressing gaps GR1-GR10 from the November 2025 product findings. It specifies manifest structure, hash algorithms, storage layout, lineage tracking, and governance rules for deterministic, auditable reachability graphs.
---
## 1. Overview
Graph revisions provide content-addressable, append-only versioning for `richgraph-v1` documents. Every graph mutation produces a new immutable revision with:
- **Deterministic hash:** BLAKE3-256 of canonical JSON
- **Lineage metadata:** Parent revision + diff summary
- **Cross-artifact digests:** Links to SBOM, VEX, policy, and tool versions
- **Audit trail:** Timestamp, actor, tenant, and operation type
---
## 2. Gap Resolutions
### GR1: Manifest Schema + Canonical Hash Rules
**Manifest schema:**
```json
{
"schema": "stellaops.graph.revision@v1",
"revision_id": "rev:blake3:a1b2c3d4e5f6...",
"graph_hash": "blake3:a1b2c3d4e5f6...",
"parent_revision_id": "rev:blake3:9f8e7d6c5b4a...",
"created_at": "2025-12-13T10:00:00Z",
"created_by": "service:scanner",
"tenant_id": "tenant:acme",
"shard_id": "shard:01",
"operation": "create",
"lineage": {
"depth": 3,
"root_revision_id": "rev:blake3:1a2b3c4d5e6f..."
},
"cross_artifacts": {
"sbom_digest": "sha256:...",
"vex_digest": "sha256:...",
"policy_digest": "sha256:...",
"analyzer_digest": "sha256:..."
},
"diff_summary": {
"nodes_added": 12,
"nodes_removed": 3,
"edges_added": 24,
"edges_removed": 8,
"roots_changed": false
}
}
```
**Canonical hash rules:**
1. JSON keys sorted alphabetically at all nesting levels
2. No whitespace/indentation (compact JSON)
3. UTF-8 encoding, no BOM
4. Arrays sorted by deterministic key (nodes by `id`, edges by `from,to,kind`)
5. Null/empty values omitted
6. Numeric values without trailing zeros
### GR2: Mandated BLAKE3-256 Encoding
All graph-level hashes use BLAKE3-256 with the following format:
```
blake3:{64_hex_chars}
```
Example:
```
blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
```
**Rationale:**
- BLAKE3 is 3x+ faster than SHA-256 on modern CPUs
- Parallelizable for large graphs (>100K nodes)
- Cryptographically secure (256-bit security)
- Algorithm prefix enables future migration
### GR3: Append-Only Storage
Graph revisions are immutable. Operations:
| Operation | Creates New Revision | Modifies Existing |
|-----------|---------------------|-------------------|
| `create` | Yes | No |
| `update` | Yes | No |
| `merge` | Yes | No |
| `tombstone` | Yes | No |
| `read` | No | No |
**Storage layout:**
```
cas://reachability/
revisions/
{blake3}/ # Revision manifest
{blake3}.graph # Graph body
{blake3}.dsse # DSSE envelope
indices/
by-tenant/{tenant_id}/ # Tenant index
by-sbom/{sbom_digest}/ # SBOM correlation
by-root/{root_revision_id}/ # Lineage tree
```
### GR4: Lineage/Diff Metadata
Every revision tracks its lineage:
```json
{
"lineage": {
"depth": 5,
"root_revision_id": "rev:blake3:...",
"parent_revision_id": "rev:blake3:...",
"merge_parents": []
},
"diff_summary": {
"nodes_added": 12,
"nodes_removed": 3,
"nodes_modified": 0,
"edges_added": 24,
"edges_removed": 8,
"edges_modified": 0,
"roots_added": 0,
"roots_removed": 0
},
"diff_detail_uri": "cas://reachability/diffs/{parent_hash}_{child_hash}.ndjson"
}
```
**Diff detail format (NDJSON):**
```ndjson
{"op":"add","path":"nodes","value":{"id":"sym:java:...","display":"..."}}
{"op":"remove","path":"edges","from":"sym:java:a","to":"sym:java:b"}
```
### GR5: Cross-Artifact Digests (SBOM/VEX/Policy/Tool)
Every revision links to related artifacts:
```json
{
"cross_artifacts": {
"sbom_digest": "sha256:...",
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
"sbom_format": "cyclonedx-1.6",
"vex_digest": "sha256:...",
"vex_uri": "cas://excititor/vex/openvex.json",
"policy_digest": "sha256:...",
"policy_version": "P-7:v4",
"analyzer_digest": "sha256:...",
"analyzer_name": "scanner.java",
"analyzer_version": "1.2.0"
}
}
```
### GR6: UI/CLI Surfacing of Full/Short IDs
**Full ID format:**
```
rev:blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
```
**Short ID format (for display):**
```
rev:a1b2c3d4
```
**CLI commands:**
```bash
# List revisions
stella graph revisions --scan-id scan-123
# Show full ID
stella graph revisions --scan-id scan-123 --full
# Output:
REVISION CREATED NODES EDGES PARENT
rev:a1b2c3d4 2025-12-13T10:00:00 1247 3891 rev:9f8e7d6c
rev:9f8e7d6c 2025-12-12T15:30:00 1235 3867 rev:1a2b3c4d
```
**UI display:**
- Revision chips show short ID with copy-to-clipboard for full ID
- Hover tooltip shows full ID and creation timestamp
- Lineage tree visualization available in "Revision History" drawer
### GR7: Shard/Tenant Context
Every revision includes partition context:
```json
{
"tenant_id": "tenant:acme",
"shard_id": "shard:01",
"namespace": "prod",
"workspace_id": "ws:default"
}
```
**Tenant isolation:**
- Revisions are tenant-scoped; cross-tenant access requires explicit grants
- Shard ID enables horizontal scaling and data locality
- Namespace supports multi-environment deployments
### GR8: Pin/Audit Governance
**Pinned revisions:**
Revisions can be pinned to prevent automatic retention cleanup:
```json
{
"pinned": true,
"pinned_at": "2025-12-13T10:00:00Z",
"pinned_by": "user:alice",
"pin_reason": "Audit retention for CVE-2021-44228 investigation",
"pin_expires_at": "2026-12-13T10:00:00Z"
}
```
**Audit events:**
All revision operations emit audit events:
```json
{
"event_type": "graph.revision.created",
"revision_id": "rev:blake3:...",
"actor": "service:scanner",
"tenant_id": "tenant:acme",
"timestamp": "2025-12-13T10:00:00Z",
"metadata": {
"operation": "create",
"parent_revision_id": "rev:blake3:...",
"graph_hash": "blake3:..."
}
}
```
### GR9: Retention/Tombstones
**Retention policy:**
| Category | Default Retention | Configurable |
|----------|-------------------|--------------|
| Latest revision | Forever | No |
| Intermediate revisions | 90 days | Yes |
| Tombstoned revisions | 30 days | Yes |
| Pinned revisions | Until unpin + 7 days | No |
**Tombstone format:**
```json
{
"schema": "stellaops.graph.revision@v1",
"revision_id": "rev:blake3:...",
"tombstone": true,
"tombstoned_at": "2025-12-13T10:00:00Z",
"tombstoned_by": "service:retention-worker",
"tombstone_reason": "retention_policy",
"successor_revision_id": "rev:blake3:..."
}
```
### GR10: Inclusion in Offline Kits
Offline kits include graph revisions for air-gapped deployments:
**Offline bundle manifest:**
```json
{
"schema": "stellaops.offline.bundle@v1",
"bundle_id": "bundle:2025-12-13",
"graph_revisions": [
{
"revision_id": "rev:blake3:...",
"graph_hash": "blake3:...",
"included_artifacts": ["graph", "dsse", "diff"]
}
],
"rekor_checkpoints": [
{
"log_id": "rekor.sigstore.dev",
"checkpoint": "...",
"verified_at": "2025-12-13T10:00:00Z"
}
],
"signature": {
"algorithm": "ecdsa-p256",
"value": "base64:...",
"public_key_id": "key:offline-signing-2025"
}
}
```
**Import verification:**
```bash
stella offline import --bundle ./offline-bundle.tgz --verify
# Output:
Bundle: bundle:2025-12-13
Graph Revisions: 5
Rekor Checkpoints: 2
Verifying signatures...
Bundle signature: VALID
DSSE envelopes: 5/5 VALID
Rekor checkpoints: 2/2 VERIFIED
Import complete.
```
---
## 3. API Reference
### 3.1 Create Revision
```http
POST /api/graph/revisions
Authorization: Bearer <token>
Content-Type: application/json
{
"graph": { ... richgraph-v1 ... },
"parent_revision_id": "rev:blake3:...",
"cross_artifacts": { ... }
}
```
### 3.2 Get Revision
```http
GET /api/graph/revisions/{revision_id}
Authorization: Bearer <token>
```
### 3.3 List Revisions
```http
GET /api/graph/revisions?tenant_id=acme&sbom_digest=sha256:...&limit=20
Authorization: Bearer <token>
```
### 3.4 Diff Revisions
```http
GET /api/graph/revisions/diff?from={rev_a}&to={rev_b}
Authorization: Bearer <token>
```
---
## 4. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [CAS Infrastructure](../contracts/cas-infrastructure.md) - Content-addressable storage
- [Offline Kit](../OFFLINE_KIT.md) - Air-gap deployment
---
_Last updated: 2025-12-13. See Sprint 0401 GRAPHREV-GAPS-401-063 for change history._

View File

@@ -0,0 +1,337 @@
# Ground Truth Schema for Reachability Datasets
> **Status:** Design v1 (Sprint 0401)
> **Owners:** Scanner Guild, Signals Guild, Quality Guild
This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.
---
## 1. Purpose
Ground truth datasets enable:
1. **Regression testing:** Detect regressions in reachability analysis accuracy
2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery
3. **Lattice validation:** Verify join/meet operations produce expected states
4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions
---
## 2. Dataset Structure
### 2.1 Directory Layout
```
datasets/reachability/
├── samples/
│ ├── java/
│ │ ├── vulnerable-log4j/
│ │ │ ├── manifest.json # Sample metadata
│ │ │ ├── richgraph-v1.json # Input callgraph
│ │ │ ├── ground-truth.json # Expected outcomes
│ │ │ └── artifacts/ # Source binaries/SBOMs
│ │ └── safe-spring-boot/
│ │ └── ...
│ ├── native/
│ │ ├── stripped-elf/
│ │ └── openssl-vuln/
│ └── polyglot/
│ └── node-native-addon/
├── corpus/
│ ├── positive/ # Known reachable samples
│ ├── negative/ # Known unreachable samples
│ └── contested/ # Known conflict samples
└── schema/
├── manifest.schema.json
└── ground-truth.schema.json
```
### 2.2 Sample Manifest (`manifest.json`)
```json
{
"sampleId": "sample:java:vulnerable-log4j:001",
"version": "1.0.0",
"createdAt": "2025-12-13T10:00:00Z",
"language": "java",
"category": "positive",
"description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
"source": {
"repository": "https://github.com/example/vuln-app",
"commit": "abc123...",
"buildToolchain": "maven:3.9.0,jdk:17"
},
"vulnerabilities": [
{
"vulnId": "CVE-2021-44228",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
}
],
"artifacts": [
{
"path": "artifacts/app.jar",
"hash": "sha256:...",
"type": "application/java-archive"
},
{
"path": "artifacts/sbom.cdx.json",
"hash": "sha256:...",
"type": "application/vnd.cyclonedx+json"
}
]
}
```
### 2.3 Ground Truth Document (`ground-truth.json`)
```json
{
"schema": "ground-truth-v1",
"sampleId": "sample:java:vulnerable-log4j:001",
"generatedAt": "2025-12-13T10:00:00Z",
"generator": {
"name": "manual-annotation",
"version": "1.0.0",
"annotator": "security-team"
},
"targets": [
{
"symbolId": "sym:java:...",
"display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"expected": {
"latticeState": "CR",
"bucket": "direct",
"reachable": true,
"confidence": 0.95,
"pathLength": 3,
"path": [
"sym:java:...main",
"sym:java:...logInfo",
"sym:java:...JndiLookup.lookup"
]
},
"reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
},
{
"symbolId": "sym:java:...",
"display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"expected": {
"latticeState": "CU",
"bucket": "unreachable",
"reachable": false,
"confidence": 0.90,
"pathLength": null,
"path": null
},
"reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
}
],
"entryPoints": [
{
"symbolId": "sym:java:...",
"display": "com.example.app.Main.main",
"phase": "runtime",
"source": "manifest"
}
],
"expectedUncertainty": {
"states": [],
"aggregateTier": "T4",
"riskScore": 0.0
},
"expectedGateDecisions": [
{
"vulnId": "CVE-2021-44228",
"targetSymbol": "sym:java:...JndiLookup.lookup",
"requestedStatus": "not_affected",
"expectedDecision": "block",
"expectedBlockedBy": "LatticeState",
"expectedReason": "CR state incompatible with not_affected"
},
{
"vulnId": "CVE-2021-44228",
"targetSymbol": "sym:java:...JndiLookup.lookup",
"requestedStatus": "affected",
"expectedDecision": "allow"
}
]
}
```
---
## 3. Schema Definitions
### 3.1 Ground Truth Target
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) |
| `display` | string | No | Human-readable symbol name |
| `purl` | string | No | Package URL of containing package |
| `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` |
| `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) |
| `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point |
| `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] |
| `expected.pathLength` | number | No | Expected path length (null if unreachable) |
| `expected.path` | string[] | No | Expected path (sorted, deterministic) |
| `reasoning` | string | Yes | Human explanation of expected outcome |
### 3.2 Expected Gate Decision
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `vulnId` | string | Yes | Vulnerability identifier |
| `targetSymbol` | string | Yes | Target SymbolID |
| `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` |
| `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` |
| `expectedBlockedBy` | string | No | Gate name if blocked |
| `expectedReason` | string | No | Expected reason message |
---
## 4. Sample Categories
### 4.1 Positive Samples (Reachable)
Known-reachable cases where vulnerable code is called:
- **direct-call:** Vulnerable function called directly from entry point
- **transitive:** Multi-hop path from entry point to vulnerable function
- **runtime-observed:** Confirmed reachable via runtime probe
- **init-array:** Reachable via load-time constructor
### 4.2 Negative Samples (Unreachable)
Known-unreachable cases where vulnerable code exists but isn't called:
- **dead-code:** Function present but never invoked
- **conditional-unreachable:** Function behind impossible condition
- **test-only:** Function only reachable from test entry points
- **deprecated-api:** Old API present but replaced by new implementation
### 4.3 Contested Samples
Cases where static and runtime evidence conflict:
- **static-reach-runtime-miss:** Static analysis finds path, runtime never observes
- **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution
- **version-mismatch:** Analysis version differs from runtime version
---
## 5. Benchmark Metrics
### 5.1 Path Discovery Metrics
```
Precision = TruePositive / (TruePositive + FalsePositive)
Recall = TruePositive / (TruePositive + FalseNegative)
F1 = 2 * (Precision * Recall) / (Precision + Recall)
```
### 5.2 Lattice State Accuracy
```
StateAccuracy = CorrectStates / TotalTargets
BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility)
```
### 5.3 Gate Decision Accuracy
```
GateAccuracy = CorrectDecisions / TotalGateTests
FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric)
FalseBlock = BlockedWhenShouldAllow / TotalAllows
```
---
## 6. Test Harness Integration
### 6.1 xUnit Test Pattern
```csharp
[Theory]
[MemberData(nameof(GetGroundTruthSamples))]
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
{
// Arrange
var graph = await LoadRichGraphAsync(sample.GraphPath);
var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();
// Act
var result = await scorer.ComputeAsync(graph, sample.EntryPoints);
// Assert
foreach (var target in sample.Targets)
{
var actual = result.States.First(s => s.SymbolId == target.SymbolId);
Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
Assert.Equal(target.Expected.Reachable, actual.Reachable);
Assert.InRange(actual.Confidence,
target.Expected.Confidence - 0.05,
target.Expected.Confidence + 0.05);
}
}
```
### 6.2 Benchmark Runner
```bash
# Run reachability benchmarks
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
--dataset datasets/reachability/samples \
--output benchmark-results.json \
--threshold-f1 0.95 \
--threshold-gate-accuracy 0.99
```
---
## 7. Sample Contribution Guidelines
### 7.1 Adding New Samples
1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/`
2. Add `manifest.json` with sample metadata
3. Add `richgraph-v1.json` (run scanner on artifacts)
4. Create `ground-truth.json` with manual annotations
5. Include reasoning for each expected outcome
6. Run validation: `dotnet test --filter "GroundTruth"`
### 7.2 Ground Truth Validation
Ground truth files must pass schema validation:
```bash
npx ajv validate -s docs/modules/reach-graph/schemas/ground-truth.schema.json \
-d datasets/reachability/samples/**/ground-truth.json
```
### 7.3 Review Requirements
- All samples require two independent annotators
- Contested samples require security team review
- Changes to existing samples require regression test pass
---
## 8. Related Documents
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
- [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |

View File

@@ -0,0 +1,129 @@
# Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)
## Goals
- Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
- Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
- Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.
## File layout (CAS)
- Namespace root: `reachability_graphs/<analysis_id>/` (analysis_id is caller-supplied UUID or hash).
- Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
- `nodes.ndjson` (sorted by `symbol_id`)
- `edges.ndjson` (sorted by `from` then `to` then `edge_type`)
- `facts_runtime.ndjson` (sorted by `symbol_id`, optional)
- `meta.json` (single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
- Hashing: SHA-256 of each file recorded in `meta.json` under `files[]` with `path`, `sha256`, `records`.
- Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.
## SymbolID (language-agnostic envelope)
```
symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
```
- `lang`: `java|dotnet|go|node|deno|rust|swift|shell|binary`
- `stable-fragment`: SHA-256(base64url-no-pad) of the canonical tuple per language:
- **java**: (`package`, `class`, `method`, `descriptor`) lowercased, descriptor in JVM format.
- **dotnet**: (`assembly_name`, `namespace`, `type`, `member_signature`) using ECMA-335 signature string.
- **node/deno**: (`pkg_name_or_path`, `export_path`, `kind`) where `export_path` is slash-joined ESM/CJS path; `pkg_name_or_path` uses npm name or normalized absolute path with drive stripped.
- **go**: (`module_path`, `package_path`, `receiver`, `func`), with receiver empty for functions.
- **rust**: (`crate`, `module_path`, `item_name`, `mangled`)
- **swift**: (`module`, `type`, `member`, `swift-mangled`)
- **shell**: (`script_relpath`, `function_or_cmd`)
- **binary**: (`binary_build_id`, `section`, `symbol_name`)
## nodes.ndjson
Each line:
```
{
"symbol_id": "sym:lang:...",
"lang": "dotnet",
"kind": "function|method|type|module|package|binary",
"display": "Human readable name",
"source": {
"file": "relative/or/pkg/path",
"line": 123,
"col": 1,
"digest": "sha256:<hex>"
},
"attributes": {
"visibility": "public|internal|private",
"async": true,
"static": false,
"generic_arity": 2
}
}
```
Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside `attributes` (e.g., `jvm_descriptor`, `dotnet_signature`).
## edges.ndjson
Each line (static or runtime-derived; see `source`):
```
{
"from": "sym:...",
"to": "sym:...",
"edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
"confidence": "certain|high|medium|low",
"source": {
"origin": "static|runtime",
"provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
"evidence": "file:path:line"
}
}
```
- Ordering: primary `from`, secondary `to`, tertiary `edge_type`.
- Duplicate edges with different provenance are allowed; consumers deduplicate by (`from`,`to`,`edge_type`,`provenance`).
## facts_runtime.ndjson (optional)
Runtime-only observations attached to symbols:
```
{
"symbol_id": "sym:...",
"samples": {
"call_count": 14,
"first_seen_utc": "2025-11-22T18:21:12Z",
"last_seen_utc": "2025-11-22T18:23:01Z"
},
"env": {
"pid": 1234,
"image": "sha256:...",
"entrypoint": "main",
"tags": ["sealed","offline"]
}
}
```
Sorting by `symbol_id`. Time fields must be UTC ISO-8601 with `Z`.
## meta.json
```
{
"schema": "reachability-union@0.1",
"generated_at": "2025-11-23T00:00:00Z",
"produced_by": {
"tool": "StellaOps.Scanner.Worker",
"version": "0.1.0",
"analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
},
"files": [
{"path":"nodes.ndjson","sha256":"...","records":1234},
{"path":"edges.ndjson","sha256":"...","records":4567},
{"path":"facts_runtime.ndjson","sha256":"...","records":89}
],
"options": {
"dedupe_edges": false,
"include_runtime": true
}
}
```
## Determinism rules
- Sort order as noted; no nulls; omit empty objects/arrays.
- All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
- Hash inputs use exact serialized bytes (no trailing spaces, newline `\n` only).
## Validation
- JSON Schema draft 2020-12 available at `docs/modules/reach-graph/schemas/runtime-static-union-schema.json` (to be generated from this spec; allowable values match enumerations above).
- Minimal required fields: `symbol_id`, `lang`, `kind` (nodes); `from`, `to`, `edge_type`, `source.origin` (edges).
## Integration guidance
- Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
- CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
- Consumers should treat runtime edges as additive; when both origins exist, prefer `origin=runtime` for exploitability scoring but keep static edges for coverage.

View File

@@ -0,0 +1,243 @@
# Reachability Slice Schema
_Last updated: 2025-12-22. Owner: Scanner Guild._
This document defines the **Reachability Slice** schema - a minimal, attestable proof unit that answers whether a vulnerable symbol is reachable from application entrypoints.
---
## 1. Overview
A **slice** is a focused subgraph extracted from a full reachability graph, containing only the nodes and edges relevant to answering a specific reachability query (for example, "Is CVE-2024-1234's vulnerable function reachable?").
### Key Properties
| Property | Description |
|----------|-------------|
| **Minimal** | Contains only nodes/edges on paths between entrypoints and targets |
| **Attestable** | DSSE-signed with a dedicated slice predicate |
| **Reproducible** | Same inputs -> same bytes (deterministic) |
| **Content-addressed** | Retrieved by BLAKE3 digest |
---
## 2. Predicate Type & Schema
- Predicate type: `stellaops.dev/predicates/reachability-slice@v1`
- JSON schema: `https://stellaops.dev/schemas/stellaops-slice.v1.schema.json`
- DSSE payload type: `application/vnd.stellaops.slice.v1+json`
---
## 3. Schema Structure
### 3.1 ReachabilitySlice
```csharp
public sealed record ReachabilitySlice
{
[JsonPropertyName("_type")]
public string Type { get; init; } = "stellaops.dev/predicates/reachability-slice@v1";
[JsonPropertyName("inputs")]
public required SliceInputs Inputs { get; init; }
[JsonPropertyName("query")]
public required SliceQuery Query { get; init; }
[JsonPropertyName("subgraph")]
public required SliceSubgraph Subgraph { get; init; }
[JsonPropertyName("verdict")]
public required SliceVerdict Verdict { get; init; }
[JsonPropertyName("manifest")]
public required ScanManifest Manifest { get; init; }
}
```
### 3.2 SliceInputs
```csharp
public sealed record SliceInputs
{
public required string GraphDigest { get; init; }
public ImmutableArray<string> BinaryDigests { get; init; }
public string? SbomDigest { get; init; }
public ImmutableArray<string> LayerDigests { get; init; }
}
```
### 3.3 SliceQuery
```csharp
public sealed record SliceQuery
{
public string? CveId { get; init; }
public ImmutableArray<string> TargetSymbols { get; init; }
public ImmutableArray<string> Entrypoints { get; init; }
public string? PolicyHash { get; init; }
}
```
### 3.4 SliceSubgraph, Nodes, Edges
```csharp
public sealed record SliceSubgraph
{
public ImmutableArray<SliceNode> Nodes { get; init; }
public ImmutableArray<SliceEdge> Edges { get; init; }
}
public sealed record SliceNode
{
public required string Id { get; init; }
public required string Symbol { get; init; }
public required SliceNodeKind Kind { get; init; } // entrypoint | intermediate | target | unknown
public string? File { get; init; }
public int? Line { get; init; }
public string? Purl { get; init; }
public IReadOnlyDictionary<string, string>? Attributes { get; init; }
}
public sealed record SliceEdge
{
public required string From { get; init; }
public required string To { get; init; }
public SliceEdgeKind Kind { get; init; } // direct | plt | iat | dynamic | unknown
public double Confidence { get; init; }
public string? Evidence { get; init; }
public SliceGateInfo? Gate { get; init; }
public ObservedEdgeMetadata? Observed { get; init; }
}
```
### 3.5 SliceVerdict
```csharp
public sealed record SliceVerdict
{
public required SliceVerdictStatus Status { get; init; }
public required double Confidence { get; init; }
public ImmutableArray<string> Reasons { get; init; }
public ImmutableArray<string> PathWitnesses { get; init; }
public int UnknownCount { get; init; }
public ImmutableArray<GatedPath> GatedPaths { get; init; }
}
```
`SliceVerdictStatus` values (snake_case):
- `reachable`
- `unreachable`
- `unknown`
- `gated`
- `observed_reachable`
### 3.6 ScanManifest
`ScanManifest` is imported from `StellaOps.Scanner.Core` and includes required fields for reproducibility:
- `scanId`
- `createdAtUtc`
- `artifactDigest`
- `scannerVersion`
- `workerVersion`
- `concelierSnapshotHash`
- `excititorSnapshotHash`
- `latticePolicyHash`
- `deterministic`
- `seed` (base64-encoded 32-byte seed)
- `knobs` (string map)
`artifactPurl` is optional.
---
## 4. Verdict Computation Rules
```
reachable := path_exists AND min(path_confidence) > 0.7 AND unknown_edges == 0
unreachable := NOT path_exists AND unknown_edges == 0
unknown := otherwise
```
`gated` and `observed_reachable` are reserved for feature-gate and runtime-observed paths (see Sprint 3830 and 3840).
---
## 5. Example Slice
```json
{
"_type": "stellaops.dev/predicates/reachability-slice@v1",
"inputs": {
"graphDigest": "blake3:a1b2c3d4e5f6789012345678901234567890123456789012345678901234abcd",
"binaryDigests": ["sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef"],
"sbomDigest": "sha256:cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe"
},
"query": {
"cveId": "CVE-2024-1234",
"targetSymbols": ["openssl:EVP_PKEY_decrypt"],
"entrypoints": ["main", "http_handler"]
},
"subgraph": {
"nodes": [
{"id": "node:1", "symbol": "main", "kind": "entrypoint", "file": "/app/main.c", "line": 42},
{"id": "node:2", "symbol": "process_request", "kind": "intermediate", "file": "/app/handler.c", "line": 100},
{"id": "node:3", "symbol": "decrypt_data", "kind": "intermediate", "file": "/app/crypto.c", "line": 55},
{"id": "node:4", "symbol": "EVP_PKEY_decrypt", "kind": "target", "purl": "pkg:generic/openssl@3.0.0"}
],
"edges": [
{"from": "node:1", "to": "node:2", "kind": "direct", "confidence": 1.0},
{"from": "node:2", "to": "node:3", "kind": "direct", "confidence": 0.95},
{"from": "node:3", "to": "node:4", "kind": "plt", "confidence": 0.9}
]
},
"verdict": {
"status": "reachable",
"confidence": 0.9,
"reasons": ["path_exists_high_confidence"],
"pathWitnesses": ["main -> process_request -> decrypt_data -> EVP_PKEY_decrypt"],
"unknownCount": 0
},
"manifest": {
"scanId": "scan-1234",
"createdAtUtc": "2025-12-22T10:00:00Z",
"artifactDigest": "sha256:00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff",
"artifactPurl": "pkg:generic/app@1.0.0",
"scannerVersion": "scanner.native:1.2.0",
"workerVersion": "scanner.worker:1.2.0",
"concelierSnapshotHash": "sha256:1111222233334444555566667777888899990000aaaabbbbccccddddeeeeffff",
"excititorSnapshotHash": "sha256:2222333344445555666677778888999900001111aaaabbbbccccddddeeeeffff",
"latticePolicyHash": "sha256:3333444455556666777788889999000011112222aaaabbbbccccddddeeeeffff",
"deterministic": true,
"seed": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
"knobs": { "maxDepth": "20" }
}
}
```
---
## 6. Determinism Requirements
For reproducible slices:
1. **Node ordering**: Sort by `id` (ordinal).
2. **Edge ordering**: Sort by `from`, then `to`, then `kind`.
3. **Strings**: Trim and de-duplicate lists (`targetSymbols`, `entrypoints`, `reasons`).
4. **Timestamps**: Use UTC ISO-8601 with `Z` suffix.
5. **JSON serialization**: Canonical JSON (sorted keys, no whitespace).
---
## 7. Related Documentation
- [Binary Reachability Schema](./binary-reachability-schema.md)
- [RichGraph Contract](../contracts/richgraph-v1.md)
- [Function-Level Evidence](./function-level-evidence.md)
- [Replay Verification](./replay-verification.md)
---
_Created: 2025-12-22. See Sprint 3810 for implementation details._