docs consolidation and others

This commit is contained in:
master
2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions

View File

@@ -0,0 +1,202 @@
# Reachability Evidence Delivery Guide
_Last updated: November 8, 2025. Owner: Reachability Tiger Team (Scanner, Signals, Replay, Policy, Authority, UI)._
This guide translates the deterministic reachability blueprint into concrete work streams that average contributors can pick up without re-reading the entire proposal. Use it as the single navigation point when you land a reachability ticket. For a task-centric view of remaining gaps, see `docs/modules/reach-graph/guides/REACHABILITY_GAP_TASKS.md`.
---
## 1. Scope & Principles
**Goal**: ship a verifiable reachability signal for every scan by chaining SBOM → graph → runtime facts → VEX into DSSE-attested, replayable evidence.
**Principles**
1. **Deterministic inputs** canonical IDs, sorted payloads, normalized timestamps.
2. **Provable facts** every artifact has a DSSE envelope anchored in Authority + Rekor mirror.
3. **Replay-first** manifests pin feed snapshots, analyzer digests, and policies so auditors can rerun.
4. **Least surprise** same API and file layouts across languages; tests run fixture packs at CI time.
---
## 2. Evidence Chain Overview
| Stage | Producer | Artifact | Requirements |
|-------|----------|----------|--------------|
| SBOM per layer & composed image | Scanner Worker + Sbomer | `sbom.layer.cdx.json`, `sbom.image.cdx.json` | Deterministic CycloneDX 1.6, DSSE envelope, CAS URI |
| Static reachability graph | Scanner Worker lifters (DotNet, Go, Node/Deno, Rust, Swift, JVM, Binary, Shell) | `richgraph-v1.json` + `sha256` | Canonical SymbolIDs, framework entries, predicates, graph hash |
| Runtime facts | Zastava Observer / runtime probes | `runtime-trace.ndjson` (gzip or JSON) | EntryTrace schema, CAS pointer, process/socket/container metadata, optional compression |
| Replay manifest | Scanner Worker + Replay Core | `replay.yaml` | Contains analyzer versions, feed locks, graph hash, runtime trace digests |
| VEX statements | Scanner WebService + Policy Engine | `reachability.json` + OpenVEX doc | Links SBOM attn, graph attn, runtime evidence IDs |
| Signed bundle | Authority + Signer | DSSE envelope referencing above | Support FIPS + PQ variants (Dilithium where required) |
---
## 3. Work Streams (modules + hand-offs)
| Stream | Owner Guild(s) | Key deliverables |
|--------|----------------|------------------|
| **Native symbols & callgraphs** | Scanner Worker · Symbols Guild | Ship `Scanner.Symbols.Native` + `Scanner.CallGraph.Native`, integrate Symbol Manifest v1, demangle Itanium/MSVC names, emit `FuncNode`/`CallEdge` CAS bundles (task `SCANNER-NATIVE-401-015`). |
| **Reachability store** | Signals · BE-Base Platform | Provision shared PostgreSQL tables (`func_nodes`, `call_edges`, `cve_func_hits`), indexes, and repositories plus REST hooks for reuse (task `SIG-STORE-401-016`). |
| **Language lifters** | Scanner Worker | CLI/hosted lifters for DotNet, Go, Node/Deno, JVM, Rust, Swift, Binary, Shell with CAS uploads and richgraph output |
| **Signals ingestion & scoring** | Signals | `/callgraphs`, `/runtime-facts` (JSON + NDJSON/gzip), `/graphs/{id}`, `/reachability/recompute` GA; CAS-backed storage, runtime dedupe, BFS+predicates scoring |
| **Runtime capture** | Zastava + Runtime Guild | EntryTrace/eBPF samplers, NDJSON batches (symbol IDs + timestamps + counts) |
| **Replay evidence** | Replay Core + Scanner Worker | Manifest schema v2, `ReachabilityReplayWriter` integration, hash-lock tests |
| **Authority attestations** | Authority + Signer | DSSE predicates for SBOM, Graph, Replay, VEX; Rekor mirror alignment |
| **Policy & VEX** | Policy Engine + Web + CLI + UI | Accept reachability states, render “Why safe” call paths, CLI/UI explain flows |
| **QA & Docs** | QA + Docs Guilds | `reachbench-2025-expanded` fixtures wired to CI; operator + developer runbooks |
| **Binary quality guardrails (Nov 2026)** | Scanner · Signals · QA | Build-id capture, init-array roots, purl-resolved edges, unknowns emission, and patch-oracle fixtures; see sections 5.75.9 |
---
## 4. Sprint Targets
| Sprint | Nickname | Focus | Exit Criteria |
|--------|----------|-------|---------------|
| **401** | Evidence Pipeline | Finish static lifters + CAS graph storage + runtime ingestion endpoint | Graph CAS layout documented, lifter fixtures passing, `/runtime-facts` receives NDJSON batches |
| **402** | Replay & Attest | Manifest v2, DSSE envelopes, Authority/Rekor publishing | Replay packs include hashes + analyzer fingerprint; DSSE statements passed integration; Rekor mirror updated |
| **403** | Policy & Explain | VEX generation, SPL predicates, UI/CLI explainers | Policy engine uses reachability states, CLI `stella graph explain` returns signed paths, UI shows explain drawer |
Each sprint is two weeks; refer to `docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md` (new) for per-task tracking.
---
## 5. Task Breakdown Cheat Sheet
### 5.1 Scanner Worker
1. **Lifter SDK** Define `RichGraphWriter`, canonical SymbolID helpers, analyzer interface updates.
2. **Language passes** deliverables per language: discovery, graph build, framework wiring, predicate extraction, runtime overlay.
3. **Replay hooks** plug lifter output + runtime traces into `ReachabilityReplayWriter`; enforce CAS registration before emitting manifest references.
4. **Fixture runs** add tests under `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` to execute lifter outputs against reachbench A/B cases.
### 5.2 Signals Service
1. **Callgraph CAS layout** migrate from filesystem to CAS (`cas://reachability/graphs/{hash}`), include metadata doc.
2. **Runtime facts API** accept NDJSON or gzip, dedupe events, compute hit stats, link to graph nodes.
3. **Scoring engine v2** support multi-state lattice (`Unknown → Observed`), record predicates, blocked edges, runtime evidence CAS URIs.
4. **API responses** `/graphs/{scanId}` returns graph CAS refs + manifest pointers; `/reachability/recompute` accepts replay manifest IDs.
### 5.3 Replay Core & Authority
1. **Manifest schema v2** YAML + JSON versions, includes feeds/analyzers/policies.
2. **CAS naming** standardize `cas://reachability/{kind}/{sha256}`.
3. **DSSE predicate types** `SbomAttestation`, `GraphAttestation`, `VexAttestation`, `ReplayManifest`.
4. **Authority integration** new endpoints for submitting reachability predicates, rotation tests, Rekor mirror update instructions.
### 5.4 Policy / Web / UI / CLI
1. **Policy Engine** ingest reachability fact from Signals, expose via SPL, produce metrics, integrate into explanation tree.
2. **Web API** join reachability fields in vuln responses, add override endpoints, simulate support.
3. **UI/CLI** Visual explain drawer/CLI command showing signed call-path, predicates, runtime hits; counterfactual toggles.
4. **VEX emitter** generate OpenVEX statements with evidence references, DSSE sign via Signer.
### 5.5 Native binaries (build-id + init roots)
- Capture ELF build-id (`.note.gnu.build-id`) alongside soname/path and propagate into `SymbolID`/`code_id` so SBOM/runtime joins stay stable even when paths change.
- Treat `.preinit_array`, `.init_array`, `.ctors`, and `_init` as synthetic graph roots with `phase=load`; include constructors from `DT_NEEDED` deps. Persist the root list in scan evidence.
- Add deterministic tests covering build-id present/absent and init-array edge creation.
### 5.6 PURL-resolved edges
- Annotate every call edge with callee `purl` and `symbol_digest` per `docs/modules/reach-graph/guides/purl-resolved-edges.md`.
- Update `richgraph-v1` schema, CAS metadata, and CLI/UI explainers to display `purl@version` + demangled name.
- Signals merges graphs by `(purl, symbol_digest)`; Policy uses the same keys when mapping CVE-affected functions.
### 5.7 Unknowns Registry integration
- Emit structured Unknowns when symbol->purl mapping, edge targets, or hashes are ambiguous; write them via Signals API per `docs/modules/signals/guides/unknowns-registry.md`.
- Scoring adds `unknowns_pressure` so `not_affected` claims cannot bypass unresolved evidence.
- UI/CLI should surface unknown chips and triage actions.
### 5.8 Patch-oracle guardrails
- Add `tests/reachability/patch-oracles/**` with paired vuln/fixed binaries and `oracle.yml` expectations (functions/edges added/removed).
- Scanner binary analyzer tests must fail if expected guard functions or edges are missing; CI job ensures determinism.
- See `docs/modules/reach-graph/guides/patch-oracles.md` for fixture layout and manifest schema.
### 5.9 JS/PHP framework reachability
- Model framework entrypoints explicitly: Express/Fastify/Nest handlers, Laravel/Symfony routes/commands/hooks. Generate graph roots from route/handler catalogs instead of generic `main` only.
- Represent dynamic import/require/include resolution as graph nodes so ambiguity stays visible (`resolution` edges with confidence).
- Keep multi-layer graphs: source-level (TS/JS/PHP) plus bundled output (Webpack/Vite). Merge with runtime hints when available.
- Status model: `always_reachable`, `conditional`, `not_reachable`, `not_analyzed`, `ambiguous`, each with confidence and evidence tags.
- Deliver language-specific profiles + fixture cases to prove coverage; update CLI/UI explainers to show framework route context.
### 5.10 Vulnerability Surfaces (Sprint 3700)
Vulnerability surfaces identify **which specific methods changed** in a security fix, enabling precise reachability analysis:
- **Surface computation**: Download vulnerable and fixed package versions, fingerprint all methods, diff to find changed methods (sinks).
- **Trigger extraction**: Build internal call graphs, reverse BFS from sinks to public APIs (triggers).
- **Per-ecosystem support**:
- NuGet: Cecil IL fingerprinting
- npm: Babel AST fingerprinting
- Maven: ASM bytecode fingerprinting
- PyPI: Python AST fingerprinting
- **Integration**: `ISurfaceQueryService` queries triggers during scan; use triggers as sinks instead of all package methods.
- **Storage**: `scanner.vuln_surfaces`, `scanner.vuln_surface_sinks`, `scanner.vuln_surface_triggers` tables.
- **Docs**: `docs/contracts/vuln-surface-v1.md` for schema details.
### 5.11 Confidence Tiers
Reachability findings are classified into confidence tiers:
| Tier | Condition | Display | Implications |
|------|-----------|---------|--------------|
| **Confirmed** | Surface exists AND trigger method is reachable | Red badge | Highest confidence—vulnerable code definitely called |
| **Likely** | No surface but package API is called | Orange badge | Medium confidence—package used but specific vuln path unknown |
| **Present** | No call graph, dependency in SBOM | Gray badge | Lowest confidence—cannot determine reachability |
| **Unreachable** | Surface exists AND no trigger reachable | Green badge | High confidence vulnerability is not exploitable |
- Tier assignment logic in `SurfaceAwareReachabilityAnalyzer`
- API responses include `confidenceTier` and `confidenceDisplay`
- UI badges reflect tier colors
- VEX statements reference tier in justification
### 5.12 Reachability Drift (Sprint 3600)
Track function-level reachability changes between scans:
- **New reachable**: Sinks that became reachable (alert)
- **Mitigated**: Sinks that became unreachable (positive)
- **Causal attribution**: Why change occurred (guard removed, new route, code change)
- **Components**: `DriftDetectionEngine`, `PathCompressor`, `DriftCauseExplainer`
- **API**: `POST /api/drift/analyze`, `GET /api/drift/{id}`
- **UI**: `PathViewerComponent`, `RiskDriftCardComponent`
- **Attestation**: DSSE-signed drift predicates for evidence chain
---
## 6. Acceptance Tests
1. **Hash-lock** reorder analyzer flags and confirm graph hash unchanged.
2. **Replay** delete caches, replay manifest, verify DSSE + hash equality.
3. **Tamper** alter single edge and expect VEX verification failure with specific path mismatch.
4. **Golden corpus** run all reachbench cases; ensure NotReachable vs Reachable twins align with expectations JSON.
5. **Runtime sanity** feed staged runtime traces and ensure confidence bump + `observed=true` path chips propagate to UI.
---
## 7. Documentation & Runbooks
- Place developer-facing updates here (`docs/modules/reach-graph/guides`).
- [Function-level evidence guide](function-level-evidence.md) captures the Nov2025 advisory scope, task references, and schema expectations; keep it in lockstep with sprint status.
- [Reachability runtime runbook](../runbooks/reachability-runtime.md) documents ingestion, CAS staging, air-gap handling, and troubleshooting—link every runtime feature PR to this guide.
- [VEX Evidence Playbook](../benchmarks/vex-evidence-playbook.md) defines the bench repo layout, artifact shapes, verifier tooling, and metrics; keep it updated when Policy/Signer/CLI features land.
- [Reachability lattice](lattice.md) describes the confidence states, evidence/mitigation kinds, scoring policy, event graph schema, and VEX gates; update it when lattices or probes change.
- [PURL-resolved edges spec](purl-resolved-edges.md) defines the purl + symbol-digest annotation rules for graphs and SBOM joins.
- [Patch-oracles QA pattern](patch-oracles.md) describes the fixture layout and expectations for binary reachability guards.
- [Unknowns registry](../signals/unknowns-registry.md) documents how unresolved symbols/edges are recorded and how scoring uses `unknowns_pressure`.
- [Evidence schema](evidence-schema.md) is the canonical field list for richgraph, runtime facts, and Unknowns CAS objects.
- Update module dossiers (Scanner, Signals, Replay, Authority, Policy, UI) once each guild lands work.
---
## 8. Contact & Rituals
- **Daily reachability stand-up** in `#reachability-build`.
- **Fixture sync** every Friday: QA leads run reachbench matrix, post report to Confluence + link in `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`.
- **Decision log** Append ADRs under `docs/adr/reachability-*` for schema changes.
Keep this guide updated whenever scope shifts or a new sprint is added.

View File

@@ -0,0 +1,44 @@
# Reachability Callgraph Formats (richgraph-v1)
## Purpose
Normalize static callgraphs across languages so Signals can merge them with runtime traces and replay bundles deterministically.
## Core fields (per node/edge)
- `nodes[].id` — canonical SymbolID (language-specific, stable, lowercase where applicable).
- `nodes[].kind` — e.g., method/function/class/file.
- `edges[].sourceId` / `edges[].targetId` — SymbolIDs; edge types include `call`, `import`, `inherit`, `reference`.
- `artifact` — CAS paths for source graph files; include `sha256`, `uri`, optional `generator` (analyzer name/version).
## Language-specific notes
- **JVM**: use JVM internal names; include signature for overloads.
- **.NET/Roslyn**: fully-qualified method token; include assembly and module for cross-assembly edges.
- **Go SSA**: package path + function; include receiver for methods.
- **Node/Deno TS**: module path + exported symbol; ES module graph only.
- **Rust MIR**: crate::module::symbol; monomorphized forms allowed if stable.
- **Swift SIL**: mangled name; demangled kept in metadata only.
- **Shell/binaries**: `SymbolID = sym:binary:{sha256(file)\0section\0addr\0name\0linkage}` via `SymbolId.ForBinaryAddressed`, include `code_id = CodeId.ForBinarySegment(...)` and set `kind=binary`.
## CAS layout
- Store graph bundles under `reachability_graphs/<hh>/<sha>.tar.zst`.
- Bundle SHOULD contain `meta.json` with analyzer, version, language, component, and entry points (array).
- File order inside tar must be lexicographic to keep hashes stable.
## Validation rules
- No duplicate node IDs; edges must reference existing nodes.
- Entry points list must be present (even if empty) for Signals recompute.
- Graph SHA256 must match tar content; Signals rejects mismatched SHA.
- Only ASCII; UTF-8 paths are allowed but must be normalized (NFC).
## V1 Schema Reference
The `stella.callgraph.v1` schema provides enhanced fields for explainability:
- **Edge Reasons**: 13 reason codes explaining why edges exist
- **Symbol Visibility**: Public/Internal/Protected/Private access levels
- **Typed Entrypoints**: Framework-aware entrypoint detection
See [Callgraph Schema Reference](../signals/callgraph-formats.md) for complete v1 schema documentation.
## References
- **V1 Schema Reference**: `docs/modules/signals/guides/callgraph-formats.md`
- Union schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`

View File

@@ -0,0 +1,69 @@
# Reachability Corpus Plan (QA-CORPUS-401-031)
Objective
- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.
## Corpus Map
### 1) Multi-runtime corpus (internal MVP)
Path: `tests/reachability/corpus/`
Per-case layout: `tests/reachability/corpus/<language>/<case>/`
- `callgraph.static.json` — static call graph sample (stub for MVP).
- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
- `vex.openvex.json` — expected VEX slice for the case.
- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
### 2) Public mini dataset (PHP/JS/C#)
Path: `tests/reachability/samples-public/`
Layout:
- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
### 3) Reachbench fixture pack (expanded, dual variants)
Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
## Ground Truth Conventions
- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
## Determinism & Runners
Regenerate all reachability manifests (corpus + public samples + reachbench pack):
- `tests/reachability/runners/run_all.sh`
- `tests/reachability/runners/run_all.ps1`
Individual scripts:
- `python tests/reachability/scripts/update_corpus_manifest.py`
- `python tests/reachability/samples-public/scripts/update_manifest.py`
- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
## CI Gates
- `tests/reachability/StellaOps.Reachability.FixtureTests`
- validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
- enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
## MVP Slice (stub cases)
- Go: `go-ssh-CVE-2020-9283-keyexchange`
- .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
- Python: `python-django-CVE-2019-19844-sqli-like`
- Rust: `rust-axum-header-parsing-TBD`
## Next Work (post-MVP)
- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.

View File

@@ -0,0 +1,143 @@
# CVE-to-Symbol Mapping
_Last updated: 2025-12-22. Owner: Scanner Guild + Concelier Guild._
This document describes how StellaOps maps CVE identifiers to specific binary symbols/functions for reachability slices.
---
## 1. Overview
To determine if a vulnerability is reachable, StellaOps resolves:
- **CVE identifiers** (e.g., `CVE-2024-1234`)
- **Package coordinates** (e.g., `pkg:npm/lodash@4.17.21`)
- **Affected symbols** (e.g., `lodash.template`, `openssl:EVP_PKEY_decrypt`)
The mapping is used by `SliceExtractor` to target the right symbols and by downstream VEX decisions.
---
## 2. Data Sources
### 2.1 Patch Diff Surfaces (Preferred)
Highest-fidelity source: compute method-level diffs between vulnerable and fixed versions.
**Implementation**: `StellaOps.Scanner.VulnSurfaces`
### 2.2 Advisory Linksets (Concelier)
Scanner queries Concelier's LNM linksets for package coordinates and optional symbol hints.
**Implementation**: `StellaOps.Scanner.Advisory` -> Concelier `/v1/lnm/linksets/{cveId}` or `/v1/lnm/linksets/search`
### 2.3 Offline Bundles
For air-gapped environments, precomputed bundles map CVEs to packages and symbols.
**Implementation**: `FileAdvisoryBundleStore`
---
## 3. Service Contracts
### 3.1 CVE -> Package/Symbol Mapping
```csharp
public interface IAdvisoryClient
{
Task<AdvisorySymbolMapping?> GetCveSymbolsAsync(string cveId, CancellationToken ct = default);
}
public sealed record AdvisorySymbolMapping
{
public required string CveId { get; init; }
public ImmutableArray<AdvisoryPackageSymbols> Packages { get; init; }
public required string Source { get; init; } // "concelier" | "bundle"
}
public sealed record AdvisoryPackageSymbols
{
public required string Purl { get; init; }
public ImmutableArray<string> Symbols { get; init; }
}
```
### 3.2 CVE + PURL -> Affected Symbols
```csharp
public interface IVulnSurfaceService
{
Task<VulnSurfaceResult> GetAffectedSymbolsAsync(
string cveId,
string purl,
CancellationToken ct = default);
}
public sealed record VulnSurfaceResult
{
public required string CveId { get; init; }
public required string Purl { get; init; }
public required ImmutableArray<AffectedSymbol> Symbols { get; init; }
public required string Source { get; init; } // "surface" | "package-symbols" | "heuristic"
public required double Confidence { get; init; }
}
public sealed record AffectedSymbol
{
public required string SymbolId { get; init; }
public string? MethodKey { get; init; }
public string? DisplayName { get; init; }
public string? ChangeType { get; init; }
public double Confidence { get; init; }
}
```
---
## 4. Caching Strategy
| Data | TTL | Notes |
|------|-----|------|
| Advisory linksets | 1 hour | In-memory cache; configurable TTL |
| Offline bundles | Process lifetime | Loaded once from file |
---
## 5. Offline Bundle Format
```json
{
"items": [
{
"cveId": "CVE-2024-1234",
"source": "bundle",
"packages": [
{
"purl": "pkg:npm/lodash@4.17.21",
"symbols": ["template", "templateSettings"]
}
]
}
]
}
```
---
## 6. Fallback Behavior
When no surface or advisory mapping is available, the service returns an empty symbol list with low confidence and `Source = "heuristic"`. Callers may inject an `IPackageSymbolProvider` to supply public-symbol fallbacks.
---
## 7. Related Documentation
- [Slice Schema](./slice-schema.md)
- [Patch Oracles](./patch-oracles.md)
- [Concelier Architecture](../modules/concelier/architecture.md)
---
_Created: 2025-12-22. See Sprint 3810 for implementation details._

View File

@@ -0,0 +1,535 @@
# Function-Level Evidence Guide
_Last updated: 2025-12-13. Owner: Docs Guild._
This guide documents the cross-module function-level evidence chain that enables provable reachability claims. It covers the schema, identifiers, API usage, CLI commands, and integration patterns for Scanner, Signals, Policy, and Replay.
---
## 1. Overview
StellaOps implements a **function-level evidence chain** that anchors every vulnerability finding to immutable identifiers (`code_id`, `symbol_id`, `graph_hash`) enabling:
- **Provable reachability:** Deterministic call-path evidence from entry points to vulnerable functions.
- **Stripped binary support:** `code_id` + `code_block_hash` provides identity when symbols are absent.
- **Evidence replay:** Sealed artifacts with DSSE attestation allow offline verification.
- **Cross-module linking:** Scanner -> Signals -> Policy -> VEX -> UI/CLI evidence chain.
### 1.1 Core Identifiers
| Identifier | Format | Purpose | Example |
|------------|--------|---------|---------|
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity | `sym:java:R3JlZXRpbmc...` |
| `code_id` | `code:{lang}:{base64url}` | Identity for name-less code blocks | `code:binary:YWJjZGVm...` |
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity | `blake3:a1b2c3d4e5f6...` |
| `symbol_digest` | `sha256:{hex}` | Hash of symbol_id for edge linking | `sha256:e5f6a7b8c9d0...` |
| `build_id` | `gnu-build-id:{hex}` | ELF/PE debug identifier | `gnu-build-id:5f0c7c3c...` |
### 1.2 Evidence Chain Flow
```
Scanner -> richgraph-v1 -> Signals -> Scoring -> Policy -> VEX -> UI/CLI
| | | | | | |
| | | | | | +-- stella graph explain
| | | | | +-- OpenVEX with call-path proofs
| | | | +-- Policy gates + reachability.state
| | | +-- Lattice state + confidence + riskScore
| | +-- Runtime facts + static paths
| +-- BLAKE3 graph_hash + DSSE attestation
+-- code_id, symbol_id, build_id per node
```
---
## 2. Schema Reference
### 2.1 SymbolID Construction
Per-language canonical tuple format (NUL-separated, then SHA-256 -> base64url):
| Language | Tuple Components | Example |
|----------|------------------|---------|
| Java | `{package}\0{class}\0{method}\0{descriptor}` | `com.example\0Foo\0bar\0(Ljava/lang/String;)V` |
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | `MyApp\0Controllers\0UserController\0GetById(int)` |
| Go | `{module}\0{package}\0{receiver}\0{func}` | `github.com/user/repo\0handler\0*Server\0Handle` |
| Node | `{pkg_or_path}\0{export_path}\0{kind}` | `lodash\0get\0function` |
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | `sha256:abc...\0.text\00x401000\0ssl3_read\0global\0` |
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` | `requests\0api\0get` |
| Ruby | `{gem_or_path}\0{module}\0{method}` | `rails\0ActionController::Base\0render` |
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | `symfony/http-kernel\0Kernel\0handle` |
### 2.2 CodeID Construction
For stripped binaries or name-less code blocks:
```
code:{lang}:{base64url_sha256(format + file_hash + addr + length + section + code_block_hash)}
```
Example for stripped ELF:
```
code:binary:YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo
```
### 2.3 Graph Node Schema
Each node in a richgraph-v1 document includes:
```json
{
"id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
"symbol_id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
"code_id": "code:java:...",
"lang": "java",
"kind": "method",
"display": "com.example.GreetingService.greet(String)",
"purl": "pkg:maven/com.example/greeting-service@1.0.0",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_digest": "sha256:e5f6a7b8...",
"code_block_hash": "sha256:deadbeef...",
"symbol": {
"mangled": null,
"demangled": "com.example.GreetingService.greet(String)",
"source": "DWARF",
"confidence": 0.98
},
"evidence": ["import", "bytecode"],
"attributes": {}
}
```
### 2.4 Graph Edge Schema
Edges carry callee `purl` and `symbol_digest` for SBOM correlation:
```json
{
"from": "sym:java:caller...",
"to": "sym:java:callee...",
"kind": "call",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"symbol_digest": "sha256:f1e2d3c4...",
"confidence": 0.92,
"evidence": ["bytecode", "import"],
"candidates": []
}
```
### 2.5 Evidence Block Schema
Evidence blocks in Policy/VEX responses cite all relevant identifiers:
```json
{
"evidence": {
"graph_hash": "blake3:a1b2c3d4e5f6...",
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
"path": [
{"symbol_id": "sym:java:...", "display": "main()"},
{"symbol_id": "sym:java:...", "display": "processRequest()"},
{"symbol_id": "sym:java:...", "display": "log4j.error()"}
],
"path_length": 3,
"confidence": 0.85,
"runtime_hits": ["probe:jfr:1234"],
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"toolchain_digest": "sha256:..."
}
}
}
```
---
## 3. API Usage
### 3.1 Signals Callgraph Ingestion
Submit a callgraph and receive a deterministic `graph_hash`:
```http
POST /signals/callgraphs
Authorization: Bearer <token>
Content-Type: application/json
{
"schema": "richgraph-v1",
"analyzer": {"name": "scanner.java", "version": "1.2.0"},
"nodes": [...],
"edges": [...],
"roots": [...]
}
```
**Response:**
```json
{
"graphHash": "blake3:a1b2c3d4e5f6...",
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
"nodeCount": 1247,
"edgeCount": 3891
}
```
### 3.2 Signals Runtime Facts
Submit runtime observations with `code_id` anchors:
```http
POST /signals/runtime-facts/ndjson?scanId=scan-123&imageDigest=sha256:abc123
Authorization: Bearer <token>
Content-Type: application/x-ndjson
Content-Encoding: gzip
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":47,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:00Z"}
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":12,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:01Z"}
```
**Response:**
```json
{
"accepted": 128,
"duplicates": 2,
"evidenceUri": "cas://reachability/runtime/sha256:xyz789..."
}
```
### 3.3 Fetch Reachability Facts
Query reachability state for a subject:
```http
GET /signals/facts/{subjectKey}
Authorization: Bearer <token>
```
**Response:**
```json
{
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
"metadata": {
"fact": {
"digest": "sha256:abc123...",
"version": 3
}
},
"states": [
{
"symbol": "sym:java:...",
"latticeState": "CR",
"bucket": "runtime",
"confidence": 0.92,
"score": 0.78,
"path": ["sym:java:main...", "sym:java:process...", "sym:java:log4j..."],
"evidence": {
"static": {"graphHash": "blake3:...", "pathLength": 3, "confidence": 0.85},
"runtime": {"probeId": "probe:jfr:1234", "hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
}
}
],
"score": 0.78,
"aggregateTier": "T2",
"riskScore": 0.65
}
```
### 3.4 Policy Findings with Reachability Evidence
```http
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
Authorization: Bearer <token>
```
**Response (excerpt):**
```json
{
"findingId": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
"reachability": {
"state": "CR",
"confidence": 0.92,
"evidence": {
"graph_hash": "blake3:a1b2c3d4...",
"path": [
{"symbol_id": "sym:java:...", "display": "main()"},
{"symbol_id": "sym:java:...", "display": "Logger.error()"}
],
"runtime_hits": 47,
"fact_digest": "sha256:abc123..."
}
},
"steps": [
{"rule": "reachability_gate", "state": "CR", "allowed": true},
{"rule": "severity_baseline", "severity": {"normalized": "Critical", "score": 10.0}}
]
}
```
---
## 4. CLI Usage
### 4.1 Graph Explain Command
View the call path and evidence for a finding:
```bash
stella graph explain --finding "pkg:maven/log4j@2.14.1:CVE-2021-44228" --scan-id scan-123
# Output:
Finding: CVE-2021-44228 in pkg:maven/log4j@2.14.1
Reachability: CONFIRMED_REACHABLE (CR)
Confidence: 0.92
Graph Hash: blake3:a1b2c3d4e5f6...
Call Path (3 hops):
1. main() [sym:java:R3JlZXRpbmcuLi4=]
-> processRequest() [direct call]
2. processRequest() [sym:java:cHJvY2Vzcy4uLg==]
-> Logger.error() [virtual call]
3. Logger.error() [sym:java:bG9nNGouLi4=]
[VULNERABLE: CVE-2021-44228]
Runtime Evidence:
- JFR probe hit: 47 times
- Last observed: 2025-12-13T10:00:00Z
DSSE Attestation: cas://reachability/graphs/a1b2c3d4....dsse
```
### 4.2 Graph Export Command
Export a reachability graph for offline analysis:
```bash
stella graph export --scan-id scan-123 --output ./evidence-bundle/
# Creates:
# ./evidence-bundle/richgraph-v1.json # Canonical graph
# ./evidence-bundle/richgraph-v1.json.dsse # DSSE envelope
# ./evidence-bundle/meta.json # Metadata
# ./evidence-bundle/runtime-facts.ndjson # Runtime observations
```
### 4.3 Graph Verify Command
Verify a graph's DSSE signature and Rekor inclusion:
```bash
stella graph verify --graph ./evidence-bundle/richgraph-v1.json \
--dsse ./evidence-bundle/richgraph-v1.json.dsse \
--rekor-log
# Output:
Graph Hash: blake3:a1b2c3d4e5f6...
DSSE Signature: VALID (key: scanner-signing-2025)
Rekor Entry: 12345678 (verified)
Timestamp: 2025-12-13T09:30:00Z
```
---
## 5. OpenVEX Integration
### 5.1 OpenVEX with Reachability Evidence
When Policy emits VEX decisions, reachability evidence is included:
```json
{
"@context": "https://openvex.dev/ns/v0.2.0",
"@id": "https://stellaops.example/vex/2025-12-13/001",
"author": "StellaOps Policy Engine",
"timestamp": "2025-12-13T10:00:00Z",
"version": 1,
"statements": [
{
"vulnerability": {"@id": "CVE-2021-44228"},
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
"status": "affected",
"justification": "vulnerable_code_in_container",
"impact_statement": "Vulnerable Log4j method reachable from main entry point.",
"action_statement": "Upgrade to log4j 2.17.1 or later.",
"stellaops:reachability": {
"state": "CR",
"confidence": 0.92,
"graph_hash": "blake3:a1b2c3d4e5f6...",
"path_length": 3,
"evidence_uri": "cas://reachability/graphs/a1b2c3d4..."
}
}
]
}
```
### 5.2 VEX "not_affected" with Unreachability Evidence
When code is provably unreachable:
```json
{
"statements": [
{
"vulnerability": {"@id": "CVE-2023-XXXXX"},
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
"status": "not_affected",
"justification": "vulnerable_code_not_in_execute_path",
"impact_statement": "Vulnerable function not reachable from any entry point.",
"stellaops:reachability": {
"state": "CU",
"confidence": 0.88,
"graph_hash": "blake3:d4e5f6a7b8c9...",
"evidence_uri": "cas://reachability/graphs/d4e5f6a7b8c9...",
"runtime_observation_window": "72h",
"runtime_hits": 0
}
}
]
}
```
---
## 6. Replay Manifest v2
### 6.1 Manifest Structure
Replay manifests now enforce BLAKE3 hashing and CAS registration:
```json
{
"schema": "stellaops.replay.manifest@v2",
"subject": "scan:123",
"generatedAt": "2025-12-13T10:00:00Z",
"hashAlg": "blake3",
"artifacts": [
{
"kind": "richgraph",
"uri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6...",
"hash": "blake3:a1b2c3d4e5f6...",
"dsseUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6....dsse"
},
{
"kind": "runtime-facts",
"uri": "cas://reachability/runtime/sha256:xyz789...",
"hash": "sha256:xyz789..."
},
{
"kind": "sbom",
"uri": "cas://scanner-artifacts/sbom.cdx.json",
"hash": "sha256:def456..."
}
],
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"toolchain_digest": "sha256:..."
},
"code_id_coverage": {
"total_symbols": 1247,
"with_code_id": 1189,
"coverage_pct": 95.3
}
}
```
### 6.2 Determinism Verification
Replay a manifest to verify determinism:
```bash
stella replay verify --manifest ./manifest.json --sealed
# Output:
Manifest: stellaops.replay.manifest@v2
Subject: scan:123
Artifacts: 3
Verifying richgraph...
Computed: blake3:a1b2c3d4e5f6...
Expected: blake3:a1b2c3d4e5f6...
Status: MATCH
Verifying runtime-facts...
Computed: sha256:xyz789...
Expected: sha256:xyz789...
Status: MATCH
Verifying sbom...
Computed: sha256:def456...
Expected: sha256:def456...
Status: MATCH
All artifacts verified. Determinism check PASSED.
```
---
## 7. Module Integration Guide
### 7.1 Scanner -> Signals
Scanner emits richgraph-v1 with `code_id` and `symbol_id`:
1. Scanner analyzes container/artifact
2. Callgraph generators emit nodes with `symbol_id`, `code_id`, `build_id`
3. RichGraphWriter canonicalizes (sorted arrays/keys) and computes `graph_hash` (BLAKE3)
4. DSSE signer wraps canonical JSON
5. CAS store persists body + envelope
6. Signals ingestion API receives URI reference
### 7.2 Signals -> Policy
Signals provides reachability facts to Policy:
1. Policy queries `/signals/facts/{subjectKey}`
2. Response includes `metadata.fact.digest`, `states[]`, `score`
3. Policy gates check `latticeState` (U, SR, SU, RO, RU, CR, CU, X)
4. Evidence blocks in findings reference `graph_hash`, `path[]`, `runtime_hits[]`
### 7.3 Policy -> VEX/UI
Policy emits OpenVEX with evidence:
1. VexDecisionEmitter serializes OpenVEX with `stellaops:reachability` extension
2. UI explain drawer fetches evidence via `/api/policy/findings/{id}/explain`
3. CLI `stella graph explain` renders call path and attestation refs
---
## 8. CAS Layout Reference
```
cas://reachability/
graphs/
{blake3}/ # Graph body (canonical JSON)
{blake3}.dsse # DSSE envelope
edges/
{graph_hash}/{bundle_id} # Edge bundle body (optional)
{graph_hash}/{bundle_id}.dsse
runtime/
{sha256}/ # Runtime facts NDJSON
```
---
## 9. Related Documentation
- [Reachability Lattice Model](./lattice.md) - State definitions and join rules
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Schema specification
- [Evidence Schema](./evidence-schema.md) - Detailed field definitions
- [Signals API Contract](../api/signals/reachability-contract.md) - API reference
- [Policy Gates](./policy-gate.md) - Gate configuration
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
- [Ground Truth Schema](./ground-truth-schema.md) - Test fixture format
---
_Last updated: 2025-12-13. See Sprint 0401 GAP-DOC-008 for change history._

View File

@@ -0,0 +1,206 @@
# Gate Detection for Reachability Scoring
> **Sprint:** SPRINT_3405_0001_0001
> **Module:** Scanner Reachability / Signals
## Overview
Gate detection identifies protective controls in code paths that reduce the likelihood of vulnerability exploitation. When a vulnerable function is protected by authentication, feature flags, admin-only checks, or configuration gates, the reachability score is reduced proportionally.
## Gate Types
| Gate Type | Multiplier | Description |
|-----------|------------|-------------|
| `AuthRequired` | 30% | Code path requires authentication |
| `FeatureFlag` | 20% | Code path behind a feature flag |
| `AdminOnly` | 15% | Code path requires admin/elevated role |
| `NonDefaultConfig` | 50% | Code path requires non-default configuration |
### Multiplier Stacking
Multiple gate types stack multiplicatively:
```
Auth (30%) × Feature Flag (20%) = 6%
Auth (30%) × Admin (15%) = 4.5%
All four gates = ~0.45% (floored to 5%)
```
A minimum floor of **5%** prevents scores from reaching zero.
## Detection Methods
### AuthGateDetector
Detects authentication requirements:
**C# Patterns:**
- `[Authorize]` attribute
- `User.Identity.IsAuthenticated` checks
- `HttpContext.User` access
- JWT/Bearer token validation
**Java Patterns:**
- `@PreAuthorize`, `@Secured` annotations
- `SecurityContextHolder.getContext()`
- Spring Security filter chains
**Go Patterns:**
- Middleware patterns (`authMiddleware`, `RequireAuth`)
- Context-based auth checks
**JavaScript/TypeScript Patterns:**
- Express.js `passport` middleware
- JWT verification middleware
- Session checks
### FeatureFlagDetector
Detects feature flag guards:
**Patterns:**
- LaunchDarkly: `ldClient.variation()`, `ld.boolVariation()`
- Split.io: `splitClient.getTreatment()`
- Unleash: `unleash.isEnabled()`
- Custom: `featureFlags.isEnabled()`, `isFeatureEnabled()`
### AdminOnlyDetector
Detects admin/role requirements:
**Patterns:**
- `[Authorize(Roles = "Admin")]`
- `User.IsInRole("Admin")`
- `@RolesAllowed("ADMIN")`
- RBAC middleware checks
### ConfigGateDetector
Detects configuration-based gates:
**Patterns:**
- Environment variable checks (`process.env.ENABLE_FEATURE`)
- Configuration file conditionals
- Runtime feature toggles
- Debug-only code paths
## Output Contract
### DetectedGate
**Note:** In **Signals API outputs**, `type` is serialized as the C# enum name (e.g., `"AuthRequired"`). In **richgraph-v1** JSON, `type` is lowerCamelCase and gate fields are snake_case (see example below).
```typescript
interface DetectedGate {
type: 'AuthRequired' | 'FeatureFlag' | 'AdminOnly' | 'NonDefaultConfig';
detail: string; // Human-readable description
guardSymbol: string; // Symbol where gate was detected
sourceFile?: string; // Source file location
lineNumber?: number; // Line number
confidence: number; // 0.0-1.0 confidence score
detectionMethod: string; // Detection algorithm used
}
```
### GateDetectionResult
```typescript
interface GateDetectionResult {
gates: DetectedGate[];
hasGates: boolean;
primaryGate?: DetectedGate; // Highest confidence gate
combinedMultiplierBps: number; // Basis points (10000 = 100%)
}
```
## Integration
### RichGraph Edge Annotation
Gates are annotated on `RichGraphEdge` objects:
```csharp
public sealed record RichGraphEdge
{
// ... existing properties ...
/// <summary>Gates detected on this edge</summary>
public IReadOnlyList<DetectedGate> Gates { get; init; } = [];
/// <summary>Combined gate multiplier in basis points</summary>
public int GateMultiplierBps { get; init; } = 10000;
}
```
**richgraph-v1 JSON example (edge fragment):**
```json
{
"gate_multiplier_bps": 3000,
"gates": [
{
"type": "authRequired",
"detail": "[Authorize] attribute on controller",
"guard_symbol": "MyController.VulnerableAction",
"source_file": "src/MyController.cs",
"line_number": 42,
"detection_method": "csharp.attribute",
"confidence": 0.95
}
]
}
```
### ReachabilityReport
Gates are included in the reachability report:
```json
{
"vulnId": "CVE-2024-0001",
"reachable": true,
"score": 7.5,
"adjustedScore": 2.25,
"gates": [
{
"type": "AuthRequired",
"detail": "[Authorize] attribute on controller",
"guardSymbol": "MyController.VulnerableAction",
"confidence": 0.95
}
],
"gateMultiplierBps": 3000
}
```
## Configuration
### appsettings.json
```json
{
"Reachability": {
"GateMultipliers": {
"AuthRequiredMultiplierBps": 3000,
"FeatureFlagMultiplierBps": 2000,
"AdminOnlyMultiplierBps": 1500,
"NonDefaultConfigMultiplierBps": 5000,
"MinimumMultiplierBps": 500
}
}
}
```
## Metrics
| Metric | Description |
|--------|-------------|
| `scanner.gates_detected_total` | Total gates detected by type |
| `scanner.gate_reduction_applied` | Histogram of multiplier reductions |
| `scanner.gated_vulns_total` | Vulnerabilities with gates detected |
## Related Documentation
- [Reachability Architecture](../modules/scanner/architecture.md)
- [Determinism Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md) - Sections 2.2, 4.3
- [Signals Service](../modules/signals/architecture.md)

View File

@@ -0,0 +1,508 @@
# Hybrid Reachability Attestation (Graph + Edge-Bundle)
> Decision date: 2025-12-11 · Owners: Scanner Guild, Attestor Guild, Signals Guild, Policy Guild
## 0. Context: Four Capabilities
This document supports **Signed Reachability**—one of four capabilities no competitor offers together:
1. **Signed Reachability** Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. Both static call-graph edges and runtime-derived edges can be attested—true hybrid reachability.
2. **Deterministic Replay** Scans run bit-for-bit identical from frozen feeds and analyzer manifests.
3. **Explainable Policy (Lattice VEX)** Evidence-linked VEX decisions with explicit "Unknown" state handling.
4. **Sovereign + Offline Operation** FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles.
All evidence is sealed in **Decision Capsules** for audit-grade reproducibility.
---
## 1. Purpose
- Guarantee replayable, signed reachability evidence with **graph-level DSSE** for every scan while enabling **selective edge-level DSSE bundles** when finer provenance or dispute handling is required.
- Keep CI/offline bundles lean (graph-first), but allow auditors/regulators to quarantine or prove individual edges without regenerating whole graphs.
- Support **hybrid reachability** by attesting both static call-graph edges and runtime-derived edges.
## 2. Attestation levels
- **Level 0 (Graph DSSE) — Required**
- Payload: canonical `richgraph-v1` (nodes, edges, roots, graph_hash, analyzer metadata, policy_hash).
- Signature: one DSSE envelope per graph; submit digest to Rekor (or mirror) always.
- CAS: `cas://reachability/graphs/{blake3}` (body) + `cas://reachability/graphs/{blake3}.dsse` (envelope).
- **Level 1 (Edge-Bundle DSSE) — Optional/Selective**
- Payload: batch of edges (size ≤ 512) with per-edge reason, evidence hashes, `symbol_digest`, `purl`, `confidence`, and `phase`.
- Criteria to emit bundles:
- Edge reason is `runtime`, `init_array`/constructors/TLS callbacks, or comes from third-party provenance.
- Edge is contested/flagged in Unknowns registry or under policy quarantine.
- Signature: one DSSE envelope per bundle; Rekor submission **configurable** (default on for contested/high-risk bundles, off for bulk benign bundles in sealed mode).
- CAS: `cas://reachability/edges/{graph_hash}/{bundle_id}` JSON + `.../{bundle_id}.dsse`.
## 3. Producer responsibilities
- **Scanner**
- Always emit Level 0 graph + manifest.
- When criteria match, emit Level 1 bundles; include `bundle_reason` (e.g., `runtime-hit`, `init-root`, `third-party`, `disputed`).
- Canonicalise JSON (sorted keys/arrays) before hashing; BLAKE3 as graph hash, SHA-256 inside bundles.
- For hybrid reachability: tag edges with `source: static` or `source: runtime` to distinguish call-graph derived vs. runtime-observed edges.
- **Attestor/Signer**
- Apply DSSE for both levels; respect sovereign crypto modes (FIPS/GOST/SM/PQC) from environment.
- Rekor: push graph envelope digests; push edge-bundle digests only when `rekor_publish=true` (policy/default for high-risk bundles).
## 4. Consumer responsibilities
- **Signals**
- Ingest graph DSSE as the canonical source; ingest edge-bundles when present and attach to the same `graph_hash`.
- Store per-edge DSSE metadata for quarantine/override flows; surface missing edges as Unknowns only when absent from both graph and bundles.
- **Policy**
- Default trust path: graph DSSE + CAS object.
- When an edge is quarantined/contested, drop it from consideration if an edge-bundle DSSE marks it `revoked=true` or if the Unknowns registry lists it with policy quarantine flag.
- For "evidence-required" rules, require either (a) graph DSSE + policy_hash match **or** (b) edge-bundle DSSE that covers the vulnerable path edges.
- **Replay/Bench/CLI**
- `stella graph verify` should accept `--graph {hash}` and optional `--edge-bundles` to validate deeper provenance offline.
## 5. Verification and quarantine flows
- **Happy path**: verify graph DSSE → verify Rekor inclusion (or mirror) → hash graph body → match `graph_hash` in policy/replay manifest → accept.
- **Dispute/quarantine**: mark specific `edge_id` as `revoked` in an edge-bundle DSSE; Policy/Signals exclude it, recompute reachability, and surface delta in explainers.
- **Offline**: retain graph DSSE and selected edge-bundles inside replay pack; Rekor proofs cached when available.
- **Sovereign Verification Mode**: Even with no internet, all signatures and transparency proofs can be locally verified using Offline Update Kits.
## 6. Performance & storage guardrails
- Default: only graph DSSE is mandatory; edge-bundles capped at 512 edges per envelope and emitted only on criteria above.
- Rekor flood control: cap edge-bundle Rekor submissions per graph (config `reachability.edgeBundles.maxRekorPublishes`, default 5). Others stay CAS-only.
- Determinism: bundle ordering = stable sort by `(bundle_reason, edge_id)`; hash before signing.
## 7. Hybrid Reachability Details
Stella Ops provides **true hybrid reachability** by combining:
| Signal Type | Source | Attestation |
|-------------|--------|-------------|
| Static call-graph edges | IL/bytecode analysis, framework routing models, entry-point proximity | Graph DSSE (Level 0) |
| Runtime-observed edges | EventPipe, JFR, Node inspector, Go/Rust probes | Edge-bundle DSSE (Level 1) with `source: runtime` |
**Why hybrid matters:**
- Static analysis catches code paths that may not execute during observed runtime
- Runtime analysis catches dynamic dispatch, reflection, and framework-injected paths
- Combining both provides confidence across build and runtime contexts
- Each edge type is separately attestable for audit and dispute resolution
**Evidence linking:** Each edge in the graph or bundle includes `evidenceRefs` pointing to the underlying proof artifacts (static analysis artifacts, runtime traces), enabling **evidence-linked VEX decisions**.
## 8. Decisions (Frozen 2025-12-13)
### 8.1 DSSE/Rekor Budget by Deployment Tier
| Tier | Graph DSSE | Edge-Bundle DSSE | Rekor Publish | Max Bundles/Graph |
|------|------------|------------------|---------------|-------------------|
| **Regulated** (SOC2, FedRAMP, PCI) | Required | Required for runtime/contested | Required | 10 |
| **Standard** | Required | Optional (criteria-based) | Graph only | 5 |
| **Air-gapped** | Required | Optional | Offline checkpoint | 5 |
| **Dev/Test** | Optional | Optional | Disabled | Unlimited |
**Budget enforcement:**
- Graph DSSE: Always submit digest to Rekor (or offline checkpoint for air-gapped)
- Edge-bundle DSSE: Submit to Rekor only when `bundle_reason` is `disputed`, `runtime-hit`, or `security-critical`
- Cap enforced by `reachability.edgeBundles.maxRekorPublishes` config (per tier defaults above)
### 8.2 Signing Layout and CAS Paths
```
cas://reachability/
graphs/
{blake3}/ # richgraph-v1 body (JSON)
{blake3}.dsse # Graph DSSE envelope
{blake3}.rekor # Rekor inclusion proof (optional)
edges/
{graph_hash}/
{bundle_id}.json # Edge bundle body
{bundle_id}.dsse # Edge bundle DSSE envelope
{bundle_id}.rekor # Rekor inclusion proof (if published)
revisions/
{revision_id}/ # Revision manifest + lineage
```
**Signing workflow:**
1. Canonicalize richgraph-v1 JSON (sorted keys, arrays by deterministic key)
2. Compute BLAKE3-256 hash -> `graph_hash`
3. Create DSSE envelope with `stella.ops/graph@v1` predicate
4. Submit digest to Rekor (online) or cache checkpoint (offline)
5. Store graph body + envelope + proof in CAS
### 8.3 CLI UX for Selective Bundle Verification
```bash
# Verify graph DSSE only (default)
stella graph verify --hash blake3:a1b2c3d4...
# Verify graph + all edge bundles
stella graph verify --hash blake3:a1b2c3d4... --include-bundles
# Verify specific edge bundle
stella graph verify --hash blake3:a1b2c3d4... --bundle bundle:001
# Offline verification with local CAS
stella graph verify --hash blake3:a1b2c3d4... --cas-root ./offline-cas/
# Verify Rekor inclusion
stella graph verify --hash blake3:a1b2c3d4... --rekor-proof
# Output formats
stella graph verify --hash blake3:a1b2c3d4... --format json|table|summary
```
### 8.4 Golden Fixture Plan
**Fixture location:** `tests/Reachability/Hybrid/`
**Required fixtures:**
| Fixture | Description | Expected Verification Time |
|---------|-------------|---------------------------|
| `graph-only.golden.json` | Minimal richgraph-v1 with DSSE | < 100ms |
| `graph-with-runtime.golden.json` | Graph + 1 runtime edge bundle | < 200ms |
| `graph-with-contested.golden.json` | Graph + 1 contested/revoked edge bundle | < 200ms |
| `large-graph.golden.json` | 10K nodes, 50K edges, 5 bundles | < 2s |
| `offline-bundle.golden.tgz` | Complete offline replay pack | < 5s |
**CI integration:**
- `.gitea/workflows/hybrid-attestation.yml` runs verification fixtures
- Size gate: Graph body < 10MB, individual bundle < 1MB
- Time gate: Full verification < 5s for standard tier
### 8.5 Implementation Status
| Component | Status | Notes |
|-----------|--------|-------|
| Graph DSSE predicate | Done | `stella.ops/graph@v1` in PredicateTypes.cs |
| Edge-bundle DSSE predicate | Done | `stella.ops/edgeBundle@v1` via EdgeBundlePublisher |
| Edge-bundle models | Done | EdgeBundle.cs, EdgeBundleReason, EdgeReason enums |
| Edge-bundle CAS publisher | Done | EdgeBundlePublisher.cs with deterministic DSSE |
| Edge-bundle ingestion | Done | EdgeBundleIngestionService in Signals |
| CAS layout | Done | Per section 8.2 |
| Runtime-facts CAS storage | Done | IRuntimeFactsArtifactStore, FileSystemRuntimeFactsArtifactStore |
| CLI verify command | Planned | Per section 8.3 |
| Golden fixtures | Planned | Per section 8.4 |
| Rekor integration | Done | Via Attestor module |
| Quarantine enforcement | Done | HasQuarantinedEdges in ReachabilityFactDocument |
---
## 9. Verification Runbook
This section provides step-by-step guidance for verifying hybrid attestations in different scenarios.
### 9.1 Graph-Only Verification
Use this workflow when only graph-level attestation is required (default for most use cases).
**Prerequisites:**
- Access to CAS storage (local or remote)
- `stella` CLI installed
- Optional: Rekor instance access for transparency verification
**Steps:**
1. **Retrieve graph DSSE envelope:**
```bash
stella graph fetch --hash blake3:<graph_hash> --output ./verification/
```
2. **Verify DSSE signature:**
```bash
stella graph verify --hash blake3:<graph_hash>
# Output: ✓ Graph signature valid (key: <key_id>)
```
3. **Verify content integrity:**
```bash
stella graph verify --hash blake3:<graph_hash> --check-content
# Output: ✓ Content hash matches BLAKE3:<graph_hash>
```
4. **Verify Rekor inclusion (online):**
```bash
stella graph verify --hash blake3:<graph_hash> --rekor-proof
# Output: ✓ Rekor inclusion verified (log index: <index>)
```
5. **Verify policy hash binding:**
```bash
stella graph verify --hash blake3:<graph_hash> --policy-hash sha256:<policy_hash>
# Output: ✓ Policy hash matches graph metadata
```
### 9.2 Graph + Edge-Bundle Verification
Use this workflow when finer-grained verification of specific edges is required.
**When to use:**
- Auditing runtime-observed paths
- Investigating contested/disputed edges
- Verifying init-section or TLS callback roots
- Regulatory compliance requiring edge-level attestation
**Steps:**
1. **List available edge bundles:**
```bash
stella graph bundles --hash blake3:<graph_hash>
# Output:
# Bundle ID Reason Edges Rekor
# bundle:001 runtime-hit 42 ✓
# bundle:002 init-root 15 ✓
# bundle:003 third-party 128 -
```
2. **Verify specific bundle:**
```bash
stella graph verify --hash blake3:<graph_hash> --bundle bundle:001
# Output:
# ✓ Bundle DSSE signature valid
# ✓ All 42 edges link to graph_hash
# ✓ Rekor inclusion verified
```
3. **Verify all bundles:**
```bash
stella graph verify --hash blake3:<graph_hash> --include-bundles
# Output:
# ✓ Graph signature valid
# ✓ 3 bundles verified (185 edges total)
```
4. **Check for revoked edges:**
```bash
stella graph verify --hash blake3:<graph_hash> --check-revoked
# Output:
# ⚠ 2 edges marked revoked in bundle:002
# - edge:func_a→func_b (reason: policy-quarantine)
# - edge:func_c→func_d (reason: revoked)
```
### 9.3 Verification Decision Matrix
| Scenario | Graph DSSE | Edge Bundles | Rekor | Policy Hash |
|----------|------------|--------------|-------|-------------|
| Standard CI/CD | Required | Optional | Recommended | Required |
| Regulated audit | Required | Required | Required | Required |
| Dispute resolution | Required | Required (contested) | Required | Optional |
| Offline replay | Required | As available | Cached proof | Required |
| Dev/test | Optional | Optional | Disabled | Optional |
---
## 10. Rekor Guidance
### 10.1 Rekor Integration Overview
Rekor provides an immutable transparency log for attestation artifacts. StellaOps integrates with Rekor (or compatible mirrors) to provide verifiable timestamps and inclusion proofs.
### 10.2 What Gets Published to Rekor
| Artifact Type | Rekor Publish | Condition |
|---------------|---------------|-----------|
| Graph DSSE digest | Always | All deployment tiers (except dev/test) |
| Edge-bundle DSSE digest | Conditional | Only for `disputed`, `runtime-hit`, `security-critical` reasons |
| VEX decision DSSE digest | Always | When VEX decisions are generated |
### 10.3 Rekor Configuration
```yaml
# etc/signals.yaml
reachability:
rekor:
enabled: true
endpoint: "https://rekor.sigstore.dev" # Or private mirror
timeout: 30s
retry:
attempts: 3
backoff: exponential
edgeBundles:
maxRekorPublishes: 5 # Per graph, configurable by tier
publishReasons:
- disputed
- runtime-hit
- security-critical
```
### 10.4 Private Rekor Mirror
For air-gapped or regulated environments:
```yaml
reachability:
rekor:
enabled: true
endpoint: "https://rekor.internal.example.com"
tls:
ca: /etc/stellaops/ca.crt
clientCert: /etc/stellaops/client.crt
clientKey: /etc/stellaops/client.key
```
### 10.5 Rekor Proof Caching
Inclusion proofs are cached locally for offline verification:
```
cas://reachability/graphs/{blake3}.rekor # Graph inclusion proof
cas://reachability/edges/{graph_hash}/{bundle_id}.rekor # Bundle proof
```
**Proof format:**
```json
{
"logIndex": 12345678,
"logId": "c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d",
"integratedTime": 1702492800,
"inclusionProof": {
"logIndex": 12345678,
"rootHash": "abc123...",
"treeSize": 50000000,
"hashes": ["def456...", "ghi789..."]
}
}
```
---
## 11. Offline Replay Steps
### 11.1 Overview
Offline replay enables full verification of reachability attestations without network access. This is essential for air-gapped deployments and regulatory compliance scenarios.
### 11.2 Creating an Offline Replay Pack
**Step 1: Export graph and bundles**
```bash
stella graph export --hash blake3:<graph_hash> \
--include-bundles \
--include-rekor-proofs \
--output ./offline-pack/
```
**Step 2: Include required artifacts**
The export creates:
```
offline-pack/
├── manifest.json # Replay manifest v2
├── graphs/
│ └── <blake3>/
│ ├── richgraph-v1.json # Graph body
│ ├── graph.dsse # DSSE envelope
│ └── graph.rekor # Inclusion proof
├── edges/
│ └── <graph_hash>/
│ ├── bundle-001.json
│ ├── bundle-001.dsse
│ └── bundle-001.rekor
├── runtime-facts/
│ └── <hash>/
│ └── runtime-facts.ndjson
└── checkpoints/
└── rekor-checkpoint.json # Transparency log checkpoint
```
**Step 3: Bundle for transfer**
```bash
stella offline pack --input ./offline-pack/ --output offline-replay.tgz
```
### 11.3 Verifying an Offline Pack
**Step 1: Extract pack**
```bash
stella offline unpack --input offline-replay.tgz --output ./verify/
```
**Step 2: Verify manifest integrity**
```bash
stella offline verify --manifest ./verify/manifest.json
# Output:
# ✓ Manifest version: 2
# ✓ Hash algorithm: blake3
# ✓ All CAS entries present
# ✓ All hashes verified
```
**Step 3: Verify attestations offline**
```bash
stella graph verify --hash blake3:<graph_hash> \
--cas-root ./verify/ \
--offline
# Output:
# ✓ Graph DSSE signature valid (offline mode)
# ✓ Rekor proof verified against checkpoint
# ✓ 3 bundles verified offline
```
### 11.4 Offline Verification Trust Model
```
┌─────────────────────────────────────────────────────────┐
│ Offline Pack │
├─────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Graph DSSE │ │ Edge Bundle │ │ Rekor │ │
│ │ Envelope │ │ DSSE │ │ Checkpoint │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Local Verification Engine │ │
│ │ 1. Verify DSSE signatures against trusted keys │ │
│ │ 2. Verify content hashes match DSSE payloads │ │
│ │ 3. Verify Rekor proofs against checkpoint │ │
│ │ 4. Verify policy hash binding │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
### 11.5 Air-Gapped Deployment Checklist
- [ ] Trusted signing keys pre-installed
- [ ] Rekor checkpoint from last sync included
- [ ] All referenced CAS artifacts bundled
- [ ] Policy hash recorded in manifest
- [ ] Analyzer manifests included for replay
- [ ] Runtime-facts artifacts included (if applicable)
---
## 12. Release Notes
### 12.1 Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-11 | Initial hybrid attestation design |
| 1.1 | 2025-12-13 | Added edge-bundle ingestion, CAS storage, verification runbook |
### 12.2 Breaking Changes
None. Hybrid attestation is additive; existing graph-only workflows remain unchanged.
### 12.3 Migration Guide
**From graph-only to hybrid:**
1. No migration required for existing graphs
2. Enable edge-bundle emission in scanner config:
```yaml
scanner:
reachability:
edgeBundles:
enabled: true
emitRuntime: true
emitContested: true
```
3. Signals automatically ingests edge bundles when present
---
## 13. Cross-References
- **Sprint:** SPRINT_0401_0001_0001_reachability_evidence_chain.md (Tasks 53-56)
- **Contracts:** docs/contracts/richgraph-v1.md, docs/contracts/edge-bundle-v1.md
- **Implementation:**
- Scanner: `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/EdgeBundle*.cs`
- Signals: `src/Signals/StellaOps.Signals/Ingestion/EdgeBundleIngestionService.cs`
- Policy: `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateEvaluator.cs`
- **Related docs:**
- docs/modules/reach-graph/guides/function-level-evidence.md
- docs/modules/reach-graph/guides/lattice.md
- docs/replay/DETERMINISTIC_REPLAY.md
- docs/ARCHITECTURE_OVERVIEW.md

View File

@@ -0,0 +1,254 @@
# Reachability Lattice & Scoring Model
> **Status:** Implemented v0 in Signals; this document describes the current deterministic bucket model and its policy-facing implications.
> **Owners:** Scanner Guild · Signals Guild · Policy Guild.
StellaOps models reachability as a deterministic, evidence-linked outcome that can safely represent "unknown" without silently producing false safety. Signals produces a `ReachabilityFactDocument` with per-target `states[]` and a top-level `score` that is stable under replays.
---
## 1. Current model (Signals v0)
Signals scoring (`src/Signals/StellaOps.Signals/Services/ReachabilityScoringService.cs`) computes, for each `target` symbol:
- `reachable`: whether there exists a path from the selected `entryPoints[]` to `target`.
- `bucket`: a coarse classification of *why* the target is/was reachable.
- `confidence` (0..1): a bounded confidence value.
- `weight` (0..1): bucket multiplier.
- `score` (0..1): `confidence * weight`.
- `path[]`: the discovered path (if reachable), deterministically ordered.
- `evidence.runtimeHits[]`: runtime hit symbols that appear on the chosen path.
The fact-level `score` is the average of per-target scores, penalized by unknowns pressure (see §4).
---
## 2. Buckets & default weights
Bucket assignment is deterministic and uses this precedence:
1. `unreachable` — no path exists.
2. `entrypoint` — the `target` itself is an entrypoint.
3. `runtime` — at least one runtime hit overlaps the discovered path.
4. `direct` — reachable and the discovered path is length ≤ 2.
5. `unknown` — reachable but none of the above classifications apply.
Default weights (configurable via `SignalsOptions:Scoring:ReachabilityBuckets`):
| Bucket | Default weight |
|--------|----------------|
| `entrypoint` | `1.0` |
| `direct` | `0.85` |
| `runtime` | `0.45` |
| `unknown` | `0.5` |
| `unreachable` | `0.0` |
---
## 3. Confidence (reachable vs unreachable)
Default confidence values (configurable via `SignalsOptions:Scoring:*`):
| Input | Default |
|-------|---------|
| `reachableConfidence` | `0.75` |
| `unreachableConfidence` | `0.25` |
| `runtimeBonus` | `0.15` |
| `minConfidence` | `0.05` |
| `maxConfidence` | `0.99` |
Rules:
- Base confidence is `reachableConfidence` when `reachable=true`, otherwise `unreachableConfidence`.
- When `reachable=true` and runtime evidence overlaps the selected path, add `runtimeBonus` (bounded by `maxConfidence`).
- The final confidence is clamped to `[minConfidence, maxConfidence]`.
---
## 4. Unknowns pressure (missing/ambiguous evidence)
Signals tracks unresolved symbols/edges as **Unknowns** (see `docs/modules/signals/guides/unknowns-registry.md`). The number of unknowns for a subject influences the final score:
```
unknownsPressure = unknownsCount / (targetsCount + unknownsCount)
pressurePenalty = min(unknownsPenaltyCeiling, unknownsPressure)
fact.score = avg(states[i].score) * (1 - pressurePenalty)
```
Default `unknownsPenaltyCeiling` is `0.35` (configurable).
This keeps the system deterministic while preventing unknown-heavy subjects from appearing "safe" by omission.
---
## 5. Evidence references & determinism anchors
Signals produces stable references intended for downstream evidence chains:
- `metadata.fact.digest` — canonical digest of the reachability fact (`sha256:<hex>`).
- `metadata.fact.version` — monotonically increasing integer for the same `subjectKey`.
- Callgraph ingestion returns a deterministic `graphHash` (sha256) for the normalized callgraph.
Downstream services (Policy, UI/CLI explainers, replay tooling) should use these fields as stable evidence references.
---
## 6. Policy-facing guidance (avoid false "not affected")
Policy should treat `unreachable` (or low fact score) as **insufficient** to claim "not affected" unless:
- the reachability evidence is present and referenced (`metadata.fact.digest`), and
- confidence is above a high-confidence threshold.
When evidence is missing or confidence is low, the correct output is **under investigation** rather than "not affected".
---
## 7. Signals API pointers
- `docs/modules/signals/api/reachability-contract.md`
- `docs/modules/signals/api/samples/facts-sample.json`
---
## 8. Roadmap (tracked in Sprint 0401)
- Introduce first-class uncertainty state lists + entropy-derived `riskScore` (see `uncertainty-entropy.md`).
- Extend evidence refs to include CAS/DSSE pointers for graph-level and edge-bundle attestations.
---
## 9. Formal Lattice Model v1 (design — Sprint 0401)
The v0 bucket model provides coarse classification. The v1 lattice model introduces a formal 7-state lattice with algebraic join/meet operations for monotonic, deterministic reachability analysis across evidence types.
### 9.1 State Definitions
| State | Code | Ordering | Description |
|-------|------|----------|-------------|
| `Unknown` | `U` | ⊥ (bottom) | No evidence available; default state |
| `StaticallyReachable` | `SR` | 1 | Static analysis suggests path exists |
| `StaticallyUnreachable` | `SU` | 1 | Static analysis finds no path |
| `RuntimeObserved` | `RO` | 2 | Runtime probe/hit confirms execution |
| `RuntimeUnobserved` | `RU` | 2 | Runtime probe active but no hit observed |
| `ConfirmedReachable` | `CR` | 3 | Both static + runtime agree reachable |
| `ConfirmedUnreachable` | `CU` | 3 | Both static + runtime agree unreachable |
| `Contested` | `X` | (top) | Static and runtime evidence conflict |
### 9.2 Lattice Ordering (Hasse Diagram)
```
Contested (X)
/ | \
/ | \
ConfirmedReachable | ConfirmedUnreachable
(CR) | (CU)
| \ / / |
| \ / / |
| \ / / |
RuntimeObserved RuntimeUnobserved
(RO) (RU)
| |
| |
StaticallyReachable StaticallyUnreachable
(SR) (SU)
\ /
\ /
Unknown (U)
```
### 9.3 Join Rules (⊔ — least upper bound)
When combining evidence from multiple sources, use the join operation:
```
U ⊔ S = S (any evidence beats unknown)
SR ⊔ RO = CR (static reachable + runtime hit = confirmed)
SU ⊔ RU = CU (static unreachable + runtime miss = confirmed)
SR ⊔ RU = X (static reachable but runtime miss = contested)
SU ⊔ RO = X (static unreachable but runtime hit = contested)
CR ⊔ CU = X (conflicting confirmations = contested)
X ⊔ * = X (contested absorbs all)
```
**Full join table:**
| ⊔ | U | SR | SU | RO | RU | CR | CU | X |
|---|---|----|----|----|----|----|----|---|
| **U** | U | SR | SU | RO | RU | CR | CU | X |
| **SR** | SR | SR | X | CR | X | CR | X | X |
| **SU** | SU | X | SU | X | CU | X | CU | X |
| **RO** | RO | CR | X | RO | X | CR | X | X |
| **RU** | RU | X | CU | X | RU | X | CU | X |
| **CR** | CR | CR | X | CR | X | CR | X | X |
| **CU** | CU | X | CU | X | CU | X | CU | X |
| **X** | X | X | X | X | X | X | X | X |
### 9.4 Meet Rules (⊓ — greatest lower bound)
Used for conservative intersection (e.g., multi-entry-point consensus):
```
U ⊓ * = U (unknown is bottom)
CR ⊓ CR = CR (agreement preserved)
X ⊓ S = S (drop contested to either side)
```
### 9.5 Monotonicity Properties
1. **Evidence accumulation is monotonic:** Once state rises in the lattice, it cannot descend without explicit revocation.
2. **Revocation resets to Unknown:** When evidence is invalidated (e.g., graph invalidation), state resets to `U`.
3. **Contested states require human triage:** `X` state triggers policy flags and UI attention.
### 9.6 Mapping v0 Buckets to v1 States
| v0 Bucket | v1 State(s) | Notes |
|-----------|-------------|-------|
| `unreachable` | `SU`, `CU` | Depends on runtime evidence availability |
| `entrypoint` | `CR` | Entry points are by definition reachable |
| `runtime` | `RO`, `CR` | Depends on static analysis agreement |
| `direct` | `SR`, `CR` | Direct paths with/without runtime confirmation |
| `unknown` | `U` | No evidence available |
### 9.7 Policy Decision Matrix
| v1 State | VEX "not_affected" | VEX "affected" | VEX "under_investigation" |
|----------|-------------------|----------------|---------------------------|
| `U` | ❌ blocked | ⚠️ needs evidence | ✅ default |
| `SR` | ❌ blocked | ✅ allowed | ✅ allowed |
| `SU` | ⚠️ low confidence | ❌ contested | ✅ allowed |
| `RO` | ❌ blocked | ✅ allowed | ✅ allowed |
| `RU` | ⚠️ medium confidence | ❌ contested | ✅ allowed |
| `CR` | ❌ blocked | ✅ required | ❌ invalid |
| `CU` | ✅ allowed | ❌ blocked | ❌ invalid |
| `X` | ❌ blocked | ❌ blocked | ✅ required |
### 9.8 Implementation Notes
- **State storage:** `ReachabilityFactDocument.states[].latticeState` field (enum)
- **Join implementation:** `ReachabilityLattice.Join(a, b)` in `src/Signals/StellaOps.Signals/Services/`
- **Backward compatibility:** v0 bucket computed from v1 state for API consumers
### 9.9 Evidence Chain Requirements
Each lattice state transition must be accompanied by evidence references:
```json
{
"symbol": "sym:java:...",
"latticeState": "CR",
"previousState": "SR",
"evidence": {
"static": {
"graphHash": "blake3:...",
"pathLength": 3,
"confidence": 0.92
},
"runtime": {
"probeId": "probe:...",
"hitCount": 47,
"observedAt": "2025-12-13T10:00:00Z"
}
},
"transitionAt": "2025-12-13T10:00:00Z"
}

View File

@@ -0,0 +1,78 @@
# Deterministic Reachability — Product Moat (Nov 2025)
Source: internal advisory “23-Nov-2025 - Where StellaOps Can Truly Lead”. Supersedes/extends archived binary reachability advisories (18-Nov-2025 - Binary-Reachability-Engine, Encoding Binary Reachability with PURL-Resolved Edges, CSharp-Binary-Analyzer). This page is the canonical, high-level articulation of our reachability moat for architects, PMM, and field teams. Detailed schemas live in `docs/modules/reach-graph/guides/evidence-schema.md` and `docs/modules/reach-graph/guides/hybrid-attestation.md`.
## Why it matters
- Most scanners list every CVE; reachability asks whether vulnerable code is actually callable.
- Competitors infer paths and rarely sign evidence; we **prove** paths with deterministic graphs and attestations.
- Outcome targets: ≥40% fewer noisy vulns shown; ≥25% faster triage via explainable “why” paths.
## Moat elements
1) **Deterministic call-graphs per artifact**
- Stable node IDs: `purl@version!build-id!symbol-signature` (or code offset when stripped).
- Stable edge IDs: `SHA256(nodeA||nodeB||tool-version||inputs-hash)`.
- Graph hash: BLAKE3 over canonical JSON; locked by manifest.
2) **Signed evidence**
- Graph-level DSSE for every scan (mandatory).
- Optional edge-bundle DSSE (≤512 edges) for runtime/init/contested edges; Rekor publish capped. See `docs/modules/reach-graph/guides/hybrid-attestation.md`.
3) **Explainability**
- Each finding carries call-chain + per-edge reason + VEX gate decision + layer attribution.
4) **Container layer provenance**
- Track file-to-layer mapping; show “introduced in layer X from base Y”.
5) **Replayability**
- Determinism manifest locks feeds, toolchain hashes, analyzer flags; replay yields identical graph and attestations.
## Minimal architecture slice
- **Sbomer/Scanner**: emit SBOM + symbol maps + per-layer file index; capture Build-IDs.
- **Cartographer**: build deterministic call-graphs (language + native), output `EdgeList.jsonl` with stable IDs.
- **Attestor**: wrap graph (and edge bundles when emitted) into DSSE; log digests to Rekor/mirror.
- **Vexer/Policy**: evaluate lattice, produce OpenVEX with linked edge proofs.
- **Ledger**: retain manifests and DSSE; mirror to Rekor where allowed.
## Practical spec (condensed)
- **Node fields**: `symbol_id`, `code_id`, `purl`, `build_id`, `symbol_digest`, `lang`, `evidence[]`.
- **Edge fields**: `from`, `to`, `kind` (direct|plt|runtime|init), `purl`, `symbol_digest`, `reason`, `confidence`, `evidence[]`.
- **Roots**: exports, entrypoints, **.init_array/.ctors/TLS callbacks**, plugin hooks.
- **Attestation layout**:
- Graph: `cas://reachability/graphs/{blake3}` + `{blake3}.dsse` (Rekor always).
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` (Rekor optional, capped).
### Example: Edge-bundle DSSE payload (abridged)
```json
{
"graph_hash": "blake3:...",
"bundle_reason": "runtime-hit",
"edges": [{
"edge_id": "sha256:...",
"from": "sym:...caller",
"to": "sym:...callee",
"reason": "plt",
"purl": "pkg:deb/openssl@3.0.2?arch=amd64",
"symbol_digest": "sha256:...",
"revoked": false
}]
}
```
### Field cheat sheet (for sprint readers)
- `graph_hash` — BLAKE3 of canonical graph JSON.
- `bundle_reason``runtime-hit | init-root | contested | third-party`.
- `edge_id` — sha256(from||to||reason||tool-version||inputs-hash).
- `revoked` — when true, policy/Signals must drop this edge before reachability scoring.
- `purl` + `symbol_digest` — bind edge to SBOM component and callee identity.
## Quick wins (ship order)
1) Capture Build-IDs in Scanner and thread into `symbol_id`/`code_id`.
2) Emit Graph Determinism Manifest (feeds + toolchain hashes) per scan.
3) Turn on edge-bundle DSSE for runtime/init edges first; keep Rekor cap low.
4) Surface “why path” + layer attribution in CLI/UI explainers.
## APIs (strawman)
- `POST /graph/edges: attest` — idempotent; same inputs → same edge IDs.
- `GET /findings/:id/proof` — returns call-chain + Rekor inclusion proofs.
- `GET /vex/:artifact` — streams OpenVEX with embedded proofs.
## Links
- Advisory source: `docs/product-advisories/23-Nov-2025 - Where StellaOps Can Truly Lead.md`
- Schemas: `docs/modules/reach-graph/guides/evidence-schema.md`, `docs/modules/reach-graph/guides/hybrid-attestation.md`
- Sprint tracking: `docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md`

View File

@@ -0,0 +1,220 @@
# Patch-Oracles QA Pattern
Patch oracles define expected functions and edges that must be present (or absent) in generated reachability graphs. The CI pipeline uses these oracles to ensure that:
1. Critical vulnerability paths are correctly identified as reachable
2. Mitigated paths are correctly identified as unreachable
3. Graph generation remains deterministic and complete
This document covers both the **JSON-based harness** (for reachbench integration) and the **YAML-based format** (for binary patch testing).
---
## Part A: JSON Patch-Oracle Harness (v1)
The JSON-based patch-oracle harness integrates with the reachbench fixture system for CI graph validation.
### A.1 Schema Overview
Patch-oracle fixtures follow the `patch-oracle/v1` schema:
```json
{
"schema_version": "patch-oracle/v1",
"id": "curl-CVE-2023-38545-socks5-heap-reachable",
"case_ref": "curl-CVE-2023-38545-socks5-heap",
"variant": "reachable",
"description": "Validates SOCKS5 heap overflow path is reachable",
"expected_functions": [...],
"expected_edges": [...],
"expected_roots": [...],
"forbidden_functions": [...],
"forbidden_edges": [...],
"min_confidence": 0.5,
"strict_mode": false
}
```
### A.2 Expected Functions
Define functions that MUST be present in the graph:
```json
{
"symbol_id": "sym://curl:curl.c#sink",
"lang": "c",
"kind": "function",
"purl_pattern": "pkg:github/curl/*",
"required": true,
"reason": "Vulnerable buffer handling function"
}
```
### A.3 Expected Edges
Define edges that MUST be present in the graph:
```json
{
"from": "sym://net:handler#read",
"to": "sym://curl:curl.c#entry",
"kind": "call",
"min_confidence": 0.8,
"required": true,
"reason": "Data flows from network to SOCKS5 handler"
}
```
### A.4 Forbidden Elements (for unreachable variants)
```json
{
"forbidden_functions": [
{
"symbol_id": "sym://dangerous#sink",
"reason": "Should not be reachable when feature disabled"
}
],
"forbidden_edges": [
{
"from": "sym://entry",
"to": "sym://sink",
"reason": "Path should be blocked by feature flag"
}
]
}
```
### A.5 Wildcard Patterns
Symbol IDs support `*` wildcards:
- `sym://test#func1` - exact match
- `sym://test#*` - matches any symbol starting with `sym://test#`
- `*` - matches anything
### A.6 Directory Structure
```
tests/reachability/fixtures/patch-oracles/
├── INDEX.json # Oracle index
├── schema/
│ └── patch-oracle-v1.json # JSON Schema
└── cases/
├── curl-CVE-2023-38545-socks5-heap/
│ ├── reachable.oracle.json
│ └── unreachable.oracle.json
└── java-log4j-CVE-2021-44228-log4shell/
└── reachable.oracle.json
```
### A.7 Usage in Tests
```csharp
var loader = new PatchOracleLoader(fixtureRoot);
var oracle = loader.LoadOracle("curl-CVE-2023-38545-socks5-heap-reachable");
var comparer = new PatchOracleComparer(oracle);
var result = comparer.Compare(richGraph);
if (!result.Success)
{
foreach (var violation in result.Violations)
{
Console.WriteLine($"[{violation.Type}] {violation.From} -> {violation.To}");
}
}
```
### A.8 Violation Types
| Type | Description |
|------|-------------|
| `MissingFunction` | Required function not found |
| `MissingEdge` | Required edge not found |
| `MissingRoot` | Required root not found |
| `ForbiddenFunctionPresent` | Forbidden function found |
| `ForbiddenEdgePresent` | Forbidden edge found |
| `UnexpectedFunction` | Unexpected function in strict mode |
| `UnexpectedEdge` | Unexpected edge in strict mode |
---
## Part B: YAML Binary Patch-Oracles
The YAML-based format is used for paired vulnerable/fixed binary testing.
### B.1 Workflow (per CVE)
1) Pick a CVE with a small, clean fix (e.g., OpenSSL, zlib, BusyBox). Identify vulnerable commit `A` and fixed commit `B`.
2) Build two stripped binaries (`vuln`, `fixed`) with identical toolchains/flags; keep a tiny harness that exercises the affected path.
3) Run Scanner binary analyzers to emit `richgraph-v1` for each binary.
4) Diff graphs: expect new/removed functions and edges to match the patch (e.g., `foo_parse -> validate_len` added; `foo_parse -> memcpy` removed).
5) Fail the test if expected functions/edges are absent or unchanged.
### B.2 Oracle manifest (YAML)
```yaml
cve: CVE-YYYY-XXXX
target: libfoo 1.2.3
build:
cc: clang
cflags: [-O2, -fno-omit-frame-pointer]
ldflags: []
strip: true
expect:
functions_added: [validate_len]
functions_removed: [unsafe_copy]
edges_added:
- { caller: foo_parse, callee: validate_len }
edges_removed:
- { caller: foo_parse, callee: memcpy }
tolerances:
allow_unresolved_symbols: 0
allow_extra_funcs: 2
```
Place manifests under `tests/reachability/patch-oracles/<cve>/oracle.yml` next to the sources/build scripts.
## 3. Repository layout
```
tests/reachability/patch-oracles/
CVE-YYYY-XXXX-foo/
src/ # vuln + fixed sources + harness
build.sh # produces ./out/vuln ./out/fixed
oracle.yml
```
## 4. Harness rules
- Output binaries to `out/vuln` and `out/fixed` with deterministic flags and stripped symbols.
- Record toolchain version in a sidecar `build-meta.json` so Replay captures provenance.
- Never download from the internet during CI; vendor tiny sources into the fixture folder.
## 5. Test runner expectations
- Runs Scanner binary analyzers on both binaries; emits `richgraph-v1` CAS entries.
- Compares graphs against `oracle.yml` expectations (functions/edges added/removed, tolerances).
- Fails when deltas are missing; succeeds when expected guards/edges are present.
## 6. Integration points
- **Scanner**: add fixture runner under `tests/reachability/StellaOps.Scanner.Binary.PatchOracleTests`.
- **CI**: wire into reachbench/patch-oracles job; ensure artifacts are small and deterministic.
- **Docs**: link this file from reachability delivery guide once tests are live.
### B.7 Acceptance criteria
- At least three seed oracles (e.g., zlib overflow, OpenSSL length guard, BusyBox ash fix) committed with passing expectations.
- CI job proves deterministic hashes across reruns.
- Failures emit clear diffs (`expected edge foo->validate_len missing`).
---
## Related Documentation
- [Reachability Evidence Chain](./function-level-evidence.md)
- [RichGraph Schema](../contracts/richgraph-v1.md)
- [Ground Truth Schema](./ground-truth-schema.md)
- [Lattice States](./lattice.md)
- [Reachability Delivery Guide](./DELIVERY_GUIDE.md)

View File

@@ -0,0 +1,269 @@
# Reachability Evidence Policy Gates
> **Status:** Design v1 (Sprint 0401)
> **Owners:** Policy Guild, Signals Guild, VEX Guild
This document defines the policy gates that enforce reachability evidence requirements for VEX decisions. Gates prevent unsafe "not_affected" claims when evidence is insufficient.
---
## 1. Overview
Policy gates act as checkpoints between evidence (reachability lattice state, uncertainty tier) and VEX status transitions. They ensure that:
1. **No false safety:** "not_affected" requires strong evidence of unreachability
2. **Explicit uncertainty:** Missing evidence triggers "under_investigation" rather than silence
3. **Audit trail:** All gate decisions are logged with evidence references
---
## 2. Gate Types
### 2.1 Lattice State Gate
Guards VEX status transitions based on the v1 lattice state (see `docs/modules/reach-graph/guides/lattice.md` §9).
| Requested VEX Status | Required Lattice State | Gate Action |
|---------------------|------------------------|-------------|
| `not_affected` | `CU` (ConfirmedUnreachable) | ✅ Allow |
| `not_affected` | `SU` (StaticallyUnreachable) | ⚠️ Allow with warning, requires `justification` |
| `not_affected` | `RU` (RuntimeUnobserved) | ⚠️ Allow with warning, requires `justification` |
| `not_affected` | `U`, `SR`, `RO`, `CR`, `X` | ❌ Block |
| `affected` | `CR` (ConfirmedReachable) | ✅ Allow |
| `affected` | `SR`, `RO` | ✅ Allow |
| `affected` | `U`, `SU`, `RU`, `CU`, `X` | ⚠️ Warn (potential false positive) |
| `under_investigation` | Any | ✅ Allow (safe default) |
| `fixed` | Any | ✅ Allow (remediation action) |
### 2.2 Uncertainty Tier Gate
Guards VEX status transitions based on the uncertainty tier (see `uncertainty-entropy.md` §1.1).
| Requested VEX Status | Uncertainty Tier | Gate Action |
|---------------------|------------------|-------------|
| `not_affected` | T1 (High) | ❌ Block |
| `not_affected` | T2 (Medium) | ⚠️ Warn, require explicit override |
| `not_affected` | T3 (Low) | ⚠️ Allow with advisory note |
| `not_affected` | T4 (Negligible) | ✅ Allow |
| `affected` | T1 (High) | ⚠️ Review required (may be false positive) |
| `affected` | T2-T4 | ✅ Allow |
### 2.3 Evidence Completeness Gate
Guards based on the presence of required evidence artifacts.
| VEX Status | Required Evidence | Gate Action if Missing |
|------------|-------------------|----------------------|
| `not_affected` | `graphHash` (DSSE-attested) | ❌ Block |
| `not_affected` | `pathAnalysis.pathLength >= 0` | ❌ Block |
| `not_affected` | `confidence >= 0.8` | ⚠️ Warn if < 0.8 |
| `affected` | `graphHash` OR `runtimeProbe` | Warn if neither |
| `under_investigation` | None required | Allow |
---
## 3. Gate Evaluation Order
Gates are evaluated in this order; first blocking gate stops evaluation:
```
1. Evidence Completeness Gate → Block if required evidence missing
2. Lattice State Gate → Block if state incompatible with status
3. Uncertainty Tier Gate → Block/warn based on tier
4. Confidence Threshold Gate → Warn if confidence below threshold
```
---
## 4. Gate Decision Document
Each gate evaluation produces a decision document:
```json
{
"gateId": "gate:vex:not_affected:2025-12-13T10:00:00Z",
"requestedStatus": "not_affected",
"subject": {
"vulnId": "CVE-2025-12345",
"purl": "pkg:maven/com.example/foo@1.0.0",
"symbolId": "sym:java:..."
},
"evidence": {
"latticeState": "CU",
"uncertaintyTier": "T3",
"graphHash": "blake3:...",
"riskScore": 0.25,
"confidence": 0.92
},
"gates": [
{
"name": "EvidenceCompleteness",
"result": "pass",
"reason": "graphHash present"
},
{
"name": "LatticeState",
"result": "pass",
"reason": "CU allows not_affected"
},
{
"name": "UncertaintyTier",
"result": "pass_with_note",
"reason": "T3 allows with advisory note",
"note": "MissingPurl uncertainty at 35% entropy"
}
],
"decision": "allow",
"advisory": "VEX status allowed with note: T3 uncertainty from MissingPurl",
"decidedAt": "2025-12-13T10:00:00Z"
}
```
---
## 5. Contested State Handling
When lattice state is `X` (Contested):
1. **Block all definitive statuses:** Neither "not_affected" nor "affected" allowed
2. **Force "under_investigation":** Auto-assign until triage resolves conflict
3. **Emit triage event:** Notify VEX operators of conflict with evidence links
4. **Evidence overlay:** Show both static and runtime evidence for manual review
### Contested Resolution Workflow
```
1. System detects X state
2. VEX status locked to "under_investigation"
3. Triage event emitted to operator queue
4. Operator reviews:
a. Static evidence (graph, paths)
b. Runtime evidence (probes, hits)
5. Operator provides resolution:
a. Trust static → state becomes SU/SR
b. Trust runtime → state becomes RU/RO
c. Add new evidence → recompute lattice
6. Gate re-evaluates with new state
```
---
## 6. Override Mechanism
Operators with `vex:gate:override` permission can bypass gates with mandatory fields:
```json
{
"override": {
"gateId": "gate:vex:not_affected:...",
"operator": "user:alice@example.com",
"justification": "Manual review confirms code path is dead code",
"evidence": {
"type": "ManualReview",
"reviewId": "review:2025-12-13:001",
"attachments": ["cas://evidence/review/..."]
},
"approvedAt": "2025-12-13T11:00:00Z",
"expiresAt": "2026-01-13T11:00:00Z"
}
}
```
Override requirements:
- `justification` is mandatory and logged
- Overrides expire after configurable period (default: 30 days)
- All overrides are auditable and appear in compliance reports
---
## 7. Configuration
Gate thresholds are configurable via `PolicyGatewayOptions`:
```yaml
PolicyGateway:
Gates:
LatticeState:
AllowSUForNotAffected: true # Allow SU with warning
AllowRUForNotAffected: true # Allow RU with warning
RequireJustificationForWeakStates: true
UncertaintyTier:
BlockT1ForNotAffected: true
WarnT2ForNotAffected: true
EvidenceCompleteness:
RequireGraphHashForNotAffected: true
MinConfidenceForNotAffected: 0.8
MinConfidenceWarning: 0.6
Override:
DefaultExpirationDays: 30
RequireJustification: true
```
---
## 8. API Integration
### POST `/api/v1/vex/status`
Request:
```json
{
"vulnId": "CVE-2025-12345",
"purl": "pkg:maven/com.example/foo@1.0.0",
"status": "not_affected",
"justification": "vulnerable_code_not_present",
"reachabilityEvidence": {
"factDigest": "sha256:...",
"graphHash": "blake3:..."
}
}
```
Response (gate blocked):
```json
{
"success": false,
"gateDecision": {
"decision": "block",
"blockedBy": "LatticeState",
"reason": "Lattice state SR (StaticallyReachable) incompatible with not_affected",
"currentState": "SR",
"requiredStates": ["CU", "SU", "RU"],
"suggestion": "Submit runtime probe evidence or change to under_investigation"
}
}
```
---
## 9. Metrics & Alerts
The policy gateway emits metrics:
| Metric | Labels | Description |
|--------|--------|-------------|
| `stellaops_gate_decisions_total` | `gate`, `result`, `status` | Total gate decisions |
| `stellaops_gate_blocks_total` | `gate`, `reason` | Total blocked requests |
| `stellaops_gate_overrides_total` | `operator` | Total override uses |
| `stellaops_contested_states_total` | `vulnId` | Active contested states |
Alert conditions:
- `stellaops_gate_overrides_total` rate > threshold → Audit review
- `stellaops_contested_states_total` > 10 → Triage backlog alert
---
## 10. Related Documents
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
- [Uncertainty States](uncertainty-entropy.md) — Tier definitions and risk scoring
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
- [VEX Contract](../contracts/vex-v1.md) — VEX document schema
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Policy Guild | Initial design from Sprint 0401 |

View File

@@ -0,0 +1,51 @@
# PURL-Resolved Callgraph Edges (Nov 2026)
This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
## 1. Goal
Annotate every call edge in `richgraph-v1` with:
- `purl` of the component that defines the callee, and
- a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint).
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
## 2. Data model additions
- **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
- **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly.
- **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata.
## 3. Producer rules
1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence.
2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry).
4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON.
## 4. Consumer rules
- **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays.
- **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl.
- **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present.
## 5. SBOM join strategy
1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`.
2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge.
## 6. Acceptance tests
- Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly.
- Disassembly: direct `call` to internal function carries `purl` of the hosting binarys SBOM entry.
- Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`.
- Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
## 7. Deliverables
- Update `richgraph-v1` schema and DTOs (Scanner + Signals).
- Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests.
- CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest.
- Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.

View File

@@ -0,0 +1,48 @@
# Reachability · Runtime + Static Union (v0.1)
## What this covers
- End-to-end flow for combining static callgraphs (Scanner) and runtime traces (Zastava) into replayable reachability bundles.
- Storage layout (CAS namespaces), manifest fields, and Signals APIs that consume/emit reachability facts.
- How unknowns/pressure and scoring are derived so Policy/UI can explain outcomes.
## Pipeline (at a glance)
1. **Scanner** emits language-specific callgraphs as `richgraph-v1` and packs them into CAS under `reachability_graphs/<digest>.tar.zst` with manifest `meta.json`.
2. **Zastava Observer** streams NDJSON runtime facts (`symbol_id`, `code_id`, `hit_count`, `loader_base`, `cas_uri`) to Signals `POST /signals/runtime-facts` or `/runtime-facts/ndjson`.
3. **Union bundles** (runtime + static) are uploaded as ZIP to `POST /signals/reachability/union` with optional `X-Analysis-Id`; Signals stores under `reachability_graphs/{analysisId}/`.
4. **Signals scoring** consumes union data + runtime facts, computes per-target states (bucket, weight, confidence, score), fact-level score, unknowns pressure, and publishes `signals.fact.updated@v1` events.
5. **Replay** records provenance: reachability section in replay manifest lists CAS URIs (graphs + runtime traces), namespaces, analyzer/version, callgraphIds, and the shared `analysisId`.
## Storage & CAS namespaces
- Static graphs: `cas://reachability_graphs/<hh>/<sha>.tar.zst` (meta.json + graph files).
- Runtime traces: `cas://runtime_traces/<hh>/<sha>.tar.zst` (NDJSON or zipped stream).
- Replay manifest now includes `analysisId` to correlate graphs/traces; each reference also carries `namespace` and `callgraphId` (static) for unambiguous replay.
## Signals API quick reference
- `POST /signals/runtime-facts` — structured request body; recomputes reachability.
- `POST /signals/runtime-facts/ndjson` — streaming NDJSON/gzip; requires `callgraphId` header params.
- `POST /signals/reachability/union` — upload ZIP bundle; optional `X-Analysis-Id`.
- `GET /signals/reachability/union/{analysisId}/meta` — returns meta.json.
- `GET /signals/reachability/union/{analysisId}/files/{fileName}` — download bundled graph/trace files.
- `GET /signals/facts/{subjectKey}` — fetch latest reachability fact (includes unknowns counters and targets).
## Scoring and unknowns
- Buckets (default weights): entrypoint 1.0, direct 0.85, runtime 0.45, unknown 0.5, unreachable 0.0.
- Confidence: reachable vs unreachable base, runtime bonus, clamped between Min/Max (defaults 0.050.99).
- Unknowns: Signals counts unresolved symbols/edges per subject; `UnknownsPressure = unknowns / (states + unknowns)` (capped). Fact score is reduced by `UnknownsPenaltyCeiling` (default 0.35) × pressure.
- Events: `signals.fact.updated@v1` now emits `unknownsCount` and `unknownsPressure` plus bucket/weight/stateCount/targets.
## Replay contract changes (v0.1 add-ons)
- `reachability.analysisId` (string, optional) — ties to Signals union ingest.
- Graph refs include `namespace`, `callgraphId`, analyzer, version, sha256, casUri.
- Runtime trace refs include `namespace`, recordedAt, sha256, casUri.
## Operator checklist
- Use deterministic CAS paths; never embed absolute file paths.
- When emitting runtime NDJSON, include `loader_base` and `code_id` when available for de-dup.
- Ensure `analysisId` is propagated from Scanner/Zastava into Signals ingest to keep replay manifests linked.
- Keep feeds frozen for reproducibility; avoid external downloads in union preparation.
## References
- Schema: `docs/modules/reach-graph/schemas/runtime-static-union-schema.md`
- Delivery guide: `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`
- Unknowns registry & scoring: Signals code (`ReachabilityScoringService`, `UnknownsIngestionService`) and events doc `docs/modules/signals/guides/events-24-005.md`.

View File

@@ -0,0 +1,332 @@
# Replay Verification
_Last updated: 2025-12-22. Owner: Scanner Guild._
This document describes the **replay verification** workflow that ensures reachability slices are reproducible and tamper-evident.
---
## 1. Overview
Replay verification answers: *"Given the same inputs, do we get the exact same slice?"*
This is critical for:
- **Audit trails**: Prove analysis results are genuine
- **Tamper detection**: Detect modified inputs or results
- **Debugging**: Identify sources of non-determinism
- **Compliance**: Demonstrate reproducible security analysis
---
## 2. Replay Workflow
```
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ Original │ │ Rehydrate │ │ Recompute │
│ Slice │────►│ Inputs │────►│ Slice │
│ (with digest) │ │ from CAS │ │ (fresh) │
└─────────────────┘ └──────────────────┘ └───────────────────┘
┌───────────────────┐
│ Compare │
│ byte-for-byte │
└───────────────────┘
┌─────────────┴─────────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ MATCH │ │ MISMATCH │
│ ✓ │ │ + diff │
└──────────┘ └──────────┘
```
---
## 3. API Reference
### 3.1 Replay Endpoint
```http
POST /api/slices/replay
Content-Type: application/json
{
"sliceDigest": "blake3:a1b2c3d4..."
}
```
### 3.2 Response Format
**Match Response (200 OK)**:
```json
{
"match": true,
"originalDigest": "blake3:a1b2c3d4...",
"recomputedDigest": "blake3:a1b2c3d4...",
"replayedAt": "2025-12-22T10:00:00Z",
"inputsVerified": true
}
```
**Mismatch Response (200 OK)**:
```json
{
"match": false,
"originalDigest": "blake3:a1b2c3d4...",
"recomputedDigest": "blake3:e5f6g7h8...",
"replayedAt": "2025-12-22T10:00:00Z",
"diff": {
"missingNodes": ["node:5"],
"extraNodes": ["node:6"],
"missingEdges": [{"from": "node:1", "to": "node:5"}],
"extraEdges": [{"from": "node:1", "to": "node:6"}],
"verdictDiff": {
"original": "unreachable",
"recomputed": "reachable"
},
"confidenceDiff": {
"original": 0.95,
"recomputed": 0.72
}
},
"possibleCauses": [
"Input graph may have been modified",
"Analyzer version mismatch: 1.2.0 vs 1.2.1",
"Feed version changed: nvd-2025-12-20 vs nvd-2025-12-22"
]
}
```
**Error Response (404 Not Found)**:
```json
{
"error": "slice_not_found",
"message": "Slice with digest blake3:a1b2c3d4... not found in CAS",
"sliceDigest": "blake3:a1b2c3d4..."
}
```
---
## 4. Input Rehydration
All inputs must be CAS-addressed for replay:
### 4.1 Required Inputs
| Input | CAS Key | Description |
|-------|---------|-------------|
| Graph | `cas://graphs/{digest}` | Full RichGraph JSON |
| Binaries | `cas://binaries/{digest}` | Binary file hashes |
| SBOM | `cas://sboms/{digest}` | CycloneDX/SPDX document |
| Policy | `cas://policies/{digest}` | Policy DSL |
| Feeds | `cas://feeds/{version}` | Advisory feed snapshot |
### 4.2 Manifest Contents
```json
{
"manifest": {
"analyzerVersion": "scanner.native:1.2.0",
"rulesetHash": "sha256:abc123...",
"feedVersions": {
"nvd": "2025-12-20",
"osv": "2025-12-20",
"ghsa": "2025-12-20"
},
"createdAt": "2025-12-22T10:00:00Z",
"toolchain": "iced-x86:1.21.0",
"environment": {
"os": "linux",
"arch": "x86_64"
}
}
}
```
---
## 5. Determinism Requirements
For byte-for-byte reproducibility:
### 5.1 JSON Canonicalization
```
1. Keys sorted alphabetically at all levels
2. No whitespace (compact JSON)
3. UTF-8 encoding
4. Lowercase hex for all hashes
5. Numbers: no trailing zeros, scientific notation for large values
```
### 5.2 Graph Ordering
```
Nodes: sorted by symbolId (lexicographic)
Edges: sorted by (from, to) tuple (lexicographic)
Paths: sorted by first node, then path length
```
### 5.3 Timestamp Handling
```
All timestamps: UTC, ISO-8601, with 'Z' suffix
Example: "2025-12-22T10:00:00Z"
No milliseconds unless significant
```
### 5.4 Floating Point
```
Confidence values: round to 6 decimal places
Example: 0.950000, not 0.95 or 0.9500001
```
---
## 6. Diff Computation
When slices don't match:
### 6.1 Diff Algorithm
```python
def compute_diff(original, recomputed):
diff = SliceDiff()
# Node diff
orig_nodes = set(n.id for n in original.subgraph.nodes)
new_nodes = set(n.id for n in recomputed.subgraph.nodes)
diff.missing_nodes = list(orig_nodes - new_nodes)
diff.extra_nodes = list(new_nodes - orig_nodes)
# Edge diff
orig_edges = set((e.from, e.to) for e in original.subgraph.edges)
new_edges = set((e.from, e.to) for e in recomputed.subgraph.edges)
diff.missing_edges = list(orig_edges - new_edges)
diff.extra_edges = list(new_edges - orig_edges)
# Verdict diff
if original.verdict.status != recomputed.verdict.status:
diff.verdict_diff = {
"original": original.verdict.status,
"recomputed": recomputed.verdict.status
}
return diff
```
### 6.2 Cause Analysis
```python
def analyze_causes(original, recomputed, manifest):
causes = []
if manifest.analyzerVersion != current_version():
causes.append(f"Analyzer version mismatch")
if manifest.feedVersions != current_feed_versions():
causes.append(f"Feed version changed")
if original.inputs.graphDigest != fetch_graph_digest():
causes.append(f"Input graph may have been modified")
return causes
```
---
## 7. CLI Usage
### 7.1 Replay Command
```bash
# Replay and verify a slice
stella slice replay --digest blake3:a1b2c3d4...
# Output:
# ✓ Slice verified: digest matches
# Original: blake3:a1b2c3d4...
# Recomputed: blake3:a1b2c3d4...
```
### 7.2 Verbose Mode
```bash
stella slice replay --digest blake3:a1b2c3d4... --verbose
# Output:
# Fetching slice from CAS...
# Rehydrating inputs:
# - Graph: cas://graphs/blake3:xyz... ✓
# - SBOM: cas://sboms/sha256:abc... ✓
# - Policy: cas://policies/sha256:def... ✓
# Recomputing slice...
# Comparing results...
# ✓ Match confirmed
```
### 7.3 Mismatch Handling
```bash
stella slice replay --digest blake3:a1b2c3d4...
# Output:
# ✗ Slice mismatch detected!
#
# Differences:
# Nodes: 1 missing, 0 extra
# Edges: 1 missing, 1 extra
# Verdict: unreachable → reachable
#
# Possible causes:
# - Input graph may have been modified
# - Analyzer version: 1.2.0 → 1.2.1
#
# Run with --diff-file to export detailed diff
```
---
## 8. Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| `slice_not_found` | Slice not in CAS | Check digest, verify upload |
| `input_not_found` | Referenced input missing | Reupload inputs |
| `version_mismatch` | Analyzer version differs | Pin version or accept drift |
| `feed_stale` | Feed snapshot unavailable | Use latest or pin version |
---
## 9. Security Considerations
1. **Input integrity**: Verify CAS digests before replay
2. **Audit logging**: Log all replay attempts
3. **Rate limiting**: Prevent replay DoS
4. **Access control**: Same permissions as slice access
---
## 10. Performance Targets
| Metric | Target |
|--------|--------|
| Replay latency | <5s for typical slice |
| Input fetch | <2s (parallel CAS fetches) |
| Comparison | <100ms |
---
## 11. Related Documentation
- [Slice Schema](./slice-schema.md)
- [Binary Reachability Schema](./binary-reachability-schema.md)
- [Determinism Requirements](../contracts/determinism.md)
- [CAS Architecture](../modules/platform/cas.md)
---
_Created: 2025-12-22. See Sprint 3820 for implementation details._

View File

@@ -0,0 +1,38 @@
# Runtime Facts (Signals/Zastava) v0.1
## Payload shapes
- **Structured** (`POST /signals/runtime-facts`):
- `subject` (imageDigest | scanId | component+version)
- `callgraphId` (required)
- `events[]`: `{ symbolId, codeId?, purl?, buildId?, loaderBase?, processId?, processName?, socketAddress?, containerId?, evidenceUri?, hitCount, observedAt?, metadata{} }`
- **Streaming NDJSON** (`POST /signals/runtime-facts/ndjson`): one JSON object per line with the same fields; supports `Content-Encoding: gzip`; callgraphId provided via query/header metadata.
## Provenance/metadata
- Signals stamps:
- `provenance.source` (defaults to `runtime` unless provided in metadata)
- `provenance.ingestedAt` (ISO-8601 UTC)
- `provenance.callgraphId`
- Runtime hits are aggregated per `symbolId` (summing hitCount) before persisting and feeding scoring.
## Validation
- `symbolId` required; events list must not be empty.
- `callgraphId` required and must resolve to a stored callgraph/union bundle.
- Subject must yield a non-empty `subjectKey`.
- Empty runtime stream is rejected.
## Storage and cache
- Stored alongside reachability facts in PostgreSQL table `reachability_facts`.
- Runtime hits cached in Valkey via `reachability_cache:*` entries; invalidated on ingest.
## Interaction with scoring
- Ingest triggers recompute: runtime hits added to prior facts hits, targets set to symbols observed, entryPoints taken from callgraph.
- Reachability states include runtime evidence on the path; bucket/weight may be `runtime` when hits are present.
- Unknowns registry stays separate; unknowns count still factors into fact score via pressure penalty.
## Replay alignment
- Runtime traces packaged under CAS namespace `runtime_traces`; referenced in replay manifest with `namespace` and `analysisId` to link to static graphs.
## Determinism rules
- Keep NDJSON ordering stable when generating bundles.
- Use UTC timestamps; avoid environment-dependent metadata values.
- No external network lookups during ingest.