12 KiB
ReachGraph Module Architecture
Overview
The ReachGraph module provides a unified store for reachability subgraphs, enabling fast, deterministic, audit-ready answers to "exactly why a dependency is reachable." It consolidates data from Scanner, Signals, and Attestor into content-addressed artifacts with edge-level explainability.
Problem Statement
Before ReachGraph, reachability data was scattered across multiple modules:
| Module | Data | Limitation |
|---|---|---|
| Scanner.CallGraph | CallGraphSnapshot |
No unified query API |
| Signals | ReachabilityFactDocument |
Runtime-focused, not auditable |
| Attestor | PoE JSON | Per-CVE only, no slice queries |
| Graph | Generic nodes/edges | Not optimized for "why reachable?" |
Result: Answering "why is lodash reachable?" required querying multiple systems with no guarantee of consistency or auditability.
Solution
ReachGraph provides:
- Unified Schema: Extends PoE subgraph format with edge explainability
- Content-Addressed Store: All artifacts identified by BLAKE3 digest
- Slice Query API: Fast queries by package, CVE, entrypoint, or file
- Deterministic Replay: Verify that same inputs produce same graph
- DSSE Signing: Offline-verifiable proofs
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Consumers │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Policy │ │ Web │ │ CLI │ │ Export │ │
│ │ Engine │ │ Console │ │ │ │ Center │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼─────────────┼─────────────┼─────────────┼───────────────┘
│ │ │ │
└─────────────┴──────┬──────┴─────────────┘
│
┌────────────────────────────▼────────────────────────────────────┐
│ ReachGraph WebService │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ REST API │ │
│ │ POST /v1/reachgraphs GET /v1/reachgraphs/{d} │ │
│ │ GET /v1/reachgraphs/{d}/slice POST /v1/reachgraphs/replay│ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Slice Query Engine │ │
│ │ - Package slice (by PURL) │ │
│ │ - CVE slice (paths to vulnerable sinks) │ │
│ │ - Entrypoint slice (reachable from entry) │ │
│ │ - File slice (changed file impact) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Replay Driver │ │
│ │ - Rebuild graph from inputs │ │
│ │ - Verify digest matches │ │
│ │ - Log for audit trail │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────────┐
│ ReachGraph Core Library │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Schema │ │ Serialization │ │ Signing │ │
│ │ │ │ │ │ │ │
│ │ ReachGraphMin │ │ Canonical JSON │ │ DSSE Wrapper │ │
│ │ EdgeExplanation│ │ BLAKE3 Digest │ │ Attestor Int. │ │
│ │ Provenance │ │ Compression │ │ │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
└─────────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────────┐
│ Persistence Layer │
│ ┌────────────────────────┐ ┌────────────────────────┐ │
│ │ PostgreSQL │ │ Valkey │ │
│ │ │ │ │ │
│ │ reachgraph.subgraphs │ │ Hot slice cache │ │
│ │ reachgraph.slice_cache│ │ (30min TTL) │ │
│ │ reachgraph.replay_log │ │ │ │
│ └────────────────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲
│
┌─────────────────────────────┴───────────────────────────────────┐
│ Producers │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scanner │ │ Signals │ │ Attestor │ │
│ │ CallGraph │ │ RuntimeFacts │ │ PoE │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Data Model
ReachGraphMinimal (v1)
The core schema extends the PoE predicate format:
{
"schemaVersion": "reachgraph.min@v1",
"artifact": {
"name": "svc.payments",
"digest": "sha256:abc123...",
"env": ["linux/amd64"]
},
"scope": {
"entrypoints": ["/app/bin/svc"],
"selectors": ["prod"],
"cves": ["CVE-2024-1234"]
},
"nodes": [...],
"edges": [...],
"provenance": {...},
"signatures": [...]
}
Edge Explainability
Every edge carries metadata explaining why it exists:
| Type | Description | Example Guard |
|---|---|---|
Import |
Static import | - |
DynamicLoad |
Runtime load | - |
Reflection |
Reflective call | - |
EnvGuard |
Env variable check | DEBUG=true |
FeatureFlag |
Feature flag | FEATURE_X=enabled |
PlatformArch |
Platform guard | os=linux |
LoaderRule |
PLT/IAT/GOT | RTLD_LAZY |
Content Addressing
All artifacts are identified by BLAKE3-256 digest:
- Computed from canonical JSON (sorted keys, no nulls)
- Signatures excluded from hash computation
- Enables idempotent upserts and cache keying
API Design
Core Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /v1/reachgraphs |
Upsert subgraph (idempotent) |
| GET | /v1/reachgraphs/{digest} |
Get full subgraph |
| GET | /v1/reachgraphs/{digest}/slice |
Query slice |
| POST | /v1/reachgraphs/replay |
Verify determinism |
Slice Query Types
-
Package Slice (
?q=pkg:npm/lodash@4.17.21)- Returns subgraph containing package and neighbors
- Configurable depth and direction
-
CVE Slice (
?cve=CVE-2024-1234)- Returns all paths from entrypoints to vulnerable sinks
- Includes edge explanations for each hop
-
Entrypoint Slice (
?entrypoint=/app/bin/svc)- Returns everything reachable from entry
- Optionally filtered to paths reaching sinks
-
File Slice (
?file=src/**/*.ts)- Returns impact of changed files
- Useful for PR-based analysis
Integration Points
Upstream (Data Producers)
- Scanner.CallGraph: Produces nodes and edges with edge explanations
- Signals: Provides runtime confirmation of reachability
- Attestor: DSSE signing integration
Downstream (Data Consumers)
- Policy Engine:
ReachabilityRequirementGatequeries slices - Web Console: "Why Reachable?" panel displays paths
- CLI:
stella reachgraph slice/replaycommands - ExportCenter: Includes subgraphs in evidence bundles
Determinism Guarantees
-
Canonical Serialization
- Sorted object keys (lexicographic)
- Sorted arrays by deterministic field
- UTC ISO-8601 timestamps
- No null fields (omit when null)
-
Replay Verification
- POST
/v1/reachgraphs/replayrebuilds from inputs - Returns
{match: true}if digests match - Logs all attempts for audit trail
- POST
-
Content Addressing
- Same content always produces same digest
- Enables cache keying and deduplication
Performance Characteristics
| Operation | Target Latency | Notes |
|---|---|---|
| Slice query | P95 < 200ms | Cached in Valkey |
| Full graph retrieval | P95 < 500ms | Compressed storage |
| Upsert | P95 < 1s | Idempotent, gzip compression |
| Replay | P95 < 5s | Depends on input size |
Security Considerations
- Tenant Isolation: RLS policies enforce at database level
- Rate Limiting: 100 req/min reads, 20 req/min writes
- DSSE Signing: All artifacts verifiable offline
- Input Validation: Schema validation on all requests
Related Documentation
- Sprint 1227.0012.0001 - Core Library
- Sprint 1227.0012.0002 - Store APIs
- Sprint 1227.0012.0003 - Integration
- PoE Predicate Spec
- Module AGENTS.md
Last updated: 2025-12-27