fixes save
This commit is contained in:
231
docs/modules/reachgraph/architecture.md
Normal file
231
docs/modules/reachgraph/architecture.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# ReachGraph Module Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The **ReachGraph** module provides a unified store for reachability subgraphs, enabling fast, deterministic, audit-ready answers to "*exactly why* a dependency is reachable." It consolidates data from Scanner, Signals, and Attestor into content-addressed artifacts with edge-level explainability.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Before ReachGraph, reachability data was scattered across multiple modules:
|
||||
|
||||
| Module | Data | Limitation |
|
||||
|--------|------|------------|
|
||||
| Scanner.CallGraph | `CallGraphSnapshot` | No unified query API |
|
||||
| Signals | `ReachabilityFactDocument` | Runtime-focused, not auditable |
|
||||
| Attestor | PoE JSON | Per-CVE only, no slice queries |
|
||||
| Graph | Generic nodes/edges | Not optimized for "why reachable?" |
|
||||
|
||||
**Result**: Answering "why is lodash reachable?" required querying multiple systems with no guarantee of consistency or auditability.
|
||||
|
||||
## Solution
|
||||
|
||||
ReachGraph provides:
|
||||
|
||||
1. **Unified Schema**: Extends PoE subgraph format with edge explainability
|
||||
2. **Content-Addressed Store**: All artifacts identified by BLAKE3 digest
|
||||
3. **Slice Query API**: Fast queries by package, CVE, entrypoint, or file
|
||||
4. **Deterministic Replay**: Verify that same inputs produce same graph
|
||||
5. **DSSE Signing**: Offline-verifiable proofs
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Consumers │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Policy │ │ Web │ │ CLI │ │ Export │ │
|
||||
│ │ Engine │ │ Console │ │ │ │ Center │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
||||
└───────┼─────────────┼─────────────┼─────────────┼───────────────┘
|
||||
│ │ │ │
|
||||
└─────────────┴──────┬──────┴─────────────┘
|
||||
│
|
||||
┌────────────────────────────▼────────────────────────────────────┐
|
||||
│ ReachGraph WebService │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ REST API │ │
|
||||
│ │ POST /v1/reachgraphs GET /v1/reachgraphs/{d} │ │
|
||||
│ │ GET /v1/reachgraphs/{d}/slice POST /v1/reachgraphs/replay│ │
|
||||
│ └──────────────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ Slice Query Engine │ │
|
||||
│ │ - Package slice (by PURL) │ │
|
||||
│ │ - CVE slice (paths to vulnerable sinks) │ │
|
||||
│ │ - Entrypoint slice (reachable from entry) │ │
|
||||
│ │ - File slice (changed file impact) │ │
|
||||
│ └──────────────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ Replay Driver │ │
|
||||
│ │ - Rebuild graph from inputs │ │
|
||||
│ │ - Verify digest matches │ │
|
||||
│ │ - Log for audit trail │ │
|
||||
│ └──────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────▼───────────────────────────────────┐
|
||||
│ ReachGraph Core Library │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ Schema │ │ Serialization │ │ Signing │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ ReachGraphMin │ │ Canonical JSON │ │ DSSE Wrapper │ │
|
||||
│ │ EdgeExplanation│ │ BLAKE3 Digest │ │ Attestor Int. │ │
|
||||
│ │ Provenance │ │ Compression │ │ │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
└─────────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────▼───────────────────────────────────┐
|
||||
│ Persistence Layer │
|
||||
│ ┌────────────────────────┐ ┌────────────────────────┐ │
|
||||
│ │ PostgreSQL │ │ Valkey │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ reachgraph.subgraphs │ │ Hot slice cache │ │
|
||||
│ │ reachgraph.slice_cache│ │ (30min TTL) │ │
|
||||
│ │ reachgraph.replay_log │ │ │ │
|
||||
│ └────────────────────────┘ └────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
▲
|
||||
│
|
||||
┌─────────────────────────────┴───────────────────────────────────┐
|
||||
│ Producers │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Scanner │ │ Signals │ │ Attestor │ │
|
||||
│ │ CallGraph │ │ RuntimeFacts │ │ PoE │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Data Model
|
||||
|
||||
### ReachGraphMinimal (v1)
|
||||
|
||||
The core schema extends the PoE predicate format:
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": "reachgraph.min@v1",
|
||||
"artifact": {
|
||||
"name": "svc.payments",
|
||||
"digest": "sha256:abc123...",
|
||||
"env": ["linux/amd64"]
|
||||
},
|
||||
"scope": {
|
||||
"entrypoints": ["/app/bin/svc"],
|
||||
"selectors": ["prod"],
|
||||
"cves": ["CVE-2024-1234"]
|
||||
},
|
||||
"nodes": [...],
|
||||
"edges": [...],
|
||||
"provenance": {...},
|
||||
"signatures": [...]
|
||||
}
|
||||
```
|
||||
|
||||
### Edge Explainability
|
||||
|
||||
Every edge carries metadata explaining *why* it exists:
|
||||
|
||||
| Type | Description | Example Guard |
|
||||
|------|-------------|---------------|
|
||||
| `Import` | Static import | - |
|
||||
| `DynamicLoad` | Runtime load | - |
|
||||
| `Reflection` | Reflective call | - |
|
||||
| `EnvGuard` | Env variable check | `DEBUG=true` |
|
||||
| `FeatureFlag` | Feature flag | `FEATURE_X=enabled` |
|
||||
| `PlatformArch` | Platform guard | `os=linux` |
|
||||
| `LoaderRule` | PLT/IAT/GOT | `RTLD_LAZY` |
|
||||
|
||||
### Content Addressing
|
||||
|
||||
All artifacts are identified by BLAKE3-256 digest:
|
||||
- Computed from canonical JSON (sorted keys, no nulls)
|
||||
- Signatures excluded from hash computation
|
||||
- Enables idempotent upserts and cache keying
|
||||
|
||||
## API Design
|
||||
|
||||
### Core Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| POST | `/v1/reachgraphs` | Upsert subgraph (idempotent) |
|
||||
| GET | `/v1/reachgraphs/{digest}` | Get full subgraph |
|
||||
| GET | `/v1/reachgraphs/{digest}/slice` | Query slice |
|
||||
| POST | `/v1/reachgraphs/replay` | Verify determinism |
|
||||
|
||||
### Slice Query Types
|
||||
|
||||
1. **Package Slice** (`?q=pkg:npm/lodash@4.17.21`)
|
||||
- Returns subgraph containing package and neighbors
|
||||
- Configurable depth and direction
|
||||
|
||||
2. **CVE Slice** (`?cve=CVE-2024-1234`)
|
||||
- Returns all paths from entrypoints to vulnerable sinks
|
||||
- Includes edge explanations for each hop
|
||||
|
||||
3. **Entrypoint Slice** (`?entrypoint=/app/bin/svc`)
|
||||
- Returns everything reachable from entry
|
||||
- Optionally filtered to paths reaching sinks
|
||||
|
||||
4. **File Slice** (`?file=src/**/*.ts`)
|
||||
- Returns impact of changed files
|
||||
- Useful for PR-based analysis
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Upstream (Data Producers)
|
||||
|
||||
- **Scanner.CallGraph**: Produces nodes and edges with edge explanations
|
||||
- **Signals**: Provides runtime confirmation of reachability
|
||||
- **Attestor**: DSSE signing integration
|
||||
|
||||
### Downstream (Data Consumers)
|
||||
|
||||
- **Policy Engine**: `ReachabilityRequirementGate` queries slices
|
||||
- **Web Console**: "Why Reachable?" panel displays paths
|
||||
- **CLI**: `stella reachgraph slice/replay` commands
|
||||
- **ExportCenter**: Includes subgraphs in evidence bundles
|
||||
|
||||
## Determinism Guarantees
|
||||
|
||||
1. **Canonical Serialization**
|
||||
- Sorted object keys (lexicographic)
|
||||
- Sorted arrays by deterministic field
|
||||
- UTC ISO-8601 timestamps
|
||||
- No null fields (omit when null)
|
||||
|
||||
2. **Replay Verification**
|
||||
- POST `/v1/reachgraphs/replay` rebuilds from inputs
|
||||
- Returns `{match: true}` if digests match
|
||||
- Logs all attempts for audit trail
|
||||
|
||||
3. **Content Addressing**
|
||||
- Same content always produces same digest
|
||||
- Enables cache keying and deduplication
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
| Operation | Target Latency | Notes |
|
||||
|-----------|---------------|-------|
|
||||
| Slice query | P95 < 200ms | Cached in Valkey |
|
||||
| Full graph retrieval | P95 < 500ms | Compressed storage |
|
||||
| Upsert | P95 < 1s | Idempotent, gzip compression |
|
||||
| Replay | P95 < 5s | Depends on input size |
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Tenant Isolation**: RLS policies enforce at database level
|
||||
2. **Rate Limiting**: 100 req/min reads, 20 req/min writes
|
||||
3. **DSSE Signing**: All artifacts verifiable offline
|
||||
4. **Input Validation**: Schema validation on all requests
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Sprint 1227.0012.0001 - Core Library](../../implplan/SPRINT_1227_0012_0001_LB_reachgraph_core.md)
|
||||
- [Sprint 1227.0012.0002 - Store APIs](../../implplan/SPRINT_1227_0012_0002_BE_reachgraph_store.md)
|
||||
- [Sprint 1227.0012.0003 - Integration](../../implplan/SPRINT_1227_0012_0003_FE_reachgraph_integration.md)
|
||||
- [PoE Predicate Spec](../../../src/Attestor/POE_PREDICATE_SPEC.md)
|
||||
- [Module AGENTS.md](../../../src/__Libraries/StellaOps.ReachGraph/AGENTS.md)
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-27_
|
||||
Reference in New Issue
Block a user