370 lines
13 KiB
Markdown
370 lines
13 KiB
Markdown
# StellaOps.ReachGraph Module
|
|
|
|
## Module Charter
|
|
|
|
The **ReachGraph** module provides a unified store for reachability subgraphs, enabling fast, deterministic, audit-ready answers to "*exactly why* a dependency is reachable."
|
|
|
|
### Mission
|
|
|
|
Consolidate reachability data from Scanner, Signals, and Attestor into a single, content-addressed store with:
|
|
- **Edge explainability**: Every edge carries "why" metadata (import, dynamic load, guards)
|
|
- **Deterministic replay**: Same inputs produce identical digests
|
|
- **Slice queries**: Fast queries by package, CVE, entrypoint, or file
|
|
- **Audit-ready proofs**: DSSE-signed artifacts verifiable offline
|
|
|
|
### Scope
|
|
|
|
| In Scope | Out of Scope |
|
|
|----------|--------------|
|
|
| ReachGraph schema and data model | Call graph extraction (handled by Scanner) |
|
|
| Content-addressed storage | Runtime signal collection (handled by Signals) |
|
|
| Slice query APIs | DSSE signing internals (handled by Attestor) |
|
|
| Deterministic serialization | VEX document ingestion (handled by Excititor) |
|
|
| Valkey caching | Policy evaluation (handled by Policy module) |
|
|
| Replay verification | UI components (handled by Web module) |
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### Component Diagram
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ ReachGraph Module │
|
|
├──────────────────────────────────────────────────────────────────┤
|
|
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Schema Layer │ │ Serialization │ │ Signing Layer │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ ReachGraphMin │ │ Canonical JSON │ │ DSSE Wrapper │ │
|
|
│ │ EdgeExplanation │ │ BLAKE3 Digest │ │ Verification │ │
|
|
│ │ Provenance │ │ Compression │ │ │ │
|
|
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
|
│ │ │ │ │
|
|
│ ┌────────▼────────────────────▼────────────────────▼────────┐ │
|
|
│ │ Store Layer │ │
|
|
│ │ │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Repository │ │ Slice Engine │ │ Replay Driver│ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────────────────────────▼───────────────────────────────┐ │
|
|
│ │ Persistence Layer │ │
|
|
│ │ │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ PostgreSQL │ │ Valkey │ │ │
|
|
│ │ │ (primary) │ │ (cache) │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ │ │
|
|
│ └────────────────────────────────────────────────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
src/__Libraries/StellaOps.ReachGraph/
|
|
├── Schema/
|
|
│ ├── ReachGraphMinimal.cs # Top-level graph structure
|
|
│ ├── ReachGraphNode.cs # Node with metadata
|
|
│ ├── ReachGraphEdge.cs # Edge with explanation
|
|
│ ├── EdgeExplanation.cs # Why the edge exists
|
|
│ └── ReachGraphProvenance.cs # Input tracking
|
|
├── Serialization/
|
|
│ ├── CanonicalReachGraphSerializer.cs
|
|
│ ├── SortedKeysJsonConverter.cs
|
|
│ └── DeterministicArraySortConverter.cs
|
|
├── Hashing/
|
|
│ ├── ReachGraphDigestComputer.cs
|
|
│ └── Blake3HashProvider.cs
|
|
├── Signing/
|
|
│ ├── IReachGraphSignerService.cs
|
|
│ └── ReachGraphSignerService.cs
|
|
├── Store/
|
|
│ ├── IReachGraphRepository.cs
|
|
│ ├── PostgresReachGraphRepository.cs
|
|
│ └── SliceQueryEngine.cs
|
|
├── Cache/
|
|
│ ├── IReachGraphCache.cs
|
|
│ └── ValkeyReachGraphCache.cs
|
|
├── Replay/
|
|
│ ├── IReplayDriver.cs
|
|
│ └── DeterministicReplayDriver.cs
|
|
└── StellaOps.ReachGraph.csproj
|
|
|
|
src/__Libraries/StellaOps.ReachGraph.Persistence/
|
|
├── Migrations/
|
|
│ └── 001_reachgraph_store.sql
|
|
├── Models/
|
|
│ └── SubgraphEntity.cs
|
|
└── StellaOps.ReachGraph.Persistence.csproj
|
|
|
|
src/ReachGraph/
|
|
├── StellaOps.ReachGraph.WebService/
|
|
│ ├── Endpoints/
|
|
│ │ ├── ReachGraphEndpoints.cs
|
|
│ │ └── SliceQueryEndpoints.cs
|
|
│ ├── Contracts/
|
|
│ │ ├── UpsertRequest.cs
|
|
│ │ ├── SliceQueryRequest.cs
|
|
│ │ └── ReplayRequest.cs
|
|
│ ├── Program.cs
|
|
│ └── openapi.yaml
|
|
└── __Tests/
|
|
└── StellaOps.ReachGraph.WebService.Tests/
|
|
```
|
|
|
|
---
|
|
|
|
## Data Model
|
|
|
|
### ReachGraphMinimal Schema (v1)
|
|
|
|
```json
|
|
{
|
|
"schemaVersion": "reachgraph.min@v1",
|
|
"artifact": {
|
|
"name": "svc.payments",
|
|
"digest": "sha256:abc123...",
|
|
"env": ["linux/amd64"]
|
|
},
|
|
"scope": {
|
|
"entrypoints": ["/app/bin/svc"],
|
|
"selectors": ["prod"],
|
|
"cves": ["CVE-2024-1234"]
|
|
},
|
|
"nodes": [
|
|
{
|
|
"id": "sha256:nodeHash1",
|
|
"kind": "function",
|
|
"ref": "main()",
|
|
"file": "src/index.ts",
|
|
"line": 1,
|
|
"isEntrypoint": true
|
|
}
|
|
],
|
|
"edges": [
|
|
{
|
|
"from": "sha256:nodeHash1",
|
|
"to": "sha256:nodeHash2",
|
|
"why": {
|
|
"type": "Import",
|
|
"loc": "src/index.ts:3",
|
|
"confidence": 1.0
|
|
}
|
|
}
|
|
],
|
|
"provenance": {
|
|
"intoto": ["attestation-1.link"],
|
|
"inputs": {
|
|
"sbom": "sha256:sbomDigest",
|
|
"vex": "sha256:vexDigest",
|
|
"callgraph": "sha256:cgDigest"
|
|
},
|
|
"computedAt": "2025-12-27T10:00:00Z",
|
|
"analyzer": {
|
|
"name": "stellaops-scanner",
|
|
"version": "1.0.0",
|
|
"toolchainDigest": "sha256:..."
|
|
}
|
|
},
|
|
"signatures": [
|
|
{"keyId": "scanner-signing-2025", "sig": "base64..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Edge Explanation Types
|
|
|
|
| Type | Description | Example Guard |
|
|
|------|-------------|---------------|
|
|
| `Import` | Static import statement | - |
|
|
| `DynamicLoad` | Runtime require/import | - |
|
|
| `Reflection` | Reflective invocation | - |
|
|
| `Ffi` | Foreign function call | - |
|
|
| `EnvGuard` | Environment variable check | `DEBUG=true` |
|
|
| `FeatureFlag` | Feature flag condition | `FEATURE_X=enabled` |
|
|
| `PlatformArch` | Platform/arch guard | `os=linux` |
|
|
| `TaintGate` | Sanitization/validation | - |
|
|
| `LoaderRule` | PLT/IAT/GOT entry | `RTLD_LAZY` |
|
|
| `DirectCall` | Direct function call | - |
|
|
| `Unknown` | Cannot determine | - |
|
|
|
|
---
|
|
|
|
## API Contracts
|
|
|
|
### Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| POST | `/v1/reachgraphs` | Upsert subgraph |
|
|
| GET | `/v1/reachgraphs/{digest}` | Get full subgraph |
|
|
| GET | `/v1/reachgraphs/{digest}/slice` | Query slice |
|
|
| POST | `/v1/reachgraphs/replay` | Verify determinism |
|
|
| GET | `/v1/reachgraphs/by-artifact/{digest}` | List by artifact |
|
|
|
|
### Slice Query Parameters
|
|
|
|
| Parameter | Description |
|
|
|-----------|-------------|
|
|
| `q` | PURL pattern for package slice |
|
|
| `cve` | CVE ID for vulnerability slice |
|
|
| `entrypoint` | Entrypoint path/symbol |
|
|
| `file` | File path pattern (glob) |
|
|
| `depth` | Max traversal depth |
|
|
| `direction` | `upstream`, `downstream`, `both` |
|
|
|
|
---
|
|
|
|
## Coding Guidelines
|
|
|
|
### Determinism Rules
|
|
|
|
1. **All JSON serialization must use canonical format**
|
|
- Sorted object keys (lexicographic)
|
|
- Sorted arrays by deterministic field
|
|
- UTC ISO-8601 timestamps
|
|
- No null fields (omit when null)
|
|
|
|
2. **Hash computation excludes signatures**
|
|
- Remove `signatures` field before hashing
|
|
- Use BLAKE3-256 for all digests
|
|
|
|
3. **Tests must verify determinism**
|
|
- Same input must produce same digest
|
|
- Golden samples for regression testing
|
|
|
|
### Error Handling
|
|
|
|
- Return structured errors with codes
|
|
- Log correlation IDs for tracing
|
|
- Never expose internal details in errors
|
|
|
|
### Performance
|
|
|
|
- Cache hot slices in Valkey (30min TTL)
|
|
- Compress stored blobs with gzip
|
|
- Paginate large results (50 nodes per page)
|
|
- Timeout long queries (30s max)
|
|
|
|
---
|
|
|
|
## Integration Points
|
|
|
|
### Upstream (Data Producers)
|
|
|
|
| Module | Data | Integration |
|
|
|--------|------|-------------|
|
|
| Scanner.CallGraph | Call graph nodes/edges | `ICallGraphExtractor` produces input |
|
|
| Signals | Runtime facts | Correlates static + dynamic paths |
|
|
| Attestor | DSSE signing | `IReachGraphSignerService` delegates |
|
|
|
|
### Downstream (Data Consumers)
|
|
|
|
| Module | Usage | Integration |
|
|
|--------|-------|-------------|
|
|
| Policy | VEX decisions | `ReachabilityRequirementGate` queries slices |
|
|
| Web | UI panel | REST API for "Why Reachable?" |
|
|
| CLI | Proof export | `stella reachgraph` commands |
|
|
| ExportCenter | Batch reports | Includes subgraphs in evidence bundles |
|
|
|
|
---
|
|
|
|
## Testing Requirements
|
|
|
|
### Unit Tests
|
|
|
|
- `CanonicalSerializerTests.cs` - Deterministic serialization
|
|
- `DigestComputerTests.cs` - BLAKE3 hashing
|
|
- `EdgeExplanationTests.cs` - Type coverage
|
|
- `SliceEngineTests.cs` - Query correctness
|
|
|
|
### Integration Tests
|
|
|
|
- PostgreSQL with Testcontainers
|
|
- Valkey cache behavior
|
|
- Tenant isolation (RLS)
|
|
- Rate limiting enforcement
|
|
|
|
### Golden Samples
|
|
|
|
Located in `tests/ReachGraph/Fixtures/`:
|
|
- `simple-single-path.reachgraph.min.json`
|
|
- `multi-edge-java.reachgraph.min.json`
|
|
- `feature-flag-guards.reachgraph.min.json`
|
|
- `large-graph-50-nodes.reachgraph.min.json`
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `REACHGRAPH_POSTGRES_CONNECTION` | PostgreSQL connection string | - |
|
|
| `REACHGRAPH_VALKEY_CONNECTION` | Valkey connection string | - |
|
|
| `REACHGRAPH_CACHE_TTL_MINUTES` | Cache TTL for full graphs | 1440 |
|
|
| `REACHGRAPH_SLICE_CACHE_TTL_MINUTES` | Cache TTL for slices | 30 |
|
|
| `REACHGRAPH_MAX_GRAPH_SIZE_MB` | Max graph size in cache | 10 |
|
|
|
|
### YAML Configuration
|
|
|
|
```yaml
|
|
# etc/reachgraph.yaml
|
|
reachgraph:
|
|
store:
|
|
maxDepth: 10
|
|
maxPaths: 5
|
|
compressionLevel: 6
|
|
cache:
|
|
enabled: true
|
|
ttlMinutes: 30
|
|
replay:
|
|
enabled: true
|
|
logResults: true
|
|
```
|
|
|
|
---
|
|
|
|
## Observability
|
|
|
|
### Metrics
|
|
|
|
- `reachgraph_upsert_total` - Upsert count by result
|
|
- `reachgraph_query_duration_seconds` - Query latency histogram
|
|
- `reachgraph_cache_hit_ratio` - Cache hit rate
|
|
- `reachgraph_replay_match_total` - Replay verification results
|
|
- `reachgraph_slice_size_bytes` - Slice response sizes
|
|
|
|
### Logging
|
|
|
|
- Structured JSON logs
|
|
- Correlation ID in all entries
|
|
- Tenant context
|
|
- Query parameters (sanitized)
|
|
|
|
### Tracing
|
|
|
|
- OpenTelemetry spans for:
|
|
- Upsert operations
|
|
- Slice queries
|
|
- Cache lookups
|
|
- Replay verification
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- `docs/implplan/SPRINT_1227_0012_0001_LB_reachgraph_core.md`
|
|
- `docs/implplan/SPRINT_1227_0012_0002_BE_reachgraph_store.md`
|
|
- `docs/implplan/SPRINT_1227_0012_0003_FE_reachgraph_integration.md`
|
|
- `src/Attestor/POE_PREDICATE_SPEC.md` (predecessor schema)
|
|
- `docs/modules/scanner/architecture.md`
|
|
- `docs/modules/signals/architecture.md`
|
|
|
|
---
|
|
|
|
_Module created: 2025-12-27. Owner: ReachGraph Guild._
|