# StellaOps.ReachGraph Module ## Module Charter The **ReachGraph** module provides a unified store for reachability subgraphs, enabling fast, deterministic, audit-ready answers to "*exactly why* a dependency is reachable." ### Mission Consolidate reachability data from Scanner, Signals, and Attestor into a single, content-addressed store with: - **Edge explainability**: Every edge carries "why" metadata (import, dynamic load, guards) - **Deterministic replay**: Same inputs produce identical digests - **Slice queries**: Fast queries by package, CVE, entrypoint, or file - **Audit-ready proofs**: DSSE-signed artifacts verifiable offline ### Scope | In Scope | Out of Scope | |----------|--------------| | ReachGraph schema and data model | Call graph extraction (handled by Scanner) | | Content-addressed storage | Runtime signal collection (handled by Signals) | | Slice query APIs | DSSE signing internals (handled by Attestor) | | Deterministic serialization | VEX document ingestion (handled by Excititor) | | Valkey caching | Policy evaluation (handled by Policy module) | | Replay verification | UI components (handled by Web module) | --- ## Architecture ### Component Diagram ``` ┌──────────────────────────────────────────────────────────────────┐ │ ReachGraph Module │ ├──────────────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Schema Layer │ │ Serialization │ │ Signing Layer │ │ │ │ │ │ │ │ │ │ │ │ ReachGraphMin │ │ Canonical JSON │ │ DSSE Wrapper │ │ │ │ EdgeExplanation │ │ BLAKE3 Digest │ │ Verification │ │ │ │ Provenance │ │ Compression │ │ │ │ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ │ ┌────────▼────────────────────▼────────────────────▼────────┐ │ │ │ Store Layer │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ Repository │ │ Slice Engine │ │ Replay Driver│ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────────▼───────────────────────────────┐ │ │ │ Persistence Layer │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ PostgreSQL │ │ Valkey │ │ │ │ │ │ (primary) │ │ (cache) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ └────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────┘ ``` ### Project Structure ``` src/__Libraries/StellaOps.ReachGraph/ ├── Schema/ │ ├── ReachGraphMinimal.cs # Top-level graph structure │ ├── ReachGraphNode.cs # Node with metadata │ ├── ReachGraphEdge.cs # Edge with explanation │ ├── EdgeExplanation.cs # Why the edge exists │ └── ReachGraphProvenance.cs # Input tracking ├── Serialization/ │ ├── CanonicalReachGraphSerializer.cs │ ├── SortedKeysJsonConverter.cs │ └── DeterministicArraySortConverter.cs ├── Hashing/ │ ├── ReachGraphDigestComputer.cs │ └── Blake3HashProvider.cs ├── Signing/ │ ├── IReachGraphSignerService.cs │ └── ReachGraphSignerService.cs ├── Store/ │ ├── IReachGraphRepository.cs │ ├── PostgresReachGraphRepository.cs │ └── SliceQueryEngine.cs ├── Cache/ │ ├── IReachGraphCache.cs │ └── ValkeyReachGraphCache.cs ├── Replay/ │ ├── IReplayDriver.cs │ └── DeterministicReplayDriver.cs └── StellaOps.ReachGraph.csproj src/__Libraries/StellaOps.ReachGraph.Persistence/ ├── Migrations/ │ └── 001_reachgraph_store.sql ├── Models/ │ └── SubgraphEntity.cs └── StellaOps.ReachGraph.Persistence.csproj src/ReachGraph/ ├── StellaOps.ReachGraph.WebService/ │ ├── Endpoints/ │ │ ├── ReachGraphEndpoints.cs │ │ └── SliceQueryEndpoints.cs │ ├── Contracts/ │ │ ├── UpsertRequest.cs │ │ ├── SliceQueryRequest.cs │ │ └── ReplayRequest.cs │ ├── Program.cs │ └── openapi.yaml └── __Tests/ └── StellaOps.ReachGraph.WebService.Tests/ ``` --- ## Data Model ### ReachGraphMinimal Schema (v1) ```json { "schemaVersion": "reachgraph.min@v1", "artifact": { "name": "svc.payments", "digest": "sha256:abc123...", "env": ["linux/amd64"] }, "scope": { "entrypoints": ["/app/bin/svc"], "selectors": ["prod"], "cves": ["CVE-2024-1234"] }, "nodes": [ { "id": "sha256:nodeHash1", "kind": "function", "ref": "main()", "file": "src/index.ts", "line": 1, "isEntrypoint": true } ], "edges": [ { "from": "sha256:nodeHash1", "to": "sha256:nodeHash2", "why": { "type": "Import", "loc": "src/index.ts:3", "confidence": 1.0 } } ], "provenance": { "intoto": ["attestation-1.link"], "inputs": { "sbom": "sha256:sbomDigest", "vex": "sha256:vexDigest", "callgraph": "sha256:cgDigest" }, "computedAt": "2025-12-27T10:00:00Z", "analyzer": { "name": "stellaops-scanner", "version": "1.0.0", "toolchainDigest": "sha256:..." } }, "signatures": [ {"keyId": "scanner-signing-2025", "sig": "base64..."} ] } ``` ### Edge Explanation Types | Type | Description | Example Guard | |------|-------------|---------------| | `Import` | Static import statement | - | | `DynamicLoad` | Runtime require/import | - | | `Reflection` | Reflective invocation | - | | `Ffi` | Foreign function call | - | | `EnvGuard` | Environment variable check | `DEBUG=true` | | `FeatureFlag` | Feature flag condition | `FEATURE_X=enabled` | | `PlatformArch` | Platform/arch guard | `os=linux` | | `TaintGate` | Sanitization/validation | - | | `LoaderRule` | PLT/IAT/GOT entry | `RTLD_LAZY` | | `DirectCall` | Direct function call | - | | `Unknown` | Cannot determine | - | --- ## API Contracts ### Endpoints | Method | Path | Description | |--------|------|-------------| | POST | `/v1/reachgraphs` | Upsert subgraph | | GET | `/v1/reachgraphs/{digest}` | Get full subgraph | | GET | `/v1/reachgraphs/{digest}/slice` | Query slice | | POST | `/v1/reachgraphs/replay` | Verify determinism | | GET | `/v1/reachgraphs/by-artifact/{digest}` | List by artifact | ### Slice Query Parameters | Parameter | Description | |-----------|-------------| | `q` | PURL pattern for package slice | | `cve` | CVE ID for vulnerability slice | | `entrypoint` | Entrypoint path/symbol | | `file` | File path pattern (glob) | | `depth` | Max traversal depth | | `direction` | `upstream`, `downstream`, `both` | --- ## Coding Guidelines ### Determinism Rules 1. **All JSON serialization must use canonical format** - Sorted object keys (lexicographic) - Sorted arrays by deterministic field - UTC ISO-8601 timestamps - No null fields (omit when null) 2. **Hash computation excludes signatures** - Remove `signatures` field before hashing - Use BLAKE3-256 for all digests 3. **Tests must verify determinism** - Same input must produce same digest - Golden samples for regression testing ### Error Handling - Return structured errors with codes - Log correlation IDs for tracing - Never expose internal details in errors ### Performance - Cache hot slices in Valkey (30min TTL) - Compress stored blobs with gzip - Paginate large results (50 nodes per page) - Timeout long queries (30s max) --- ## Integration Points ### Upstream (Data Producers) | Module | Data | Integration | |--------|------|-------------| | Scanner.CallGraph | Call graph nodes/edges | `ICallGraphExtractor` produces input | | Signals | Runtime facts | Correlates static + dynamic paths | | Attestor | DSSE signing | `IReachGraphSignerService` delegates | ### Downstream (Data Consumers) | Module | Usage | Integration | |--------|-------|-------------| | Policy | VEX decisions | `ReachabilityRequirementGate` queries slices | | Web | UI panel | REST API for "Why Reachable?" | | CLI | Proof export | `stella reachgraph` commands | | ExportCenter | Batch reports | Includes subgraphs in evidence bundles | --- ## Testing Requirements ### Unit Tests - `CanonicalSerializerTests.cs` - Deterministic serialization - `DigestComputerTests.cs` - BLAKE3 hashing - `EdgeExplanationTests.cs` - Type coverage - `SliceEngineTests.cs` - Query correctness ### Integration Tests - PostgreSQL with Testcontainers - Valkey cache behavior - Tenant isolation (RLS) - Rate limiting enforcement ### Golden Samples Located in `tests/ReachGraph/Fixtures/`: - `simple-single-path.reachgraph.min.json` - `multi-edge-java.reachgraph.min.json` - `feature-flag-guards.reachgraph.min.json` - `large-graph-50-nodes.reachgraph.min.json` --- ## Configuration ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `REACHGRAPH_POSTGRES_CONNECTION` | PostgreSQL connection string | - | | `REACHGRAPH_VALKEY_CONNECTION` | Valkey connection string | - | | `REACHGRAPH_CACHE_TTL_MINUTES` | Cache TTL for full graphs | 1440 | | `REACHGRAPH_SLICE_CACHE_TTL_MINUTES` | Cache TTL for slices | 30 | | `REACHGRAPH_MAX_GRAPH_SIZE_MB` | Max graph size in cache | 10 | ### YAML Configuration ```yaml # etc/reachgraph.yaml reachgraph: store: maxDepth: 10 maxPaths: 5 compressionLevel: 6 cache: enabled: true ttlMinutes: 30 replay: enabled: true logResults: true ``` --- ## Observability ### Metrics - `reachgraph_upsert_total` - Upsert count by result - `reachgraph_query_duration_seconds` - Query latency histogram - `reachgraph_cache_hit_ratio` - Cache hit rate - `reachgraph_replay_match_total` - Replay verification results - `reachgraph_slice_size_bytes` - Slice response sizes ### Logging - Structured JSON logs - Correlation ID in all entries - Tenant context - Query parameters (sanitized) ### Tracing - OpenTelemetry spans for: - Upsert operations - Slice queries - Cache lookups - Replay verification --- ## Related Documentation - `docs/implplan/SPRINT_1227_0012_0001_LB_reachgraph_core.md` - `docs/implplan/SPRINT_1227_0012_0002_BE_reachgraph_store.md` - `docs/implplan/SPRINT_1227_0012_0003_FE_reachgraph_integration.md` - `src/Attestor/POE_PREDICATE_SPEC.md` (predecessor schema) - `docs/modules/scanner/architecture.md` - `docs/modules/signals/architecture.md` --- _Module created: 2025-12-27. Owner: ReachGraph Guild._