417 lines
20 KiB
Markdown
417 lines
20 KiB
Markdown
# ReachGraph Module Architecture
|
|
|
|
## Overview
|
|
|
|
The **ReachGraph** module provides a unified store for reachability subgraphs, enabling fast, deterministic, audit-ready answers to "*exactly why* a dependency is reachable." It consolidates data from Scanner, Signals, and Attestor into content-addressed artifacts with edge-level explainability.
|
|
|
|
## Problem Statement
|
|
|
|
Before ReachGraph, reachability data was scattered across multiple modules:
|
|
|
|
| Module | Data | Limitation |
|
|
|--------|------|------------|
|
|
| Scanner.CallGraph | `CallGraphSnapshot` | No unified query API |
|
|
| Signals | `ReachabilityFactDocument` | Runtime-focused, not auditable |
|
|
| Attestor | PoE JSON | Per-CVE only, no slice queries |
|
|
| Graph | Generic nodes/edges | Not optimized for "why reachable?" |
|
|
|
|
**Result**: Answering "why is lodash reachable?" required querying multiple systems with no guarantee of consistency or auditability.
|
|
|
|
## Solution
|
|
|
|
ReachGraph provides:
|
|
|
|
1. **Unified Schema**: Extends PoE subgraph format with edge explainability
|
|
2. **Content-Addressed Store**: All artifacts identified by BLAKE3 digest
|
|
3. **Slice Query API**: Fast queries by package, CVE, entrypoint, or file
|
|
4. **Deterministic Replay**: Verify that same inputs produce same graph
|
|
5. **DSSE Signing**: Offline-verifiable proofs
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Consumers │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
|
│ │ Policy │ │ Web │ │ CLI │ │ Export │ │
|
|
│ │ Engine │ │ Console │ │ │ │ Center │ │
|
|
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
|
└───────┼─────────────┼─────────────┼─────────────┼───────────────┘
|
|
│ │ │ │
|
|
└─────────────┴──────┬──────┴─────────────┘
|
|
│
|
|
┌────────────────────────────▼────────────────────────────────────┐
|
|
│ ReachGraph WebService │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ REST API │ │
|
|
│ │ POST /v1/reachgraphs GET /v1/reachgraphs/{d} │ │
|
|
│ │ GET /v1/reachgraphs/{d}/slice POST /v1/reachgraphs/replay│ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ Slice Query Engine │ │
|
|
│ │ - Package slice (by PURL) │ │
|
|
│ │ - CVE slice (paths to vulnerable sinks) │ │
|
|
│ │ - Entrypoint slice (reachable from entry) │ │
|
|
│ │ - File slice (changed file impact) │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ Replay Driver │ │
|
|
│ │ - Rebuild graph from inputs │ │
|
|
│ │ - Verify digest matches │ │
|
|
│ │ - Log for audit trail │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────┬───────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────────▼───────────────────────────────────┐
|
|
│ ReachGraph Core Library │
|
|
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
|
│ │ Schema │ │ Serialization │ │ Signing │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ ReachGraphMin │ │ Canonical JSON │ │ DSSE Wrapper │ │
|
|
│ │ EdgeExplanation│ │ BLAKE3 Digest │ │ Attestor Int. │ │
|
|
│ │ Provenance │ │ Compression │ │ │ │
|
|
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
|
└─────────────────────────────┬───────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────────▼───────────────────────────────────┐
|
|
│ Persistence Layer │
|
|
│ ┌────────────────────────┐ ┌────────────────────────┐ │
|
|
│ │ PostgreSQL │ │ Valkey │ │
|
|
│ │ │ │ │ │
|
|
│ │ reachgraph.subgraphs │ │ Hot slice cache │ │
|
|
│ │ reachgraph.slice_cache│ │ (30min TTL) │ │
|
|
│ │ reachgraph.replay_log │ │ │ │
|
|
│ └────────────────────────┘ └────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
▲
|
|
│
|
|
┌─────────────────────────────┴───────────────────────────────────┐
|
|
│ Producers │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Scanner │ │ Signals │ │ Attestor │ │
|
|
│ │ CallGraph │ │ RuntimeFacts │ │ PoE │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Data Model
|
|
|
|
### ReachGraphMinimal (v1)
|
|
|
|
The core schema extends the PoE predicate format:
|
|
|
|
```json
|
|
{
|
|
"schemaVersion": "reachgraph.min@v1",
|
|
"artifact": {
|
|
"name": "svc.payments",
|
|
"digest": "sha256:abc123...",
|
|
"env": ["linux/amd64"]
|
|
},
|
|
"scope": {
|
|
"entrypoints": ["/app/bin/svc"],
|
|
"selectors": ["prod"],
|
|
"cves": ["CVE-2024-1234"]
|
|
},
|
|
"nodes": [...],
|
|
"edges": [...],
|
|
"provenance": {...},
|
|
"signatures": [...]
|
|
}
|
|
```
|
|
|
|
### Edge Explainability
|
|
|
|
Every edge carries metadata explaining *why* it exists:
|
|
|
|
| Type | Description | Example Guard |
|
|
|------|-------------|---------------|
|
|
| `Import` | Static import | - |
|
|
| `DynamicLoad` | Runtime load | - |
|
|
| `Reflection` | Reflective call | - |
|
|
| `EnvGuard` | Env variable check | `DEBUG=true` |
|
|
| `FeatureFlag` | Feature flag | `FEATURE_X=enabled` |
|
|
| `PlatformArch` | Platform guard | `os=linux` |
|
|
| `LoaderRule` | PLT/IAT/GOT | `RTLD_LAZY` |
|
|
|
|
### Content Addressing
|
|
|
|
All artifacts are identified by BLAKE3-256 digest:
|
|
- Computed from canonical JSON (sorted keys, no nulls)
|
|
- Signatures excluded from hash computation
|
|
- Enables idempotent upserts and cache keying
|
|
|
|
## API Design
|
|
|
|
### Core Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| POST | `/v1/reachgraphs` | Upsert subgraph (idempotent) |
|
|
| GET | `/v1/reachgraphs/{digest}` | Get full subgraph |
|
|
| GET | `/v1/reachgraphs/{digest}/slice` | Query slice |
|
|
| POST | `/v1/reachgraphs/replay` | Verify determinism |
|
|
|
|
### Slice Query Types
|
|
|
|
1. **Package Slice** (`?q=pkg:npm/lodash@4.17.21`)
|
|
- Returns subgraph containing package and neighbors
|
|
- Configurable depth and direction
|
|
|
|
2. **CVE Slice** (`?cve=CVE-2024-1234`)
|
|
- Returns all paths from entrypoints to vulnerable sinks
|
|
- Includes edge explanations for each hop
|
|
|
|
3. **Entrypoint Slice** (`?entrypoint=/app/bin/svc`)
|
|
- Returns everything reachable from entry
|
|
- Optionally filtered to paths reaching sinks
|
|
|
|
4. **File Slice** (`?file=src/**/*.ts`)
|
|
- Returns impact of changed files
|
|
- Useful for PR-based analysis
|
|
|
|
## Integration Points
|
|
|
|
### Upstream (Data Producers)
|
|
|
|
- **Scanner.CallGraph**: Produces nodes and edges with edge explanations
|
|
- **Signals**: Provides runtime confirmation of reachability
|
|
- **Attestor**: DSSE signing integration
|
|
|
|
### Downstream (Data Consumers)
|
|
|
|
- **Policy Engine**: `ReachabilityRequirementGate` queries slices
|
|
- **Web Console**: "Why Reachable?" panel displays paths
|
|
- **CLI**: `stella reachgraph slice/replay` commands
|
|
- **ExportCenter**: Includes subgraphs in evidence bundles
|
|
|
|
## Determinism Guarantees
|
|
|
|
1. **Canonical Serialization**
|
|
- Sorted object keys (lexicographic)
|
|
- Sorted arrays by deterministic field
|
|
- UTC ISO-8601 timestamps
|
|
- No null fields (omit when null)
|
|
|
|
2. **Replay Verification**
|
|
- POST `/v1/reachgraphs/replay` rebuilds from inputs
|
|
- Returns `{match: true}` if digests match
|
|
- Logs all attempts for audit trail
|
|
|
|
3. **Content Addressing**
|
|
- Same content always produces same digest
|
|
- Enables cache keying and deduplication
|
|
|
|
## Performance Characteristics
|
|
|
|
| Operation | Target Latency | Notes |
|
|
|-----------|---------------|-------|
|
|
| Slice query | P95 < 200ms | Cached in Valkey |
|
|
| Full graph retrieval | P95 < 500ms | Compressed storage |
|
|
| Upsert | P95 < 1s | Idempotent, gzip compression |
|
|
| Replay | P95 < 5s | Depends on input size |
|
|
|
|
## Security Considerations
|
|
|
|
1. **Tenant Isolation**: RLS policies enforce at database level
|
|
2. **Rate Limiting**: 100 req/min reads, 20 req/min writes
|
|
3. **DSSE Signing**: All artifacts verifiable offline
|
|
4. **Input Validation**: Schema validation on all requests
|
|
|
|
## Related Documentation
|
|
|
|
- [Sprint 1227.0012.0001 - Core Library](../../implplan/SPRINT_1227_0012_0001_LB_reachgraph_core.md)
|
|
- [Sprint 1227.0012.0002 - Store APIs](../../implplan/SPRINT_1227_0012_0002_BE_reachgraph_store.md)
|
|
- [Sprint 1227.0012.0003 - Integration](../../implplan/SPRINT_1227_0012_0003_FE_reachgraph_integration.md)
|
|
- [PoE Predicate Spec](../../../src/Attestor/POE_PREDICATE_SPEC.md)
|
|
- [Module AGENTS.md](../../../src/__Libraries/StellaOps.ReachGraph/AGENTS.md)
|
|
|
|
## Unified Query Interface
|
|
|
|
The ReachGraph module exposes a **Unified Reachability Query API** that provides a single facade for static, runtime, and hybrid queries.
|
|
|
|
### API Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/v1/reachability/static` | POST | Query static reachability from call graph analysis |
|
|
| `/v1/reachability/runtime` | POST | Query runtime reachability from observed execution facts |
|
|
| `/v1/reachability/hybrid` | POST | Combine static and runtime for best-effort verdict |
|
|
| `/v1/reachability/batch` | POST | Batch query for CVE vulnerability analysis |
|
|
|
|
### Adapters
|
|
|
|
The unified query interface is backed by two adapters:
|
|
|
|
1. **ReachGraphStoreAdapter**: Implements `IReachGraphAdapter` from `StellaOps.Reachability.Core`
|
|
- Queries static reachability from stored call graphs
|
|
- Uses BFS from entrypoints to target symbols
|
|
- Returns `StaticReachabilityResult` with distance, path, and evidence URIs
|
|
|
|
2. **InMemorySignalsAdapter**: Implements `ISignalsAdapter` from `StellaOps.Reachability.Core`
|
|
- Queries runtime observation facts
|
|
- Supports observation window filtering
|
|
- Returns `RuntimeReachabilityResult` with hit count, contexts, and evidence URIs
|
|
- Note: Production deployments should integrate with the actual Signals runtime service
|
|
|
|
### Hybrid Query Flow
|
|
|
|
```
|
|
┌────────────────┐
|
|
│ Hybrid Query │
|
|
│ Request │
|
|
└───────┬────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────┐
|
|
│ ReachabilityIndex Facade │
|
|
│ (StellaOps.Reachability.Core) │
|
|
└───────┬───────────────────────┬───────────┘
|
|
│ │
|
|
▼ ▼
|
|
┌───────────────┐ ┌───────────────┐
|
|
│ ReachGraph │ │ Signals │
|
|
│ StoreAdapter │ │ Adapter │
|
|
└───────┬───────┘ └───────┬───────┘
|
|
│ │
|
|
▼ ▼
|
|
┌───────────────┐ ┌───────────────┐
|
|
│ PostgreSQL + │ │ Runtime Facts │
|
|
│ Valkey Cache │ │ (In-Memory) │
|
|
└───────────────┘ └───────────────┘
|
|
```
|
|
|
|
### Query Models
|
|
|
|
**SymbolRef** - Identifies a code symbol:
|
|
```json
|
|
{
|
|
"namespace": "System.Net.Http",
|
|
"typeName": "HttpClient",
|
|
"memberName": "GetAsync"
|
|
}
|
|
```
|
|
|
|
**StaticReachabilityResult**:
|
|
```json
|
|
{
|
|
"symbol": { "namespace": "...", "typeName": "...", "memberName": "..." },
|
|
"artifactDigest": "sha256:abc123...",
|
|
"isReachable": true,
|
|
"distanceFromEntrypoint": 3,
|
|
"path": ["entry -> A -> B -> target"],
|
|
"evidenceUris": ["stella:evidence/reachgraph/sha256:abc123/symbol:..."]
|
|
}
|
|
```
|
|
|
|
**RuntimeReachabilityResult**:
|
|
```json
|
|
{
|
|
"symbol": { ... },
|
|
"artifactDigest": "sha256:abc123...",
|
|
"wasObserved": true,
|
|
"hitCount": 1250,
|
|
"firstSeen": "2025-06-10T08:00:00Z",
|
|
"lastSeen": "2025-06-15T12:00:00Z",
|
|
"contexts": [{ "environment": "production", "service": "api-gateway" }],
|
|
"evidenceUris": ["stella:evidence/signals/sha256:abc123/symbol:..."]
|
|
}
|
|
```
|
|
|
|
**HybridReachabilityResult**:
|
|
```json
|
|
{
|
|
"symbol": { ... },
|
|
"artifactDigest": "sha256:abc123...",
|
|
"staticResult": { ... },
|
|
"runtimeResult": { ... },
|
|
"confidence": 0.92,
|
|
"verdict": "reachable",
|
|
"reasoning": "Static analysis shows 3-hop path; runtime confirms 1250 observations"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 14. Lattice Triage Service
|
|
|
|
### Overview
|
|
|
|
The Lattice Triage Service provides a workflow-oriented surface on top of the
|
|
8-state reachability lattice, enabling operators to visualise lattice states,
|
|
apply evidence, perform manual overrides, and maintain a full audit trail of
|
|
every state transition.
|
|
|
|
Library: `StellaOps.Reachability.Core`
|
|
Namespace: `StellaOps.Reachability.Core`
|
|
|
|
### Models
|
|
|
|
| Type | Purpose |
|
|
|------|---------|
|
|
| `LatticeTriageEntry` | Per-(component, CVE) snapshot: current state, confidence, VEX status, full transition history. Content-addressed `EntryId` (`triage:sha256:…`). Computed `RequiresReview` / `HasOverride`. |
|
|
| `LatticeTransitionRecord` | Immutable log entry per state change: from/to state, confidence before/after, trigger, reason, actor, evidence digests, timestamp. Computed `IsManualOverride`. |
|
|
| `LatticeTransitionTrigger` | Enum: `StaticAnalysis`, `RuntimeObservation`, `ManualOverride`, `SystemReset`, `AutomatedRule`. Serialised as `JsonStringEnumConverter`. |
|
|
| `LatticeOverrideRequest` | Operator request to force a target state with reason, actor, and evidence digests. |
|
|
| `LatticeOverrideResult` | Outcome of an override: applied flag, updated entry, transition, optional warning. |
|
|
| `LatticeTriageQuery` | Filters: `State?`, `RequiresReview?`, `ComponentPurlPrefix?`, `Cve?`, `Limit` (default 100), `Offset`. |
|
|
|
|
### Service Interface (`ILatticeTriageService`)
|
|
|
|
| Method | Description |
|
|
|--------|-------------|
|
|
| `GetOrCreateEntryAsync(purl, cve)` | Returns existing entry or creates one at `Unknown` state. |
|
|
| `ApplyEvidenceAsync(purl, cve, evidenceType, digests, actor, reason)` | Delegates to `ReachabilityLattice.ApplyEvidence`, records transition. |
|
|
| `OverrideStateAsync(request)` | Forces target state via Reset + ForceState sequence. Warns when overriding `Confirmed*` states. |
|
|
| `ListAsync(query)` | Filters + pages entries; ordered by `UpdatedAt` descending. |
|
|
| `GetHistoryAsync(purl, cve)` | Returns full transition log for an entry. |
|
|
| `ResetAsync(purl, cve, actor, reason)` | Resets entry to `Unknown`, records `SystemReset` transition. |
|
|
|
|
### VEX Status Mapping
|
|
|
|
| Lattice State | VEX Status |
|
|
|---------------|------------|
|
|
| `Unknown`, `StaticReachable`, `Contested` | `under_investigation` |
|
|
| `StaticUnreachable`, `RuntimeUnobserved`, `ConfirmedUnreachable` | `not_affected` |
|
|
| `RuntimeObserved`, `ConfirmedReachable` | `affected` |
|
|
|
|
### Manual Override Behaviour
|
|
|
|
When an operator overrides state, the service:
|
|
1. Resets the lattice to `Unknown`.
|
|
2. Applies the minimal evidence sequence to reach the target state (e.g., `ConfirmedReachable` = `StaticReachable` + `RuntimeObserved`).
|
|
3. Sets confidence to the midpoint of the target state's confidence range.
|
|
4. Returns a **warning** when overriding from `ConfirmedReachable` or `ConfirmedUnreachable`, since these are high-certainty states.
|
|
|
|
### DI Registration
|
|
|
|
`AddReachabilityCore()` registers `ILatticeTriageService → LatticeTriageService` (singleton, via `TryAddSingleton`).
|
|
|
|
### Observability (OTel Metrics)
|
|
|
|
Meter: `StellaOps.Reachability.Core.Triage`
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `reachability.triage.entries_created` | Counter | Entries created |
|
|
| `reachability.triage.evidence_applied` | Counter | Evidence applications |
|
|
| `reachability.triage.overrides_applied` | Counter | Manual overrides |
|
|
| `reachability.triage.resets` | Counter | Lattice resets |
|
|
| `reachability.triage.contested` | Counter | Contested state transitions |
|
|
|
|
### Test Coverage
|
|
|
|
22 tests in `StellaOps.Reachability.Core.Tests/LatticeTriageServiceTests.cs`:
|
|
- Entry creation (new, idempotent, distinct keys)
|
|
- Evidence application (static→reachable, confirmed paths, conflicting→contested, digest recording)
|
|
- Override (target state, warnings on confirmed, HasOverride flag)
|
|
- Listing with filters (state, review, PURL prefix)
|
|
- History retrieval
|
|
- Reset transitions
|
|
- VEX mapping (theory test)
|
|
- Edge-case validation (null PURL, empty reason)
|
|
|
|
---
|
|
|
|
_Last updated: 2026-02-08_
|