Files
git.stella-ops.org/docs/modules/scanner/reachability-drift.md
StellaOps Bot df94136727 feat: Implement distro-native version comparison for RPM, Debian, and Alpine packages
- Add RpmVersionComparer for RPM version comparison with epoch, version, and release handling.
- Introduce DebianVersion for parsing Debian EVR (Epoch:Version-Release) strings.
- Create ApkVersion for parsing Alpine APK version strings with suffix support.
- Define IVersionComparator interface for version comparison with proof-line generation.
- Implement VersionComparisonResult struct to encapsulate comparison results and proof lines.
- Add tests for Debian and RPM version comparers to ensure correct functionality and edge case handling.
- Create project files for the version comparison library and its tests.
2025-12-22 09:49:53 +02:00

372 lines
10 KiB
Markdown

# Reachability Drift Detection - Architecture
**Module:** Scanner
**Version:** 1.0
**Status:** Implemented (Sprint 3600.2-3600.3)
**Last Updated:** 2025-12-22
---
## 1. Overview
Reachability Drift Detection tracks function-level reachability changes between scans to identify when code modifications create new paths to vulnerable sinks or mitigate existing risks. This enables security teams to:
- **Detect regressions** when previously unreachable vulnerabilities become exploitable
- **Validate fixes** by confirming vulnerable code paths are removed
- **Prioritize triage** based on actual exploitability rather than theoretical risk
- **Automate VEX** by generating evidence-backed justifications
---
## 2. Key Concepts
### 2.1 Call Graph
A directed graph representing function/method call relationships in source code:
- **Nodes**: Functions, methods, lambdas with metadata (file, line, visibility)
- **Edges**: Call relationships with call kind (direct, virtual, delegate, reflection, dynamic)
- **Entrypoints**: Public-facing functions (HTTP handlers, CLI commands, message consumers)
- **Sinks**: Security-sensitive APIs (command execution, SQL, file I/O, deserialization)
### 2.2 Reachability Analysis
Multi-source BFS traversal from entrypoints to determine which sinks are exploitable:
```
Entrypoints (HTTP handlers, CLI)
▼ BFS traversal
[Application Code]
Sinks (exec, query, writeFile)
Reachable = TRUE if path exists
```
### 2.3 Drift Detection
Compares reachability between two scans (base vs head):
| Transition | Direction | Risk Impact |
|------------|-----------|-------------|
| Unreachable → Reachable | `became_reachable` | **Increased** - New exploit path |
| Reachable → Unreachable | `became_unreachable` | **Decreased** - Mitigation applied |
### 2.4 Cause Attribution
Explains *why* drift occurred by correlating with code changes:
| Cause Kind | Description | Example |
|------------|-------------|---------|
| `guard_removed` | Conditional check removed | `if (!authorized)` deleted |
| `guard_added` | New conditional blocks path | Added null check |
| `new_public_route` | New entrypoint created | Added `/api/admin` endpoint |
| `visibility_escalated` | Internal → Public | Method made public |
| `dependency_upgraded` | Library update changed behavior | lodash 4.x → 5.x |
| `symbol_removed` | Function deleted | Removed vulnerable helper |
| `unknown` | Cannot determine | Multiple simultaneous changes |
---
## 3. Data Flow
```mermaid
flowchart TD
subgraph Scan["Scan Execution"]
A[Source Code] --> B[Call Graph Extractor]
B --> C[CallGraphSnapshot]
end
subgraph Analysis["Drift Analysis"]
C --> D[Reachability Analyzer]
D --> E[ReachabilityResult]
F[Base Scan Graph] --> G[Drift Detector]
E --> G
H[Code Changes] --> G
G --> I[ReachabilityDriftResult]
end
subgraph Output["Output"]
I --> J[Path Compressor]
J --> K[Compressed Paths]
I --> L[Cause Explainer]
L --> M[Drift Causes]
K --> N[Storage/API]
M --> N
end
subgraph Integration["Integration"]
N --> O[Policy Gates]
N --> P[VEX Emission]
N --> Q[Web UI]
end
```
---
## 4. Component Architecture
### 4.1 Call Graph Extractors
Per-language AST analysis producing `CallGraphSnapshot`:
| Language | Extractor | Technology | Status |
|----------|-----------|------------|--------|
| .NET | `DotNetCallGraphExtractor` | Roslyn semantic model | **Done** |
| Java | `JavaCallGraphExtractor` | ASM bytecode analysis | **Done** |
| Go | `GoCallGraphExtractor` | golang.org/x/tools SSA | **Done** |
| Python | `PythonCallGraphExtractor` | Python AST | **Done** |
| Node.js | `NodeCallGraphExtractor` | Babel (planned) | Skeleton |
| PHP | `PhpCallGraphExtractor` | php-parser | **Done** |
| Ruby | `RubyCallGraphExtractor` | parser gem | **Done** |
**Location:** `src/Scanner/__Libraries/StellaOps.Scanner.CallGraph/Extraction/`
### 4.2 Reachability Analyzer
Multi-source BFS from entrypoints to sinks:
```csharp
public sealed class ReachabilityAnalyzer
{
public ReachabilityResult Analyze(CallGraphSnapshot graph);
}
public record ReachabilityResult
{
ImmutableHashSet<string> ReachableNodes { get; }
ImmutableArray<string> ReachableSinks { get; }
ImmutableDictionary<string, ImmutableArray<string>> ShortestPaths { get; }
}
```
**Location:** `src/Scanner/__Libraries/StellaOps.Scanner.CallGraph/Analysis/`
### 4.3 Drift Detector
Compares base and head graphs:
```csharp
public sealed class ReachabilityDriftDetector
{
public ReachabilityDriftResult Detect(
CallGraphSnapshot baseGraph,
CallGraphSnapshot headGraph,
IReadOnlyList<CodeChangeFact> codeChanges);
}
```
**Location:** `src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Services/`
### 4.4 Path Compressor
Reduces full paths to key nodes for storage/display:
```
Full Path (20 nodes):
entrypoint → A → B → C → ... → X → Y → sink
Compressed Path:
entrypoint → [changed: B] → [changed: X] → sink
(intermediateCount: 17)
```
**Location:** `src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Services/PathCompressor.cs`
### 4.5 Cause Explainer
Correlates drift with code changes:
```csharp
public sealed class DriftCauseExplainer
{
public DriftCause Explain(...);
public DriftCause ExplainUnreachable(...);
}
```
**Location:** `src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Services/DriftCauseExplainer.cs`
---
## 5. Language Support Matrix
| Feature | .NET | Java | Go | Python | Node.js | PHP | Ruby |
|---------|------|------|-------|--------|---------|-----|------|
| Function extraction | Yes | Yes | Yes | Yes | Partial | Yes | Yes |
| Call edge extraction | Yes | Yes | Yes | Yes | Partial | Yes | Yes |
| HTTP entrypoints | ASP.NET | Spring | net/http | Flask/Django | Express* | Laravel | Rails |
| gRPC entrypoints | Yes | Yes | Yes | Yes | No | No | No |
| CLI entrypoints | Yes | Yes | Yes | Yes | Partial | Yes | Yes |
| Sink detection | Yes | Yes | Yes | Yes | Partial | Yes | Yes |
*Requires Sprint 3600.4 completion
---
## 6. Storage Schema
### 6.1 PostgreSQL Tables
**call_graph_snapshots:**
```sql
CREATE TABLE call_graph_snapshots (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
scan_id TEXT NOT NULL,
language TEXT NOT NULL,
graph_digest TEXT NOT NULL,
node_count INT NOT NULL,
edge_count INT NOT NULL,
entrypoint_count INT NOT NULL,
sink_count INT NOT NULL,
extracted_at TIMESTAMPTZ NOT NULL,
snapshot_json JSONB NOT NULL
);
```
**reachability_drift_results:**
```sql
CREATE TABLE reachability_drift_results (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
base_scan_id TEXT NOT NULL,
head_scan_id TEXT NOT NULL,
language TEXT NOT NULL,
newly_reachable_count INT NOT NULL,
newly_unreachable_count INT NOT NULL,
detected_at TIMESTAMPTZ NOT NULL,
result_digest TEXT NOT NULL
);
```
**drifted_sinks:**
```sql
CREATE TABLE drifted_sinks (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
drift_result_id UUID NOT NULL REFERENCES reachability_drift_results(id),
sink_node_id TEXT NOT NULL,
symbol TEXT NOT NULL,
sink_category TEXT NOT NULL,
direction TEXT NOT NULL,
cause_kind TEXT NOT NULL,
cause_description TEXT NOT NULL,
compressed_path JSONB NOT NULL,
associated_vulns JSONB
);
```
**code_changes:**
```sql
CREATE TABLE code_changes (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
scan_id TEXT NOT NULL,
base_scan_id TEXT NOT NULL,
language TEXT NOT NULL,
file TEXT NOT NULL,
symbol TEXT NOT NULL,
change_kind TEXT NOT NULL,
details JSONB,
detected_at TIMESTAMPTZ NOT NULL
);
```
### 6.2 Valkey Caching
```
stella:callgraph:{scan_id}:{lang}:{digest} → Compressed CallGraphSnapshot
stella:callgraph:{scan_id}:{lang}:reachable → Set of reachable sink IDs
stella:callgraph:{scan_id}:{lang}:paths:{sink} → Shortest path to sink
```
TTL: Configurable (default 24h)
Circuit breaker: 5 failures → 30s timeout
---
## 7. API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/scans/{scanId}/drift` | Get drift results for a scan |
| GET | `/drift/{driftId}/sinks` | List drifted sinks (paginated) |
| POST | `/scans/{scanId}/compute-reachability` | Trigger reachability computation |
| GET | `/scans/{scanId}/reachability/components` | List components with reachability |
| GET | `/scans/{scanId}/reachability/findings` | Get reachable vulnerable sinks |
| GET | `/scans/{scanId}/reachability/explain` | Explain why a sink is reachable |
See: `docs/api/scanner-drift-api.md`
---
## 8. Integration Points
### 8.1 Policy Module
Drift results feed into policy gates for CI/CD blocking:
```yaml
smart_diff:
gates:
- condition: "delta_reachable > 0 AND is_kev = true"
action: block
```
### 8.2 VEX Emission
Automatic VEX candidate generation on drift:
| Drift Direction | VEX Status | Justification |
|-----------------|------------|---------------|
| became_unreachable | `not_affected` | `vulnerable_code_not_in_execute_path` |
| became_reachable | — | Requires manual review |
### 8.3 Attestation
DSSE-signed drift attestations:
```json
{
"_type": "https://in-toto.io/Statement/v1",
"predicateType": "stellaops.dev/predicates/reachability-drift@v1",
"predicate": {
"baseScanId": "abc123",
"headScanId": "def456",
"newlyReachable": [...],
"newlyUnreachable": [...],
"resultDigest": "sha256:..."
}
}
```
---
## 9. Performance Characteristics
| Metric | Target | Notes |
|--------|--------|-------|
| Graph extraction (100K LOC) | < 60s | Per language |
| Reachability analysis | < 5s | BFS traversal |
| Drift detection | < 10s | Graph comparison |
| Memory usage | < 2GB | Large projects |
| Cache hit improvement | 10x | Valkey lookup vs recompute |
---
## 10. References
- **Implementation Sprints:**
- `docs/implplan/SPRINT_3600_0002_0001_call_graph_infrastructure.md`
- `docs/implplan/SPRINT_3600_0003_0001_drift_detection_engine.md`
- **API Reference:** `docs/api/scanner-drift-api.md`
- **Operations Guide:** `docs/operations/reachability-drift-guide.md`
- **Original Advisory:** `docs/product-advisories/archived/17-Dec-2025 - Reachability Drift Detection.md`
- **Source Code:** `src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/`