- Add RpmVersionComparer for RPM version comparison with epoch, version, and release handling. - Introduce DebianVersion for parsing Debian EVR (Epoch:Version-Release) strings. - Create ApkVersion for parsing Alpine APK version strings with suffix support. - Define IVersionComparator interface for version comparison with proof-line generation. - Implement VersionComparisonResult struct to encapsulate comparison results and proof lines. - Add tests for Debian and RPM version comparers to ensure correct functionality and edge case handling. - Create project files for the version comparison library and its tests.
520 lines
13 KiB
Markdown
520 lines
13 KiB
Markdown
# Reachability Drift Detection - Operations Guide
|
|
|
|
**Module:** Scanner
|
|
**Version:** 1.0
|
|
**Last Updated:** 2025-12-22
|
|
|
|
---
|
|
|
|
## 1. Prerequisites
|
|
|
|
### 1.1 Infrastructure Requirements
|
|
|
|
| Component | Minimum | Recommended | Notes |
|
|
|-----------|---------|-------------|-------|
|
|
| CPU | 4 cores | 8 cores | For call graph extraction |
|
|
| Memory | 4 GB | 8 GB | Large projects need more |
|
|
| PostgreSQL | 16+ | 16+ | With RLS enabled |
|
|
| Valkey/Redis | 7.0+ | 7.0+ | For caching (optional) |
|
|
| .NET Runtime | 10.0 | 10.0 | Preview features enabled |
|
|
|
|
### 1.2 Network Requirements
|
|
|
|
| Direction | Endpoints | Notes |
|
|
|-----------|-----------|-------|
|
|
| Inbound | Scanner API (8080) | Load balancer health checks |
|
|
| Outbound | PostgreSQL (5432) | Database connections |
|
|
| Outbound | Valkey (6379) | Cache connections (optional) |
|
|
| Outbound | Signer service | For DSSE attestations |
|
|
|
|
### 1.3 Dependencies
|
|
|
|
- Scanner WebService deployed and healthy
|
|
- PostgreSQL database with Scanner schema migrations applied
|
|
- (Optional) Valkey cluster for caching
|
|
- (Optional) Signer service for attestation signing
|
|
|
|
---
|
|
|
|
## 2. Configuration
|
|
|
|
### 2.1 Scanner Service Configuration
|
|
|
|
**File:** `etc/scanner.yaml`
|
|
|
|
```yaml
|
|
scanner:
|
|
reachability:
|
|
# Enable reachability drift detection
|
|
enabled: true
|
|
|
|
# Languages to analyze (empty = all supported)
|
|
languages:
|
|
- dotnet
|
|
- java
|
|
- node
|
|
- python
|
|
- go
|
|
|
|
# Call graph extraction options
|
|
extraction:
|
|
max_depth: 100
|
|
max_nodes: 100000
|
|
timeout_seconds: 300
|
|
include_test_code: false
|
|
include_vendored: false
|
|
|
|
# Drift detection options
|
|
drift:
|
|
# Auto-compute on scan completion
|
|
auto_compute: true
|
|
# Base scan selection (previous, tagged, specific)
|
|
base_selection: previous
|
|
# Emit VEX candidates for unreachable sinks
|
|
emit_vex_candidates: true
|
|
|
|
storage:
|
|
postgres:
|
|
connection_string: "Host=localhost;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}"
|
|
schema: scanner
|
|
pool_size: 20
|
|
|
|
cache:
|
|
valkey:
|
|
enabled: true
|
|
connection: "localhost:6379"
|
|
bucket: "stella-callgraph"
|
|
ttl_hours: 24
|
|
circuit_breaker:
|
|
failure_threshold: 5
|
|
timeout_seconds: 30
|
|
```
|
|
|
|
### 2.2 Valkey Cache Configuration
|
|
|
|
```yaml
|
|
# Valkey-specific settings
|
|
cache:
|
|
valkey:
|
|
enabled: true
|
|
connection: "valkey-cluster.internal:6379"
|
|
bucket: "stella-callgraph"
|
|
ttl_hours: 24
|
|
|
|
# Circuit breaker prevents cache storms
|
|
circuit_breaker:
|
|
failure_threshold: 5
|
|
timeout_seconds: 30
|
|
half_open_max_attempts: 3
|
|
|
|
# Compression reduces memory usage
|
|
compression:
|
|
enabled: true
|
|
algorithm: gzip
|
|
level: fastest
|
|
```
|
|
|
|
### 2.3 Policy Gate Configuration
|
|
|
|
**File:** `etc/policy.yaml`
|
|
|
|
```yaml
|
|
smart_diff:
|
|
gates:
|
|
# Block on KEV becoming reachable
|
|
- id: drift_block_kev
|
|
condition: "delta_reachable > 0 AND is_kev = true"
|
|
action: block
|
|
severity: critical
|
|
message: "Known Exploited Vulnerability now reachable"
|
|
|
|
# Block on high-severity sink becoming reachable
|
|
- id: drift_block_critical
|
|
condition: "delta_reachable > 0 AND max_cvss >= 9.0"
|
|
action: block
|
|
severity: critical
|
|
message: "Critical vulnerability now reachable"
|
|
|
|
# Warn on any new reachable paths
|
|
- id: drift_warn_new_paths
|
|
condition: "delta_reachable > 0"
|
|
action: warn
|
|
severity: medium
|
|
message: "New reachable paths detected"
|
|
|
|
# Auto-allow mitigated paths
|
|
- id: drift_allow_mitigated
|
|
condition: "delta_unreachable > 0 AND delta_reachable = 0"
|
|
action: allow
|
|
auto_approve: true
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Deployment Modes
|
|
|
|
### 3.1 Standalone Deployment
|
|
|
|
```bash
|
|
# Run Scanner WebService with drift detection
|
|
docker run -d \
|
|
--name scanner \
|
|
-p 8080:8080 \
|
|
-e SCANNER_DB_PASSWORD=secret \
|
|
-v /etc/scanner:/etc/scanner:ro \
|
|
stellaops/scanner:latest
|
|
|
|
# Verify health
|
|
curl http://localhost:8080/health
|
|
```
|
|
|
|
### 3.2 Kubernetes Deployment
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: scanner
|
|
namespace: stellaops
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: scanner
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: scanner
|
|
spec:
|
|
containers:
|
|
- name: scanner
|
|
image: stellaops/scanner:latest
|
|
ports:
|
|
- containerPort: 8080
|
|
env:
|
|
- name: SCANNER_DB_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: scanner-secrets
|
|
key: db-password
|
|
volumeMounts:
|
|
- name: config
|
|
mountPath: /etc/scanner
|
|
readOnly: true
|
|
resources:
|
|
requests:
|
|
memory: "4Gi"
|
|
cpu: "2"
|
|
limits:
|
|
memory: "8Gi"
|
|
cpu: "4"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health/live
|
|
port: 8080
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health/ready
|
|
port: 8080
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
volumes:
|
|
- name: config
|
|
configMap:
|
|
name: scanner-config
|
|
```
|
|
|
|
### 3.3 Air-Gapped Deployment
|
|
|
|
For air-gapped environments:
|
|
|
|
1. **Disable external lookups:**
|
|
```yaml
|
|
scanner:
|
|
reachability:
|
|
offline_mode: true
|
|
# No external advisory fetching
|
|
```
|
|
|
|
2. **Pre-load call graph caches:**
|
|
```bash
|
|
# Export from connected environment
|
|
stella cache export --type callgraph --output graphs.tar.gz
|
|
|
|
# Import in air-gapped environment
|
|
stella cache import --input graphs.tar.gz
|
|
```
|
|
|
|
3. **Use local VEX sources:**
|
|
```yaml
|
|
excititor:
|
|
sources:
|
|
- type: local
|
|
path: /data/vex-bundles/
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Monitoring & Metrics
|
|
|
|
### 4.1 Key Metrics
|
|
|
|
| Metric | Type | Description | Alert Threshold |
|
|
|--------|------|-------------|-----------------|
|
|
| `scanner_callgraph_extraction_duration_seconds` | histogram | Time to extract call graph | p99 > 300s |
|
|
| `scanner_callgraph_node_count` | gauge | Nodes in extracted graph | > 100,000 |
|
|
| `scanner_reachability_analysis_duration_seconds` | histogram | BFS analysis time | p99 > 30s |
|
|
| `scanner_drift_newly_reachable_total` | counter | Count of newly reachable sinks | > 0 (alert) |
|
|
| `scanner_drift_newly_unreachable_total` | counter | Count of mitigated sinks | (info) |
|
|
| `scanner_cache_hit_ratio` | gauge | Valkey cache hit rate | < 0.5 |
|
|
| `scanner_cache_circuit_breaker_open` | gauge | Circuit breaker state | = 1 (alert) |
|
|
|
|
### 4.2 Grafana Dashboard
|
|
|
|
Import dashboard JSON from: `deploy/grafana/scanner-drift-dashboard.json`
|
|
|
|
Key panels:
|
|
- Drift detection rate over time
|
|
- Newly reachable sinks by category
|
|
- Call graph extraction latency
|
|
- Cache hit/miss ratio
|
|
- Circuit breaker state
|
|
|
|
### 4.3 Alert Rules
|
|
|
|
```yaml
|
|
# Prometheus alerting rules
|
|
groups:
|
|
- name: scanner-drift
|
|
rules:
|
|
- alert: KevBecameReachable
|
|
expr: increase(scanner_drift_kev_reachable_total[5m]) > 0
|
|
for: 0m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "KEV vulnerability became reachable"
|
|
description: "A Known Exploited Vulnerability is now reachable from public entrypoints"
|
|
|
|
- alert: HighDriftRate
|
|
expr: rate(scanner_drift_newly_reachable_total[1h]) > 10
|
|
for: 15m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High rate of new reachable vulnerabilities"
|
|
|
|
- alert: CacheCircuitOpen
|
|
expr: scanner_cache_circuit_breaker_open == 1
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Valkey cache circuit breaker is open"
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Troubleshooting
|
|
|
|
### 5.1 Call Graph Extraction Failures
|
|
|
|
**Symptom:** `GRAPH_NOT_EXTRACTED` error
|
|
|
|
**Causes & Solutions:**
|
|
|
|
| Cause | Solution |
|
|
|-------|----------|
|
|
| Missing SDK/runtime | Install required SDK (.NET, Node.js, JDK) |
|
|
| Build errors in project | Fix compilation errors first |
|
|
| Timeout exceeded | Increase `extraction.timeout_seconds` |
|
|
| Memory exhaustion | Increase container memory limits |
|
|
| Unsupported language | Check language support matrix |
|
|
|
|
**Debugging:**
|
|
|
|
```bash
|
|
# Check extraction logs
|
|
kubectl logs -f deployment/scanner | grep -i extraction
|
|
|
|
# Manual extraction test
|
|
stella scan callgraph \
|
|
--project /path/to/project \
|
|
--language dotnet \
|
|
--verbose
|
|
```
|
|
|
|
### 5.2 Drift Detection Issues
|
|
|
|
**Symptom:** Drift not computed or incorrect results
|
|
|
|
**Causes & Solutions:**
|
|
|
|
| Cause | Solution |
|
|
|-------|----------|
|
|
| No base scan available | Ensure previous scan exists |
|
|
| Different languages | Base and head must have same language |
|
|
| Graph digest unchanged | No material code changes detected |
|
|
| Cache stale | Clear Valkey cache for scan |
|
|
|
|
**Debugging:**
|
|
|
|
```bash
|
|
# Check drift computation status
|
|
curl "http://scanner:8080/api/scanner/scans/{scanId}/drift"
|
|
|
|
# Force recomputation
|
|
curl -X POST \
|
|
"http://scanner:8080/api/scanner/scans/{scanId}/compute-reachability" \
|
|
-d '{"forceRecompute": true}'
|
|
|
|
# View graph digests
|
|
psql -c "SELECT scan_id, graph_digest FROM scanner.call_graph_snapshots ORDER BY extracted_at DESC LIMIT 10"
|
|
```
|
|
|
|
### 5.3 Cache Problems
|
|
|
|
**Symptom:** Slow performance, cache misses, circuit breaker open
|
|
|
|
**Solutions:**
|
|
|
|
```bash
|
|
# Check Valkey connectivity
|
|
redis-cli -h valkey.internal ping
|
|
|
|
# Check circuit breaker state
|
|
curl "http://scanner:8080/health/ready" | jq '.checks.cache'
|
|
|
|
# Clear specific scan cache
|
|
redis-cli DEL "stella-callgraph:scanId:*"
|
|
|
|
# Reset circuit breaker (restart scanner)
|
|
kubectl rollout restart deployment/scanner
|
|
```
|
|
|
|
### 5.4 Common Error Messages
|
|
|
|
| Error | Meaning | Action |
|
|
|-------|---------|--------|
|
|
| `ERR_GRAPH_TOO_LARGE` | > 100K nodes | Increase `max_nodes` or split project |
|
|
| `ERR_EXTRACTION_TIMEOUT` | Analysis timed out | Increase timeout or reduce scope |
|
|
| `ERR_NO_ENTRYPOINTS` | No public entrypoints found | Check framework detection |
|
|
| `ERR_BASE_SCAN_MISSING` | Base scan not found | Specify valid `baseScanId` |
|
|
| `ERR_CACHE_UNAVAILABLE` | Valkey unreachable | Check network, circuit breaker will activate |
|
|
|
|
---
|
|
|
|
## 6. Performance Tuning
|
|
|
|
### 6.1 Call Graph Extraction
|
|
|
|
```yaml
|
|
scanner:
|
|
reachability:
|
|
extraction:
|
|
# Exclude test code (reduces graph size)
|
|
include_test_code: false
|
|
|
|
# Exclude vendored dependencies
|
|
include_vendored: false
|
|
|
|
# Limit analysis depth
|
|
max_depth: 50 # Default: 100
|
|
|
|
# Parallel project analysis
|
|
parallelism: 4
|
|
```
|
|
|
|
### 6.2 Caching Strategy
|
|
|
|
```yaml
|
|
cache:
|
|
valkey:
|
|
# Longer TTL for stable projects
|
|
ttl_hours: 72
|
|
|
|
# Aggressive compression for large graphs
|
|
compression:
|
|
level: optimal # vs 'fastest'
|
|
|
|
# Larger connection pool
|
|
pool_size: 20
|
|
```
|
|
|
|
### 6.3 Database Optimization
|
|
|
|
```sql
|
|
-- Ensure indexes exist
|
|
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_callgraph_scan_lang
|
|
ON scanner.call_graph_snapshots(scan_id, language);
|
|
|
|
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_drift_head_scan
|
|
ON scanner.reachability_drift_results(head_scan_id);
|
|
|
|
-- Vacuum after large imports
|
|
VACUUM ANALYZE scanner.call_graph_snapshots;
|
|
VACUUM ANALYZE scanner.reachability_drift_results;
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Backup & Recovery
|
|
|
|
### 7.1 Database Backup
|
|
|
|
```bash
|
|
# Backup drift-related tables
|
|
pg_dump -h postgres.internal -U stellaops \
|
|
-t scanner.call_graph_snapshots \
|
|
-t scanner.reachability_results \
|
|
-t scanner.reachability_drift_results \
|
|
-t scanner.drifted_sinks \
|
|
-t scanner.code_changes \
|
|
> scanner_drift_backup.sql
|
|
```
|
|
|
|
### 7.2 Cache Recovery
|
|
|
|
```bash
|
|
# Export cache to file (if needed)
|
|
redis-cli -h valkey.internal --rdb /backup/callgraph-cache.rdb
|
|
|
|
# Cache is ephemeral - can be regenerated from database
|
|
# Recompute after cache loss:
|
|
stella scan recompute-reachability --all-pending
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Security Considerations
|
|
|
|
### 8.1 Database Access
|
|
|
|
- Scanner service uses dedicated PostgreSQL user with schema-limited permissions
|
|
- Row-Level Security (RLS) enforces tenant isolation
|
|
- Connection strings use secrets management (not plaintext)
|
|
|
|
### 8.2 API Authentication
|
|
|
|
- All drift endpoints require valid Bearer token
|
|
- Scopes: `scanner:read`, `scanner:write`, `scanner:admin`
|
|
- Rate limiting prevents abuse
|
|
|
|
### 8.3 Attestation Signing
|
|
|
|
- Drift results can be DSSE-signed for audit trails
|
|
- Signing keys managed by Signer service
|
|
- Optional Rekor transparency logging
|
|
|
|
---
|
|
|
|
## 9. References
|
|
|
|
- **Architecture:** `docs/modules/scanner/reachability-drift.md`
|
|
- **API Reference:** `docs/api/scanner-drift-api.md`
|
|
- **PostgreSQL Guide:** `docs/operations/postgresql-guide.md`
|
|
- **Air-Gap Operations:** `docs/operations/airgap-operations-runbook.md`
|
|
- **Reachability Runbook:** `docs/operations/reachability-runbook.md`
|