feat: Implement distro-native version comparison for RPM, Debian, and Alpine packages
- Add RpmVersionComparer for RPM version comparison with epoch, version, and release handling. - Introduce DebianVersion for parsing Debian EVR (Epoch:Version-Release) strings. - Create ApkVersion for parsing Alpine APK version strings with suffix support. - Define IVersionComparator interface for version comparison with proof-line generation. - Implement VersionComparisonResult struct to encapsulate comparison results and proof lines. - Add tests for Debian and RPM version comparers to ensure correct functionality and edge case handling. - Create project files for the version comparison library and its tests.
This commit is contained in:
519
docs/operations/reachability-drift-guide.md
Normal file
519
docs/operations/reachability-drift-guide.md
Normal file
@@ -0,0 +1,519 @@
|
||||
# Reachability Drift Detection - Operations Guide
|
||||
|
||||
**Module:** Scanner
|
||||
**Version:** 1.0
|
||||
**Last Updated:** 2025-12-22
|
||||
|
||||
---
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
### 1.1 Infrastructure Requirements
|
||||
|
||||
| Component | Minimum | Recommended | Notes |
|
||||
|-----------|---------|-------------|-------|
|
||||
| CPU | 4 cores | 8 cores | For call graph extraction |
|
||||
| Memory | 4 GB | 8 GB | Large projects need more |
|
||||
| PostgreSQL | 16+ | 16+ | With RLS enabled |
|
||||
| Valkey/Redis | 7.0+ | 7.0+ | For caching (optional) |
|
||||
| .NET Runtime | 10.0 | 10.0 | Preview features enabled |
|
||||
|
||||
### 1.2 Network Requirements
|
||||
|
||||
| Direction | Endpoints | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Inbound | Scanner API (8080) | Load balancer health checks |
|
||||
| Outbound | PostgreSQL (5432) | Database connections |
|
||||
| Outbound | Valkey (6379) | Cache connections (optional) |
|
||||
| Outbound | Signer service | For DSSE attestations |
|
||||
|
||||
### 1.3 Dependencies
|
||||
|
||||
- Scanner WebService deployed and healthy
|
||||
- PostgreSQL database with Scanner schema migrations applied
|
||||
- (Optional) Valkey cluster for caching
|
||||
- (Optional) Signer service for attestation signing
|
||||
|
||||
---
|
||||
|
||||
## 2. Configuration
|
||||
|
||||
### 2.1 Scanner Service Configuration
|
||||
|
||||
**File:** `etc/scanner.yaml`
|
||||
|
||||
```yaml
|
||||
scanner:
|
||||
reachability:
|
||||
# Enable reachability drift detection
|
||||
enabled: true
|
||||
|
||||
# Languages to analyze (empty = all supported)
|
||||
languages:
|
||||
- dotnet
|
||||
- java
|
||||
- node
|
||||
- python
|
||||
- go
|
||||
|
||||
# Call graph extraction options
|
||||
extraction:
|
||||
max_depth: 100
|
||||
max_nodes: 100000
|
||||
timeout_seconds: 300
|
||||
include_test_code: false
|
||||
include_vendored: false
|
||||
|
||||
# Drift detection options
|
||||
drift:
|
||||
# Auto-compute on scan completion
|
||||
auto_compute: true
|
||||
# Base scan selection (previous, tagged, specific)
|
||||
base_selection: previous
|
||||
# Emit VEX candidates for unreachable sinks
|
||||
emit_vex_candidates: true
|
||||
|
||||
storage:
|
||||
postgres:
|
||||
connection_string: "Host=localhost;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}"
|
||||
schema: scanner
|
||||
pool_size: 20
|
||||
|
||||
cache:
|
||||
valkey:
|
||||
enabled: true
|
||||
connection: "localhost:6379"
|
||||
bucket: "stella-callgraph"
|
||||
ttl_hours: 24
|
||||
circuit_breaker:
|
||||
failure_threshold: 5
|
||||
timeout_seconds: 30
|
||||
```
|
||||
|
||||
### 2.2 Valkey Cache Configuration
|
||||
|
||||
```yaml
|
||||
# Valkey-specific settings
|
||||
cache:
|
||||
valkey:
|
||||
enabled: true
|
||||
connection: "valkey-cluster.internal:6379"
|
||||
bucket: "stella-callgraph"
|
||||
ttl_hours: 24
|
||||
|
||||
# Circuit breaker prevents cache storms
|
||||
circuit_breaker:
|
||||
failure_threshold: 5
|
||||
timeout_seconds: 30
|
||||
half_open_max_attempts: 3
|
||||
|
||||
# Compression reduces memory usage
|
||||
compression:
|
||||
enabled: true
|
||||
algorithm: gzip
|
||||
level: fastest
|
||||
```
|
||||
|
||||
### 2.3 Policy Gate Configuration
|
||||
|
||||
**File:** `etc/policy.yaml`
|
||||
|
||||
```yaml
|
||||
smart_diff:
|
||||
gates:
|
||||
# Block on KEV becoming reachable
|
||||
- id: drift_block_kev
|
||||
condition: "delta_reachable > 0 AND is_kev = true"
|
||||
action: block
|
||||
severity: critical
|
||||
message: "Known Exploited Vulnerability now reachable"
|
||||
|
||||
# Block on high-severity sink becoming reachable
|
||||
- id: drift_block_critical
|
||||
condition: "delta_reachable > 0 AND max_cvss >= 9.0"
|
||||
action: block
|
||||
severity: critical
|
||||
message: "Critical vulnerability now reachable"
|
||||
|
||||
# Warn on any new reachable paths
|
||||
- id: drift_warn_new_paths
|
||||
condition: "delta_reachable > 0"
|
||||
action: warn
|
||||
severity: medium
|
||||
message: "New reachable paths detected"
|
||||
|
||||
# Auto-allow mitigated paths
|
||||
- id: drift_allow_mitigated
|
||||
condition: "delta_unreachable > 0 AND delta_reachable = 0"
|
||||
action: allow
|
||||
auto_approve: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Deployment Modes
|
||||
|
||||
### 3.1 Standalone Deployment
|
||||
|
||||
```bash
|
||||
# Run Scanner WebService with drift detection
|
||||
docker run -d \
|
||||
--name scanner \
|
||||
-p 8080:8080 \
|
||||
-e SCANNER_DB_PASSWORD=secret \
|
||||
-v /etc/scanner:/etc/scanner:ro \
|
||||
stellaops/scanner:latest
|
||||
|
||||
# Verify health
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
### 3.2 Kubernetes Deployment
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: scanner
|
||||
namespace: stellaops
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: scanner
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: scanner
|
||||
spec:
|
||||
containers:
|
||||
- name: scanner
|
||||
image: stellaops/scanner:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
env:
|
||||
- name: SCANNER_DB_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: scanner-secrets
|
||||
key: db-password
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /etc/scanner
|
||||
readOnly: true
|
||||
resources:
|
||||
requests:
|
||||
memory: "4Gi"
|
||||
cpu: "2"
|
||||
limits:
|
||||
memory: "8Gi"
|
||||
cpu: "4"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health/live
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health/ready
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: scanner-config
|
||||
```
|
||||
|
||||
### 3.3 Air-Gapped Deployment
|
||||
|
||||
For air-gapped environments:
|
||||
|
||||
1. **Disable external lookups:**
|
||||
```yaml
|
||||
scanner:
|
||||
reachability:
|
||||
offline_mode: true
|
||||
# No external advisory fetching
|
||||
```
|
||||
|
||||
2. **Pre-load call graph caches:**
|
||||
```bash
|
||||
# Export from connected environment
|
||||
stella cache export --type callgraph --output graphs.tar.gz
|
||||
|
||||
# Import in air-gapped environment
|
||||
stella cache import --input graphs.tar.gz
|
||||
```
|
||||
|
||||
3. **Use local VEX sources:**
|
||||
```yaml
|
||||
excititor:
|
||||
sources:
|
||||
- type: local
|
||||
path: /data/vex-bundles/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Monitoring & Metrics
|
||||
|
||||
### 4.1 Key Metrics
|
||||
|
||||
| Metric | Type | Description | Alert Threshold |
|
||||
|--------|------|-------------|-----------------|
|
||||
| `scanner_callgraph_extraction_duration_seconds` | histogram | Time to extract call graph | p99 > 300s |
|
||||
| `scanner_callgraph_node_count` | gauge | Nodes in extracted graph | > 100,000 |
|
||||
| `scanner_reachability_analysis_duration_seconds` | histogram | BFS analysis time | p99 > 30s |
|
||||
| `scanner_drift_newly_reachable_total` | counter | Count of newly reachable sinks | > 0 (alert) |
|
||||
| `scanner_drift_newly_unreachable_total` | counter | Count of mitigated sinks | (info) |
|
||||
| `scanner_cache_hit_ratio` | gauge | Valkey cache hit rate | < 0.5 |
|
||||
| `scanner_cache_circuit_breaker_open` | gauge | Circuit breaker state | = 1 (alert) |
|
||||
|
||||
### 4.2 Grafana Dashboard
|
||||
|
||||
Import dashboard JSON from: `deploy/grafana/scanner-drift-dashboard.json`
|
||||
|
||||
Key panels:
|
||||
- Drift detection rate over time
|
||||
- Newly reachable sinks by category
|
||||
- Call graph extraction latency
|
||||
- Cache hit/miss ratio
|
||||
- Circuit breaker state
|
||||
|
||||
### 4.3 Alert Rules
|
||||
|
||||
```yaml
|
||||
# Prometheus alerting rules
|
||||
groups:
|
||||
- name: scanner-drift
|
||||
rules:
|
||||
- alert: KevBecameReachable
|
||||
expr: increase(scanner_drift_kev_reachable_total[5m]) > 0
|
||||
for: 0m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "KEV vulnerability became reachable"
|
||||
description: "A Known Exploited Vulnerability is now reachable from public entrypoints"
|
||||
|
||||
- alert: HighDriftRate
|
||||
expr: rate(scanner_drift_newly_reachable_total[1h]) > 10
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High rate of new reachable vulnerabilities"
|
||||
|
||||
- alert: CacheCircuitOpen
|
||||
expr: scanner_cache_circuit_breaker_open == 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Valkey cache circuit breaker is open"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
### 5.1 Call Graph Extraction Failures
|
||||
|
||||
**Symptom:** `GRAPH_NOT_EXTRACTED` error
|
||||
|
||||
**Causes & Solutions:**
|
||||
|
||||
| Cause | Solution |
|
||||
|-------|----------|
|
||||
| Missing SDK/runtime | Install required SDK (.NET, Node.js, JDK) |
|
||||
| Build errors in project | Fix compilation errors first |
|
||||
| Timeout exceeded | Increase `extraction.timeout_seconds` |
|
||||
| Memory exhaustion | Increase container memory limits |
|
||||
| Unsupported language | Check language support matrix |
|
||||
|
||||
**Debugging:**
|
||||
|
||||
```bash
|
||||
# Check extraction logs
|
||||
kubectl logs -f deployment/scanner | grep -i extraction
|
||||
|
||||
# Manual extraction test
|
||||
stella scan callgraph \
|
||||
--project /path/to/project \
|
||||
--language dotnet \
|
||||
--verbose
|
||||
```
|
||||
|
||||
### 5.2 Drift Detection Issues
|
||||
|
||||
**Symptom:** Drift not computed or incorrect results
|
||||
|
||||
**Causes & Solutions:**
|
||||
|
||||
| Cause | Solution |
|
||||
|-------|----------|
|
||||
| No base scan available | Ensure previous scan exists |
|
||||
| Different languages | Base and head must have same language |
|
||||
| Graph digest unchanged | No material code changes detected |
|
||||
| Cache stale | Clear Valkey cache for scan |
|
||||
|
||||
**Debugging:**
|
||||
|
||||
```bash
|
||||
# Check drift computation status
|
||||
curl "http://scanner:8080/api/scanner/scans/{scanId}/drift"
|
||||
|
||||
# Force recomputation
|
||||
curl -X POST \
|
||||
"http://scanner:8080/api/scanner/scans/{scanId}/compute-reachability" \
|
||||
-d '{"forceRecompute": true}'
|
||||
|
||||
# View graph digests
|
||||
psql -c "SELECT scan_id, graph_digest FROM scanner.call_graph_snapshots ORDER BY extracted_at DESC LIMIT 10"
|
||||
```
|
||||
|
||||
### 5.3 Cache Problems
|
||||
|
||||
**Symptom:** Slow performance, cache misses, circuit breaker open
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# Check Valkey connectivity
|
||||
redis-cli -h valkey.internal ping
|
||||
|
||||
# Check circuit breaker state
|
||||
curl "http://scanner:8080/health/ready" | jq '.checks.cache'
|
||||
|
||||
# Clear specific scan cache
|
||||
redis-cli DEL "stella-callgraph:scanId:*"
|
||||
|
||||
# Reset circuit breaker (restart scanner)
|
||||
kubectl rollout restart deployment/scanner
|
||||
```
|
||||
|
||||
### 5.4 Common Error Messages
|
||||
|
||||
| Error | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| `ERR_GRAPH_TOO_LARGE` | > 100K nodes | Increase `max_nodes` or split project |
|
||||
| `ERR_EXTRACTION_TIMEOUT` | Analysis timed out | Increase timeout or reduce scope |
|
||||
| `ERR_NO_ENTRYPOINTS` | No public entrypoints found | Check framework detection |
|
||||
| `ERR_BASE_SCAN_MISSING` | Base scan not found | Specify valid `baseScanId` |
|
||||
| `ERR_CACHE_UNAVAILABLE` | Valkey unreachable | Check network, circuit breaker will activate |
|
||||
|
||||
---
|
||||
|
||||
## 6. Performance Tuning
|
||||
|
||||
### 6.1 Call Graph Extraction
|
||||
|
||||
```yaml
|
||||
scanner:
|
||||
reachability:
|
||||
extraction:
|
||||
# Exclude test code (reduces graph size)
|
||||
include_test_code: false
|
||||
|
||||
# Exclude vendored dependencies
|
||||
include_vendored: false
|
||||
|
||||
# Limit analysis depth
|
||||
max_depth: 50 # Default: 100
|
||||
|
||||
# Parallel project analysis
|
||||
parallelism: 4
|
||||
```
|
||||
|
||||
### 6.2 Caching Strategy
|
||||
|
||||
```yaml
|
||||
cache:
|
||||
valkey:
|
||||
# Longer TTL for stable projects
|
||||
ttl_hours: 72
|
||||
|
||||
# Aggressive compression for large graphs
|
||||
compression:
|
||||
level: optimal # vs 'fastest'
|
||||
|
||||
# Larger connection pool
|
||||
pool_size: 20
|
||||
```
|
||||
|
||||
### 6.3 Database Optimization
|
||||
|
||||
```sql
|
||||
-- Ensure indexes exist
|
||||
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_callgraph_scan_lang
|
||||
ON scanner.call_graph_snapshots(scan_id, language);
|
||||
|
||||
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_drift_head_scan
|
||||
ON scanner.reachability_drift_results(head_scan_id);
|
||||
|
||||
-- Vacuum after large imports
|
||||
VACUUM ANALYZE scanner.call_graph_snapshots;
|
||||
VACUUM ANALYZE scanner.reachability_drift_results;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Backup & Recovery
|
||||
|
||||
### 7.1 Database Backup
|
||||
|
||||
```bash
|
||||
# Backup drift-related tables
|
||||
pg_dump -h postgres.internal -U stellaops \
|
||||
-t scanner.call_graph_snapshots \
|
||||
-t scanner.reachability_results \
|
||||
-t scanner.reachability_drift_results \
|
||||
-t scanner.drifted_sinks \
|
||||
-t scanner.code_changes \
|
||||
> scanner_drift_backup.sql
|
||||
```
|
||||
|
||||
### 7.2 Cache Recovery
|
||||
|
||||
```bash
|
||||
# Export cache to file (if needed)
|
||||
redis-cli -h valkey.internal --rdb /backup/callgraph-cache.rdb
|
||||
|
||||
# Cache is ephemeral - can be regenerated from database
|
||||
# Recompute after cache loss:
|
||||
stella scan recompute-reachability --all-pending
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
### 8.1 Database Access
|
||||
|
||||
- Scanner service uses dedicated PostgreSQL user with schema-limited permissions
|
||||
- Row-Level Security (RLS) enforces tenant isolation
|
||||
- Connection strings use secrets management (not plaintext)
|
||||
|
||||
### 8.2 API Authentication
|
||||
|
||||
- All drift endpoints require valid Bearer token
|
||||
- Scopes: `scanner:read`, `scanner:write`, `scanner:admin`
|
||||
- Rate limiting prevents abuse
|
||||
|
||||
### 8.3 Attestation Signing
|
||||
|
||||
- Drift results can be DSSE-signed for audit trails
|
||||
- Signing keys managed by Signer service
|
||||
- Optional Rekor transparency logging
|
||||
|
||||
---
|
||||
|
||||
## 9. References
|
||||
|
||||
- **Architecture:** `docs/modules/scanner/reachability-drift.md`
|
||||
- **API Reference:** `docs/api/scanner-drift-api.md`
|
||||
- **PostgreSQL Guide:** `docs/operations/postgresql-guide.md`
|
||||
- **Air-Gap Operations:** `docs/operations/airgap-operations-runbook.md`
|
||||
- **Reachability Runbook:** `docs/operations/reachability-runbook.md`
|
||||
Reference in New Issue
Block a user