# Reachability Drift Detection - Operations Guide **Module:** Scanner **Version:** 1.0 **Last Updated:** 2025-12-22 --- ## 1. Prerequisites ### 1.1 Infrastructure Requirements | Component | Minimum | Recommended | Notes | |-----------|---------|-------------|-------| | CPU | 4 cores | 8 cores | For call graph extraction | | Memory | 4 GB | 8 GB | Large projects need more | | PostgreSQL | 16+ | 16+ | With RLS enabled | | Valkey/Redis | 7.0+ | 7.0+ | For caching (optional) | | .NET Runtime | 10.0 | 10.0 | Preview features enabled | ### 1.2 Network Requirements | Direction | Endpoints | Notes | |-----------|-----------|-------| | Inbound | Scanner API (8080) | Load balancer health checks | | Outbound | PostgreSQL (5432) | Database connections | | Outbound | Valkey (6379) | Cache connections (optional) | | Outbound | Signer service | For DSSE attestations | ### 1.3 Dependencies - Scanner WebService deployed and healthy - PostgreSQL database with Scanner schema migrations applied - (Optional) Valkey cluster for caching - (Optional) Signer service for attestation signing --- ## 2. Configuration ### 2.1 Scanner Service Configuration **File:** `etc/scanner.yaml` ```yaml scanner: reachability: # Enable reachability drift detection enabled: true # Languages to analyze (empty = all supported) languages: - dotnet - java - node - python - go # Call graph extraction options extraction: max_depth: 100 max_nodes: 100000 timeout_seconds: 300 include_test_code: false include_vendored: false # Drift detection options drift: # Auto-compute on scan completion auto_compute: true # Base scan selection (previous, tagged, specific) base_selection: previous # Emit VEX candidates for unreachable sinks emit_vex_candidates: true storage: postgres: connection_string: "Host=localhost;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}" schema: scanner pool_size: 20 cache: valkey: enabled: true connection: "localhost:6379" bucket: "stella-callgraph" ttl_hours: 24 circuit_breaker: failure_threshold: 5 timeout_seconds: 30 ``` ### 2.2 Valkey Cache Configuration ```yaml # Valkey-specific settings cache: valkey: enabled: true connection: "valkey-cluster.internal:6379" bucket: "stella-callgraph" ttl_hours: 24 # Circuit breaker prevents cache storms circuit_breaker: failure_threshold: 5 timeout_seconds: 30 half_open_max_attempts: 3 # Compression reduces memory usage compression: enabled: true algorithm: gzip level: fastest ``` ### 2.3 Policy Gate Configuration **File:** `etc/policy.yaml` ```yaml smart_diff: gates: # Block on KEV becoming reachable - id: drift_block_kev condition: "delta_reachable > 0 AND is_kev = true" action: block severity: critical message: "Known Exploited Vulnerability now reachable" # Block on high-severity sink becoming reachable - id: drift_block_critical condition: "delta_reachable > 0 AND max_cvss >= 9.0" action: block severity: critical message: "Critical vulnerability now reachable" # Warn on any new reachable paths - id: drift_warn_new_paths condition: "delta_reachable > 0" action: warn severity: medium message: "New reachable paths detected" # Auto-allow mitigated paths - id: drift_allow_mitigated condition: "delta_unreachable > 0 AND delta_reachable = 0" action: allow auto_approve: true ``` --- ## 3. Deployment Modes ### 3.1 Standalone Deployment ```bash # Run Scanner WebService with drift detection docker run -d \ --name scanner \ -p 8080:8080 \ -e SCANNER_DB_PASSWORD=secret \ -v /etc/scanner:/etc/scanner:ro \ stellaops/scanner:latest # Verify health curl http://localhost:8080/health ``` ### 3.2 Kubernetes Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: scanner namespace: stellaops spec: replicas: 3 selector: matchLabels: app: scanner template: metadata: labels: app: scanner spec: containers: - name: scanner image: stellaops/scanner:latest ports: - containerPort: 8080 env: - name: SCANNER_DB_PASSWORD valueFrom: secretKeyRef: name: scanner-secrets key: db-password volumeMounts: - name: config mountPath: /etc/scanner readOnly: true resources: requests: memory: "4Gi" cpu: "2" limits: memory: "8Gi" cpu: "4" livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 volumes: - name: config configMap: name: scanner-config ``` ### 3.3 Air-Gapped Deployment For air-gapped environments: 1. **Disable external lookups:** ```yaml scanner: reachability: offline_mode: true # No external advisory fetching ``` 2. **Pre-load call graph caches:** ```bash # Export from connected environment stella cache export --type callgraph --output graphs.tar.gz # Import in air-gapped environment stella cache import --input graphs.tar.gz ``` 3. **Use local VEX sources:** ```yaml excititor: sources: - type: local path: /data/vex-bundles/ ``` --- ## 4. Monitoring & Metrics ### 4.1 Key Metrics | Metric | Type | Description | Alert Threshold | |--------|------|-------------|-----------------| | `scanner_callgraph_extraction_duration_seconds` | histogram | Time to extract call graph | p99 > 300s | | `scanner_callgraph_node_count` | gauge | Nodes in extracted graph | > 100,000 | | `scanner_reachability_analysis_duration_seconds` | histogram | BFS analysis time | p99 > 30s | | `scanner_drift_newly_reachable_total` | counter | Count of newly reachable sinks | > 0 (alert) | | `scanner_drift_newly_unreachable_total` | counter | Count of mitigated sinks | (info) | | `scanner_cache_hit_ratio` | gauge | Valkey cache hit rate | < 0.5 | | `scanner_cache_circuit_breaker_open` | gauge | Circuit breaker state | = 1 (alert) | ### 4.2 Grafana Dashboard Import dashboard JSON from: `deploy/grafana/scanner-drift-dashboard.json` Key panels: - Drift detection rate over time - Newly reachable sinks by category - Call graph extraction latency - Cache hit/miss ratio - Circuit breaker state ### 4.3 Alert Rules ```yaml # Prometheus alerting rules groups: - name: scanner-drift rules: - alert: KevBecameReachable expr: increase(scanner_drift_kev_reachable_total[5m]) > 0 for: 0m labels: severity: critical annotations: summary: "KEV vulnerability became reachable" description: "A Known Exploited Vulnerability is now reachable from public entrypoints" - alert: HighDriftRate expr: rate(scanner_drift_newly_reachable_total[1h]) > 10 for: 15m labels: severity: warning annotations: summary: "High rate of new reachable vulnerabilities" - alert: CacheCircuitOpen expr: scanner_cache_circuit_breaker_open == 1 for: 5m labels: severity: warning annotations: summary: "Valkey cache circuit breaker is open" ``` --- ## 5. Troubleshooting ### 5.1 Call Graph Extraction Failures **Symptom:** `GRAPH_NOT_EXTRACTED` error **Causes & Solutions:** | Cause | Solution | |-------|----------| | Missing SDK/runtime | Install required SDK (.NET, Node.js, JDK) | | Build errors in project | Fix compilation errors first | | Timeout exceeded | Increase `extraction.timeout_seconds` | | Memory exhaustion | Increase container memory limits | | Unsupported language | Check language support matrix | **Debugging:** ```bash # Check extraction logs kubectl logs -f deployment/scanner | grep -i extraction # Manual extraction test stella scan callgraph \ --project /path/to/project \ --language dotnet \ --verbose ``` ### 5.2 Drift Detection Issues **Symptom:** Drift not computed or incorrect results **Causes & Solutions:** | Cause | Solution | |-------|----------| | No base scan available | Ensure previous scan exists | | Different languages | Base and head must have same language | | Graph digest unchanged | No material code changes detected | | Cache stale | Clear Valkey cache for scan | **Debugging:** ```bash # Check drift computation status curl "http://scanner:8080/api/scanner/scans/{scanId}/drift" # Force recomputation curl -X POST \ "http://scanner:8080/api/scanner/scans/{scanId}/compute-reachability" \ -d '{"forceRecompute": true}' # View graph digests psql -c "SELECT scan_id, graph_digest FROM scanner.call_graph_snapshots ORDER BY extracted_at DESC LIMIT 10" ``` ### 5.3 Cache Problems **Symptom:** Slow performance, cache misses, circuit breaker open **Solutions:** ```bash # Check Valkey connectivity redis-cli -h valkey.internal ping # Check circuit breaker state curl "http://scanner:8080/health/ready" | jq '.checks.cache' # Clear specific scan cache redis-cli DEL "stella-callgraph:scanId:*" # Reset circuit breaker (restart scanner) kubectl rollout restart deployment/scanner ``` ### 5.4 Common Error Messages | Error | Meaning | Action | |-------|---------|--------| | `ERR_GRAPH_TOO_LARGE` | > 100K nodes | Increase `max_nodes` or split project | | `ERR_EXTRACTION_TIMEOUT` | Analysis timed out | Increase timeout or reduce scope | | `ERR_NO_ENTRYPOINTS` | No public entrypoints found | Check framework detection | | `ERR_BASE_SCAN_MISSING` | Base scan not found | Specify valid `baseScanId` | | `ERR_CACHE_UNAVAILABLE` | Valkey unreachable | Check network, circuit breaker will activate | --- ## 6. Performance Tuning ### 6.1 Call Graph Extraction ```yaml scanner: reachability: extraction: # Exclude test code (reduces graph size) include_test_code: false # Exclude vendored dependencies include_vendored: false # Limit analysis depth max_depth: 50 # Default: 100 # Parallel project analysis parallelism: 4 ``` ### 6.2 Caching Strategy ```yaml cache: valkey: # Longer TTL for stable projects ttl_hours: 72 # Aggressive compression for large graphs compression: level: optimal # vs 'fastest' # Larger connection pool pool_size: 20 ``` ### 6.3 Database Optimization ```sql -- Ensure indexes exist CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_callgraph_scan_lang ON scanner.call_graph_snapshots(scan_id, language); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_drift_head_scan ON scanner.reachability_drift_results(head_scan_id); -- Vacuum after large imports VACUUM ANALYZE scanner.call_graph_snapshots; VACUUM ANALYZE scanner.reachability_drift_results; ``` --- ## 7. Backup & Recovery ### 7.1 Database Backup ```bash # Backup drift-related tables pg_dump -h postgres.internal -U stellaops \ -t scanner.call_graph_snapshots \ -t scanner.reachability_results \ -t scanner.reachability_drift_results \ -t scanner.drifted_sinks \ -t scanner.code_changes \ > scanner_drift_backup.sql ``` ### 7.2 Cache Recovery ```bash # Export cache to file (if needed) redis-cli -h valkey.internal --rdb /backup/callgraph-cache.rdb # Cache is ephemeral - can be regenerated from database # Recompute after cache loss: stella scan recompute-reachability --all-pending ``` --- ## 8. Security Considerations ### 8.1 Database Access - Scanner service uses dedicated PostgreSQL user with schema-limited permissions - Row-Level Security (RLS) enforces tenant isolation - Connection strings use secrets management (not plaintext) ### 8.2 API Authentication - All drift endpoints require valid Bearer token - Scopes: `scanner:read`, `scanner:write`, `scanner:admin` - Rate limiting prevents abuse ### 8.3 Attestation Signing - Drift results can be DSSE-signed for audit trails - Signing keys managed by Signer service - Optional Rekor transparency logging --- ## 9. References - **Architecture:** `docs/modules/scanner/reachability-drift.md` - **API Reference:** `docs/api/scanner-drift-api.md` - **PostgreSQL Guide:** `docs/operations/postgresql-guide.md` - **Air-Gap Operations:** `docs/operations/airgap-operations-runbook.md` - **Reachability Runbook:** `docs/operations/reachability-runbook.md`