- Add RpmVersionComparer for RPM version comparison with epoch, version, and release handling. - Introduce DebianVersion for parsing Debian EVR (Epoch:Version-Release) strings. - Create ApkVersion for parsing Alpine APK version strings with suffix support. - Define IVersionComparator interface for version comparison with proof-line generation. - Implement VersionComparisonResult struct to encapsulate comparison results and proof lines. - Add tests for Debian and RPM version comparers to ensure correct functionality and edge case handling. - Create project files for the version comparison library and its tests.
13 KiB
13 KiB
Reachability Drift Detection - Operations Guide
Module: Scanner Version: 1.0 Last Updated: 2025-12-22
1. Prerequisites
1.1 Infrastructure Requirements
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| CPU | 4 cores | 8 cores | For call graph extraction |
| Memory | 4 GB | 8 GB | Large projects need more |
| PostgreSQL | 16+ | 16+ | With RLS enabled |
| Valkey/Redis | 7.0+ | 7.0+ | For caching (optional) |
| .NET Runtime | 10.0 | 10.0 | Preview features enabled |
1.2 Network Requirements
| Direction | Endpoints | Notes |
|---|---|---|
| Inbound | Scanner API (8080) | Load balancer health checks |
| Outbound | PostgreSQL (5432) | Database connections |
| Outbound | Valkey (6379) | Cache connections (optional) |
| Outbound | Signer service | For DSSE attestations |
1.3 Dependencies
- Scanner WebService deployed and healthy
- PostgreSQL database with Scanner schema migrations applied
- (Optional) Valkey cluster for caching
- (Optional) Signer service for attestation signing
2. Configuration
2.1 Scanner Service Configuration
File: etc/scanner.yaml
scanner:
reachability:
# Enable reachability drift detection
enabled: true
# Languages to analyze (empty = all supported)
languages:
- dotnet
- java
- node
- python
- go
# Call graph extraction options
extraction:
max_depth: 100
max_nodes: 100000
timeout_seconds: 300
include_test_code: false
include_vendored: false
# Drift detection options
drift:
# Auto-compute on scan completion
auto_compute: true
# Base scan selection (previous, tagged, specific)
base_selection: previous
# Emit VEX candidates for unreachable sinks
emit_vex_candidates: true
storage:
postgres:
connection_string: "Host=localhost;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}"
schema: scanner
pool_size: 20
cache:
valkey:
enabled: true
connection: "localhost:6379"
bucket: "stella-callgraph"
ttl_hours: 24
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
2.2 Valkey Cache Configuration
# Valkey-specific settings
cache:
valkey:
enabled: true
connection: "valkey-cluster.internal:6379"
bucket: "stella-callgraph"
ttl_hours: 24
# Circuit breaker prevents cache storms
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
half_open_max_attempts: 3
# Compression reduces memory usage
compression:
enabled: true
algorithm: gzip
level: fastest
2.3 Policy Gate Configuration
File: etc/policy.yaml
smart_diff:
gates:
# Block on KEV becoming reachable
- id: drift_block_kev
condition: "delta_reachable > 0 AND is_kev = true"
action: block
severity: critical
message: "Known Exploited Vulnerability now reachable"
# Block on high-severity sink becoming reachable
- id: drift_block_critical
condition: "delta_reachable > 0 AND max_cvss >= 9.0"
action: block
severity: critical
message: "Critical vulnerability now reachable"
# Warn on any new reachable paths
- id: drift_warn_new_paths
condition: "delta_reachable > 0"
action: warn
severity: medium
message: "New reachable paths detected"
# Auto-allow mitigated paths
- id: drift_allow_mitigated
condition: "delta_unreachable > 0 AND delta_reachable = 0"
action: allow
auto_approve: true
3. Deployment Modes
3.1 Standalone Deployment
# Run Scanner WebService with drift detection
docker run -d \
--name scanner \
-p 8080:8080 \
-e SCANNER_DB_PASSWORD=secret \
-v /etc/scanner:/etc/scanner:ro \
stellaops/scanner:latest
# Verify health
curl http://localhost:8080/health
3.2 Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: scanner
namespace: stellaops
spec:
replicas: 3
selector:
matchLabels:
app: scanner
template:
metadata:
labels:
app: scanner
spec:
containers:
- name: scanner
image: stellaops/scanner:latest
ports:
- containerPort: 8080
env:
- name: SCANNER_DB_PASSWORD
valueFrom:
secretKeyRef:
name: scanner-secrets
key: db-password
volumeMounts:
- name: config
mountPath: /etc/scanner
readOnly: true
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
volumes:
- name: config
configMap:
name: scanner-config
3.3 Air-Gapped Deployment
For air-gapped environments:
-
Disable external lookups:
scanner: reachability: offline_mode: true # No external advisory fetching -
Pre-load call graph caches:
# Export from connected environment stella cache export --type callgraph --output graphs.tar.gz # Import in air-gapped environment stella cache import --input graphs.tar.gz -
Use local VEX sources:
excititor: sources: - type: local path: /data/vex-bundles/
4. Monitoring & Metrics
4.1 Key Metrics
| Metric | Type | Description | Alert Threshold |
|---|---|---|---|
scanner_callgraph_extraction_duration_seconds |
histogram | Time to extract call graph | p99 > 300s |
scanner_callgraph_node_count |
gauge | Nodes in extracted graph | > 100,000 |
scanner_reachability_analysis_duration_seconds |
histogram | BFS analysis time | p99 > 30s |
scanner_drift_newly_reachable_total |
counter | Count of newly reachable sinks | > 0 (alert) |
scanner_drift_newly_unreachable_total |
counter | Count of mitigated sinks | (info) |
scanner_cache_hit_ratio |
gauge | Valkey cache hit rate | < 0.5 |
scanner_cache_circuit_breaker_open |
gauge | Circuit breaker state | = 1 (alert) |
4.2 Grafana Dashboard
Import dashboard JSON from: deploy/grafana/scanner-drift-dashboard.json
Key panels:
- Drift detection rate over time
- Newly reachable sinks by category
- Call graph extraction latency
- Cache hit/miss ratio
- Circuit breaker state
4.3 Alert Rules
# Prometheus alerting rules
groups:
- name: scanner-drift
rules:
- alert: KevBecameReachable
expr: increase(scanner_drift_kev_reachable_total[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "KEV vulnerability became reachable"
description: "A Known Exploited Vulnerability is now reachable from public entrypoints"
- alert: HighDriftRate
expr: rate(scanner_drift_newly_reachable_total[1h]) > 10
for: 15m
labels:
severity: warning
annotations:
summary: "High rate of new reachable vulnerabilities"
- alert: CacheCircuitOpen
expr: scanner_cache_circuit_breaker_open == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Valkey cache circuit breaker is open"
5. Troubleshooting
5.1 Call Graph Extraction Failures
Symptom: GRAPH_NOT_EXTRACTED error
Causes & Solutions:
| Cause | Solution |
|---|---|
| Missing SDK/runtime | Install required SDK (.NET, Node.js, JDK) |
| Build errors in project | Fix compilation errors first |
| Timeout exceeded | Increase extraction.timeout_seconds |
| Memory exhaustion | Increase container memory limits |
| Unsupported language | Check language support matrix |
Debugging:
# Check extraction logs
kubectl logs -f deployment/scanner | grep -i extraction
# Manual extraction test
stella scan callgraph \
--project /path/to/project \
--language dotnet \
--verbose
5.2 Drift Detection Issues
Symptom: Drift not computed or incorrect results
Causes & Solutions:
| Cause | Solution |
|---|---|
| No base scan available | Ensure previous scan exists |
| Different languages | Base and head must have same language |
| Graph digest unchanged | No material code changes detected |
| Cache stale | Clear Valkey cache for scan |
Debugging:
# Check drift computation status
curl "http://scanner:8080/api/scanner/scans/{scanId}/drift"
# Force recomputation
curl -X POST \
"http://scanner:8080/api/scanner/scans/{scanId}/compute-reachability" \
-d '{"forceRecompute": true}'
# View graph digests
psql -c "SELECT scan_id, graph_digest FROM scanner.call_graph_snapshots ORDER BY extracted_at DESC LIMIT 10"
5.3 Cache Problems
Symptom: Slow performance, cache misses, circuit breaker open
Solutions:
# Check Valkey connectivity
redis-cli -h valkey.internal ping
# Check circuit breaker state
curl "http://scanner:8080/health/ready" | jq '.checks.cache'
# Clear specific scan cache
redis-cli DEL "stella-callgraph:scanId:*"
# Reset circuit breaker (restart scanner)
kubectl rollout restart deployment/scanner
5.4 Common Error Messages
| Error | Meaning | Action |
|---|---|---|
ERR_GRAPH_TOO_LARGE |
> 100K nodes | Increase max_nodes or split project |
ERR_EXTRACTION_TIMEOUT |
Analysis timed out | Increase timeout or reduce scope |
ERR_NO_ENTRYPOINTS |
No public entrypoints found | Check framework detection |
ERR_BASE_SCAN_MISSING |
Base scan not found | Specify valid baseScanId |
ERR_CACHE_UNAVAILABLE |
Valkey unreachable | Check network, circuit breaker will activate |
6. Performance Tuning
6.1 Call Graph Extraction
scanner:
reachability:
extraction:
# Exclude test code (reduces graph size)
include_test_code: false
# Exclude vendored dependencies
include_vendored: false
# Limit analysis depth
max_depth: 50 # Default: 100
# Parallel project analysis
parallelism: 4
6.2 Caching Strategy
cache:
valkey:
# Longer TTL for stable projects
ttl_hours: 72
# Aggressive compression for large graphs
compression:
level: optimal # vs 'fastest'
# Larger connection pool
pool_size: 20
6.3 Database Optimization
-- Ensure indexes exist
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_callgraph_scan_lang
ON scanner.call_graph_snapshots(scan_id, language);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_drift_head_scan
ON scanner.reachability_drift_results(head_scan_id);
-- Vacuum after large imports
VACUUM ANALYZE scanner.call_graph_snapshots;
VACUUM ANALYZE scanner.reachability_drift_results;
7. Backup & Recovery
7.1 Database Backup
# Backup drift-related tables
pg_dump -h postgres.internal -U stellaops \
-t scanner.call_graph_snapshots \
-t scanner.reachability_results \
-t scanner.reachability_drift_results \
-t scanner.drifted_sinks \
-t scanner.code_changes \
> scanner_drift_backup.sql
7.2 Cache Recovery
# Export cache to file (if needed)
redis-cli -h valkey.internal --rdb /backup/callgraph-cache.rdb
# Cache is ephemeral - can be regenerated from database
# Recompute after cache loss:
stella scan recompute-reachability --all-pending
8. Security Considerations
8.1 Database Access
- Scanner service uses dedicated PostgreSQL user with schema-limited permissions
- Row-Level Security (RLS) enforces tenant isolation
- Connection strings use secrets management (not plaintext)
8.2 API Authentication
- All drift endpoints require valid Bearer token
- Scopes:
scanner:read,scanner:write,scanner:admin - Rate limiting prevents abuse
8.3 Attestation Signing
- Drift results can be DSSE-signed for audit trails
- Signing keys managed by Signer service
- Optional Rekor transparency logging
9. References
- Architecture:
docs/modules/scanner/reachability-drift.md - API Reference:
docs/api/scanner-drift-api.md - PostgreSQL Guide:
docs/operations/postgresql-guide.md - Air-Gap Operations:
docs/operations/airgap-operations-runbook.md - Reachability Runbook:
docs/operations/reachability-runbook.md