Refactor code structure for improved readability and maintainability; optimize performance in key functions.

This commit is contained in:
master
2025-12-22 19:06:31 +02:00
parent dfaa2079aa
commit 4602ccc3a3
1444 changed files with 109919 additions and 8058 deletions

View File

@@ -1,4 +1,4 @@
# Reachability Drift Detection - Operations Guide
# Reachability Drift Detection - Operations Guide
**Module:** Scanner
**Version:** 1.0
@@ -6,514 +6,142 @@
---
## 1. Prerequisites
## 1. Overview
### 1.1 Infrastructure Requirements
| Component | Minimum | Recommended | Notes |
|-----------|---------|-------------|-------|
| CPU | 4 cores | 8 cores | For call graph extraction |
| Memory | 4 GB | 8 GB | Large projects need more |
| PostgreSQL | 16+ | 16+ | With RLS enabled |
| Valkey/Redis | 7.0+ | 7.0+ | For caching (optional) |
| .NET Runtime | 10.0 | 10.0 | Preview features enabled |
### 1.2 Network Requirements
| Direction | Endpoints | Notes |
|-----------|-----------|-------|
| Inbound | Scanner API (8080) | Load balancer health checks |
| Outbound | PostgreSQL (5432) | Database connections |
| Outbound | Valkey (6379) | Cache connections (optional) |
| Outbound | Signer service | For DSSE attestations |
### 1.3 Dependencies
- Scanner WebService deployed and healthy
- PostgreSQL database with Scanner schema migrations applied
- (Optional) Valkey cluster for caching
- (Optional) Signer service for attestation signing
Reachability Drift Detection compares call graph reachability between two scans and surfaces newly reachable or newly unreachable sinks. The API lives in the Scanner WebService and relies on call graph snapshots stored in PostgreSQL.
---
## 2. Configuration
## 2. Prerequisites
### 2.1 Scanner Service Configuration
### 2.1 Infrastructure Requirements
**File:** `etc/scanner.yaml`
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| CPU | 4 cores | 8 cores | Call graph extraction is CPU heavy. |
| Memory | 4 GB | 8 GB | Large graphs need more memory. |
| PostgreSQL | 16+ | 16+ | Required for call graph + drift tables. |
| Valkey/Redis | 7.0+ | 7.0+ | Optional call graph cache. |
| .NET Runtime | 10.0 | 10.0 | Scanner WebService runtime. |
### 2.2 Required Services
- Scanner WebService running with storage configured.
- Call graph ingestion pipeline populating `call_graph_snapshots` (Scanner Worker or external ingestion).
- PostgreSQL migrations for call graph and drift tables applied (auto-migrate is enabled by default).
Optional:
- Valkey call graph cache (`CallGraph:Cache`).
- Signer service for drift attestations (if enabled by the integration layer).
---
## 3. Configuration
### 3.1 Scanner WebService
**File:** `etc/scanner.yaml` (path depends on deployment)
```yaml
scanner:
reachability:
# Enable reachability drift detection
enabled: true
# Languages to analyze (empty = all supported)
languages:
- dotnet
- java
- node
- python
- go
# Call graph extraction options
extraction:
max_depth: 100
max_nodes: 100000
timeout_seconds: 300
include_test_code: false
include_vendored: false
# Drift detection options
drift:
# Auto-compute on scan completion
auto_compute: true
# Base scan selection (previous, tagged, specific)
base_selection: previous
# Emit VEX candidates for unreachable sinks
emit_vex_candidates: true
storage:
postgres:
connection_string: "Host=localhost;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}"
schema: scanner
pool_size: 20
dsn: "Host=postgres;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}"
database: "scanner"
commandTimeoutSeconds: 30
autoMigrate: true
cache:
valkey:
enabled: true
connection: "localhost:6379"
bucket: "stella-callgraph"
ttl_hours: 24
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
api:
basePath: "/api/v1"
scansSegment: "scans"
```
### 2.2 Valkey Cache Configuration
### 3.2 Call Graph Cache (Optional)
```yaml
# Valkey-specific settings
cache:
valkey:
CallGraph:
Cache:
enabled: true
connection: "valkey-cluster.internal:6379"
bucket: "stella-callgraph"
ttl_hours: 24
# Circuit breaker prevents cache storms
connection_string: "valkey:6379"
key_prefix: "callgraph:"
ttl_seconds: 3600
gzip: true
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
half_open_max_attempts: 3
# Compression reduces memory usage
compression:
enabled: true
algorithm: gzip
level: fastest
half_open_timeout: 10
```
### 2.3 Policy Gate Configuration
**File:** `etc/policy.yaml`
```yaml
smart_diff:
gates:
# Block on KEV becoming reachable
- id: drift_block_kev
condition: "delta_reachable > 0 AND is_kev = true"
action: block
severity: critical
message: "Known Exploited Vulnerability now reachable"
# Block on high-severity sink becoming reachable
- id: drift_block_critical
condition: "delta_reachable > 0 AND max_cvss >= 9.0"
action: block
severity: critical
message: "Critical vulnerability now reachable"
# Warn on any new reachable paths
- id: drift_warn_new_paths
condition: "delta_reachable > 0"
action: warn
severity: medium
message: "New reachable paths detected"
# Auto-allow mitigated paths
- id: drift_allow_mitigated
condition: "delta_unreachable > 0 AND delta_reachable = 0"
action: allow
auto_approve: true
```
---
## 3. Deployment Modes
### 3.1 Standalone Deployment
```bash
# Run Scanner WebService with drift detection
docker run -d \
--name scanner \
-p 8080:8080 \
-e SCANNER_DB_PASSWORD=secret \
-v /etc/scanner:/etc/scanner:ro \
stellaops/scanner:latest
# Verify health
curl http://localhost:8080/health
```
### 3.2 Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: scanner
namespace: stellaops
spec:
replicas: 3
selector:
matchLabels:
app: scanner
template:
metadata:
labels:
app: scanner
spec:
containers:
- name: scanner
image: stellaops/scanner:latest
ports:
- containerPort: 8080
env:
- name: SCANNER_DB_PASSWORD
valueFrom:
secretKeyRef:
name: scanner-secrets
key: db-password
volumeMounts:
- name: config
mountPath: /etc/scanner
readOnly: true
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
volumes:
- name: config
configMap:
name: scanner-config
```
### 3.3 Air-Gapped Deployment
For air-gapped environments:
1. **Disable external lookups:**
```yaml
scanner:
reachability:
offline_mode: true
# No external advisory fetching
```
2. **Pre-load call graph caches:**
```bash
# Export from connected environment
stella cache export --type callgraph --output graphs.tar.gz
# Import in air-gapped environment
stella cache import --input graphs.tar.gz
```
3. **Use local VEX sources:**
```yaml
excititor:
sources:
- type: local
path: /data/vex-bundles/
```
---
## 4. Monitoring & Metrics
### 4.1 Key Metrics
| Metric | Type | Description | Alert Threshold |
|--------|------|-------------|-----------------|
| `scanner_callgraph_extraction_duration_seconds` | histogram | Time to extract call graph | p99 > 300s |
| `scanner_callgraph_node_count` | gauge | Nodes in extracted graph | > 100,000 |
| `scanner_reachability_analysis_duration_seconds` | histogram | BFS analysis time | p99 > 30s |
| `scanner_drift_newly_reachable_total` | counter | Count of newly reachable sinks | > 0 (alert) |
| `scanner_drift_newly_unreachable_total` | counter | Count of mitigated sinks | (info) |
| `scanner_cache_hit_ratio` | gauge | Valkey cache hit rate | < 0.5 |
| `scanner_cache_circuit_breaker_open` | gauge | Circuit breaker state | = 1 (alert) |
### 4.2 Grafana Dashboard
Import dashboard JSON from: `deploy/grafana/scanner-drift-dashboard.json`
Key panels:
- Drift detection rate over time
- Newly reachable sinks by category
- Call graph extraction latency
- Cache hit/miss ratio
- Circuit breaker state
### 4.3 Alert Rules
```yaml
# Prometheus alerting rules
groups:
- name: scanner-drift
rules:
- alert: KevBecameReachable
expr: increase(scanner_drift_kev_reachable_total[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "KEV vulnerability became reachable"
description: "A Known Exploited Vulnerability is now reachable from public entrypoints"
- alert: HighDriftRate
expr: rate(scanner_drift_newly_reachable_total[1h]) > 10
for: 15m
labels:
severity: warning
annotations:
summary: "High rate of new reachable vulnerabilities"
- alert: CacheCircuitOpen
expr: scanner_cache_circuit_breaker_open == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Valkey cache circuit breaker is open"
```
---
## 5. Troubleshooting
### 5.1 Call Graph Extraction Failures
**Symptom:** `GRAPH_NOT_EXTRACTED` error
**Causes & Solutions:**
| Cause | Solution |
|-------|----------|
| Missing SDK/runtime | Install required SDK (.NET, Node.js, JDK) |
| Build errors in project | Fix compilation errors first |
| Timeout exceeded | Increase `extraction.timeout_seconds` |
| Memory exhaustion | Increase container memory limits |
| Unsupported language | Check language support matrix |
**Debugging:**
```bash
# Check extraction logs
kubectl logs -f deployment/scanner | grep -i extraction
# Manual extraction test
stella scan callgraph \
--project /path/to/project \
--language dotnet \
--verbose
```
### 5.2 Drift Detection Issues
**Symptom:** Drift not computed or incorrect results
**Causes & Solutions:**
| Cause | Solution |
|-------|----------|
| No base scan available | Ensure previous scan exists |
| Different languages | Base and head must have same language |
| Graph digest unchanged | No material code changes detected |
| Cache stale | Clear Valkey cache for scan |
**Debugging:**
```bash
# Check drift computation status
curl "http://scanner:8080/api/scanner/scans/{scanId}/drift"
# Force recomputation
curl -X POST \
"http://scanner:8080/api/scanner/scans/{scanId}/compute-reachability" \
-d '{"forceRecompute": true}'
# View graph digests
psql -c "SELECT scan_id, graph_digest FROM scanner.call_graph_snapshots ORDER BY extracted_at DESC LIMIT 10"
```
### 5.3 Cache Problems
**Symptom:** Slow performance, cache misses, circuit breaker open
**Solutions:**
```bash
# Check Valkey connectivity
redis-cli -h valkey.internal ping
# Check circuit breaker state
curl "http://scanner:8080/health/ready" | jq '.checks.cache'
# Clear specific scan cache
redis-cli DEL "stella-callgraph:scanId:*"
# Reset circuit breaker (restart scanner)
kubectl rollout restart deployment/scanner
```
### 5.4 Common Error Messages
| Error | Meaning | Action |
|-------|---------|--------|
| `ERR_GRAPH_TOO_LARGE` | > 100K nodes | Increase `max_nodes` or split project |
| `ERR_EXTRACTION_TIMEOUT` | Analysis timed out | Increase timeout or reduce scope |
| `ERR_NO_ENTRYPOINTS` | No public entrypoints found | Check framework detection |
| `ERR_BASE_SCAN_MISSING` | Base scan not found | Specify valid `baseScanId` |
| `ERR_CACHE_UNAVAILABLE` | Valkey unreachable | Check network, circuit breaker will activate |
---
## 6. Performance Tuning
### 6.1 Call Graph Extraction
### 3.3 Authorization (Optional)
```yaml
scanner:
reachability:
extraction:
# Exclude test code (reduces graph size)
include_test_code: false
# Exclude vendored dependencies
include_vendored: false
# Limit analysis depth
max_depth: 50 # Default: 100
# Parallel project analysis
parallelism: 4
```
### 6.2 Caching Strategy
```yaml
cache:
valkey:
# Longer TTL for stable projects
ttl_hours: 72
# Aggressive compression for large graphs
compression:
level: optimal # vs 'fastest'
# Larger connection pool
pool_size: 20
```
### 6.3 Database Optimization
```sql
-- Ensure indexes exist
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_callgraph_scan_lang
ON scanner.call_graph_snapshots(scan_id, language);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_drift_head_scan
ON scanner.reachability_drift_results(head_scan_id);
-- Vacuum after large imports
VACUUM ANALYZE scanner.call_graph_snapshots;
VACUUM ANALYZE scanner.reachability_drift_results;
authority:
enabled: true
issuer: "https://authority.local"
requiredScopes:
- "scanner.scans.read"
- "scanner.scans.write"
```
---
## 7. Backup & Recovery
## 4. Running Drift Analysis
### 7.1 Database Backup
1. Ensure call graph snapshots exist for base and head scans.
2. Compute drift by providing the base scan ID:
- `GET /api/v1/scans/{scanId}/drift?baseScanId={baseScanId}&language=dotnet`
3. Page through sinks:
- `GET /api/v1/drift/{driftId}/sinks?direction=became_reachable&offset=0&limit=100`
```bash
# Backup drift-related tables
pg_dump -h postgres.internal -U stellaops \
-t scanner.call_graph_snapshots \
-t scanner.reachability_results \
-t scanner.reachability_drift_results \
-t scanner.drifted_sinks \
-t scanner.code_changes \
> scanner_drift_backup.sql
```
### 7.2 Cache Recovery
```bash
# Export cache to file (if needed)
redis-cli -h valkey.internal --rdb /backup/callgraph-cache.rdb
# Cache is ephemeral - can be regenerated from database
# Recompute after cache loss:
stella scan recompute-reachability --all-pending
```
If `baseScanId` is omitted, the API returns the most recent stored drift result for the head scan.
---
## 8. Security Considerations
## 5. Deployment Modes
### 8.1 Database Access
### 5.1 Standalone
- Scanner service uses dedicated PostgreSQL user with schema-limited permissions
- Row-Level Security (RLS) enforces tenant isolation
- Connection strings use secrets management (not plaintext)
- Run Scanner WebService with PostgreSQL reachable.
- Provide `scanner.storage.dsn` and `scanner.api.basePath`.
### 8.2 API Authentication
### 5.2 Kubernetes
- All drift endpoints require valid Bearer token
- Scopes: `scanner:read`, `scanner:write`, `scanner:admin`
- Rate limiting prevents abuse
- Configure readiness and liveness probes (`/health/ready`, `/health/live`).
- Mount `scanner.yaml` via ConfigMap or Secret.
- Ensure Postgres connectivity and schema migrations are enabled.
### 8.3 Attestation Signing
### 5.3 Air-Gapped
- Drift results can be DSSE-signed for audit trails
- Signing keys managed by Signer service
- Optional Rekor transparency logging
- Use Offline Kit flows for advisory data and signatures.
- Avoid external endpoints; configure any optional integrations to local services.
---
## 9. References
## 6. Monitoring and Metrics
- **Architecture:** `docs/modules/scanner/reachability-drift.md`
- **API Reference:** `docs/api/scanner-drift-api.md`
- **PostgreSQL Guide:** `docs/operations/postgresql-guide.md`
- **Air-Gap Operations:** `docs/operations/airgap-operations-runbook.md`
- **Reachability Runbook:** `docs/operations/reachability-runbook.md`
There are no drift-specific metrics emitted by the drift endpoints yet. Recommended operational checks:
- API logs for `/api/v1/scans/{scanId}/drift` and `/api/v1/drift/{driftId}/sinks`.
- PostgreSQL table sizes and growth for `call_graph_snapshots`, `reachability_drift_results`, `drifted_sinks`.
- Valkey connectivity and cache hit rates if `CallGraph:Cache` is enabled.
---
## 7. Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| 404 scan not found | Invalid scan ID | Verify scan ID or resolve by image reference. |
| 404 call graph not found | Call graph not ingested | Ingest call graph snapshot before running drift. |
| 404 drift result not found | No stored drift and no base scan provided | Provide `baseScanId` to compute drift. |
| 400 invalid direction | Unsupported direction value | Use `became_reachable` or `became_unreachable`. |
| 409 computation already in progress | Reachability job already running | Wait or retry later. |
---
## 8. References
- `docs/modules/scanner/reachability-drift.md`
- `docs/api/scanner-drift-api.md`
- `docs/airgap/reachability-drift-airgap-workflows.md`
- `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/009_call_graph_tables.sql`
- `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/010_reachability_drift_tables.sql`

View File

@@ -576,6 +576,52 @@ stella unknowns report --format email --send-to security-team@example.com
---
## 8. Unknown Budgets
Unknown budgets enforce per-environment caps on unknowns by reason code. Budgets can warn or block when exceeded.
**Configuration**:
```yaml
# etc/policy.unknowns.budgets.yaml
unknownBudgets:
enforceBudgets: true
budgets:
prod:
environment: prod
totalLimit: 3
reasonLimits:
Reachability: 0
Provenance: 0
VexConflict: 1
action: Block
exceededMessage: "Production requires zero reachability unknowns"
stage:
environment: stage
totalLimit: 10
reasonLimits:
Reachability: 1
action: WarnUnlessException
dev:
environment: dev
totalLimit: null
action: Warn
default:
environment: default
totalLimit: 5
action: Warn
```
**Exception coverage**:
To allow approved exceptions to cover specific unknown reason codes, set exception metadata
`unknown_reason_codes` (comma-separated). Example: `Reachability, U-VEX`.
---
## Related Documentation
- [Unknowns API Reference](../api/score-proofs-reachability-api-reference.md#5-unknowns-api)
@@ -585,6 +631,6 @@ stella unknowns report --format email --send-to security-team@example.com
---
**Last Updated**: 2025-12-20
**Last Updated**: 2025-12-22
**Version**: 1.0.0
**Sprint**: 3500.0004.0004