Files

StellaOps Bot df94136727 feat: Implement distro-native version comparison for RPM, Debian, and Alpine packages

- Add RpmVersionComparer for RPM version comparison with epoch, version, and release handling.
- Introduce DebianVersion for parsing Debian EVR (Epoch:Version-Release) strings.
- Create ApkVersion for parsing Alpine APK version strings with suffix support.
- Define IVersionComparator interface for version comparison with proof-line generation.
- Implement VersionComparisonResult struct to encapsulate comparison results and proof lines.
- Add tests for Debian and RPM version comparers to ensure correct functionality and edge case handling.
- Create project files for the version comparison library and its tests.

2025-12-22 09:49:53 +02:00

13 KiB

Raw Blame History

Reachability Drift Detection - Operations Guide

Module: Scanner Version: 1.0 Last Updated: 2025-12-22

1. Prerequisites

1.1 Infrastructure Requirements

Component	Minimum	Recommended	Notes
CPU	4 cores	8 cores	For call graph extraction
Memory	4 GB	8 GB	Large projects need more
PostgreSQL	16+	16+	With RLS enabled
Valkey/Redis	7.0+	7.0+	For caching (optional)
.NET Runtime	10.0	10.0	Preview features enabled

1.2 Network Requirements

Direction	Endpoints	Notes
Inbound	Scanner API (8080)	Load balancer health checks
Outbound	PostgreSQL (5432)	Database connections
Outbound	Valkey (6379)	Cache connections (optional)
Outbound	Signer service	For DSSE attestations

1.3 Dependencies

Scanner WebService deployed and healthy
PostgreSQL database with Scanner schema migrations applied
(Optional) Valkey cluster for caching
(Optional) Signer service for attestation signing

2. Configuration

2.1 Scanner Service Configuration

File: etc/scanner.yaml

scanner:
  reachability:
    # Enable reachability drift detection
    enabled: true

    # Languages to analyze (empty = all supported)
    languages:
      - dotnet
      - java
      - node
      - python
      - go

    # Call graph extraction options
    extraction:
      max_depth: 100
      max_nodes: 100000
      timeout_seconds: 300
      include_test_code: false
      include_vendored: false

    # Drift detection options
    drift:
      # Auto-compute on scan completion
      auto_compute: true
      # Base scan selection (previous, tagged, specific)
      base_selection: previous
      # Emit VEX candidates for unreachable sinks
      emit_vex_candidates: true

  storage:
    postgres:
      connection_string: "Host=localhost;Database=stellaops;Username=scanner;Password=${SCANNER_DB_PASSWORD}"
      schema: scanner
      pool_size: 20

  cache:
    valkey:
      enabled: true
      connection: "localhost:6379"
      bucket: "stella-callgraph"
      ttl_hours: 24
      circuit_breaker:
        failure_threshold: 5
        timeout_seconds: 30

2.2 Valkey Cache Configuration

# Valkey-specific settings
cache:
  valkey:
    enabled: true
    connection: "valkey-cluster.internal:6379"
    bucket: "stella-callgraph"
    ttl_hours: 24

    # Circuit breaker prevents cache storms
    circuit_breaker:
      failure_threshold: 5
      timeout_seconds: 30
      half_open_max_attempts: 3

    # Compression reduces memory usage
    compression:
      enabled: true
      algorithm: gzip
      level: fastest

2.3 Policy Gate Configuration

File: etc/policy.yaml

smart_diff:
  gates:
    # Block on KEV becoming reachable
    - id: drift_block_kev
      condition: "delta_reachable > 0 AND is_kev = true"
      action: block
      severity: critical
      message: "Known Exploited Vulnerability now reachable"

    # Block on high-severity sink becoming reachable
    - id: drift_block_critical
      condition: "delta_reachable > 0 AND max_cvss >= 9.0"
      action: block
      severity: critical
      message: "Critical vulnerability now reachable"

    # Warn on any new reachable paths
    - id: drift_warn_new_paths
      condition: "delta_reachable > 0"
      action: warn
      severity: medium
      message: "New reachable paths detected"

    # Auto-allow mitigated paths
    - id: drift_allow_mitigated
      condition: "delta_unreachable > 0 AND delta_reachable = 0"
      action: allow
      auto_approve: true

3. Deployment Modes

3.1 Standalone Deployment

# Run Scanner WebService with drift detection
docker run -d \
  --name scanner \
  -p 8080:8080 \
  -e SCANNER_DB_PASSWORD=secret \
  -v /etc/scanner:/etc/scanner:ro \
  stellaops/scanner:latest

# Verify health
curl http://localhost:8080/health

3.2 Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: scanner
  namespace: stellaops
spec:
  replicas: 3
  selector:
    matchLabels:
      app: scanner
  template:
    metadata:
      labels:
        app: scanner
    spec:
      containers:
        - name: scanner
          image: stellaops/scanner:latest
          ports:
            - containerPort: 8080
          env:
            - name: SCANNER_DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: scanner-secrets
                  key: db-password
          volumeMounts:
            - name: config
              mountPath: /etc/scanner
              readOnly: true
          resources:
            requests:
              memory: "4Gi"
              cpu: "2"
            limits:
              memory: "8Gi"
              cpu: "4"
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
      volumes:
        - name: config
          configMap:
            name: scanner-config

3.3 Air-Gapped Deployment

For air-gapped environments:

Disable external lookups:

scanner:
  reachability:
    offline_mode: true
    # No external advisory fetching

Pre-load call graph caches:

# Export from connected environment
stella cache export --type callgraph --output graphs.tar.gz

# Import in air-gapped environment
stella cache import --input graphs.tar.gz

Use local VEX sources:

excititor:
  sources:
    - type: local
      path: /data/vex-bundles/

4. Monitoring & Metrics

4.1 Key Metrics

Metric	Type	Description	Alert Threshold
`scanner_callgraph_extraction_duration_seconds`	histogram	Time to extract call graph	p99 > 300s
`scanner_callgraph_node_count`	gauge	Nodes in extracted graph	> 100,000
`scanner_reachability_analysis_duration_seconds`	histogram	BFS analysis time	p99 > 30s
`scanner_drift_newly_reachable_total`	counter	Count of newly reachable sinks	> 0 (alert)
`scanner_drift_newly_unreachable_total`	counter	Count of mitigated sinks	(info)
`scanner_cache_hit_ratio`	gauge	Valkey cache hit rate	< 0.5
`scanner_cache_circuit_breaker_open`	gauge	Circuit breaker state	= 1 (alert)

4.2 Grafana Dashboard

Import dashboard JSON from: deploy/grafana/scanner-drift-dashboard.json

Key panels:

Drift detection rate over time
Newly reachable sinks by category
Call graph extraction latency
Cache hit/miss ratio
Circuit breaker state

4.3 Alert Rules

# Prometheus alerting rules
groups:
  - name: scanner-drift
    rules:
      - alert: KevBecameReachable
        expr: increase(scanner_drift_kev_reachable_total[5m]) > 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "KEV vulnerability became reachable"
          description: "A Known Exploited Vulnerability is now reachable from public entrypoints"

      - alert: HighDriftRate
        expr: rate(scanner_drift_newly_reachable_total[1h]) > 10
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "High rate of new reachable vulnerabilities"

      - alert: CacheCircuitOpen
        expr: scanner_cache_circuit_breaker_open == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Valkey cache circuit breaker is open"

5. Troubleshooting

5.1 Call Graph Extraction Failures

Symptom: GRAPH_NOT_EXTRACTED error

Causes & Solutions:

Cause	Solution
Missing SDK/runtime	Install required SDK (.NET, Node.js, JDK)
Build errors in project	Fix compilation errors first
Timeout exceeded	Increase `extraction.timeout_seconds`
Memory exhaustion	Increase container memory limits
Unsupported language	Check language support matrix

Debugging:

# Check extraction logs
kubectl logs -f deployment/scanner | grep -i extraction

# Manual extraction test
stella scan callgraph \
  --project /path/to/project \
  --language dotnet \
  --verbose

5.2 Drift Detection Issues

Symptom: Drift not computed or incorrect results

Causes & Solutions:

Cause	Solution
No base scan available	Ensure previous scan exists
Different languages	Base and head must have same language
Graph digest unchanged	No material code changes detected
Cache stale	Clear Valkey cache for scan

Debugging:

# Check drift computation status
curl "http://scanner:8080/api/scanner/scans/{scanId}/drift"

# Force recomputation
curl -X POST \
  "http://scanner:8080/api/scanner/scans/{scanId}/compute-reachability" \
  -d '{"forceRecompute": true}'

# View graph digests
psql -c "SELECT scan_id, graph_digest FROM scanner.call_graph_snapshots ORDER BY extracted_at DESC LIMIT 10"

5.3 Cache Problems

Symptom: Slow performance, cache misses, circuit breaker open

Solutions:

# Check Valkey connectivity
redis-cli -h valkey.internal ping

# Check circuit breaker state
curl "http://scanner:8080/health/ready" | jq '.checks.cache'

# Clear specific scan cache
redis-cli DEL "stella-callgraph:scanId:*"

# Reset circuit breaker (restart scanner)
kubectl rollout restart deployment/scanner

5.4 Common Error Messages

Error	Meaning	Action
`ERR_GRAPH_TOO_LARGE`	> 100K nodes	Increase `max_nodes` or split project
`ERR_EXTRACTION_TIMEOUT`	Analysis timed out	Increase timeout or reduce scope
`ERR_NO_ENTRYPOINTS`	No public entrypoints found	Check framework detection
`ERR_BASE_SCAN_MISSING`	Base scan not found	Specify valid `baseScanId`
`ERR_CACHE_UNAVAILABLE`	Valkey unreachable	Check network, circuit breaker will activate

6. Performance Tuning

6.1 Call Graph Extraction

scanner:
  reachability:
    extraction:
      # Exclude test code (reduces graph size)
      include_test_code: false

      # Exclude vendored dependencies
      include_vendored: false

      # Limit analysis depth
      max_depth: 50  # Default: 100

      # Parallel project analysis
      parallelism: 4

6.2 Caching Strategy

cache:
  valkey:
    # Longer TTL for stable projects
    ttl_hours: 72

    # Aggressive compression for large graphs
    compression:
      level: optimal  # vs 'fastest'

    # Larger connection pool
    pool_size: 20

6.3 Database Optimization

-- Ensure indexes exist
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_callgraph_scan_lang
  ON scanner.call_graph_snapshots(scan_id, language);

CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_drift_head_scan
  ON scanner.reachability_drift_results(head_scan_id);

-- Vacuum after large imports
VACUUM ANALYZE scanner.call_graph_snapshots;
VACUUM ANALYZE scanner.reachability_drift_results;

7. Backup & Recovery

7.1 Database Backup

# Backup drift-related tables
pg_dump -h postgres.internal -U stellaops \
  -t scanner.call_graph_snapshots \
  -t scanner.reachability_results \
  -t scanner.reachability_drift_results \
  -t scanner.drifted_sinks \
  -t scanner.code_changes \
  > scanner_drift_backup.sql

7.2 Cache Recovery

# Export cache to file (if needed)
redis-cli -h valkey.internal --rdb /backup/callgraph-cache.rdb

# Cache is ephemeral - can be regenerated from database
# Recompute after cache loss:
stella scan recompute-reachability --all-pending

8. Security Considerations

8.1 Database Access

Scanner service uses dedicated PostgreSQL user with schema-limited permissions
Row-Level Security (RLS) enforces tenant isolation
Connection strings use secrets management (not plaintext)

8.2 API Authentication

All drift endpoints require valid Bearer token
Scopes: scanner:read, scanner:write, scanner:admin
Rate limiting prevents abuse

8.3 Attestation Signing

Drift results can be DSSE-signed for audit trails
Signing keys managed by Signer service
Optional Rekor transparency logging

9. References

Architecture: docs/modules/scanner/reachability-drift.md
API Reference: docs/api/scanner-drift-api.md
PostgreSQL Guide: docs/operations/postgresql-guide.md
Air-Gap Operations: docs/operations/airgap-operations-runbook.md
Reachability Runbook: docs/operations/reachability-runbook.md

13 KiB Raw Blame History