233 lines
7.0 KiB
Markdown
233 lines
7.0 KiB
Markdown
# Runtime Linkage Verification - Operational Runbook
|
|
|
|
> **Audience:** Platform operators, SREs, security engineers
|
|
> **Related:** [Runtime Linkage Guide](../modules/scanner/guides/runtime-linkage.md), [Function Map V1 Contract](../contracts/function-map-v1.md)
|
|
|
|
## Overview
|
|
|
|
This runbook covers production deployment and operation of the runtime linkage verification system. The system uses eBPF probes to observe function calls and verifies them against declared function maps.
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
- Linux kernel 5.8+ (for eBPF CO-RE support)
|
|
- `CAP_BPF` and `CAP_PERFMON` capabilities for the runtime agent
|
|
- BTF (BPF Type Format) enabled in kernel config
|
|
- Stella runtime agent deployed as a DaemonSet or sidecar
|
|
|
|
---
|
|
|
|
## Deployment
|
|
|
|
### Runtime Agent Configuration
|
|
|
|
The Stella runtime agent (`stella-runtime-agent`) attaches eBPF probes based on function map predicates. Configuration via environment or YAML:
|
|
|
|
```yaml
|
|
runtime_agent:
|
|
observation_store:
|
|
type: "memory" # or "postgres", "valkey"
|
|
retention_hours: 72
|
|
max_batch_size: 1000
|
|
probes:
|
|
max_concurrent: 256
|
|
attach_timeout_ms: 5000
|
|
default_types: ["uprobe", "kprobe"]
|
|
export:
|
|
format: "ndjson"
|
|
flush_interval_ms: 5000
|
|
output_path: "/var/stella/observations/"
|
|
```
|
|
|
|
### Probe Selection Guidance
|
|
|
|
| Category | Probe Type | Use Case |
|
|
|----------|-----------|----------|
|
|
| Crypto functions | `uprobe` | OpenSSL/BoringSSL/libsodium calls |
|
|
| Network I/O | `kprobe` | connect/sendto/recvfrom syscalls |
|
|
| Auth flows | `uprobe` | PAM/LDAP/OAuth library calls |
|
|
| File access | `kprobe` | open/read/write on sensitive paths |
|
|
| TLS handshake | `uprobe` | SSL_do_handshake, TLS negotiation |
|
|
|
|
**Prioritization:**
|
|
1. Start with crypto and auth paths (highest security relevance)
|
|
2. Add network I/O for service mesh verification
|
|
3. Expand to file access for compliance requirements
|
|
|
|
### Resource Overhead
|
|
|
|
Expected overhead per probe:
|
|
- CPU: ~0.1-0.5% per active uprobe (per-call overhead ~100ns)
|
|
- Memory: ~2KB per attached probe + observation buffer
|
|
- Disk: ~100 bytes per observation record (NDJSON)
|
|
|
|
**Recommended limits:**
|
|
- Max 256 concurrent probes per node
|
|
- Observation buffer: 64MB
|
|
- Flush interval: 5 seconds
|
|
- Retention: 72 hours (configurable)
|
|
|
|
---
|
|
|
|
## Operations
|
|
|
|
### Generating Function Maps
|
|
|
|
Run generation as part of CI/CD pipeline after SBOM generation:
|
|
|
|
```bash
|
|
# In CI after SBOM generation
|
|
stella function-map generate \
|
|
--sbom ${BUILD_DIR}/sbom.cdx.json \
|
|
--service ${SERVICE_NAME} \
|
|
--hot-functions "crypto/*" --hot-functions "net/*" --hot-functions "auth/*" \
|
|
--min-rate 0.95 \
|
|
--window 1800 \
|
|
--build-id ${CI_BUILD_ID} \
|
|
--output ${BUILD_DIR}/function-map.json
|
|
```
|
|
|
|
Store the function map alongside the container image (OCI referrer or artifact registry).
|
|
|
|
### Continuous Verification
|
|
|
|
Set up periodic verification (cron or controller loop):
|
|
|
|
```bash
|
|
# Every 30 minutes, verify the last hour of observations
|
|
stella function-map verify \
|
|
--function-map /etc/stella/function-map.json \
|
|
--from "$(date -d '1 hour ago' -Iseconds)" \
|
|
--to "$(date -Iseconds)" \
|
|
--format json --output /var/stella/verification/latest.json
|
|
```
|
|
|
|
### Monitoring
|
|
|
|
Key metrics to alert on:
|
|
|
|
| Metric | Threshold | Action |
|
|
|--------|-----------|--------|
|
|
| `observation_rate` | < 0.80 | Warning: coverage dropping |
|
|
| `observation_rate` | < 0.50 | Critical: significant coverage loss |
|
|
| `unexpected_symbols_count` | > 0 | Investigate: undeclared functions executing |
|
|
| `probe_attach_failures` | > 5% | Warning: probe attachment issues |
|
|
| `observation_buffer_full` | true | Critical: observations being dropped |
|
|
|
|
### Alert Configuration
|
|
|
|
```yaml
|
|
alerts:
|
|
- name: "function-map-coverage-low"
|
|
condition: observation_rate < 0.80
|
|
severity: warning
|
|
description: "Function map coverage below 80% for {service}"
|
|
runbook: "Check probe attachment, verify no binary update without map regeneration"
|
|
|
|
- name: "function-map-unexpected-calls"
|
|
condition: unexpected_symbols_count > 0
|
|
severity: info
|
|
description: "Unexpected function calls detected in {service}"
|
|
runbook: "Review unexpected symbols, regenerate function map if benign"
|
|
|
|
- name: "function-map-probe-failures"
|
|
condition: probe_attach_failure_rate > 0.05
|
|
severity: warning
|
|
description: "Probe attachment failure rate above 5%"
|
|
runbook: "Check kernel version, verify BTF availability, check CAP_BPF"
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Tuning
|
|
|
|
### High-Traffic Services
|
|
|
|
For services with >10K calls/second on probed functions:
|
|
|
|
1. **Sampling:** Configure observation sampling rate:
|
|
```yaml
|
|
probes:
|
|
sampling_rate: 0.01 # 1% of calls
|
|
```
|
|
|
|
2. **Aggregation:** Use count-based observations instead of per-call:
|
|
```yaml
|
|
export:
|
|
aggregation_window_ms: 1000 # Aggregate per second
|
|
```
|
|
|
|
3. **Selective probing:** Use `--hot-functions` to limit to critical paths only
|
|
|
|
### Large Function Maps
|
|
|
|
For maps with >100 expected paths:
|
|
|
|
1. Tag paths by priority: `crypto` > `auth` > `network` > `general`
|
|
2. Mark low-priority paths as `optional: true`
|
|
3. Set per-tag minimum rates if needed
|
|
|
|
### Storage Optimization
|
|
|
|
For long-term observation storage:
|
|
|
|
1. Enable retention pruning: `pruneOlderThanAsync(72h)`
|
|
2. Compress archived observations (gzip NDJSON)
|
|
3. Use dedicated Postgres partitions by date for query performance
|
|
|
|
---
|
|
|
|
## Incident Response
|
|
|
|
### Coverage Dropped After Deployment
|
|
|
|
1. Check if binary was updated without regenerating the function map
|
|
2. Verify probes are still attached: `stella observations query --summary`
|
|
3. Check for symbol changes (ASLR, different build)
|
|
4. Regenerate function map from new SBOM and redeploy
|
|
|
|
### Unexpected Symbols Detected
|
|
|
|
1. Identify the unexpected functions from the verification report
|
|
2. Determine if they are:
|
|
- **Benign:** Dynamic dispatch, plugins, lazy-loaded libraries → add to map
|
|
- **Suspicious:** Unexpected crypto usage, network calls → escalate to security team
|
|
3. If benign, regenerate function map with broader patterns
|
|
4. If suspicious, correlate with vulnerability findings and open incident
|
|
|
|
### Probe Attachment Failures
|
|
|
|
1. Check kernel version: `uname -r` (need 5.8+)
|
|
2. Verify BTF: `ls /sys/kernel/btf/vmlinux`
|
|
3. Check capabilities: `capsh --print | grep bpf`
|
|
4. Check binary paths: verify `binary_path` in function map matches deployed binary
|
|
5. Check for SELinux/AppArmor blocking BPF operations
|
|
|
|
---
|
|
|
|
## Air-Gap Considerations
|
|
|
|
For air-gapped environments:
|
|
|
|
1. **Bundle generation** (connected side):
|
|
```bash
|
|
stella function-map generate --sbom app.cdx.json --service my-service --output fm.json
|
|
# Package with observations
|
|
tar czf linkage-bundle.tgz fm.json observations/*.ndjson
|
|
```
|
|
|
|
2. **Transfer** via approved media to air-gapped environment
|
|
|
|
3. **Offline verification** (air-gapped side):
|
|
```bash
|
|
stella function-map verify --function-map fm.json --offline --observations obs.ndjson
|
|
```
|
|
|
|
4. **Result export** for compliance reporting:
|
|
```bash
|
|
stella function-map verify ... --format json --output report.json
|
|
# Sign the report
|
|
stella attest sign --input report.json --output report.dsse.json
|
|
```
|