finish off sprint advisories and sprints

This commit is contained in:
master
2026-01-24 00:12:43 +02:00
parent 726d70dc7f
commit c70e83719e
266 changed files with 46699 additions and 1328 deletions

View File

@@ -0,0 +1,232 @@
# Runtime Linkage Verification - Operational Runbook
> **Audience:** Platform operators, SREs, security engineers
> **Related:** [Runtime Linkage Guide](../modules/scanner/guides/runtime-linkage.md), [Function Map V1 Contract](../contracts/function-map-v1.md)
## Overview
This runbook covers production deployment and operation of the runtime linkage verification system. The system uses eBPF probes to observe function calls and verifies them against declared function maps.
---
## Prerequisites
- Linux kernel 5.8+ (for eBPF CO-RE support)
- `CAP_BPF` and `CAP_PERFMON` capabilities for the runtime agent
- BTF (BPF Type Format) enabled in kernel config
- Stella runtime agent deployed as a DaemonSet or sidecar
---
## Deployment
### Runtime Agent Configuration
The Stella runtime agent (`stella-runtime-agent`) attaches eBPF probes based on function map predicates. Configuration via environment or YAML:
```yaml
runtime_agent:
observation_store:
type: "memory" # or "postgres", "valkey"
retention_hours: 72
max_batch_size: 1000
probes:
max_concurrent: 256
attach_timeout_ms: 5000
default_types: ["uprobe", "kprobe"]
export:
format: "ndjson"
flush_interval_ms: 5000
output_path: "/var/stella/observations/"
```
### Probe Selection Guidance
| Category | Probe Type | Use Case |
|----------|-----------|----------|
| Crypto functions | `uprobe` | OpenSSL/BoringSSL/libsodium calls |
| Network I/O | `kprobe` | connect/sendto/recvfrom syscalls |
| Auth flows | `uprobe` | PAM/LDAP/OAuth library calls |
| File access | `kprobe` | open/read/write on sensitive paths |
| TLS handshake | `uprobe` | SSL_do_handshake, TLS negotiation |
**Prioritization:**
1. Start with crypto and auth paths (highest security relevance)
2. Add network I/O for service mesh verification
3. Expand to file access for compliance requirements
### Resource Overhead
Expected overhead per probe:
- CPU: ~0.1-0.5% per active uprobe (per-call overhead ~100ns)
- Memory: ~2KB per attached probe + observation buffer
- Disk: ~100 bytes per observation record (NDJSON)
**Recommended limits:**
- Max 256 concurrent probes per node
- Observation buffer: 64MB
- Flush interval: 5 seconds
- Retention: 72 hours (configurable)
---
## Operations
### Generating Function Maps
Run generation as part of CI/CD pipeline after SBOM generation:
```bash
# In CI after SBOM generation
stella function-map generate \
--sbom ${BUILD_DIR}/sbom.cdx.json \
--service ${SERVICE_NAME} \
--hot-functions "crypto/*" --hot-functions "net/*" --hot-functions "auth/*" \
--min-rate 0.95 \
--window 1800 \
--build-id ${CI_BUILD_ID} \
--output ${BUILD_DIR}/function-map.json
```
Store the function map alongside the container image (OCI referrer or artifact registry).
### Continuous Verification
Set up periodic verification (cron or controller loop):
```bash
# Every 30 minutes, verify the last hour of observations
stella function-map verify \
--function-map /etc/stella/function-map.json \
--from "$(date -d '1 hour ago' -Iseconds)" \
--to "$(date -Iseconds)" \
--format json --output /var/stella/verification/latest.json
```
### Monitoring
Key metrics to alert on:
| Metric | Threshold | Action |
|--------|-----------|--------|
| `observation_rate` | < 0.80 | Warning: coverage dropping |
| `observation_rate` | < 0.50 | Critical: significant coverage loss |
| `unexpected_symbols_count` | > 0 | Investigate: undeclared functions executing |
| `probe_attach_failures` | > 5% | Warning: probe attachment issues |
| `observation_buffer_full` | true | Critical: observations being dropped |
### Alert Configuration
```yaml
alerts:
- name: "function-map-coverage-low"
condition: observation_rate < 0.80
severity: warning
description: "Function map coverage below 80% for {service}"
runbook: "Check probe attachment, verify no binary update without map regeneration"
- name: "function-map-unexpected-calls"
condition: unexpected_symbols_count > 0
severity: info
description: "Unexpected function calls detected in {service}"
runbook: "Review unexpected symbols, regenerate function map if benign"
- name: "function-map-probe-failures"
condition: probe_attach_failure_rate > 0.05
severity: warning
description: "Probe attachment failure rate above 5%"
runbook: "Check kernel version, verify BTF availability, check CAP_BPF"
```
---
## Performance Tuning
### High-Traffic Services
For services with >10K calls/second on probed functions:
1. **Sampling:** Configure observation sampling rate:
```yaml
probes:
sampling_rate: 0.01 # 1% of calls
```
2. **Aggregation:** Use count-based observations instead of per-call:
```yaml
export:
aggregation_window_ms: 1000 # Aggregate per second
```
3. **Selective probing:** Use `--hot-functions` to limit to critical paths only
### Large Function Maps
For maps with >100 expected paths:
1. Tag paths by priority: `crypto` > `auth` > `network` > `general`
2. Mark low-priority paths as `optional: true`
3. Set per-tag minimum rates if needed
### Storage Optimization
For long-term observation storage:
1. Enable retention pruning: `pruneOlderThanAsync(72h)`
2. Compress archived observations (gzip NDJSON)
3. Use dedicated Postgres partitions by date for query performance
---
## Incident Response
### Coverage Dropped After Deployment
1. Check if binary was updated without regenerating the function map
2. Verify probes are still attached: `stella observations query --summary`
3. Check for symbol changes (ASLR, different build)
4. Regenerate function map from new SBOM and redeploy
### Unexpected Symbols Detected
1. Identify the unexpected functions from the verification report
2. Determine if they are:
- **Benign:** Dynamic dispatch, plugins, lazy-loaded libraries → add to map
- **Suspicious:** Unexpected crypto usage, network calls → escalate to security team
3. If benign, regenerate function map with broader patterns
4. If suspicious, correlate with vulnerability findings and open incident
### Probe Attachment Failures
1. Check kernel version: `uname -r` (need 5.8+)
2. Verify BTF: `ls /sys/kernel/btf/vmlinux`
3. Check capabilities: `capsh --print | grep bpf`
4. Check binary paths: verify `binary_path` in function map matches deployed binary
5. Check for SELinux/AppArmor blocking BPF operations
---
## Air-Gap Considerations
For air-gapped environments:
1. **Bundle generation** (connected side):
```bash
stella function-map generate --sbom app.cdx.json --service my-service --output fm.json
# Package with observations
tar czf linkage-bundle.tgz fm.json observations/*.ndjson
```
2. **Transfer** via approved media to air-gapped environment
3. **Offline verification** (air-gapped side):
```bash
stella function-map verify --function-map fm.json --offline --observations obs.ndjson
```
4. **Result export** for compliance reporting:
```bash
stella function-map verify ... --format json --output report.json
# Sign the report
stella attest sign --input report.json --output report.dsse.json
```