finish off sprint advisories and sprints

2026-01-24 00:12:43 +02:00
parent 726d70dc7f
commit c70e83719e
266 changed files with 46699 additions and 1328 deletions
--- a/docs/runbooks/runtime-linkage-ops.md
+++ b/docs/runbooks/runtime-linkage-ops.md
@@ -0,0 +1,232 @@
+# Runtime Linkage Verification - Operational Runbook
+
+> **Audience:** Platform operators, SREs, security engineers
+> **Related:** [Runtime Linkage Guide](../modules/scanner/guides/runtime-linkage.md), [Function Map V1 Contract](../contracts/function-map-v1.md)
+
+## Overview
+
+This runbook covers production deployment and operation of the runtime linkage verification system. The system uses eBPF probes to observe function calls and verifies them against declared function maps.
+
+---
+
+## Prerequisites
+
+- Linux kernel 5.8+ (for eBPF CO-RE support)
+- `CAP_BPF` and `CAP_PERFMON` capabilities for the runtime agent
+- BTF (BPF Type Format) enabled in kernel config
+- Stella runtime agent deployed as a DaemonSet or sidecar
+
+---
+
+## Deployment
+
+### Runtime Agent Configuration
+
+The Stella runtime agent (`stella-runtime-agent`) attaches eBPF probes based on function map predicates. Configuration via environment or YAML:
+
+```yaml
+runtime_agent:
+  observation_store:
+    type: "memory"  # or "postgres", "valkey"
+    retention_hours: 72
+    max_batch_size: 1000
+  probes:
+    max_concurrent: 256
+    attach_timeout_ms: 5000
+    default_types: ["uprobe", "kprobe"]
+  export:
+    format: "ndjson"
+    flush_interval_ms: 5000
+    output_path: "/var/stella/observations/"
+```
+
+### Probe Selection Guidance
+
+| Category | Probe Type | Use Case |
+|----------|-----------|----------|
+| Crypto functions | `uprobe` | OpenSSL/BoringSSL/libsodium calls |
+| Network I/O | `kprobe` | connect/sendto/recvfrom syscalls |
+| Auth flows | `uprobe` | PAM/LDAP/OAuth library calls |
+| File access | `kprobe` | open/read/write on sensitive paths |
+| TLS handshake | `uprobe` | SSL_do_handshake, TLS negotiation |
+
+**Prioritization:**
+1. Start with crypto and auth paths (highest security relevance)
+2. Add network I/O for service mesh verification
+3. Expand to file access for compliance requirements
+
+### Resource Overhead
+
+Expected overhead per probe:
+- CPU: ~0.1-0.5% per active uprobe (per-call overhead ~100ns)
+- Memory: ~2KB per attached probe + observation buffer
+- Disk: ~100 bytes per observation record (NDJSON)
+
+**Recommended limits:**
+- Max 256 concurrent probes per node
+- Observation buffer: 64MB
+- Flush interval: 5 seconds
+- Retention: 72 hours (configurable)
+
+---
+
+## Operations
+
+### Generating Function Maps
+
+Run generation as part of CI/CD pipeline after SBOM generation:
+
+```bash
+# In CI after SBOM generation
+stella function-map generate \
+  --sbom ${BUILD_DIR}/sbom.cdx.json \
+  --service ${SERVICE_NAME} \
+  --hot-functions "crypto/*" --hot-functions "net/*" --hot-functions "auth/*" \
+  --min-rate 0.95 \
+  --window 1800 \
+  --build-id ${CI_BUILD_ID} \
+  --output ${BUILD_DIR}/function-map.json
+```
+
+Store the function map alongside the container image (OCI referrer or artifact registry).
+
+### Continuous Verification
+
+Set up periodic verification (cron or controller loop):
+
+```bash
+# Every 30 minutes, verify the last hour of observations
+stella function-map verify \
+  --function-map /etc/stella/function-map.json \
+  --from "$(date -d '1 hour ago' -Iseconds)" \
+  --to "$(date -Iseconds)" \
+  --format json --output /var/stella/verification/latest.json
+```
+
+### Monitoring
+
+Key metrics to alert on:
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| `observation_rate` | < 0.80 | Warning: coverage dropping |
+| `observation_rate` | < 0.50 | Critical: significant coverage loss |
+| `unexpected_symbols_count` | > 0 | Investigate: undeclared functions executing |
+| `probe_attach_failures` | > 5% | Warning: probe attachment issues |
+| `observation_buffer_full` | true | Critical: observations being dropped |
+
+### Alert Configuration
+
+```yaml
+alerts:
+  - name: "function-map-coverage-low"
+    condition: observation_rate < 0.80
+    severity: warning
+    description: "Function map coverage below 80% for {service}"
+    runbook: "Check probe attachment, verify no binary update without map regeneration"
+
+  - name: "function-map-unexpected-calls"
+    condition: unexpected_symbols_count > 0
+    severity: info
+    description: "Unexpected function calls detected in {service}"
+    runbook: "Review unexpected symbols, regenerate function map if benign"
+
+  - name: "function-map-probe-failures"
+    condition: probe_attach_failure_rate > 0.05
+    severity: warning
+    description: "Probe attachment failure rate above 5%"
+    runbook: "Check kernel version, verify BTF availability, check CAP_BPF"
+```
+
+---
+
+## Performance Tuning
+
+### High-Traffic Services
+
+For services with >10K calls/second on probed functions:
+
+1. **Sampling:** Configure observation sampling rate:
+   ```yaml
+   probes:
+     sampling_rate: 0.01  # 1% of calls
+   ```
+
+2. **Aggregation:** Use count-based observations instead of per-call:
+   ```yaml
+   export:
+     aggregation_window_ms: 1000  # Aggregate per second
+   ```
+
+3. **Selective probing:** Use `--hot-functions` to limit to critical paths only
+
+### Large Function Maps
+
+For maps with >100 expected paths:
+
+1. Tag paths by priority: `crypto` > `auth` > `network` > `general`
+2. Mark low-priority paths as `optional: true`
+3. Set per-tag minimum rates if needed
+
+### Storage Optimization
+
+For long-term observation storage:
+
+1. Enable retention pruning: `pruneOlderThanAsync(72h)`
+2. Compress archived observations (gzip NDJSON)
+3. Use dedicated Postgres partitions by date for query performance
+
+---
+
+## Incident Response
+
+### Coverage Dropped After Deployment
+
+1. Check if binary was updated without regenerating the function map
+2. Verify probes are still attached: `stella observations query --summary`
+3. Check for symbol changes (ASLR, different build)
+4. Regenerate function map from new SBOM and redeploy
+
+### Unexpected Symbols Detected
+
+1. Identify the unexpected functions from the verification report
+2. Determine if they are:
+   - **Benign:** Dynamic dispatch, plugins, lazy-loaded libraries → add to map
+   - **Suspicious:** Unexpected crypto usage, network calls → escalate to security team
+3. If benign, regenerate function map with broader patterns
+4. If suspicious, correlate with vulnerability findings and open incident
+
+### Probe Attachment Failures
+
+1. Check kernel version: `uname -r` (need 5.8+)
+2. Verify BTF: `ls /sys/kernel/btf/vmlinux`
+3. Check capabilities: `capsh --print | grep bpf`
+4. Check binary paths: verify `binary_path` in function map matches deployed binary
+5. Check for SELinux/AppArmor blocking BPF operations
+
+---
+
+## Air-Gap Considerations
+
+For air-gapped environments:
+
+1. **Bundle generation** (connected side):
+   ```bash
+   stella function-map generate --sbom app.cdx.json --service my-service --output fm.json
+   # Package with observations
+   tar czf linkage-bundle.tgz fm.json observations/*.ndjson
+   ```
+
+2. **Transfer** via approved media to air-gapped environment
+
+3. **Offline verification** (air-gapped side):
+   ```bash
+   stella function-map verify --function-map fm.json --offline --observations obs.ndjson
+   ```
+
+4. **Result export** for compliance reporting:
+   ```bash
+   stella function-map verify ... --format json --output report.json
+   # Sign the report
+   stella attest sign --input report.json --output report.dsse.json
+   ```