git.stella-ops.org/docs/runbooks/runtime-linkage-ops.md

# Runtime Linkage Verification - Operational Runbook

> **Audience:** Platform operators, SREs, security engineers
> **Related:** [Runtime Linkage Guide](../modules/scanner/guides/runtime-linkage.md), [Function Map V1 Contract](../contracts/function-map-v1.md)

## Overview

This runbook covers production deployment and operation of the runtime linkage verification system. The system uses eBPF probes to observe function calls and verifies them against declared function maps.

---

## Prerequisites

- Linux kernel 5.8+ (for eBPF CO-RE support)
- `CAP_BPF` and `CAP_PERFMON` capabilities for the runtime agent
- BTF (BPF Type Format) enabled in kernel config
- Stella runtime agent deployed as a DaemonSet or sidecar

---

## Deployment

### Runtime Agent Configuration

The Stella runtime agent (`stella-runtime-agent`) attaches eBPF probes based on function map predicates. Configuration via environment or YAML:

```yaml
runtime_agent:
  observation_store:
    type: "memory"  # or "postgres", "valkey"
    retention_hours: 72
    max_batch_size: 1000
  probes:
    max_concurrent: 256
    attach_timeout_ms: 5000
    default_types: ["uprobe", "kprobe"]
  export:
    format: "ndjson"
    flush_interval_ms: 5000
    output_path: "/var/stella/observations/"
```

### Probe Selection Guidance

| Category | Probe Type | Use Case |
|----------|-----------|----------|
| Crypto functions | `uprobe` | OpenSSL/BoringSSL/libsodium calls |
| Network I/O | `kprobe` | connect/sendto/recvfrom syscalls |
| Auth flows | `uprobe` | PAM/LDAP/OAuth library calls |
| File access | `kprobe` | open/read/write on sensitive paths |
| TLS handshake | `uprobe` | SSL_do_handshake, TLS negotiation |

**Prioritization:**
1. Start with crypto and auth paths (highest security relevance)
2. Add network I/O for service mesh verification
3. Expand to file access for compliance requirements

### Resource Overhead

Expected overhead per probe:
- CPU: ~0.1-0.5% per active uprobe (per-call overhead ~100ns)
- Memory: ~2KB per attached probe + observation buffer
- Disk: ~100 bytes per observation record (NDJSON)

**Recommended limits:**
- Max 256 concurrent probes per node
- Observation buffer: 64MB
- Flush interval: 5 seconds
- Retention: 72 hours (configurable)

---

## Operations

### Generating Function Maps

Run generation as part of CI/CD pipeline after SBOM generation:

```bash
# In CI after SBOM generation
stella function-map generate \
  --sbom ${BUILD_DIR}/sbom.cdx.json \
  --service ${SERVICE_NAME} \
  --hot-functions "crypto/*" --hot-functions "net/*" --hot-functions "auth/*" \
  --min-rate 0.95 \
  --window 1800 \
  --build-id ${CI_BUILD_ID} \
  --output ${BUILD_DIR}/function-map.json
```

Store the function map alongside the container image (OCI referrer or artifact registry).

### Continuous Verification

Set up periodic verification (cron or controller loop):

```bash
# Every 30 minutes, verify the last hour of observations
stella function-map verify \
  --function-map /etc/stella/function-map.json \
  --from "$(date -d '1 hour ago' -Iseconds)" \
  --to "$(date -Iseconds)" \
  --format json --output /var/stella/verification/latest.json
```

### Monitoring

Key metrics to alert on:

| Metric | Threshold | Action |
|--------|-----------|--------|
| `observation_rate` | < 0.80 | Warning: coverage dropping |
| `observation_rate` | < 0.50 | Critical: significant coverage loss |
| `unexpected_symbols_count` | > 0 | Investigate: undeclared functions executing |
| `probe_attach_failures` | > 5% | Warning: probe attachment issues |
| `observation_buffer_full` | true | Critical: observations being dropped |

### Alert Configuration

```yaml
alerts:
  - name: "function-map-coverage-low"
    condition: observation_rate < 0.80
    severity: warning
    description: "Function map coverage below 80% for {service}"
    runbook: "Check probe attachment, verify no binary update without map regeneration"

  - name: "function-map-unexpected-calls"
    condition: unexpected_symbols_count > 0
    severity: info
    description: "Unexpected function calls detected in {service}"
    runbook: "Review unexpected symbols, regenerate function map if benign"

  - name: "function-map-probe-failures"
    condition: probe_attach_failure_rate > 0.05
    severity: warning
    description: "Probe attachment failure rate above 5%"
    runbook: "Check kernel version, verify BTF availability, check CAP_BPF"
```

---

## Performance Tuning

### High-Traffic Services

For services with >10K calls/second on probed functions:

1. **Sampling:** Configure observation sampling rate:
   ```yaml
   probes:
     sampling_rate: 0.01  # 1% of calls
   ```

2. **Aggregation:** Use count-based observations instead of per-call:
   ```yaml
   export:
     aggregation_window_ms: 1000  # Aggregate per second
   ```

3. **Selective probing:** Use `--hot-functions` to limit to critical paths only

### Large Function Maps

For maps with >100 expected paths:

1. Tag paths by priority: `crypto` > `auth` > `network` > `general`
2. Mark low-priority paths as `optional: true`
3. Set per-tag minimum rates if needed

### Storage Optimization

For long-term observation storage:

1. Enable retention pruning: `pruneOlderThanAsync(72h)`
2. Compress archived observations (gzip NDJSON)
3. Use dedicated Postgres partitions by date for query performance

---

## Incident Response

### Coverage Dropped After Deployment

1. Check if binary was updated without regenerating the function map
2. Verify probes are still attached: `stella observations query --summary`
3. Check for symbol changes (ASLR, different build)
4. Regenerate function map from new SBOM and redeploy

### Unexpected Symbols Detected

1. Identify the unexpected functions from the verification report
2. Determine if they are:
   - **Benign:** Dynamic dispatch, plugins, lazy-loaded libraries → add to map
   - **Suspicious:** Unexpected crypto usage, network calls → escalate to security team
3. If benign, regenerate function map with broader patterns
4. If suspicious, correlate with vulnerability findings and open incident

### Probe Attachment Failures

1. Check kernel version: `uname -r` (need 5.8+)
2. Verify BTF: `ls /sys/kernel/btf/vmlinux`
3. Check capabilities: `capsh --print | grep bpf`
4. Check binary paths: verify `binary_path` in function map matches deployed binary
5. Check for SELinux/AppArmor blocking BPF operations

---

## Air-Gap Considerations

For air-gapped environments:

1. **Bundle generation** (connected side):
   ```bash
   stella function-map generate --sbom app.cdx.json --service my-service --output fm.json
   # Package with observations
   tar czf linkage-bundle.tgz fm.json observations/*.ndjson
   ```

2. **Transfer** via approved media to air-gapped environment

3. **Offline verification** (air-gapped side):
   ```bash
   stella function-map verify --function-map fm.json --offline --observations obs.ndjson
   ```

4. **Result export** for compliance reporting:
   ```bash
   stella function-map verify ... --format json --output report.json
   # Sign the report
   stella attest sign --input report.json --output report.dsse.json
   ```