Files
git.stella-ops.org/docs/reachability/ebpf-architecture.md
2026-01-28 02:30:48 +02:00

233 lines
7.5 KiB
Markdown

# eBPF Reachability Architecture
## System Overview
The eBPF reachability system captures kernel-level events to provide cryptographic proof of runtime behavior. It uses Linux eBPF (extended Berkeley Packet Filter) with CO-RE (Compile Once, Run Everywhere) for portable deployment across kernel versions.
## Design Principles
1. **Minimal Kernel Footprint**: eBPF programs perform only essential filtering and data capture
2. **User-Space Enrichment**: Complex lookups (symbols, containers, SBOMs) happen in user space
3. **Deterministic Output**: Same inputs produce byte-identical NDJSON output
4. **Chain of Custody**: Every evidence chunk is cryptographically signed and linked
## Component Architecture
### Kernel-Space Components
#### Ring Buffer (`BPF_MAP_TYPE_RINGBUF`)
- Single shared buffer for all event types (default 256KB)
- Lock-free, multi-producer design
- Automatic backpressure via `bpf_ringbuf_reserve()` failures
#### Tracepoint Probes
| Probe | Event Type | Purpose |
|-------|------------|---------|
| `tracepoint/syscalls/sys_enter_openat` | File access | Track which files are opened |
| `tracepoint/sched/sched_process_exec` | Process execution | Track binary invocations |
| `tracepoint/sock/inet_sock_set_state` | TCP state | Track network connections |
#### Uprobe Probes
| Probe | Library | Purpose |
|-------|---------|---------|
| `uprobe/libc.so:connect` | glibc/musl | Outbound network connections |
| `uprobe/libc.so:accept` | glibc/musl | Inbound connections |
| `uprobe/libssl.so:SSL_read` | OpenSSL | TLS traffic monitoring |
| `uprobe/libssl.so:SSL_write` | OpenSSL | TLS traffic monitoring |
#### BPF Maps for Filtering
```c
// Cgroup filter for container targeting
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, u64); // cgroup_id
__type(value, u8); // 1 = include
} cgroup_filter SEC(".maps");
// Namespace filter for multi-tenant isolation
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 256);
__type(key, u64); // namespace inode
__type(value, u8); // 1 = include
} namespace_filter SEC(".maps");
```
### User-Space Components
#### CoreProbeLoader
Manages eBPF program lifecycle:
- Loads compiled `.bpf.o` files via libbpf
- Attaches probes to tracepoints/uprobes
- Configures BPF maps for filtering
- Handles graceful detachment and cleanup
#### EventParser
Parses binary events from ring buffer:
- Fixed-size header with event type discriminator
- Type-specific payload parsing
- Timestamp normalization (boot time to wall clock)
#### CgroupContainerResolver
Maps kernel cgroup IDs to container identities:
- Parses `/proc/{pid}/cgroup` for container runtime paths
- Supports containerd, Docker, CRI-O path formats
- Caches mappings with configurable TTL
#### EnhancedSymbolResolver
Resolves addresses to human-readable symbols:
- Parses `/proc/{pid}/maps` for ASLR offsets
- Reads ELF symbol tables (`.symtab`, `.dynsym`)
- Optional DWARF debug info for line numbers
- LRU cache with bounded memory usage
#### RuntimeEventEnricher
Decorates events with container and SBOM metadata:
- Container ID and image digest correlation
- SBOM component (PURL) lookup
- Graceful degradation on missing metadata
#### RuntimeEvidenceNdjsonWriter
Produces deterministic NDJSON output:
- Canonical JSON serialization (sorted keys, no whitespace variance)
- Rolling BLAKE3 hash for content verification
- Size and time-based rotation with callbacks
#### EvidenceChunkFinalizer
Signs and links evidence chunks:
- Creates in-toto statements with chunk metadata
- Requests DSSE signatures via Signer service
- Submits to Rekor transparency log
- Maintains chain state (previous_chunk_id linkage)
## Data Flow
```
1. Kernel Event
├─► Tracepoint/Uprobe fires
│ └─► BPF program captures event data
│ └─► Filter by cgroup/namespace (optional)
│ └─► Submit to ring buffer
2. Ring Buffer Drain
├─► EventParser reads binary data
│ └─► Deserialize to typed event struct
│ └─► Validate event integrity
3. Resolution & Enrichment
├─► CgroupResolver: cgroup_id → container_id
├─► SymbolResolver: address → symbol name
├─► StateProvider: container_id → image_ref
├─► DigestResolver: image_ref → image_digest
└─► SbomProvider: image_digest → purls[]
4. Serialization
├─► RuntimeEvidenceNdjsonWriter
│ ├─► Canonical JSON serialization
│ ├─► Append to current chunk file
│ └─► Update rolling hash
5. Rotation & Signing
├─► Size/time threshold reached
│ └─► Close current chunk
│ └─► ChunkFinalizer
│ ├─► Create in-toto statement
│ ├─► Sign with DSSE
│ ├─► Submit to Rekor
│ └─► Link to previous chunk
6. Verification
└─► stella signals verify-chain
├─► Parse DSSE envelopes
├─► Verify signatures
├─► Check chain linkage
└─► Validate time monotonicity
```
## Performance Characteristics
### Kernel-Space
- Ring buffer prevents event loss under load (backpressure)
- In-kernel filtering reduces user-space processing
- BTF enables zero-copy field access
### User-Space
| Operation | Target Latency |
|-----------|---------------|
| Cached symbol lookup | < 1ms p99 |
| Uncached symbol lookup | < 10ms p99 |
| Container enrichment | < 10ms p99 |
| NDJSON write | < 1ms p99 |
### Throughput
- Target: 100,000 events/second sustained
- Rate limiting available for resource-constrained environments
## Memory Budget
| Component | Default | Configurable |
|-----------|---------|--------------|
| Ring buffer | 256 KB | Yes |
| Symbol cache | 100,000 entries | Yes |
| Container cache | 5 min TTL | Yes |
| Write buffer | 64 KB | Yes |
## Failure Modes
### Ring Buffer Overflow
- **Symptom**: Events dropped, warning logged
- **Mitigation**: Increase buffer size or enable rate limiting
### Symbol Resolution Failure
- **Symptom**: Address shown as `addr:0x{hex}`
- **Mitigation**: Ensure debug symbols available or accept address-only evidence
### Container Resolution Failure
- **Symptom**: `container_id = "unknown:{cgroup_id}"`
- **Mitigation**: Verify Zastava integration, check cgroup path format support
### Signing Failure
- **Symptom**: Chunk saved without signature, warning logged
- **Mitigation**: Check Signer service availability, verify Fulcio/KMS connectivity
## CO-RE (Compile Once, Run Everywhere)
The system uses BTF (BPF Type Format) for kernel-version-independent field access:
```c
// Access kernel struct fields without hardcoded offsets
struct task_struct *task = (void *)bpf_get_current_task();
pid_t pid = BPF_CORE_READ(task, pid);
pid_t tgid = BPF_CORE_READ(task, tgid);
```
**Requirements:**
- Kernel 5.2+ with built-in BTF (recommended)
- Kernel 4.14+ with external BTF from btfhub
## Integration Points
### Zastava (Container State)
- `IContainerIdentityResolver` interface
- Container lifecycle events (start/stop)
- Image reference to digest mapping
### Scanner (Reachability Merger)
- `EbpfSignalMerger` combines runtime with static analysis
- Symbol hash correlation via `RuntimeNodeHash`
### Signer (Evidence Signing)
- `IAttestationSigningService` for DSSE signatures
- `IRekorClient` for transparency log submission
### SBOM Service (Component Correlation)
- `ISbomComponentProvider` for PURL lookup
- Image digest to component mapping