Files
git.stella-ops.org/docs/modules/runtime-instrumentation/architecture.md

153 lines
8.1 KiB
Markdown

# Runtime Instrumentation Architecture
> Technical architecture for the eBPF event adapter bridging Tetragon into Stella Ops.
## Overview
The Runtime Instrumentation module is a stream-processing library that connects to the Tetragon eBPF agent via gRPC, receives raw kernel and user-space events, and converts them into the platform's canonical `RuntimeCallEvent` format. It does not expose HTTP endpoints or maintain a database -- it is consumed as a library by services that need runtime observation data (Signals, Scanner, Policy). The adapter decouples the rest of the platform from Tetragon's wire format and probe semantics.
## Design Principles
1. **Provider abstraction** - Downstream modules consume `RuntimeCallEvent`, not Tetragon-specific types; replacing the eBPF agent requires only a new adapter
2. **Privacy by default** - Sensitive data is filtered at the adapter boundary before events propagate into the platform
3. **Minimal allocation** - Event conversion is designed for high-throughput streaming with minimal object allocation
4. **Deterministic canonicalization** - Stack frame normalization produces stable, comparable output regardless of ASLR or load order
## Components
```
RuntimeInstrumentation/
├── StellaOps.RuntimeInstrumentation.Tetragon/ # Core adapter library
│ ├── TetragonEventAdapter.cs # Raw event -> RuntimeCallEvent conversion
│ ├── Models/
│ │ ├── TetragonEvent.cs # Raw Tetragon event representation
│ │ ├── RuntimeCallEvent.cs # Canonical platform event
│ │ ├── CanonicalStackFrame.cs # Normalized stack frame
│ │ └── ProbeType.cs # eBPF probe type enumeration
│ ├── StackCanonicalization/
│ │ ├── StackFrameCanonicalizer.cs # Symbol resolution and normalization
│ │ └── SymbolResolver.cs # Address-to-symbol mapping
│ ├── Privacy/
│ │ └── PrivacyFilter.cs # Sensitive data stripping
│ └── HotSymbol/
│ └── HotSymbolPublisher.cs # Publishes observed symbols to index
├── StellaOps.Agent.Tetragon/ # gRPC client for Tetragon agent
│ ├── TetragonGrpcClient.cs # gRPC stream consumer
│ ├── TetragonStreamReader.cs # Backpressure-aware stream reader
│ └── Proto/ # Tetragon protobuf definitions
└── __Tests/
└── StellaOps.RuntimeInstrumentation.Tests/ # Unit tests with fixture events
```
## Core Models
### RuntimeCallEvent (canonical output)
```csharp
public sealed record RuntimeCallEvent
{
public required string EventId { get; init; }
public required DateTimeOffset Timestamp { get; init; }
public required ProbeType ProbeType { get; init; }
public required ProcessInfo Process { get; init; }
public ThreadInfo? Thread { get; init; }
public required string Syscall { get; init; }
public IReadOnlyList<CanonicalStackFrame> StackFrames { get; init; }
public string? ContainerId { get; init; }
public string? PodName { get; init; }
public string? Namespace { get; init; }
}
```
### CanonicalStackFrame
```csharp
public sealed record CanonicalStackFrame
{
public required string Module { get; init; }
public required string Symbol { get; init; }
public ulong Offset { get; init; }
public bool IsKernelSpace { get; init; }
public string? SourceFile { get; init; }
public int? LineNumber { get; init; }
}
```
### ProbeType Enumeration
| Probe Type | Description | Origin |
|------------|-------------|--------|
| `ProcessExec` | New process execution | Tetragon process tracker |
| `ProcessExit` | Process termination | Tetragon process tracker |
| `Kprobe` | Kernel function entry | Kernel dynamic tracing |
| `Kretprobe` | Kernel function return | Kernel dynamic tracing |
| `Uprobe` | User-space function entry | User-space dynamic tracing |
| `Uretprobe` | User-space function return | User-space dynamic tracing |
| `Tracepoint` | Static kernel tracepoint | Kernel static tracing |
| `USDT` | User-space static tracepoint | Application-defined probes |
| `Fentry` | Kernel function entry (BPF trampoline) | Modern kernel tracing (5.5+) |
| `Fexit` | Kernel function exit (BPF trampoline) | Modern kernel tracing (5.5+) |
## Data Flow
```
[Tetragon Agent]
│ gRPC stream (protobuf)
[TetragonGrpcClient]
│ TetragonEvent (raw)
[TetragonEventAdapter]
├── [StackFrameCanonicalizer] ── symbol resolution ──> CanonicalStackFrame[]
├── [PrivacyFilter] ── strip sensitive data
├── [HotSymbolPublisher] ── publish to hot symbol index
[RuntimeCallEvent] (canonical)
├──> [Signals] (RTS scoring)
├──> [Scanner] (reachability validation)
└──> [Policy] (runtime evidence)
```
1. **Stream connection:** `TetragonGrpcClient` establishes a persistent gRPC stream to the Tetragon agent running on the same node.
2. **Raw event ingestion:** `TetragonStreamReader` reads events with backpressure handling; if the consumer falls behind, oldest events are dropped with a metric increment.
3. **Adaptation:** `TetragonEventAdapter` maps the raw `TetragonEvent` to a `RuntimeCallEvent`, invoking the stack canonicalizer and privacy filter.
4. **Stack canonicalization:** `StackFrameCanonicalizer` resolves addresses to symbols using the `SymbolResolver`, normalizes module paths, and separates kernel-space from user-space frames.
5. **Privacy filtering:** `PrivacyFilter` removes or redacts environment variables, sensitive command-line arguments, and file paths matching configurable patterns.
6. **Symbol publishing:** `HotSymbolPublisher` emits observed symbols to the hot symbol index, enabling runtime reachability correlation without requiring full re-analysis.
7. **Downstream consumption:** The resulting `RuntimeCallEvent` stream is consumed by Signals (for RTS scoring), Scanner (for reachability validation), and Policy (for runtime evidence in verdicts).
## Security Considerations
- **Privacy filtering:** All events pass through `PrivacyFilter` before leaving the instrumentation boundary. Configurable patterns control what gets redacted (default: environment variables, home directory paths, credential file paths).
- **Kernel vs user-space separation:** `CanonicalStackFrame.IsKernelSpace` flag ensures downstream consumers can distinguish privilege levels and avoid conflating kernel internals with application code.
- **No credential exposure:** The gRPC connection to Tetragon uses mTLS when available; connection parameters are configured via environment variables or mounted secrets, not hardcoded.
- **Minimal privilege:** The adapter library itself requires no elevated privileges; only the Tetragon agent (running as a DaemonSet) requires kernel access.
## Performance Characteristics
- **Throughput target:** Sustain 50,000 events/second per node without dropping events under normal load
- **Latency:** Event-to-canonical conversion target under 1ms per event
- **Backpressure:** When the consumer cannot keep up, `TetragonStreamReader` applies backpressure via gRPC flow control; persistent overload triggers event dropping with `events_dropped_total` metric
- **Memory:** Pooled buffers for protobuf deserialization to minimize GC pressure
## Observability
- **Metrics:** `runtime_events_received_total{probe_type}`, `runtime_events_converted_total`, `runtime_events_dropped_total`, `runtime_event_conversion_duration_ms`, `hot_symbols_published_total`
- **Logs:** Structured logs with `eventId`, `probeType`, `containerId`, `processName`
- **Health:** gRPC connection status and stream lag exposed for monitoring
## References
- [Module README](./README.md)
- [Signals Architecture](../signals/architecture.md) - RTS scoring consumer
- [Scanner Architecture](../scanner/architecture.md) - Reachability validation