153 lines
8.1 KiB
Markdown
153 lines
8.1 KiB
Markdown
# Runtime Instrumentation Architecture
|
|
|
|
> Technical architecture for the eBPF event adapter bridging Tetragon into Stella Ops.
|
|
|
|
## Overview
|
|
|
|
The Runtime Instrumentation module is a stream-processing library that connects to the Tetragon eBPF agent via gRPC, receives raw kernel and user-space events, and converts them into the platform's canonical `RuntimeCallEvent` format. It does not expose HTTP endpoints or maintain a database -- it is consumed as a library by services that need runtime observation data (Signals, Scanner, Policy). The adapter decouples the rest of the platform from Tetragon's wire format and probe semantics.
|
|
|
|
## Design Principles
|
|
|
|
1. **Provider abstraction** - Downstream modules consume `RuntimeCallEvent`, not Tetragon-specific types; replacing the eBPF agent requires only a new adapter
|
|
2. **Privacy by default** - Sensitive data is filtered at the adapter boundary before events propagate into the platform
|
|
3. **Minimal allocation** - Event conversion is designed for high-throughput streaming with minimal object allocation
|
|
4. **Deterministic canonicalization** - Stack frame normalization produces stable, comparable output regardless of ASLR or load order
|
|
|
|
## Components
|
|
|
|
```
|
|
RuntimeInstrumentation/
|
|
├── StellaOps.RuntimeInstrumentation.Tetragon/ # Core adapter library
|
|
│ ├── TetragonEventAdapter.cs # Raw event -> RuntimeCallEvent conversion
|
|
│ ├── Models/
|
|
│ │ ├── TetragonEvent.cs # Raw Tetragon event representation
|
|
│ │ ├── RuntimeCallEvent.cs # Canonical platform event
|
|
│ │ ├── CanonicalStackFrame.cs # Normalized stack frame
|
|
│ │ └── ProbeType.cs # eBPF probe type enumeration
|
|
│ ├── StackCanonicalization/
|
|
│ │ ├── StackFrameCanonicalizer.cs # Symbol resolution and normalization
|
|
│ │ └── SymbolResolver.cs # Address-to-symbol mapping
|
|
│ ├── Privacy/
|
|
│ │ └── PrivacyFilter.cs # Sensitive data stripping
|
|
│ └── HotSymbol/
|
|
│ └── HotSymbolPublisher.cs # Publishes observed symbols to index
|
|
│
|
|
├── StellaOps.Agent.Tetragon/ # gRPC client for Tetragon agent
|
|
│ ├── TetragonGrpcClient.cs # gRPC stream consumer
|
|
│ ├── TetragonStreamReader.cs # Backpressure-aware stream reader
|
|
│ └── Proto/ # Tetragon protobuf definitions
|
|
│
|
|
└── __Tests/
|
|
└── StellaOps.RuntimeInstrumentation.Tests/ # Unit tests with fixture events
|
|
```
|
|
|
|
## Core Models
|
|
|
|
### RuntimeCallEvent (canonical output)
|
|
|
|
```csharp
|
|
public sealed record RuntimeCallEvent
|
|
{
|
|
public required string EventId { get; init; }
|
|
public required DateTimeOffset Timestamp { get; init; }
|
|
public required ProbeType ProbeType { get; init; }
|
|
public required ProcessInfo Process { get; init; }
|
|
public ThreadInfo? Thread { get; init; }
|
|
public required string Syscall { get; init; }
|
|
public IReadOnlyList<CanonicalStackFrame> StackFrames { get; init; }
|
|
public string? ContainerId { get; init; }
|
|
public string? PodName { get; init; }
|
|
public string? Namespace { get; init; }
|
|
}
|
|
```
|
|
|
|
### CanonicalStackFrame
|
|
|
|
```csharp
|
|
public sealed record CanonicalStackFrame
|
|
{
|
|
public required string Module { get; init; }
|
|
public required string Symbol { get; init; }
|
|
public ulong Offset { get; init; }
|
|
public bool IsKernelSpace { get; init; }
|
|
public string? SourceFile { get; init; }
|
|
public int? LineNumber { get; init; }
|
|
}
|
|
```
|
|
|
|
### ProbeType Enumeration
|
|
|
|
| Probe Type | Description | Origin |
|
|
|------------|-------------|--------|
|
|
| `ProcessExec` | New process execution | Tetragon process tracker |
|
|
| `ProcessExit` | Process termination | Tetragon process tracker |
|
|
| `Kprobe` | Kernel function entry | Kernel dynamic tracing |
|
|
| `Kretprobe` | Kernel function return | Kernel dynamic tracing |
|
|
| `Uprobe` | User-space function entry | User-space dynamic tracing |
|
|
| `Uretprobe` | User-space function return | User-space dynamic tracing |
|
|
| `Tracepoint` | Static kernel tracepoint | Kernel static tracing |
|
|
| `USDT` | User-space static tracepoint | Application-defined probes |
|
|
| `Fentry` | Kernel function entry (BPF trampoline) | Modern kernel tracing (5.5+) |
|
|
| `Fexit` | Kernel function exit (BPF trampoline) | Modern kernel tracing (5.5+) |
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
[Tetragon Agent]
|
|
│
|
|
│ gRPC stream (protobuf)
|
|
▼
|
|
[TetragonGrpcClient]
|
|
│
|
|
│ TetragonEvent (raw)
|
|
▼
|
|
[TetragonEventAdapter]
|
|
│
|
|
├── [StackFrameCanonicalizer] ── symbol resolution ──> CanonicalStackFrame[]
|
|
│
|
|
├── [PrivacyFilter] ── strip sensitive data
|
|
│
|
|
├── [HotSymbolPublisher] ── publish to hot symbol index
|
|
│
|
|
▼
|
|
[RuntimeCallEvent] (canonical)
|
|
│
|
|
├──> [Signals] (RTS scoring)
|
|
├──> [Scanner] (reachability validation)
|
|
└──> [Policy] (runtime evidence)
|
|
```
|
|
|
|
1. **Stream connection:** `TetragonGrpcClient` establishes a persistent gRPC stream to the Tetragon agent running on the same node.
|
|
2. **Raw event ingestion:** `TetragonStreamReader` reads events with backpressure handling; if the consumer falls behind, oldest events are dropped with a metric increment.
|
|
3. **Adaptation:** `TetragonEventAdapter` maps the raw `TetragonEvent` to a `RuntimeCallEvent`, invoking the stack canonicalizer and privacy filter.
|
|
4. **Stack canonicalization:** `StackFrameCanonicalizer` resolves addresses to symbols using the `SymbolResolver`, normalizes module paths, and separates kernel-space from user-space frames.
|
|
5. **Privacy filtering:** `PrivacyFilter` removes or redacts environment variables, sensitive command-line arguments, and file paths matching configurable patterns.
|
|
6. **Symbol publishing:** `HotSymbolPublisher` emits observed symbols to the hot symbol index, enabling runtime reachability correlation without requiring full re-analysis.
|
|
7. **Downstream consumption:** The resulting `RuntimeCallEvent` stream is consumed by Signals (for RTS scoring), Scanner (for reachability validation), and Policy (for runtime evidence in verdicts).
|
|
|
|
## Security Considerations
|
|
|
|
- **Privacy filtering:** All events pass through `PrivacyFilter` before leaving the instrumentation boundary. Configurable patterns control what gets redacted (default: environment variables, home directory paths, credential file paths).
|
|
- **Kernel vs user-space separation:** `CanonicalStackFrame.IsKernelSpace` flag ensures downstream consumers can distinguish privilege levels and avoid conflating kernel internals with application code.
|
|
- **No credential exposure:** The gRPC connection to Tetragon uses mTLS when available; connection parameters are configured via environment variables or mounted secrets, not hardcoded.
|
|
- **Minimal privilege:** The adapter library itself requires no elevated privileges; only the Tetragon agent (running as a DaemonSet) requires kernel access.
|
|
|
|
## Performance Characteristics
|
|
|
|
- **Throughput target:** Sustain 50,000 events/second per node without dropping events under normal load
|
|
- **Latency:** Event-to-canonical conversion target under 1ms per event
|
|
- **Backpressure:** When the consumer cannot keep up, `TetragonStreamReader` applies backpressure via gRPC flow control; persistent overload triggers event dropping with `events_dropped_total` metric
|
|
- **Memory:** Pooled buffers for protobuf deserialization to minimize GC pressure
|
|
|
|
## Observability
|
|
|
|
- **Metrics:** `runtime_events_received_total{probe_type}`, `runtime_events_converted_total`, `runtime_events_dropped_total`, `runtime_event_conversion_duration_ms`, `hot_symbols_published_total`
|
|
- **Logs:** Structured logs with `eventId`, `probeType`, `containerId`, `processName`
|
|
- **Health:** gRPC connection status and stream lag exposed for monitoring
|
|
|
|
## References
|
|
|
|
- [Module README](./README.md)
|
|
- [Signals Architecture](../signals/architecture.md) - RTS scoring consumer
|
|
- [Scanner Architecture](../scanner/architecture.md) - Reachability validation
|