wip: doctor/cli/docs/api to vector db consolidation; api hardening for descriptions, tenant, and scopes; migrations and conversions of all DALs to EF v10
This commit is contained in:
43
docs/modules/runtime-instrumentation/README.md
Normal file
43
docs/modules/runtime-instrumentation/README.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Runtime Instrumentation
|
||||
|
||||
> Bridges eBPF-based runtime monitoring into the Stella Ops platform, converting kernel-level events into canonical format for reachability validation and signal scoring.
|
||||
|
||||
## Purpose
|
||||
|
||||
Runtime Instrumentation adapts raw eBPF events from Tetragon into the Stella Ops canonical `RuntimeCallEvent` format. This enables the platform to incorporate live runtime observations (system calls, function probes, process lifecycle) into reachability validation and evidence-weighted vulnerability scoring without coupling downstream modules to any specific eBPF agent.
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [Architecture](./architecture.md) - Technical design and implementation details
|
||||
|
||||
## Status
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Maturity** | Beta |
|
||||
| **Source** | `src/RuntimeInstrumentation/` |
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Tetragon gRPC client:** Connects to the Tetragon agent's gRPC stream and ingests raw eBPF events in real time
|
||||
- **eBPF probe type mapping:** Supports all major probe types -- Kprobe, Kretprobe, Uprobe, Uretprobe, Tracepoint, USDT, Fentry, Fexit, ProcessExec, ProcessExit
|
||||
- **Stack frame canonicalization:** Converts raw kernel/user-space stack frames into `CanonicalStackFrame` with symbol resolution and address normalization
|
||||
- **Hot symbol index updates:** Publishes observed symbols to the hot symbol index for runtime reachability correlation
|
||||
- **Privacy filtering:** Strips sensitive data (environment variables, command arguments, file paths) before events leave the instrumentation boundary
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Upstream (this module depends on)
|
||||
- **Tetragon** - External eBPF agent providing kernel-level event streams via gRPC
|
||||
|
||||
### Downstream (modules that depend on this)
|
||||
- **Signals** - Consumes `RuntimeCallEvent` data for runtime signal scoring (RTS dimension)
|
||||
- **Scanner** - Uses runtime observations for reachability validation
|
||||
- **Policy** - Incorporates runtime evidence into policy evaluation and verdicts
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Signals](../signals/) - Runtime signal scoring using RTS dimension
|
||||
- [Signals eBPF Contract](../signals/contracts/ebpf-micro-witness-determinism-profile.md) - Determinism profile for eBPF witnesses
|
||||
- [Scanner](../scanner/) - Reachability validation
|
||||
- [Policy](../policy/) - Runtime evidence in policy decisions
|
||||
152
docs/modules/runtime-instrumentation/architecture.md
Normal file
152
docs/modules/runtime-instrumentation/architecture.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# Runtime Instrumentation Architecture
|
||||
|
||||
> Technical architecture for the eBPF event adapter bridging Tetragon into Stella Ops.
|
||||
|
||||
## Overview
|
||||
|
||||
The Runtime Instrumentation module is a stream-processing library that connects to the Tetragon eBPF agent via gRPC, receives raw kernel and user-space events, and converts them into the platform's canonical `RuntimeCallEvent` format. It does not expose HTTP endpoints or maintain a database -- it is consumed as a library by services that need runtime observation data (Signals, Scanner, Policy). The adapter decouples the rest of the platform from Tetragon's wire format and probe semantics.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Provider abstraction** - Downstream modules consume `RuntimeCallEvent`, not Tetragon-specific types; replacing the eBPF agent requires only a new adapter
|
||||
2. **Privacy by default** - Sensitive data is filtered at the adapter boundary before events propagate into the platform
|
||||
3. **Minimal allocation** - Event conversion is designed for high-throughput streaming with minimal object allocation
|
||||
4. **Deterministic canonicalization** - Stack frame normalization produces stable, comparable output regardless of ASLR or load order
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
RuntimeInstrumentation/
|
||||
├── StellaOps.RuntimeInstrumentation.Tetragon/ # Core adapter library
|
||||
│ ├── TetragonEventAdapter.cs # Raw event -> RuntimeCallEvent conversion
|
||||
│ ├── Models/
|
||||
│ │ ├── TetragonEvent.cs # Raw Tetragon event representation
|
||||
│ │ ├── RuntimeCallEvent.cs # Canonical platform event
|
||||
│ │ ├── CanonicalStackFrame.cs # Normalized stack frame
|
||||
│ │ └── ProbeType.cs # eBPF probe type enumeration
|
||||
│ ├── StackCanonicalization/
|
||||
│ │ ├── StackFrameCanonicalizer.cs # Symbol resolution and normalization
|
||||
│ │ └── SymbolResolver.cs # Address-to-symbol mapping
|
||||
│ ├── Privacy/
|
||||
│ │ └── PrivacyFilter.cs # Sensitive data stripping
|
||||
│ └── HotSymbol/
|
||||
│ └── HotSymbolPublisher.cs # Publishes observed symbols to index
|
||||
│
|
||||
├── StellaOps.Agent.Tetragon/ # gRPC client for Tetragon agent
|
||||
│ ├── TetragonGrpcClient.cs # gRPC stream consumer
|
||||
│ ├── TetragonStreamReader.cs # Backpressure-aware stream reader
|
||||
│ └── Proto/ # Tetragon protobuf definitions
|
||||
│
|
||||
└── __Tests/
|
||||
└── StellaOps.RuntimeInstrumentation.Tests/ # Unit tests with fixture events
|
||||
```
|
||||
|
||||
## Core Models
|
||||
|
||||
### RuntimeCallEvent (canonical output)
|
||||
|
||||
```csharp
|
||||
public sealed record RuntimeCallEvent
|
||||
{
|
||||
public required string EventId { get; init; }
|
||||
public required DateTimeOffset Timestamp { get; init; }
|
||||
public required ProbeType ProbeType { get; init; }
|
||||
public required ProcessInfo Process { get; init; }
|
||||
public ThreadInfo? Thread { get; init; }
|
||||
public required string Syscall { get; init; }
|
||||
public IReadOnlyList<CanonicalStackFrame> StackFrames { get; init; }
|
||||
public string? ContainerId { get; init; }
|
||||
public string? PodName { get; init; }
|
||||
public string? Namespace { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### CanonicalStackFrame
|
||||
|
||||
```csharp
|
||||
public sealed record CanonicalStackFrame
|
||||
{
|
||||
public required string Module { get; init; }
|
||||
public required string Symbol { get; init; }
|
||||
public ulong Offset { get; init; }
|
||||
public bool IsKernelSpace { get; init; }
|
||||
public string? SourceFile { get; init; }
|
||||
public int? LineNumber { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### ProbeType Enumeration
|
||||
|
||||
| Probe Type | Description | Origin |
|
||||
|------------|-------------|--------|
|
||||
| `ProcessExec` | New process execution | Tetragon process tracker |
|
||||
| `ProcessExit` | Process termination | Tetragon process tracker |
|
||||
| `Kprobe` | Kernel function entry | Kernel dynamic tracing |
|
||||
| `Kretprobe` | Kernel function return | Kernel dynamic tracing |
|
||||
| `Uprobe` | User-space function entry | User-space dynamic tracing |
|
||||
| `Uretprobe` | User-space function return | User-space dynamic tracing |
|
||||
| `Tracepoint` | Static kernel tracepoint | Kernel static tracing |
|
||||
| `USDT` | User-space static tracepoint | Application-defined probes |
|
||||
| `Fentry` | Kernel function entry (BPF trampoline) | Modern kernel tracing (5.5+) |
|
||||
| `Fexit` | Kernel function exit (BPF trampoline) | Modern kernel tracing (5.5+) |
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
[Tetragon Agent]
|
||||
│
|
||||
│ gRPC stream (protobuf)
|
||||
▼
|
||||
[TetragonGrpcClient]
|
||||
│
|
||||
│ TetragonEvent (raw)
|
||||
▼
|
||||
[TetragonEventAdapter]
|
||||
│
|
||||
├── [StackFrameCanonicalizer] ── symbol resolution ──> CanonicalStackFrame[]
|
||||
│
|
||||
├── [PrivacyFilter] ── strip sensitive data
|
||||
│
|
||||
├── [HotSymbolPublisher] ── publish to hot symbol index
|
||||
│
|
||||
▼
|
||||
[RuntimeCallEvent] (canonical)
|
||||
│
|
||||
├──> [Signals] (RTS scoring)
|
||||
├──> [Scanner] (reachability validation)
|
||||
└──> [Policy] (runtime evidence)
|
||||
```
|
||||
|
||||
1. **Stream connection:** `TetragonGrpcClient` establishes a persistent gRPC stream to the Tetragon agent running on the same node.
|
||||
2. **Raw event ingestion:** `TetragonStreamReader` reads events with backpressure handling; if the consumer falls behind, oldest events are dropped with a metric increment.
|
||||
3. **Adaptation:** `TetragonEventAdapter` maps the raw `TetragonEvent` to a `RuntimeCallEvent`, invoking the stack canonicalizer and privacy filter.
|
||||
4. **Stack canonicalization:** `StackFrameCanonicalizer` resolves addresses to symbols using the `SymbolResolver`, normalizes module paths, and separates kernel-space from user-space frames.
|
||||
5. **Privacy filtering:** `PrivacyFilter` removes or redacts environment variables, sensitive command-line arguments, and file paths matching configurable patterns.
|
||||
6. **Symbol publishing:** `HotSymbolPublisher` emits observed symbols to the hot symbol index, enabling runtime reachability correlation without requiring full re-analysis.
|
||||
7. **Downstream consumption:** The resulting `RuntimeCallEvent` stream is consumed by Signals (for RTS scoring), Scanner (for reachability validation), and Policy (for runtime evidence in verdicts).
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Privacy filtering:** All events pass through `PrivacyFilter` before leaving the instrumentation boundary. Configurable patterns control what gets redacted (default: environment variables, home directory paths, credential file paths).
|
||||
- **Kernel vs user-space separation:** `CanonicalStackFrame.IsKernelSpace` flag ensures downstream consumers can distinguish privilege levels and avoid conflating kernel internals with application code.
|
||||
- **No credential exposure:** The gRPC connection to Tetragon uses mTLS when available; connection parameters are configured via environment variables or mounted secrets, not hardcoded.
|
||||
- **Minimal privilege:** The adapter library itself requires no elevated privileges; only the Tetragon agent (running as a DaemonSet) requires kernel access.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Throughput target:** Sustain 50,000 events/second per node without dropping events under normal load
|
||||
- **Latency:** Event-to-canonical conversion target under 1ms per event
|
||||
- **Backpressure:** When the consumer cannot keep up, `TetragonStreamReader` applies backpressure via gRPC flow control; persistent overload triggers event dropping with `events_dropped_total` metric
|
||||
- **Memory:** Pooled buffers for protobuf deserialization to minimize GC pressure
|
||||
|
||||
## Observability
|
||||
|
||||
- **Metrics:** `runtime_events_received_total{probe_type}`, `runtime_events_converted_total`, `runtime_events_dropped_total`, `runtime_event_conversion_duration_ms`, `hot_symbols_published_total`
|
||||
- **Logs:** Structured logs with `eventId`, `probeType`, `containerId`, `processName`
|
||||
- **Health:** gRPC connection status and stream lag exposed for monitoring
|
||||
|
||||
## References
|
||||
|
||||
- [Module README](./README.md)
|
||||
- [Signals Architecture](../signals/architecture.md) - RTS scoring consumer
|
||||
- [Scanner Architecture](../scanner/architecture.md) - Reachability validation
|
||||
Reference in New Issue
Block a user