Files
git.stella-ops.org/docs/features/dropped/ebpf-runtime-signal-integration.md

103 lines
8.4 KiB
Markdown

# eBPF Runtime Signal Integration (Probe Management, Type Granularity, and Tier 5 Evidence)
## Module
Signals (with cross-module touchpoints in Scanner and Zastava)
## Status
PARTIALLY_IMPLEMENTED
## Description
eBPF signals library project exists with probe, parser, and enrichment infrastructure. Runtime signal ingestion is connected to the Unknowns module. The structure suggests it is in progress but not production-ready. This is the "Tier 5" runtime evidence layer complementing the existing Tiers 1-4 (static analysis, binary fingerprinting, SBOM-based evidence). Includes probe lifecycle management in Zastava and probe-type-aware confidence scoring in Scanner.
## What's Implemented
- **RuntimeSignalCollector**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Services/RuntimeSignalCollector.cs` -- collects runtime signals from eBPF probes
- **RuntimeEvidenceCollector**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Services/RuntimeEvidenceCollector.cs` -- collects runtime evidence from eBPF events
- **CoreProbeLoader**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Probes/CoreProbeLoader.cs` -- loads core eBPF probes
- **AirGapProbeLoader**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Probes/AirGapProbeLoader.cs` -- offline/air-gap compatible probe loading
- **EventParser**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Parsers/EventParser.cs` -- parses raw eBPF events into structured models
- **RuntimeEventEnricher**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Enrichment/RuntimeEventEnricher.cs` -- enriches runtime events with container/SBOM context
- **CgroupContainerResolver**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Cgroup/CgroupContainerResolver.cs` -- resolves cgroup paths to container identities
- **RuntimeEvidenceNdjsonWriter**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Output/RuntimeEvidenceNdjsonWriter.cs` -- writes evidence in NDJSON format
- **AttestorEvidenceChunkSigner**: `src/Signals/__Libraries/StellaOps.Signals.Ebpf/Signing/AttestorEvidenceChunkSigner.cs` -- signs evidence chunks for attestation
- **DotNetEventPipeAgent**: `src/Signals/StellaOps.Signals.RuntimeAgent/DotNetEventPipeAgent.cs` -- .NET EventPipe agent (production-ready for .NET)
- **Interfaces**: `IRuntimeSignalCollector`, `IEbpfProbeLoader`, `IContainerIdentityResolver`, `IContainerStateProvider`, `IImageDigestResolver`, `ISbomComponentProvider`
- **Scanner Runtime Trace Ingestion**: `src/Scanner/__Libraries/StellaOps.Scanner.Runtime/Ingestion/TraceIngestionService.cs` -- ingests runtime traces
- **Scanner Witness Infrastructure**:
- `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/Witnesses/RuntimeObservation.cs` -- runtime-observed function invocations (timestamp, function signature, process context), but currently without a ProbeType discriminator
- `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/Witnesses/PathWitness.cs` -- combines static call-graph paths with runtime observations
- `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/Witnesses/WitnessDsseSigner.cs` -- signs runtime witness predicates for attestation
- `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/Witnesses/WitnessPredicateBuilder.cs` -- builds DSSE-signable witness predicates from runtime observations
- **Zastava Probe Manager**: `src/Zastava/StellaOps.Zastava.Observer/Probes/EbpfProbeManager.cs` -- implements `IProbeManager` and `IAsyncDisposable`; manages eBPF probe lifecycle with `OnContainerStartAsync`/stop hooks; uses `IRuntimeSignalCollector` and `ISignalPublisher`; tracks active probe handles via `ConcurrentDictionary<string, SignalCollectionHandle>`; configurable via `EbpfProbeManagerOptions`
## What's Missing
### Signals (core infrastructure)
- **Production-grade kernel probe deployment**: No production deployment automation (probe installation, lifecycle management, Helm charts, systemd units)
- **Kernel-level function entry/exit tracing**: No BTF-backed function entry/exit tracing with accurate call stacks at scale
- **Performance SLA compliance**: No benchmarking proving low overhead under production workload
- **Kernel version compatibility matrix**: No detection and fallback strategies for different kernel versions
- **Cross-platform runtime agents**: Beyond .NET (Java JVMTI, Go delve, Python sys.settrace, Node.js native) not yet built
- **Runtime backport detection**: No logic comparing runtime traces against known-patched function signatures
- **Integration testing**: No integration tests with multiple container runtimes (containerd, CRI-O, Podman)
- **Production monitoring**: No dashboards and alerting for probe health
### Scanner (probe type granularity)
- **ProbeType Enum**: No `ProbeType` enum (Kprobe, Uprobe, Tracepoint, Usdt, Fentry, RawTracepoint) defined on or associated with `RuntimeObservation`
- **Probe-Aware Confidence Scoring**: Reachability confidence scoring does not differentiate based on probe attachment type (e.g., uprobe on a specific function is higher fidelity than a kprobe on a syscall)
- **ProbeType Propagation**: The Signals.Ebpf pipeline does not tag observations with their originating probe type before forwarding to the scanner
- **Predicate Schema Update**: Witness DSSE predicates do not include probeType in their signed payload schema
### Zastava (probe lifecycle management)
- No tests for EbpfProbeManager
- No integration with the Observer's `ContainerLifecycleHostedService` to automatically attach/detach probes
- No eBPF probe configuration UI or CLI
- Limited probe types (needs expansion for different kernel hook points)
- No probe health monitoring or failure recovery
## Implementation Plan
### Phase 1: Core production readiness (Signals)
- Benchmark eBPF probe overhead in production-like environments with performance SLAs
- Implement kernel version detection and compatibility matrix with fallback strategies
- Add integration tests for containerd, CRI-O, and Podman container runtimes
- Implement probe lifecycle management (hot-reload, graceful degradation)
- Production deployment automation with Helm charts and systemd units
### Phase 2: Probe type granularity (Scanner)
1. Define `ProbeType` enum in `StellaOps.Scanner.Reachability/Witnesses/` with values: Kprobe, Uprobe, Tracepoint, Usdt, Fentry, RawTracepoint, Unknown
2. Add optional `ProbeType` property to `RuntimeObservation`
3. Update `Signals.Ebpf` pipeline to tag observations with their originating probe type
4. Update `WitnessPredicateBuilder` to include probeType in signed predicates
5. Update reachability confidence scoring to apply probe-type-aware weights (uprobe > tracepoint > kprobe)
### Phase 3: Probe management (Zastava)
- Add unit tests for EbpfProbeManager lifecycle (attach/detach/dispose)
- Integrate with ContainerLifecycleHostedService for automatic probe management
- Expand probe types for syscall, network, and filesystem observation
- Add probe health monitoring with automatic reattachment on failure
- Add CLI/API for probe configuration management
### Phase 4: Extended runtime agents
- Add runtime backport detection comparing traces against patched function signatures
- Implement cross-platform runtime agents for Java, Go, Python
- Add production monitoring dashboards and alerting
## E2E Test Plan
- [ ] Collect runtime observations from a uprobe-attached function and verify the ProbeType field is set to `Uprobe`
- [ ] Collect runtime observations from a kprobe-attached syscall and verify the ProbeType field is set to `Kprobe`
- [ ] Verify reachability confidence scoring assigns higher weight to uprobe observations than kprobe observations
- [ ] Verify the witness DSSE predicate payload includes the probeType field and the signature covers it
- [ ] Verify backward compatibility: observations without ProbeType default to `Unknown`
- [ ] Verify ProbeType is preserved through the full pipeline: eBPF collection -> signal forwarding -> scanner ingestion -> witness predicate -> reachability score
- [ ] Verify EbpfProbeManager attaches probes on container start and detaches on container stop
- [ ] Verify probe health monitoring detects failed probes and triggers reattachment
## Related Documentation
- Source: See feature catalog
- Architecture: `docs/modules/scanner/architecture.md`
## Merged From
- `signals/tier-5-runtime-trace-evidence.md` (previously merged)
- `scanner/ebpf-probe-type-granularity.md` (merged -- probe type granularity for scanner witness infrastructure)
- `zastava/ebpf-probe-manager.md` (merged -- eBPF probe lifecycle management in Zastava observer)