Here’s a compact, end‑to‑end design you can drop into a repo: a **cross‑platform call‑stack analyzer** plus an **offline capture/replay pipeline** with provable symbol provenance—built to behave the same on Linux, Windows, and macOS, and to pass strict CI acceptance tests. --- # What this solves (quick context) * **Problem:** stack unwinding differs by OS, binary format, runtime (signals/async/coroutines), and symbol sources—making incident triage noisy and non‑reproducible. * **Goal:** one analyzer that **normalizes unwinding invariants**, **records traces**, **resolves symbols offline**, and **replays** to verify determinism and coverage—useful for Stella Ops evidence capture and air‑gapped flows. --- # Unwinding model (portable) * **Primary CFI:** DWARF `.eh_frame` / `.debug_frame` (Linux/macOS), `.pdata` / unwind info (Windows). * **IDs for symbol lookup:** * Linux: **ELF build‑id** (`.note.gnu.build-id`) * macOS: **Mach‑O UUID** (dSYM) * Windows: **PDB GUID+Age** * **Fallback chain per frame (strict order, record provenance):** 1. CFI/CIE lookup (libunwind/LLVM, DIA on Windows, Apple DWARF tools) 2. **Frame‑pointer** walk if available 3. **Language/runtime helpers** (e.g., Go, Rust, JVM, .NET where present) 4. **Heuristic last‑resort** (conservative unwind, stop on ambiguity) * **Async/signal/coroutines:** stitch segments by reading runtime metadata and signal trampolines, then join on saved contexts; tag boundaries so replay can validate. * **Kernel/eBPF contexts (Linux):** optional BTF‑assisted unwind for kernel frames when traces cross user/kernel boundary. --- # Offline symbol bundles (content‑addressed) **Required bundle contents (per‑OS id map + index):** * **Content‑addressed index** (sha256 keys) * **Per‑OS mapping:** * Linux: **build‑id → path/blob** * Windows: **PDB GUID+Age → PDB blob** * macOS: **UUID → dSYM blob** * **`symbol_index.json`** (addr → file:line + function) * **DSSE signature** (+ signer) * **Rekor inclusion proof** or embedded tile fragment (for transparency) **Acceptance rules:** * `symbol_coverage_pct ≥ 90%` per trace (resolver chain: debuginfod → local bundle → heuristic demangle) * Replay across 5 seeds: `replay_success_ratio ≥ 0.95` * DSSE + Rekor proofs verify **offline** * Platform checks: * **ELF build‑id** matches binary note * **PDB GUID+Age** matches module metadata * **dSYM UUID** matches Mach‑O UUID --- # Minimal Postgres schema (ready to run) ```sql CREATE TABLE traces( trace_id UUID PRIMARY KEY, platform TEXT, captured_at TIMESTAMP, build_id TEXT, symbol_bundle_sha256 TEXT, dsse_ref TEXT ); CREATE TABLE frames( trace_id UUID REFERENCES traces, frame_index INT, ip BIGINT, module_path TEXT, module_build_id TEXT, resolved_symbol TEXT, symbol_offset BIGINT, resolver TEXT, PRIMARY KEY(trace_id, frame_index) ); CREATE TABLE symbol_bundles( sha256 TEXT PRIMARY KEY, os TEXT, bundle_blob BYTEA, index_json JSONB, signer TEXT, rekor_tile_ref TEXT ); CREATE TABLE replays( replay_id UUID PRIMARY KEY, trace_id UUID REFERENCES traces, seed BIGINT, started_at TIMESTAMP, finished_at TIMESTAMP, replay_success_ratio FLOAT, verify_time_ms INT, verifier_version TEXT, notes JSONB ); ``` --- # Event payloads (wire format) ```json {"event":"trace.capture","trace_id":"...","platform":"linux","build_id":"","frames":[{"ip":"0x..","module":"/usr/bin/foo","module_build_id":""}],"symbol_bundle_ref":"sha256:...","dsse_ref":"dsse:..."} {"event":"replay.result","replay_id":"...","trace_id":"...","seed":42,"replay_success_ratio":0.98,"symbol_coverage_pct":93,"verify_time_ms":8423} ``` --- # Resolver policy (per‑OS, enforced) * **Linux:** debuginfod → local bundle (build‑id) → DWARF CFI → FP → heuristic demangle * **Windows:** local bundle (PDB GUID+Age via DIA) → .pdata unwind → FP → demangle * **macOS:** local bundle (dSYM UUID) → DWARF CFI → FP → demangle Record **`resolver`** used on every frame. --- # CI acceptance scripts (tiny but strict) * Run capture → resolve → replay across 5 seeds; fail merge if any SLO unmet. * Verify DSSE signature and Rekor inclusion offline. * Assert per‑platform ID matches (build‑id / GUID+Age / UUID). * Emit a short JUnit‑style report plus `% coverage` and `% success`. --- # Implementation notes (drop‑in) * Use **libunwind/LLVM** (Linux/macOS), **DIA SDK** (Windows). * Add small shims for **signal trampolines** and **runtime helpers** (Go/Rust/JVM/.NET) when present. * Protobuf or JSON Lines for event logs; gzip + content‑address everything (sha256). * Store **provenance per frame** (`resolver`, source, bundle hash). * Provide a tiny **CLI**: * `trace-capture --with-btf --pid ...` * `trace-resolve --bundle sha256:...` * `trace-replay --trace ... --seeds 5` * `trace-verify --bundle sha256:... --dsse --rekor` --- # Why this fits your stack (Stella Ops) * **Air‑gap/attestation first:** DSSE, Rekor tile fragments, offline verification—aligns with your evidence model. * **Deterministic evidence:** replayable traces with SLOs → reliable RCA artifacts you can store beside SBOM/VEX. * **Provenance:** per‑frame resolver trail supports auditor queries (“how was this line derived?”). --- # Next steps (ready‑made tasks) * Add a **SymbolBundleBuilder** job to produce DSSE‑signed bundles per release. * Integrate **Capture→Resolve→Replay** into CI and gate merges on SLOs above. * Expose a **Stella Ops Evidence card**: coverage%, success ratio, verifier version, and links to frames. If you want, I’ll generate a starter repo (CLI skeleton, DSSE/Rekor validators, Postgres migrations, CI workflow, and a tiny sample bundle) so you can try it immediately.