Files
git.stella-ops.org/docs-archived/product/advisories/2026-03-04 - Unified call‑stack analyzer and micro‑witness schema.md

5.7 KiB
Raw Permalink Blame History

Heres a compact, endtoend design you can drop into a repo: a crossplatform callstack analyzer plus an offline capture/replay pipeline with provable symbol provenance—built to behave the same on Linux, Windows, and macOS, and to pass strict CI acceptance tests.


What this solves (quick context)

  • Problem: stack unwinding differs by OS, binary format, runtime (signals/async/coroutines), and symbol sources—making incident triage noisy and nonreproducible.
  • Goal: one analyzer that normalizes unwinding invariants, records traces, resolves symbols offline, and replays to verify determinism and coverage—useful for Stella Ops evidence capture and airgapped flows.

Unwinding model (portable)

  • Primary CFI: DWARF .eh_frame / .debug_frame (Linux/macOS), .pdata / unwind info (Windows).

  • IDs for symbol lookup:

    • Linux: ELF buildid (.note.gnu.build-id)
    • macOS: MachO UUID (dSYM)
    • Windows: PDB GUID+Age
  • Fallback chain per frame (strict order, record provenance):

    1. CFI/CIE lookup (libunwind/LLVM, DIA on Windows, Apple DWARF tools)
    2. Framepointer walk if available
    3. Language/runtime helpers (e.g., Go, Rust, JVM, .NET where present)
    4. Heuristic lastresort (conservative unwind, stop on ambiguity)
  • Async/signal/coroutines: stitch segments by reading runtime metadata and signal trampolines, then join on saved contexts; tag boundaries so replay can validate.

  • Kernel/eBPF contexts (Linux): optional BTFassisted unwind for kernel frames when traces cross user/kernel boundary.


Offline symbol bundles (contentaddressed)

Required bundle contents (perOS id map + index):

  • Contentaddressed index (sha256 keys)

  • PerOS mapping:

    • Linux: buildid → path/blob
    • Windows: PDB GUID+Age → PDB blob
    • macOS: UUID → dSYM blob
  • symbol_index.json (addr → file:line + function)

  • DSSE signature (+ signer)

  • Rekor inclusion proof or embedded tile fragment (for transparency)

Acceptance rules:

  • symbol_coverage_pct ≥ 90% per trace (resolver chain: debuginfod → local bundle → heuristic demangle)

  • Replay across 5 seeds: replay_success_ratio ≥ 0.95

  • DSSE + Rekor proofs verify offline

  • Platform checks:

    • ELF buildid matches binary note
    • PDB GUID+Age matches module metadata
    • dSYM UUID matches MachO UUID

Minimal Postgres schema (ready to run)

CREATE TABLE traces(
  trace_id UUID PRIMARY KEY,
  platform TEXT,
  captured_at TIMESTAMP,
  build_id TEXT,
  symbol_bundle_sha256 TEXT,
  dsse_ref TEXT
);

CREATE TABLE frames(
  trace_id UUID REFERENCES traces,
  frame_index INT,
  ip BIGINT,
  module_path TEXT,
  module_build_id TEXT,
  resolved_symbol TEXT,
  symbol_offset BIGINT,
  resolver TEXT,
  PRIMARY KEY(trace_id, frame_index)
);

CREATE TABLE symbol_bundles(
  sha256 TEXT PRIMARY KEY,
  os TEXT,
  bundle_blob BYTEA,
  index_json JSONB,
  signer TEXT,
  rekor_tile_ref TEXT
);

CREATE TABLE replays(
  replay_id UUID PRIMARY KEY,
  trace_id UUID REFERENCES traces,
  seed BIGINT,
  started_at TIMESTAMP,
  finished_at TIMESTAMP,
  replay_success_ratio FLOAT,
  verify_time_ms INT,
  verifier_version TEXT,
  notes JSONB
);

Event payloads (wire format)

{"event":"trace.capture","trace_id":"...","platform":"linux","build_id":"<gnu-build-id>","frames":[{"ip":"0x..","module":"/usr/bin/foo","module_build_id":"<id>"}],"symbol_bundle_ref":"sha256:...","dsse_ref":"dsse:..."}

{"event":"replay.result","replay_id":"...","trace_id":"...","seed":42,"replay_success_ratio":0.98,"symbol_coverage_pct":93,"verify_time_ms":8423}

Resolver policy (perOS, enforced)

  • Linux: debuginfod → local bundle (buildid) → DWARF CFI → FP → heuristic demangle
  • Windows: local bundle (PDB GUID+Age via DIA) → .pdata unwind → FP → demangle
  • macOS: local bundle (dSYM UUID) → DWARF CFI → FP → demangle Record resolver used on every frame.

CI acceptance scripts (tiny but strict)

  • Run capture → resolve → replay across 5 seeds; fail merge if any SLO unmet.
  • Verify DSSE signature and Rekor inclusion offline.
  • Assert perplatform ID matches (buildid / GUID+Age / UUID).
  • Emit a short JUnitstyle report plus % coverage and % success.

Implementation notes (dropin)

  • Use libunwind/LLVM (Linux/macOS), DIA SDK (Windows).

  • Add small shims for signal trampolines and runtime helpers (Go/Rust/JVM/.NET) when present.

  • Protobuf or JSON Lines for event logs; gzip + contentaddress everything (sha256).

  • Store provenance per frame (resolver, source, bundle hash).

  • Provide a tiny CLI:

    • trace-capture --with-btf --pid ...
    • trace-resolve --bundle sha256:...
    • trace-replay --trace ... --seeds 5
    • trace-verify --bundle sha256:... --dsse --rekor

Why this fits your stack (StellaOps)

  • Airgap/attestation first: DSSE, Rekor tile fragments, offline verification—aligns with your evidence model.
  • Deterministic evidence: replayable traces with SLOs → reliable RCA artifacts you can store beside SBOM/VEX.
  • Provenance: perframe resolver trail supports auditor queries (“how was this line derived?”).

Next steps (readymade tasks)

  • Add a SymbolBundleBuilder job to produce DSSEsigned bundles per release.
  • Integrate Capture→Resolve→Replay into CI and gate merges on SLOs above.
  • Expose a StellaOps Evidence card: coverage%, success ratio, verifier version, and links to frames.

If you want, Ill generate a starter repo (CLI skeleton, DSSE/Rekor validators, Postgres migrations, CI workflow, and a tiny sample bundle) so you can try it immediately.