Files
git.stella-ops.org/docs/product-advisories/30-Dec-2025 - Evidence‑Gated AI Explanations.md
2025-12-30 16:05:16 +02:00

3.9 KiB
Raw Blame History

Heres a simple, highsignal pattern you can drop into your security product: gate AI remediation/explanations behind an “Evidence Coverage” badge—and hide suggestions when coverage is weak.


What this solves (plain English)

AI advice is only trustworthy when its grounded in real evidence. If your scan only sees half the picture, AI “fixes” become noise. A visible coverage badge makes this explicit and keeps the UI quiet until youve got enough facts.


What “Evidence Coverage” means

Score = % of the verdicts required facts present, e.g., do we have:

  • Reachability (is the vulnerable code/path actually callable in this artifact/runtime?)
  • VEX (vendor/product statements: affected/notaffected/underinvestigation)
  • Runtime (telemetry, process trees, loaded libs, eBPF hooks)
  • Exploit signals (known exploits, KEV, EPSS tier, inthewild intel)
  • Patch/backport proof (distro backports, symbols, diff/BuildID match)
  • Provenance (intoto/DSSE attestations, signer trust)
  • Environment match (kernel/os/distro/package set parity)
  • Differential context (did this change since last release?)

Each fact bucket contributes weighted points → a 0100% Coverage score.


UX rule of thumb

  • <60%: Hide AI suggestions by default. Show a muted badge “Coverage 41% — add sources to unlock guidance.”
  • 6079%: Collapse AI panel; allow manual expand with a caution label. Every sentence shows its citations.
  • ≥80%: Show AI remediation by default with a green badge and inline evidence chips.
  • 100%: Add a subtle “High confidence” ribbon + “export proof” link.

Minimal UI components

  • A small badge next to each finding: Coverage 82% (click → drawer).
  • Drawer tabs: Sources, Why we think its reachable, Counterevidence, Gaps.
  • “Fill the gaps” callouts (e.g., “Attach VEX”, “Enable runtime sensor”, “Upload SBOM”).

Copy you can reuse

  • Collapsed state (low coverage): “Were missing runtime or VEX evidence. Add one source to unlock tailored remediation.”
  • Expanded (medium): “Guidance shown with caution. 3/5 evidence buckets present. See gaps →”

Data model (lean)

coverage:
  score: 0-100
  buckets:
    - id: reachability      # call graph, symbol, entrypoints
      present: true
      weight: 0.22
      evidence_refs: [e1,e7]
    - id: vex               # product/vendor statements
      present: false
      weight: 0.18
      evidence_refs: []
    - id: runtime
      present: true
      weight: 0.20
      evidence_refs: [e3]
    - id: exploit_signals
      present: true
      weight: 0.15
      evidence_refs: [e6]
    - id: patch_backport
      present: false
      weight: 0.15
      evidence_refs: []
    - id: provenance
      present: true
      weight: 0.10
      evidence_refs: [e9]

Policy in one line (ship this as a guard)

if coverage.score < 60: hide_ai()
elif coverage.score < 80: show_ai(collapsed=true, label="limited evidence")
else: show_ai(collapsed=false, label="evidence-backed")

What the AI must output (when shown)

  • Stepbystep remediation with perstep citations to the evidence drawer.
  • Why this is safe (mentions backports, ABI risk, service impact).
  • Counterfactual: “If VEX says Not Affected → do X instead.”
  • Residual risk and rollback plan.

How to reach ≥80% more often

  • Autorequest missing inputs (“Upload maintainer VEX” / “Turn on runtime for 24h”).
  • Fetch distro backport diffs and symbol maps to close the patch/backport bucket.
  • Merge SBOM + callgraph + eBPF to strengthen reachability.

If you want, I can draft a dropin React component (Badge + Drawer) and a tiny scoring service (C#/.NET 10) that plugs into your verdict pipeline.