Files
git.stella-ops.org/docs/product-advisories/01-Dec-2025 - Tracking UX Health with Time‑to‑Evidence.md
2025-12-01 17:50:11 +02:00

13 KiB
Raw Blame History

Heres a simple metric that will make your security UI (and teams) radically better: TimetoEvidence (TTE) — the time from opening a finding to seeing raw proof (a dataflow edge, an SBOM line, or a VEX note), not a summary.


What it is

  • Definition: TTE = t_first_proof_rendered t_open_finding.
  • Proof = the exact artifact or path that justifies the claim (e.g., package-lock.json: line 214 → openssl@1.1.1, reachability: A → B → C sink, or VEX: not_affected due to unreachable code).
  • Target: P95 ≤ 15s (stretch: P99 ≤ 30s). If 95% of findings show proof within 15 seconds, the UI stays honest: evidence before opinion, low noise, fast explainability.

Why it matters

  • Trust: People accept decisions they can verify quickly.
  • Triage speed: Proof-first UIs cut back-and-forth and guesswork.
  • Noise control: If you cant surface proof fast, you probably shouldnt surface the finding yet.

How to measure (engineeringready)

  • Emit two stamps per finding view:

    • t_open_finding (on route enter or modal open).
    • t_first_proof_rendered (first DOM paint of SBOM line / path list / VEX clause).
  • Store as tte_ms in a lightweight events table (Postgres) with tags: tenant, finding_id, proof_kind (sbom|reachability|vex), source (local|remote|cache).

  • Nightly rollup: compute P50/P90/P95/P99 by proof_kind and page.

  • Alert when P95 > 15s for 15 minutes.


UI contract (keeps the UX honest)

  • Above the fold: always show a compact Proof panel first (not hidden behind tabs).
  • Skeletons over spinners: reserve space; render partial proof as soon as any piece is ready.
  • Plain text copy affordance: “Copy SBOM line / path” button right next to the proof.
  • Defer nonproof widgets: CVSS badges, remediation prose, and charts load after proof.
  • Emptystate truth: if no proof exists, say “No proof available yet” and show the loader for that proof type only (dont pretend with summaries).

Backend rules of thumb

  • Preindex for first paint: cache top N proof items per hot finding (e.g., first SBOM hit + shortest path).
  • Bound queries: proof queries must be O(log n) on indexed columns (pkg name@version, file hash, graph node id).
  • Chunked streaming: send first proof chunk <200ms after backend hit; dont hold for the full set.
  • Timeout budget: 12s backend budget + 3s UI/render margin = 15s P95.

Minimal contract to add in your code

// Frontend: fire on open
metrics.emit('finding_open', { findingId, t: performance.now() });

// When the first real proof node/line hits the DOM:
metrics.emit('proof_rendered', { findingId, proofKind, t: performance.now() });
-- Rollup (hourly)
SELECT
  proof_kind,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY tte_ms) AS p95_ms
FROM tte_events
WHERE ts >= now() - interval '1 hour'
GROUP BY proof_kind;

What to put on the team dashboard

  • TTE P95 by page (Findings list, Finding details).
  • TTE P95 by proof_kind (sbom / reachability / vex).
  • Error budget burn: minutes over target per day.
  • Top regressions: last 7 days vs prior 7.

Acceptance checklist for any finding view

  • First paint shows a real proof snippet (not a summary).
  • “Copy proof” button works within 1 click.
  • TTE P95 in staging ≤ 10s; in prod ≤ 15s.
  • If proof missing, explicit emptystate + retry path.
  • Telemetry sampled ≥ 50% of sessions (or 100% for internal).

If you want, I can turn this into a docs/UX/tte.md spec plus a Grafana panel JSON and a tiny .NET middleware to emit the metrics—just say the word and Ill drop in readytouse snippets. Perfect, got it — lets turn TimetoEvidence (TTE) into a proper set of “how we track UX health” guidelines.

Ill treat this like something you can drop into an internal engineering/UX doc.


1. What exactly is TimetoEvidence?

Definition

TTE = t_first_proof_rendered t_open_finding

  • t_open_finding when the user first opens a “finding” / detail view (e.g., vulnerability, alert, ticket, log event).

  • t_first_proof_rendered when the UI first paints actual evidence that backs the finding, for example:

    • The SBOM row showing package@version.
    • The callgraph/dataflow path to a sink.
    • A VEX note explaining why something is (not) affected.
    • A raw log snippet that the alert is based on.

Key principle: TTE measures how long users have to trust you blindly before they can see proof with their own eyes.


2. UX health goals & targets

Treat TTE like latency SLOs:

  • Primary SLO:

    • P95 TTE ≤ 15s for all findings in normal conditions.
  • Stretch SLO:

    • P99 TTE ≤ 30s for heavy cases (big graphs, huge SBOMs, cold caches).
  • Guardrail:

    • P50 TTE should be < 3s. If the median creeps up, youre in trouble even if P95 looks OK.

You can refine by feature:

  • “Simple” proof (single SBOM row, small payload):

    • P95 ≤ 5s.
  • “Complex” proof (reachability graph, crossrepo joins):

    • P95 ≤ 15s.

UX rule of thumb

  • < 2s: feels instant.
  • 210s: acceptable if clearly loading something heavy.
  • 10s: needs strong feedback (progress, partial results, explanations).

  • 30s: the system should probably offer fallback (e.g., “download raw evidence” or “retry”).


3. Instrumentation guidelines

3.1 Event model

Emit two core events per finding view:

  1. finding_open

    • When user opens the finding details (route enter / modal open).

    • Must include:

      • finding_id
      • tenant_id / org_id
      • user_role (admin, dev, triager, etc.)
      • entry_point (list, search, notification, deep link)
      • ui_version / build_sha
  2. proof_rendered

    • First time any qualifying proof element is painted.

    • Must include:

      • finding_id
      • proof_kind (sbom | reachability | vex | logs | other)
      • source (local_cache | backend_api | 3rd_party)
      • proof_height (e.g., pixel offset from top) to ensure its actually above the fold or very close.

Derived metric

Your telemetry pipeline should compute:

tte_ms = proof_rendered.timestamp - finding_open.timestamp

If there are multiple proof_rendered events for the same finding_open, use:

  • TTE (first proof) minimum timestamp; primary SLO.
  • Optionally: TTE (full evidence) last proof in a defined “bundle” (e.g., path + SBOM row).

3.2 Implementation notes

Frontend

  • Emit finding_open as soon as:

    • The route is confirmed and
    • You know which finding_id is being displayed.
  • Emit proof_rendered:

    • Not when you fetch data, but when at least one evidence component is visibly rendered.
    • Easiest approach: hook into component lifecycle / intersection observer on the evidence container.

Pseudoexample:

// On route/mount:
metrics.emit('finding_open', {
  findingId,
  entryPoint,
  userRole,
  uiVersion,
  t: performance.now()
});

// In EvidencePanel component, after first render with real data:
if (!hasEmittedProof && hasRealEvidence) {
  metrics.emit('proof_rendered', {
    findingId,
    proofKind: 'sbom',
    source: 'backend_api',
    t: performance.now()
  });
  hasEmittedProof = true;
}

Backend

  • No special requirement beyond:

    • Stable IDs (finding_id).
    • Knowing which API endpoints respond with evidence payloads — youll want to correlate backend latency with TTE later.

4. Data quality & sampling

If you want TTE to drive decisions, the data must be boringly reliable.

Guidelines

  1. Sample rate

    • Start with 100% in staging.
    • In production, aim for ≥ 25% of sessions for TTE events at minimum; 100% is ideal if volume is reasonable.
  2. Clock skew

    • Prefer frontend timestamps using performance.now() for TTE; theyre monotonic within a tab.
    • Dont mix backend clocks into the TTE calculation.
  3. Bot / synthetic traffic

    • Tag synthetic tests (is_synthetic = true) and exclude them from UX health dashboards.
  4. Retry behavior

    • If the proof fails to load and user hits “retry”:

      • Treat it as a separate measurement (retry = true) or
      • Log an additional proof_error event with error class (timeout, 5xx, network, parse, etc.).

5. Dashboards: how to watch TTE

You want a small, opinionated set of views that answer:

“Is UX getting better or worse for people trying to understand findings?”

5.1 Core widgets

  1. TTE distribution

    • P50 / P90 / P95 / P99 per day (or per release).
    • Split by proof_kind.
  2. TTE by page / surface

    • Finding list → detail.
    • Deep links from notifications.
    • Direct URLs / bookmarks.
  3. TTE by user segment

    • New users vs power users.
    • Different roles (security engineer vs application dev).
  4. Error budget panel

    • “Minutes over SLO per day” e.g., sum of all userminutes where TTE > 15s.
    • Use this to prioritize work.
  5. Correlation with engagement

    • Scatter: TTE vs session length, or TTE vs “user clicked ignore / snooze”.
    • Aim to confirm the obvious: long TTE → worse engagement/completion.

5.2 Operational details

  • Update granularity: realtime or ≤15 min for oncall/ops panels.

  • Retention: at least 90 days to see trends across big releases.

  • Breakdowns:

    • backend_region (to catch regional issues).
    • build_version (to spot regressions quickly).

6. UX & engineering design rules anchored in TTE

These are the behavior rules for the product that keep TTE healthy.

6.1 “Evidence first” layout rules

  • Evidence above the fold

    • At least one proof element must be visible without scrolling on a typical laptop viewport.
  • Summary second

    • CVSS scores, severity badges, long descriptions: all secondary. Evidence should come before opinion.
  • No fake proof

    • Dont use placeholders that look like evidence but arent (e.g., “example path” or generic text).
    • If evidence is still loading, show a clear skeleton/loader with “Loading evidence…”.

6.2 Loading strategy rules

  • Start fetching evidence as soon as navigation begins, not after the page is fully mounted.

  • Use lazy loading for noncritical widgets until after proof is shown.

  • If a call is known to be heavy:

    • Consider precomputing and caching the top evidence (shortest path, first SBOM hit).
    • Stream results: render first proof item as soon as it arrives; dont wait for the full list.

6.3 Empty / error state rules

  • If there is genuinely no evidence:

    • Explicitly say “No supporting evidence available yet” and treat TTE as:

      • Either “no value” (excluded), or
      • A special bucket proof_kind = "none".
  • If loading fails:

    • Show a clear error and a retry that reemits proof_rendered when successful.
    • Log proof_error with reason; track error rate alongside TTE.

7. How to use TTE in practice

7.1 For releases

For any change that affects findings UI or evidence plumbing:

  • Add a release checklist item:

    • “No regression on TTE P95 for [pages X, Y].”
  • During rollout:

    • Compare pre vs postrelease TTE P95 by ui_version.

    • If regression > 20%:

      • Roll back, or
      • Add a followup ticket explicitly tagged with the regression.

7.2 For experiments / A/B tests

When running UI experiments around findings:

  • Always capture TTE per variant.

  • Compare:

    • TTE P50/P95.
    • Task completion rate (e.g., “user changed status”).
    • Subjective UX (CSAT) if you have it.

Youre looking for patterns like:

  • Variant B: +5% completion, +8% TTE → maybe OK.
  • Variant C: +2% completion, +70% TTE → probably not acceptable.

7.3 For prioritization

Use TTE as a lever in planning:

  • If P95 TTE is healthy and stable:

    • More room for new features / experiments.
  • If P95 TTE is trending up for 2+ weeks:

    • Time to schedule a “TTE debt” story: caching, query optimization, UI relayout, etc.

8. Quick “TTEready” checklist

Youre “tracking UX health with TTE” if you can honestly tick these:

  1. Instrumentation

    • finding_open + proof_rendered events exist and are correlated.
    • TTE computed in a stable pipeline (joins, dedupe, etc.).
  2. Targets

    • TTE SLOs defined (P95, P99) and agreed by UX + engineering.
  3. Dashboards

    • A dashboard shows TTE by proof kind, page, and release.
    • Oncall / ops can see TTE in near realtime.
  4. UX rules

    • Evidence is visible above the fold for all main finding types.
    • Noncritical widgets load after evidence.
    • Empty/error states are explicit about evidence availability.
  5. Process

    • Major UI changes check TTE pre vs post as part of release acceptance.
    • Regressions in TTE create real tickets, not just “well watch it”.

If you tell me what stack youre on (e.g., React + Next.js + OpenTelemetry + X observability tool), I can turn this into concrete code snippets and an example dashboard spec (fields, queries, charts) tailored exactly to your setup.