git.stella-ops.org/01-Dec-2025 - Tracking UX Health with Time‑to‑Evidence.md at ce5ec9c158f31c649b839e2950b2e203d2271624 - git.stella-ops.org

Files

master 790801f329 add advisories

2025-12-01 17:50:11 +02:00

13 KiB

Raw Blame History

Here’s a simple metric that will make your security UI (and teams) radically better: Time‑to‑Evidence (TTE) — the time from opening a finding to seeing raw proof (a data‑flow edge, an SBOM line, or a VEX note), not a summary.

What it is

Definition: TTE = t_first_proof_rendered − t_open_finding.
Proof = the exact artifact or path that justifies the claim (e.g., package-lock.json: line 214 → openssl@1.1.1, reachability: A → B → C sink, or VEX: not_affected due to unreachable code).
Target: P95 ≤ 15s (stretch: P99 ≤ 30s). If 95% of findings show proof within 15 seconds, the UI stays honest: evidence before opinion, low noise, fast explainability.

Why it matters

Trust: People accept decisions they can verify quickly.
Triage speed: Proof-first UIs cut back-and-forth and guesswork.
Noise control: If you can’t surface proof fast, you probably shouldn’t surface the finding yet.

How to measure (engineering‑ready)

Emit two stamps per finding view:
- t_open_finding (on route enter or modal open).
- t_first_proof_rendered (first DOM paint of SBOM line / path list / VEX clause).
Store as tte_ms in a lightweight events table (Postgres) with tags: tenant, finding_id, proof_kind (sbom|reachability|vex), source (local|remote|cache).
Nightly rollup: compute P50/P90/P95/P99 by proof_kind and page.
Alert when P95 > 15s for 15 minutes.

UI contract (keeps the UX honest)

Above the fold: always show a compact Proof panel first (not hidden behind tabs).
Skeletons over spinners: reserve space; render partial proof as soon as any piece is ready.
Plain text copy affordance: “Copy SBOM line / path” button right next to the proof.
Defer non‑proof widgets: CVSS badges, remediation prose, and charts load after proof.
Empty‑state truth: if no proof exists, say “No proof available yet” and show the loader for that proof type only (don’t pretend with summaries).

Backend rules of thumb

Pre‑index for first paint: cache top N proof items per hot finding (e.g., first SBOM hit + shortest path).
Bound queries: proof queries must be O(log n) on indexed columns (pkg name@version, file hash, graph node id).
Chunked streaming: send first proof chunk <200 ms after backend hit; don’t hold for the full set.
Timeout budget: 12s backend budget + 3s UI/render margin = 15s P95.

Minimal contract to add in your code

// Frontend: fire on open
metrics.emit('finding_open', { findingId, t: performance.now() });

// When the first real proof node/line hits the DOM:
metrics.emit('proof_rendered', { findingId, proofKind, t: performance.now() });

-- Rollup (hourly)
SELECT
  proof_kind,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY tte_ms) AS p95_ms
FROM tte_events
WHERE ts >= now() - interval '1 hour'
GROUP BY proof_kind;

What to put on the team dashboard

TTE P95 by page (Findings list, Finding details).
TTE P95 by proof_kind (sbom / reachability / vex).
Error budget burn: minutes over target per day.
Top regressions: last 7 days vs prior 7.

Acceptance checklist for any finding view

First paint shows a real proof snippet (not a summary).
“Copy proof” button works within 1 click.
TTE P95 in staging ≤ 10s; in prod ≤ 15s.
If proof missing, explicit empty‑state + retry path.
Telemetry sampled ≥ 50% of sessions (or 100% for internal).

If you want, I can turn this into a docs/UX/tte.md spec plus a Grafana panel JSON and a tiny .NET middleware to emit the metrics—just say the word and I’ll drop in ready‑to‑use snippets. Perfect, got it — let’s turn Time‑to‑Evidence (TTE) into a proper set of “how we track UX health” guidelines.

I’ll treat this like something you can drop into an internal engineering/UX doc.

1. What exactly is Time‑to‑Evidence?

Definition

TTE = t_first_proof_rendered − t_open_finding

t_open_finding – when the user first opens a “finding” / detail view (e.g., vulnerability, alert, ticket, log event).
t_first_proof_rendered – when the UI first paints actual evidence that backs the finding, for example:
- The SBOM row showing package@version.
- The call‑graph/data‑flow path to a sink.
- A VEX note explaining why something is (not) affected.
- A raw log snippet that the alert is based on.

Key principle: TTE measures how long users have to trust you blindly before they can see proof with their own eyes.

2. UX health goals & targets

Treat TTE like latency SLOs:

Primary SLO:
- P95 TTE ≤ 15s for all findings in normal conditions.
Stretch SLO:
- P99 TTE ≤ 30s for heavy cases (big graphs, huge SBOMs, cold caches).
Guardrail:
- P50 TTE should be < 3s. If the median creeps up, you’re in trouble even if P95 looks OK.

You can refine by feature:

“Simple” proof (single SBOM row, small payload):
- P95 ≤ 5s.
“Complex” proof (reachability graph, cross‑repo joins):
- P95 ≤ 15s.

UX rule of thumb

< 2s: feels instant.
2–10s: acceptable if clearly loading something heavy.
10s: needs strong feedback (progress, partial results, explanations).
30s: the system should probably offer fallback (e.g., “download raw evidence” or “retry”).

3. Instrumentation guidelines

3.1 Event model

Emit two core events per finding view:

finding_open
- When user opens the finding details (route enter / modal open).
- Must include:
  - finding_id
  - tenant_id / org_id
  - user_role (admin, dev, triager, etc.)
  - entry_point (list, search, notification, deep link)
  - ui_version / build_sha
proof_rendered
- First time any qualifying proof element is painted.
- Must include:
  - finding_id
  - proof_kind (sbom | reachability | vex | logs | other)
  - source (local_cache | backend_api | 3rd_party)
  - proof_height (e.g., pixel offset from top) – to ensure it’s actually above the fold or very close.

Derived metric

Your telemetry pipeline should compute:

tte_ms = proof_rendered.timestamp - finding_open.timestamp

If there are multiple proof_rendered events for the same finding_open, use:

TTE (first proof) – minimum timestamp; primary SLO.
Optionally: TTE (full evidence) – last proof in a defined “bundle” (e.g., path + SBOM row).

3.2 Implementation notes

Frontend

Emit finding_open as soon as:
- The route is confirmed and
- You know which finding_id is being displayed.
Emit proof_rendered:
- Not when you fetch data, but when at least one evidence component is visibly rendered.
- Easiest approach: hook into component lifecycle / intersection observer on the evidence container.

Pseudo‑example:

// On route/mount:
metrics.emit('finding_open', {
  findingId,
  entryPoint,
  userRole,
  uiVersion,
  t: performance.now()
});

// In EvidencePanel component, after first render with real data:
if (!hasEmittedProof && hasRealEvidence) {
  metrics.emit('proof_rendered', {
    findingId,
    proofKind: 'sbom',
    source: 'backend_api',
    t: performance.now()
  });
  hasEmittedProof = true;
}

Backend

No special requirement beyond:
- Stable IDs (finding_id).
- Knowing which API endpoints respond with evidence payloads — you’ll want to correlate backend latency with TTE later.

4. Data quality & sampling

If you want TTE to drive decisions, the data must be boringly reliable.

Guidelines

Sample rate
- Start with 100% in staging.
- In production, aim for ≥ 25% of sessions for TTE events at minimum; 100% is ideal if volume is reasonable.
Clock skew
- Prefer frontend timestamps using performance.now() for TTE; they’re monotonic within a tab.
- Don’t mix backend clocks into the TTE calculation.
Bot / synthetic traffic
- Tag synthetic tests (is_synthetic = true) and exclude them from UX health dashboards.
Retry behavior
- If the proof fails to load and user hits “retry”:
  - Treat it as a separate measurement (retry = true) or
  - Log an additional proof_error event with error class (timeout, 5xx, network, parse, etc.).

5. Dashboards: how to watch TTE

You want a small, opinionated set of views that answer:

“Is UX getting better or worse for people trying to understand findings?”

5.1 Core widgets

TTE distribution
- P50 / P90 / P95 / P99 per day (or per release).
- Split by proof_kind.
TTE by page / surface
- Finding list → detail.
- Deep links from notifications.
- Direct URLs / bookmarks.
TTE by user segment
- New users vs power users.
- Different roles (security engineer vs application dev).
Error budget panel
- “Minutes over SLO per day” – e.g., sum of all user‑minutes where TTE > 15s.
- Use this to prioritize work.
Correlation with engagement
- Scatter: TTE vs session length, or TTE vs “user clicked ‘ignore’ / ‘snooze’”.
- Aim to confirm the obvious: long TTE → worse engagement/completion.

5.2 Operational details

Update granularity: real‑time or ≤15 min for on‑call/ops panels.
Retention: at least 90 days to see trends across big releases.
Breakdowns:
- backend_region (to catch regional issues).
- build_version (to spot regressions quickly).

6. UX & engineering design rules anchored in TTE

These are the behavior rules for the product that keep TTE healthy.

6.1 “Evidence first” layout rules

Evidence above the fold
- At least one proof element must be visible without scrolling on a typical laptop viewport.
Summary second
- CVSS scores, severity badges, long descriptions: all secondary. Evidence should come before opinion.
No fake proof
- Don’t use placeholders that look like evidence but aren’t (e.g., “example path” or generic text).
- If evidence is still loading, show a clear skeleton/loader with “Loading evidence…”.

6.2 Loading strategy rules

Start fetching evidence as soon as navigation begins, not after the page is fully mounted.
Use lazy loading for non‑critical widgets until after proof is shown.
If a call is known to be heavy:
- Consider precomputing and caching the top evidence (shortest path, first SBOM hit).
- Stream results: render first proof item as soon as it arrives; don’t wait for the full list.

6.3 Empty / error state rules

If there is genuinely no evidence:
- Explicitly say “No supporting evidence available yet” and treat TTE as:
  - Either “no value” (excluded), or
  - A special bucket proof_kind = "none".
If loading fails:
- Show a clear error and a retry that re‑emits proof_rendered when successful.
- Log proof_error with reason; track error rate alongside TTE.

7. How to use TTE in practice

7.1 For releases

For any change that affects findings UI or evidence plumbing:

Add a release checklist item:
- “No regression on TTE P95 for [pages X, Y].”
During rollout:
- Compare pre‑ vs post‑release TTE P95 by ui_version.
- If regression > 20%:
  - Roll back, or
  - Add a follow‑up ticket explicitly tagged with the regression.

7.2 For experiments / A/B tests

When running UI experiments around findings:

Always capture TTE per variant.
Compare:
- TTE P50/P95.
- Task completion rate (e.g., “user changed status”).
- Subjective UX (CSAT) if you have it.

You’re looking for patterns like:

Variant B: +5% completion, +8% TTE → maybe OK.
Variant C: +2% completion, +70% TTE → probably not acceptable.

7.3 For prioritization

Use TTE as a lever in planning:

If P95 TTE is healthy and stable:
- More room for new features / experiments.
If P95 TTE is trending up for 2+ weeks:
- Time to schedule a “TTE debt” story: caching, query optimization, UI re‑layout, etc.

8. Quick “TTE‑ready” checklist

You’re “tracking UX health with TTE” if you can honestly tick these:

Instrumentation
- finding_open + proof_rendered events exist and are correlated.
- TTE computed in a stable pipeline (joins, dedupe, etc.).
Targets
- TTE SLOs defined (P95, P99) and agreed by UX + engineering.
Dashboards
- A dashboard shows TTE by proof kind, page, and release.
- On‑call / ops can see TTE in near real‑time.
UX rules
- Evidence is visible above the fold for all main finding types.
- Non‑critical widgets load after evidence.
- Empty/error states are explicit about evidence availability.
Process
- Major UI changes check TTE pre vs post as part of release acceptance.
- Regressions in TTE create real tickets, not just “we’ll watch it”.

If you tell me what stack you’re on (e.g., React + Next.js + OpenTelemetry + X observability tool), I can turn this into concrete code snippets and an example dashboard spec (fields, queries, charts) tailored exactly to your setup.

13 KiB Raw Blame History Unescape Escape