add advisories
This commit is contained in:
@@ -0,0 +1,425 @@
|
||||
Here’s a simple metric that will make your security UI (and teams) radically better: **Time‑to‑Evidence (TTE)** — the time from opening a finding to seeing *raw proof* (a data‑flow edge, an SBOM line, or a VEX note), not a summary.
|
||||
|
||||
---
|
||||
|
||||
### What it is
|
||||
|
||||
* **Definition:** TTE = `t_first_proof_rendered − t_open_finding`.
|
||||
* **Proof =** the exact artifact or path that justifies the claim (e.g., `package-lock.json: line 214 → openssl@1.1.1`, `reachability: A → B → C sink`, or `VEX: not_affected due to unreachable code`).
|
||||
* **Target:** **P95 ≤ 15s** (stretch: P99 ≤ 30s). If 95% of findings show proof within 15 seconds, the UI stays honest: evidence before opinion, low noise, fast explainability.
|
||||
|
||||
---
|
||||
|
||||
### Why it matters
|
||||
|
||||
* **Trust:** People accept decisions they can *verify* quickly.
|
||||
* **Triage speed:** Proof-first UIs cut back-and-forth and guesswork.
|
||||
* **Noise control:** If you can’t surface proof fast, you probably shouldn’t surface the finding yet.
|
||||
|
||||
---
|
||||
|
||||
### How to measure (engineering‑ready)
|
||||
|
||||
* Emit two stamps per finding view:
|
||||
|
||||
* `t_open_finding` (on route enter or modal open).
|
||||
* `t_first_proof_rendered` (first DOM paint of SBOM line / path list / VEX clause).
|
||||
* Store as `tte_ms` in a lightweight events table (Postgres) with tags: `tenant`, `finding_id`, `proof_kind` (`sbom|reachability|vex`), `source` (`local|remote|cache`).
|
||||
* Nightly rollup: compute P50/P90/P95/P99 by proof_kind and page.
|
||||
* Alert when **P95 > 15s** for 15 minutes.
|
||||
|
||||
---
|
||||
|
||||
### UI contract (keeps the UX honest)
|
||||
|
||||
* **Above the fold:** always show a compact **Proof panel** first (not hidden behind tabs).
|
||||
* **Skeletons over spinners:** reserve space; render partial proof as soon as any piece is ready.
|
||||
* **Plain text copy affordance:** “Copy SBOM line / path” button right next to the proof.
|
||||
* **Defer non‑proof widgets:** CVSS badges, remediation prose, and charts load *after* proof.
|
||||
* **Empty‑state truth:** if no proof exists, say “No proof available yet” and show the loader for *that* proof type only (don’t pretend with summaries).
|
||||
|
||||
---
|
||||
|
||||
### Backend rules of thumb
|
||||
|
||||
* **Pre‑index for first paint:** cache top N proof items per hot finding (e.g., first SBOM hit + shortest path).
|
||||
* **Bound queries:** proof queries must be *O(log n)* on indexed columns (pkg name@version, file hash, graph node id).
|
||||
* **Chunked streaming:** send first proof chunk <200 ms after backend hit; don’t hold for the full set.
|
||||
* **Timeout budget:** 12s backend budget + 3s UI/render margin = 15s P95.
|
||||
|
||||
---
|
||||
|
||||
### Minimal contract to add in your code
|
||||
|
||||
```ts
|
||||
// Frontend: fire on open
|
||||
metrics.emit('finding_open', { findingId, t: performance.now() });
|
||||
|
||||
// When the first real proof node/line hits the DOM:
|
||||
metrics.emit('proof_rendered', { findingId, proofKind, t: performance.now() });
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Rollup (hourly)
|
||||
SELECT
|
||||
proof_kind,
|
||||
percentile_cont(0.95) WITHIN GROUP (ORDER BY tte_ms) AS p95_ms
|
||||
FROM tte_events
|
||||
WHERE ts >= now() - interval '1 hour'
|
||||
GROUP BY proof_kind;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### What to put on the team dashboard
|
||||
|
||||
* **TTE P95 by page** (Findings list, Finding details).
|
||||
* **TTE P95 by proof_kind** (sbom / reachability / vex).
|
||||
* **Error budget burn**: minutes over target per day.
|
||||
* **Top regressions**: last 7 days vs prior 7.
|
||||
|
||||
---
|
||||
|
||||
### Acceptance checklist for any finding view
|
||||
|
||||
* [ ] First paint shows a real proof snippet (not a summary).
|
||||
* [ ] “Copy proof” button works within 1 click.
|
||||
* [ ] TTE P95 in staging ≤ 10s; in prod ≤ 15s.
|
||||
* [ ] If proof missing, explicit empty‑state + retry path.
|
||||
* [ ] Telemetry sampled ≥ 50% of sessions (or 100% for internal).
|
||||
|
||||
---
|
||||
|
||||
If you want, I can turn this into a **`docs/UX/tte.md`** spec plus a **Grafana panel JSON** and a tiny **.NET middleware** to emit the metrics—just say the word and I’ll drop in ready‑to‑use snippets.
|
||||
Perfect, got it — let’s turn **Time‑to‑Evidence (TTE)** into a proper set of *“how we track UX health”* guidelines.
|
||||
|
||||
I’ll treat this like something you can drop into an internal engineering/UX doc.
|
||||
|
||||
---
|
||||
|
||||
## 1. What exactly is Time‑to‑Evidence?
|
||||
|
||||
**Definition**
|
||||
|
||||
> **TTE = t_first_proof_rendered − t_open_finding**
|
||||
|
||||
* **t_open_finding** – when the user first opens a “finding” / detail view (e.g., vulnerability, alert, ticket, log event).
|
||||
* **t_first_proof_rendered** – when the UI first paints **actual evidence** that backs the finding, for example:
|
||||
|
||||
* The SBOM row showing `package@version`.
|
||||
* The call‑graph/data‑flow path to a sink.
|
||||
* A VEX note explaining why something is (not) affected.
|
||||
* A raw log snippet that the alert is based on.
|
||||
|
||||
**Key principle:**
|
||||
TTE measures **how long users have to trust you blindly** before they can see proof with their own eyes.
|
||||
|
||||
---
|
||||
|
||||
## 2. UX health goals & targets
|
||||
|
||||
Treat TTE like latency SLOs:
|
||||
|
||||
* **Primary SLO**:
|
||||
|
||||
* **P95 TTE ≤ 15s** for all findings in normal conditions.
|
||||
* **Stretch SLO**:
|
||||
|
||||
* **P99 TTE ≤ 30s** for heavy cases (big graphs, huge SBOMs, cold caches).
|
||||
* **Guardrail**:
|
||||
|
||||
* P50 TTE should be **< 3s**. If the median creeps up, you’re in trouble even if P95 looks OK.
|
||||
|
||||
You can refine by feature:
|
||||
|
||||
* “Simple” proof (single SBOM row, small payload):
|
||||
|
||||
* P95 ≤ 5s.
|
||||
* “Complex” proof (reachability graph, cross‑repo joins):
|
||||
|
||||
* P95 ≤ 15s.
|
||||
|
||||
**UX rule of thumb**
|
||||
|
||||
* < 2s: feels instant.
|
||||
* 2–10s: acceptable if clearly loading something heavy.
|
||||
* > 10s: needs **strong** feedback (progress, partial results, explanations).
|
||||
* > 30s: the system should probably **offer fallback** (e.g., “download raw evidence” or “retry”).
|
||||
|
||||
---
|
||||
|
||||
## 3. Instrumentation guidelines
|
||||
|
||||
### 3.1 Event model
|
||||
|
||||
Emit two core events per finding view:
|
||||
|
||||
1. **`finding_open`**
|
||||
|
||||
* When user opens the finding details (route enter / modal open).
|
||||
* Must include:
|
||||
|
||||
* `finding_id`
|
||||
* `tenant_id` / `org_id`
|
||||
* `user_role` (admin, dev, triager, etc.)
|
||||
* `entry_point` (list, search, notification, deep link)
|
||||
* `ui_version` / `build_sha`
|
||||
|
||||
2. **`proof_rendered`**
|
||||
|
||||
* First time *any* qualifying proof element is painted.
|
||||
* Must include:
|
||||
|
||||
* `finding_id`
|
||||
* `proof_kind` (`sbom | reachability | vex | logs | other`)
|
||||
* `source` (`local_cache | backend_api | 3rd_party`)
|
||||
* `proof_height` (e.g., pixel offset from top) – to ensure it’s actually above the fold or very close.
|
||||
|
||||
**Derived metric**
|
||||
|
||||
Your telemetry pipeline should compute:
|
||||
|
||||
```text
|
||||
tte_ms = proof_rendered.timestamp - finding_open.timestamp
|
||||
```
|
||||
|
||||
If there are multiple `proof_rendered` events for the same `finding_open`, use:
|
||||
|
||||
* **TTE (first proof)** – minimum timestamp; primary SLO.
|
||||
* Optionally: **TTE (full evidence)** – last proof in a defined “bundle” (e.g., path + SBOM row).
|
||||
|
||||
### 3.2 Implementation notes
|
||||
|
||||
**Frontend**
|
||||
|
||||
* Emit `finding_open` as soon as:
|
||||
|
||||
* The route is confirmed and
|
||||
* You know which `finding_id` is being displayed.
|
||||
* Emit `proof_rendered`:
|
||||
|
||||
* **Not** when you *fetch* data, but when at least one evidence component is **visibly rendered**.
|
||||
* Easiest approach: hook into component lifecycle / intersection observer on the evidence container.
|
||||
|
||||
Pseudo‑example:
|
||||
|
||||
```ts
|
||||
// On route/mount:
|
||||
metrics.emit('finding_open', {
|
||||
findingId,
|
||||
entryPoint,
|
||||
userRole,
|
||||
uiVersion,
|
||||
t: performance.now()
|
||||
});
|
||||
|
||||
// In EvidencePanel component, after first render with real data:
|
||||
if (!hasEmittedProof && hasRealEvidence) {
|
||||
metrics.emit('proof_rendered', {
|
||||
findingId,
|
||||
proofKind: 'sbom',
|
||||
source: 'backend_api',
|
||||
t: performance.now()
|
||||
});
|
||||
hasEmittedProof = true;
|
||||
}
|
||||
```
|
||||
|
||||
**Backend**
|
||||
|
||||
* No special requirement beyond:
|
||||
|
||||
* Stable IDs (`finding_id`).
|
||||
* Knowing which API endpoints respond with evidence payloads — you’ll want to correlate backend latency with TTE later.
|
||||
|
||||
---
|
||||
|
||||
## 4. Data quality & sampling
|
||||
|
||||
If you want TTE to drive decisions, the data must be boringly reliable.
|
||||
|
||||
**Guidelines**
|
||||
|
||||
1. **Sample rate**
|
||||
|
||||
* Start with **100%** in staging.
|
||||
* In production, aim for **≥ 25% of sessions** for TTE events at minimum; 100% is ideal if volume is reasonable.
|
||||
|
||||
2. **Clock skew**
|
||||
|
||||
* Prefer **frontend timestamps** using `performance.now()` for TTE; they’re monotonic within a tab.
|
||||
* Don’t mix backend clocks into the TTE calculation.
|
||||
|
||||
3. **Bot / synthetic traffic**
|
||||
|
||||
* Tag synthetic tests (`is_synthetic = true`) and exclude them from UX health dashboards.
|
||||
|
||||
4. **Retry behavior**
|
||||
|
||||
* If the proof fails to load and user hits “retry”:
|
||||
|
||||
* Treat it as a separate measurement (`retry = true`) or
|
||||
* Log an additional `proof_error` event with error class (timeout, 5xx, network, parse, etc.).
|
||||
|
||||
---
|
||||
|
||||
## 5. Dashboards: how to watch TTE
|
||||
|
||||
You want a small, opinionated set of views that answer:
|
||||
|
||||
> “Is UX getting better or worse for people trying to understand findings?”
|
||||
|
||||
### 5.1 Core widgets
|
||||
|
||||
1. **TTE distribution**
|
||||
|
||||
* P50 / P90 / P95 / P99 per day (or per release).
|
||||
* Split by `proof_kind`.
|
||||
|
||||
2. **TTE by page / surface**
|
||||
|
||||
* Finding list → detail.
|
||||
* Deep links from notifications.
|
||||
* Direct URLs / bookmarks.
|
||||
|
||||
3. **TTE by user segment**
|
||||
|
||||
* New users vs power users.
|
||||
* Different roles (security engineer vs application dev).
|
||||
|
||||
4. **Error budget panel**
|
||||
|
||||
* “Minutes over SLO per day” – e.g., sum of all user‑minutes where TTE > 15s.
|
||||
* Use this to prioritize work.
|
||||
|
||||
5. **Correlation with engagement**
|
||||
|
||||
* Scatter: TTE vs session length, or TTE vs “user clicked ‘ignore’ / ‘snooze’”.
|
||||
* Aim to confirm the obvious: **long TTE → worse engagement/completion**.
|
||||
|
||||
### 5.2 Operational details
|
||||
|
||||
* Update granularity: **real‑time or ≤15 min** for on‑call/ops panels.
|
||||
* Retention: at least **90 days** to see trends across big releases.
|
||||
* Breakdowns:
|
||||
|
||||
* `backend_region` (to catch regional issues).
|
||||
* `build_version` (to spot regressions quickly).
|
||||
|
||||
---
|
||||
|
||||
## 6. UX & engineering design rules anchored in TTE
|
||||
|
||||
These are the **behavior rules** for the product that keep TTE healthy.
|
||||
|
||||
### 6.1 “Evidence first” layout rules
|
||||
|
||||
* **Evidence above the fold**
|
||||
|
||||
* At least *one* proof element must be visible **without scrolling** on a typical laptop viewport.
|
||||
* **Summary second**
|
||||
|
||||
* CVSS scores, severity badges, long descriptions: all secondary. Evidence should come *before* opinion.
|
||||
* **No fake proof**
|
||||
|
||||
* Don’t use placeholders that *look* like evidence but aren’t (e.g., “example path” or generic text).
|
||||
* If evidence is still loading, show a clear skeleton/loader with “Loading evidence…”.
|
||||
|
||||
### 6.2 Loading strategy rules
|
||||
|
||||
* Start fetching evidence **as soon as navigation begins**, not after the page is fully mounted.
|
||||
* Use **lazy loading** for non‑critical widgets until after proof is shown.
|
||||
* If a call is known to be heavy:
|
||||
|
||||
* Consider **precomputing** and caching the top evidence (shortest path, first SBOM hit).
|
||||
* Stream results: render first proof item as soon as it arrives; don’t wait for the full list.
|
||||
|
||||
### 6.3 Empty / error state rules
|
||||
|
||||
* If there is genuinely no evidence:
|
||||
|
||||
* Explicitly say **“No supporting evidence available yet”** and treat TTE as:
|
||||
|
||||
* Either “no value” (excluded), or
|
||||
* A special bucket `proof_kind = "none"`.
|
||||
* If loading fails:
|
||||
|
||||
* Show a clear error and a **retry** that re‑emits `proof_rendered` when successful.
|
||||
* Log `proof_error` with reason; track error rate alongside TTE.
|
||||
|
||||
---
|
||||
|
||||
## 7. How to *use* TTE in practice
|
||||
|
||||
### 7.1 For releases
|
||||
|
||||
For any change that affects findings UI or evidence plumbing:
|
||||
|
||||
* Add a release checklist item:
|
||||
|
||||
* “No regression on TTE P95 for [pages X, Y].”
|
||||
* During rollout:
|
||||
|
||||
* Compare **pre‑ vs post‑release** TTE P95 by `ui_version`.
|
||||
* If regression > 20%:
|
||||
|
||||
* Roll back, or
|
||||
* Add a follow‑up ticket explicitly tagged with the regression.
|
||||
|
||||
### 7.2 For experiments / A/B tests
|
||||
|
||||
When running UI experiments around findings:
|
||||
|
||||
* Always capture TTE per variant.
|
||||
* Compare:
|
||||
|
||||
* TTE P50/P95.
|
||||
* Task completion rate (e.g., “user changed status”).
|
||||
* Subjective UX (CSAT) if you have it.
|
||||
|
||||
You’re looking for patterns like:
|
||||
|
||||
* Variant B: **+5% completion**, **+8% TTE** → maybe OK.
|
||||
* Variant C: **+2% completion**, **+70% TTE** → probably not acceptable.
|
||||
|
||||
### 7.3 For prioritization
|
||||
|
||||
Use TTE as a lever in planning:
|
||||
|
||||
* If P95 TTE is healthy and stable:
|
||||
|
||||
* More room for new features / experiments.
|
||||
* If P95 TTE is trending up for 2+ weeks:
|
||||
|
||||
* Time to schedule a “TTE debt” story: caching, query optimization, UI re‑layout, etc.
|
||||
|
||||
---
|
||||
|
||||
## 8. Quick “TTE‑ready” checklist
|
||||
|
||||
You’re “tracking UX health with TTE” if you can honestly tick these:
|
||||
|
||||
1. **Instrumentation**
|
||||
|
||||
* [ ] `finding_open` + `proof_rendered` events exist and are correlated.
|
||||
* [ ] TTE computed in a stable pipeline (joins, dedupe, etc.).
|
||||
2. **Targets**
|
||||
|
||||
* [ ] TTE SLOs defined (P95, P99) and agreed by UX + engineering.
|
||||
3. **Dashboards**
|
||||
|
||||
* [ ] A dashboard shows TTE by proof kind, page, and release.
|
||||
* [ ] On‑call / ops can see TTE in near real‑time.
|
||||
4. **UX rules**
|
||||
|
||||
* [ ] Evidence is visible above the fold for all main finding types.
|
||||
* [ ] Non‑critical widgets load after evidence.
|
||||
* [ ] Empty/error states are explicit about evidence availability.
|
||||
5. **Process**
|
||||
|
||||
* [ ] Major UI changes check TTE pre vs post as part of release acceptance.
|
||||
* [ ] Regressions in TTE create real tickets, not just “we’ll watch it”.
|
||||
|
||||
---
|
||||
|
||||
If you tell me what stack you’re on (e.g., React + Next.js + OpenTelemetry + X observability tool), I can turn this into concrete code snippets and an example dashboard spec (fields, queries, charts) tailored exactly to your setup.
|
||||
Reference in New Issue
Block a user