Files
git.stella-ops.org/docs/ux/TRIAGE_UX_GUIDE.md

237 lines
7.8 KiB
Markdown

# Stella Ops Triage UX Guide (Narrative-First + Proof-Linked)
## 0. Scope
This guide specifies the user experience for Stella Ops triage and evidence workflows:
- Narrative-first case view that answers DevOps' three questions quickly.
- Proof-linked evidence surfaces (SBOM/VEX/provenance/reachability/replay).
- Quiet-by-default noise controls with reversible, signed decisions.
- Smart-Diff history that explains meaningful risk changes.
Architecture constraints:
- Lattice/risk evaluation executes in `scanner.webservice`.
- `concelier` and `excititor` must **preserve prune source** (every merged/pruned datum remains traceable to origin).
## 1. UX Contract
Every triage surface must answer, in order:
1) Can I ship this?
2) If not, what exactly blocks me?
3) What's the minimum safe change to unblock?
Everything else is secondary and should be progressively disclosed.
## 2. Primary Objects in the UX
- Finding/Case: a specific vuln/rule tied to an asset (image/artifact/environment).
- Risk Result: deterministic lattice output (score/verdict/lane), computed by `scanner.webservice`.
- Evidence Artifact: signed, hash-addressed proof objects (SBOM slice, VEX doc, provenance, reachability slice, replay manifest).
- Decision: reversible user/system action that changes visibility/gating (mute/ack/exception) and is always signed/auditable.
- Snapshot: immutable record of inputs/outputs hashes enabling Smart-Diff.
## 3. Global UX Principles
### 3.1 Narrative-first, list-second
Default view is a "Case" narrative header + evidence rail. Lists exist for scanning and sorting, but not as the primary cognitive surface.
### 3.2 Time-to-evidence (TTFS) target
From pipeline alert click → human-readable verdict + first evidence link:
- p95 ≤ 30 seconds (including auth and initial fetch).
- "Evidence" is always one click away (no deep tab chains).
### 3.3 Proof-linking is mandatory
Any chip/badge that asserts a fact must link to the exact evidence object(s) that justify it.
Examples:
- "Reachable: Yes" → call-stack slice (and/or runtime hit record)
- "VEX: not_affected" → effective VEX assertion + signature details
- "Blocked by Policy Gate X" → policy artifact + lattice explanation
### 3.4 Quiet by default, never silent
Muted lanes are hidden by default but surfaced with counts and a toggle.
Muting never deletes; it creates a signed Decision with TTL/reason and is reversible.
### 3.5 Deterministic and replayable
Users must be able to export an evidence bundle containing:
- scan replay manifest (feeds/rules/policies/hashes)
- signed artifacts
- outputs (risk result, snapshots)
so auditors can replay identically.
## 4. Information Architecture
### 4.1 Screens
1) Findings Table (global)
- Purpose: scan, sort, filter, jump into cases
- Default: muted lanes hidden
- Banner: shows count of auto-muted by policy with "Show" toggle
2) Case View (single-page narrative)
- Purpose: decision making + proof review
- Above fold: verdict + chips + deterministic score
- Right rail: evidence list
- Tabs (max 3):
- Evidence (default)
- Reachability & Impact
- History (Smart-Diff)
3) Export / Verify Bundle
- Purpose: offline/audit verification
- Async export job, then download DSSE-signed zip
- Verification UI: signature status, hash tree, issuer chain
### 4.2 Lanes (visibility buckets)
Lanes are a UX categorization derived from deterministic risk + decisions:
- ACTIVE
- BLOCKED
- NEEDS_EXCEPTION
- MUTED_REACH (non-reachable)
- MUTED_VEX (effective VEX says not_affected)
- COMPENSATED (controls satisfy policy)
Default: show ACTIVE/BLOCKED/NEEDS_EXCEPTION.
Muted lanes appear behind a toggle and via the banner counts.
## 5. Case View Layout (Required)
### 5.1 Top Bar
- Asset name / Image tag / Environment
- Last evaluated time
- Policy profile name (e.g., "Strict CI Gate")
### 5.2 Verdict Banner (Above fold)
Large, unambiguous verdict:
- SHIP
- BLOCKED
- NEEDS EXCEPTION
Below verdict:
- One-line "why" summary (max 140 chars), e.g.:
- "Reachable path observed; exploit signal present; Policy 'prod-strict' blocks."
### 5.3 Chips (Each chip is clickable)
Minimum set:
- Reachability: Reachable / Not reachable / Unknown (with confidence)
- Effective VEX: affected / not_affected / under_investigation
- Exploit signal: yes/no + source indicator
- Exposure: internet-exposed yes/no (if available)
- Asset tier: tier label
- Gate: allow/block/exception-needed (policy gate name)
Chip click behavior:
- Opens evidence panel anchored to the proof objects
- Shows source chain (concelier/excititor preserved sources)
### 5.4 Evidence Rail (Always visible right side)
List of evidence artifacts with:
- Type icon
- Title
- Issuer
- Signed/verified indicator
- Content hash (short)
- Created timestamp
Actions per item:
- Preview
- Copy hash
- Open raw
- "Show in bundle" marker
### 5.5 Actions Footer (Only primary actions)
- Create work item
- Acknowledge / Mute (opens Decision drawer)
- Propose exception (Decision with TTL + approver chain)
- Export evidence bundle
No more than 4 primary buttons. Secondary actions go into kebab menu.
## 6. Decision Flows (Mute/Ack/Exception)
### 6.1 Decision Drawer (common UI)
Fields:
- Decision kind: Mute reach / Mute VEX / Acknowledge / Exception
- Reason code (dropdown) + free-text note
- TTL (required for exceptions; optional for mutes)
- Policy ref (auto-filled; editable only by admins)
- "Sign and apply" (server-side DSSE signing; user identity included)
On submit:
- Create Decision (signed)
- Re-evaluate lane/verdict if applicable
- Create Snapshot ("DECISION" trigger)
- Show toast with undo link
### 6.2 Undo
Undo is implemented as "revoke decision" (signed revoke record or revocation fields).
Never delete.
## 7. Smart-Diff UX
### 7.1 Timeline
Chronological snapshots:
- when (timestamp)
- trigger (feed/vex/sbom/policy/runtime/decision/rescan)
- summary (short)
### 7.2 Diff panel
Two-column diff:
- Inputs changed (with proof links): VEX assertion changed, policy version changed, runtime trace arrived, etc.
- Outputs changed: lane, verdict, score, gates
### 7.3 Meaningful change definition
The UI only highlights "meaningful" changes:
- verdict change
- lane change
- score crosses a policy threshold
- reachability state changes
- effective VEX status changes
Other changes remain in "details" expandable.
## 8. Performance & UI Engineering Requirements
- Findings table uses virtual scroll and server-side pagination.
- Case view loads in 2 steps:
1) Header narrative (small payload)
2) Evidence list + snapshots (lazy)
- Evidence previews are lazy-loaded and cancellable.
- Use ETag/If-None-Match for case and evidence list endpoints.
- UI must remain usable under high latency (air-gapped / offline kits):
- show cached last-known verdict with clear "stale" marker
- allow exporting bundles from cached artifacts when permissible
## 9. Accessibility & Operator Usability
- Keyboard navigation: table rows, chips, evidence list
- High contrast mode supported
- All status is conveyed by text + shape (not color only)
- Copy-to-clipboard for hashes, purls, CVE IDs
## 10. Telemetry (Must instrument)
- TTFS: notification click → verdict banner rendered
- Time-to-proof: click chip → proof preview shown
- Mute reversal rate (auto-muted later becomes actionable)
- Bundle export success/latency
## 11. Responsibilities by Service
- `scanner.webservice`:
- produces reachability results, risk results, snapshots
- stores/serves case narrative header, evidence indexes, Smart-Diff
- `concelier`:
- aggregates vuln feeds and preserves per-source provenance ("preserve prune source")
- `excititor`:
- merges VEX and preserves original assertion sources ("preserve prune source")
- `notify.webservice`:
- emits first_signal / risk_changed / gate_blocked
- `scheduler.webservice`:
- re-evaluates existing images on feed/policy updates, triggers snapshots
---
**Document Version**: 1.0
**Target Platform**: .NET 10, PostgreSQL >= 16, Angular v17