stella-ops.org/git.stella-ops.org

Fork 0

Files

master dee252940b SPRINT_3600_0001_0001 - Reachability Drift Detection Master Plan

2025-12-18 00:02:31 +02:00

7.8 KiB

Raw Blame History

Stella Ops Triage UX Guide (Narrative-First + Proof-Linked)

0. Scope

This guide specifies the user experience for Stella Ops triage and evidence workflows:

Narrative-first case view that answers DevOps' three questions quickly.
Proof-linked evidence surfaces (SBOM/VEX/provenance/reachability/replay).
Quiet-by-default noise controls with reversible, signed decisions.
Smart-Diff history that explains meaningful risk changes.

Architecture constraints:

Lattice/risk evaluation executes in scanner.webservice.
concelier and excititor must preserve prune source (every merged/pruned datum remains traceable to origin).

1. UX Contract

Every triage surface must answer, in order:

Can I ship this?
If not, what exactly blocks me?
What's the minimum safe change to unblock?

Everything else is secondary and should be progressively disclosed.

2. Primary Objects in the UX

Finding/Case: a specific vuln/rule tied to an asset (image/artifact/environment).
Risk Result: deterministic lattice output (score/verdict/lane), computed by scanner.webservice.
Evidence Artifact: signed, hash-addressed proof objects (SBOM slice, VEX doc, provenance, reachability slice, replay manifest).
Decision: reversible user/system action that changes visibility/gating (mute/ack/exception) and is always signed/auditable.
Snapshot: immutable record of inputs/outputs hashes enabling Smart-Diff.

3. Global UX Principles

3.1 Narrative-first, list-second

Default view is a "Case" narrative header + evidence rail. Lists exist for scanning and sorting, but not as the primary cognitive surface.

3.2 Time-to-evidence (TTFS) target

From pipeline alert click → human-readable verdict + first evidence link:

p95 ≤ 30 seconds (including auth and initial fetch).
"Evidence" is always one click away (no deep tab chains).

3.3 Proof-linking is mandatory

Any chip/badge that asserts a fact must link to the exact evidence object(s) that justify it.

Examples:

"Reachable: Yes" → call-stack slice (and/or runtime hit record)
"VEX: not_affected" → effective VEX assertion + signature details
"Blocked by Policy Gate X" → policy artifact + lattice explanation

3.4 Quiet by default, never silent

Muted lanes are hidden by default but surfaced with counts and a toggle. Muting never deletes; it creates a signed Decision with TTL/reason and is reversible.

3.5 Deterministic and replayable

Users must be able to export an evidence bundle containing:

scan replay manifest (feeds/rules/policies/hashes)
signed artifacts
outputs (risk result, snapshots) so auditors can replay identically.

4. Information Architecture

4.1 Screens

Findings Table (global)

Purpose: scan, sort, filter, jump into cases
Default: muted lanes hidden
Banner: shows count of auto-muted by policy with "Show" toggle

Case View (single-page narrative)

Purpose: decision making + proof review
Above fold: verdict + chips + deterministic score
Right rail: evidence list
Tabs (max 3):
- Evidence (default)
- Reachability & Impact
- History (Smart-Diff)

Export / Verify Bundle

Purpose: offline/audit verification
Async export job, then download DSSE-signed zip
Verification UI: signature status, hash tree, issuer chain

4.2 Lanes (visibility buckets)

Lanes are a UX categorization derived from deterministic risk + decisions:

ACTIVE
BLOCKED
NEEDS_EXCEPTION
MUTED_REACH (non-reachable)
MUTED_VEX (effective VEX says not_affected)
COMPENSATED (controls satisfy policy)

Default: show ACTIVE/BLOCKED/NEEDS_EXCEPTION. Muted lanes appear behind a toggle and via the banner counts.

5. Case View Layout (Required)

5.1 Top Bar

Asset name / Image tag / Environment
Last evaluated time
Policy profile name (e.g., "Strict CI Gate")

5.2 Verdict Banner (Above fold)

Large, unambiguous verdict:

SHIP
BLOCKED
NEEDS EXCEPTION

Below verdict:

One-line "why" summary (max 140 chars), e.g.:
- "Reachable path observed; exploit signal present; Policy 'prod-strict' blocks."

5.3 Chips (Each chip is clickable)

Minimum set:

Reachability: Reachable / Not reachable / Unknown (with confidence)
Effective VEX: affected / not_affected / under_investigation
Exploit signal: yes/no + source indicator
Exposure: internet-exposed yes/no (if available)
Asset tier: tier label
Gate: allow/block/exception-needed (policy gate name)

Chip click behavior:

Opens evidence panel anchored to the proof objects
Shows source chain (concelier/excititor preserved sources)

5.4 Evidence Rail (Always visible right side)

List of evidence artifacts with:

Type icon
Title
Issuer
Signed/verified indicator
Content hash (short)
Created timestamp Actions per item:
Preview
Copy hash
Open raw
"Show in bundle" marker

5.5 Actions Footer (Only primary actions)

Create work item
Acknowledge / Mute (opens Decision drawer)
Propose exception (Decision with TTL + approver chain)
Export evidence bundle

No more than 4 primary buttons. Secondary actions go into kebab menu.

6. Decision Flows (Mute/Ack/Exception)

6.1 Decision Drawer (common UI)

Fields:

Decision kind: Mute reach / Mute VEX / Acknowledge / Exception
Reason code (dropdown) + free-text note
TTL (required for exceptions; optional for mutes)
Policy ref (auto-filled; editable only by admins)
"Sign and apply" (server-side DSSE signing; user identity included)

On submit:

Create Decision (signed)
Re-evaluate lane/verdict if applicable
Create Snapshot ("DECISION" trigger)
Show toast with undo link

6.2 Undo

Undo is implemented as "revoke decision" (signed revoke record or revocation fields). Never delete.

7. Smart-Diff UX

7.1 Timeline

Chronological snapshots:

when (timestamp)
trigger (feed/vex/sbom/policy/runtime/decision/rescan)
summary (short)

7.2 Diff panel

Two-column diff:

Inputs changed (with proof links): VEX assertion changed, policy version changed, runtime trace arrived, etc.
Outputs changed: lane, verdict, score, gates

7.3 Meaningful change definition

The UI only highlights "meaningful" changes:

verdict change
lane change
score crosses a policy threshold
reachability state changes
effective VEX status changes Other changes remain in "details" expandable.

8. Performance & UI Engineering Requirements

Findings table uses virtual scroll and server-side pagination.
Case view loads in 2 steps:
1. Header narrative (small payload)
2. Evidence list + snapshots (lazy)
Evidence previews are lazy-loaded and cancellable.
Use ETag/If-None-Match for case and evidence list endpoints.
UI must remain usable under high latency (air-gapped / offline kits):
- show cached last-known verdict with clear "stale" marker
- allow exporting bundles from cached artifacts when permissible

9. Accessibility & Operator Usability

Keyboard navigation: table rows, chips, evidence list
High contrast mode supported
All status is conveyed by text + shape (not color only)
Copy-to-clipboard for hashes, purls, CVE IDs

10. Telemetry (Must instrument)

TTFS: notification click → verdict banner rendered
Time-to-proof: click chip → proof preview shown
Mute reversal rate (auto-muted later becomes actionable)
Bundle export success/latency

11. Responsibilities by Service

scanner.webservice:
- produces reachability results, risk results, snapshots
- stores/serves case narrative header, evidence indexes, Smart-Diff
concelier:
- aggregates vuln feeds and preserves per-source provenance ("preserve prune source")
excititor:
- merges VEX and preserves original assertion sources ("preserve prune source")
notify.webservice:
- emits first_signal / risk_changed / gate_blocked
scheduler.webservice:
- re-evaluates existing images on feed/policy updates, triggers snapshots

Document Version: 1.0 Target Platform: .NET 10, PostgreSQL >= 16, Angular v17

7.8 KiB Raw Blame History