Files
git.stella-ops.org/docs/qa/DASHBOARD_REDESIGN_PROPOSAL_20260316.md
master 534aabfa2a First-time user experience fixes and platform contract repairs
FTUX fixes (Sprint 316-001):
- Remove all hardcoded fake data from dashboard — fresh installs show
  honest setup guide instead of fake crisis data (5 fake criticals gone)
- Curate advisory source defaults: 32 sources disabled by default
  (ecosystem, geo-restricted, exploit, hardware, mirror). ~43 core
  sources remain enabled. StellaOps Mirror no longer enabled at priority 1.
- Filter Mirror-category sources from Create Domain wizard to prevent
  circular mirror-from-mirror chains
- Add 404 catch-all route — unknown URLs show "Page Not Found" instead
  of silently rendering the dashboard
- Fix arrow characters in release target path dropdown (? → →)
- Add login credentials to quickstart documentation
- Update Feature Matrix: 14 release orchestration features marked as
  shipped (was marked planned)

Platform contract repairs (from prior session):
- Add /api/v1/jobengine/quotas/summary endpoint on Platform
- Fix gateway route prefix matching for /policy/shadow/* and
  /policy/simulations/* (regex routes instead of exact match)
- Fix VexHub PostgresVexSourceRepository missing interface method
- Fix advisory-vex-sources sweep text expectation
- Fix mirror operator journey auth (session storage token extraction)

Verified: 110/111 canonical routes passing (1 unrelated stale approval ref)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 02:05:38 +02:00

17 KiB

Dashboard Redesign Proposal — Stella Ops Mission Board

Date: 2026-03-16 Author: First-time user audit + product analysis Status: Proposal


What Stella Ops Actually Is (The Mental Model)

Stella Ops is a release control plane that answers three questions for every deployment:

  1. Is it safe? — Vulnerabilities, SBOM health, reachability evidence, VEX dispositions
  2. Is it approved? — Policy gates, human approvals, compliance evidence
  3. Is it working? — Deployment health, service connectivity, feed freshness

The dashboard must reflect these three pillars. Currently it's a single vertical scroll of hardcoded fake data that mixes all three with no hierarchy.


Current Dashboard Problems

P1: Everything is hardcoded

  • summary signal: { activePromotions: 3, blockedPromotions: 1, ... } — fake
  • resolveStatusSeed(): generates fake metrics by environment type (dev=healthy, staging=degraded, prod+us-east=blocked) — deterministic lies
  • reachabilityStats: { bCoverage: 72, iCoverage: 88, rCoverage: 61 } — fake
  • nightlyOpsSignals: 4 hardcoded items — fake
  • Alerts section: 3 hardcoded HTML <li> items — fake
  • Activity section: 3 hardcoded HTML cards — fake
  • Zero API calls to real backends

P2: Layout has no hierarchy

Current vertical order: Summary strip → Environment grid → Risk table → 3-card row (SBOM + Reachability + Ops Signals) → Alerts → Activity → Domain nav.

This is 7 sections stacked vertically. A user scrolls through 3+ screens of content with no visual priority. The most critical information (am I blocked?) competes with the least actionable (domain navigation links).

P3: No data source distinction

Dashboard shows "5 critical findings" and "blocked" but the security posture page shows "0 findings". The user can't tell what's real. There's no "last updated" timestamp, no data source indicator, no "demo data" badge.

P4: Duplicate information

  • Environment grid AND risk table show the same environments with the same metrics
  • SBOM card recalculates stats from the same environment data shown in the grid
  • Reachability percentages are shown per-environment (B/I/R column) AND as aggregate (Reachability card)

Proposed Layout: 3-Column Mission Board

┌─────────────────────────────────────────────────────────────────────┐
│ Dashboard — Mission Board for [Demo Production]          [Refresh] │
│ Last updated: 15 Mar 2026, 23:15 UTC                              │
├───────────────────────┬─────────────────────────────────────────────┤
│                       │                                             │
│  SECURITY POSTURE     │  ENVIRONMENTS & ACTIONS                     │
│  (1/3 width)          │  (2/3 width)                                │
│                       │                                             │
│  ┌─────────────────┐  │  ┌─────────────────────────────────────────┐│
│  │ VULNERABILITY    │  │  │ PROMOTION PIPELINE                     ││
│  │ SUMMARY          │  │  │                                         ││
│  │                  │  │  │ [env cards in promotion order:          ││
│  │ Critical:  12    │  │  │  Dev → Stage → Prod per region]        ││
│  │ High:      34    │  │  │                                         ││
│  │ Medium:    89    │  │  │  Blocked: prod-us-east (5 crit)        ││
│  │ Low:      156    │  │  │  Degraded: staging (stale SBOM)        ││
│  │                  │  │  │  Healthy: dev, prod-eu-west             ││
│  │ [severity donut  │  │  │                                         ││
│  │  or bar chart]   │  │  └─────────────────────────────────────────┘│
│  │                  │  │                                             │
│  │ Reachable: 9     │  │  ┌─────────────────────────────────────────┐│
│  │ Unreachable: 47  │  │  │ NEEDS YOUR ATTENTION                   ││
│  │ Unknown: 23      │  │  │                                         ││
│  │                  │  │  │ ⚠ 3 approvals blocked (evidence stale) ││
│  └─────────────────┘  │  │ ⚠ 2 waivers expiring in 24h            ││
│                       │  │ ⚠ 1 promotion blocked by policy gate    ││
│  ┌─────────────────┐  │  │ 🔴 Feed freshness degraded              ││
│  │ SBOM HEALTH      │  │  │                                         ││
│  │                  │  │  │ [each item links to the action page]    ││
│  │ Components: 247  │  │  └─────────────────────────────────────────┘│
│  │ Fresh: 231       │  │                                             │
│  │ Stale: 12        │  │  ┌─────────────────────────────────────────┐│
│  │ Missing: 4       │  │  │ RECENT ACTIVITY (real events)           ││
│  │                  │  │  │                                         ││
│  │ B/I/R Coverage   │  │  │ • admin sealed "API Gateway v2.1"      ││
│  │ B: 72%           │  │  │ • Policy gate blocked prod-us-east     ││
│  │ I: 88%           │  │  │ • NVD feed synced (142 new advisories) ││
│  │ R: 61%           │  │  │ • Doctor check: 1 fail, 9 warn         ││
│  └─────────────────┘  │  │                                         ││
│                       │  │ [live event stream, not static cards]    ││
│  ┌─────────────────┐  │  └─────────────────────────────────────────┘│
│  │ ADVISORY FEEDS   │  │                                             │
│  │                  │  │                                             │
│  │ Sources: 55/75   │  │                                             │
│  │ Healthy: 55      │  │                                             │
│  │ Failed: 18       │  │                                             │
│  │ Last sync: 2m    │  │                                             │
│  │                  │  │                                             │
│  │ [Configure]      │  │                                             │
│  └─────────────────┘  │                                             │
│                       │                                             │
├───────────────────────┴─────────────────────────────────────────────┤
│ PLATFORM HEALTH (full width footer bar)                             │
│                                                                     │
│ Services: 63/63 ✓  │ DB: healthy │ Events: CONNECTED │ Doctor: 7/1/1│
│ Feed: Live         │ Evidence: ON │ Offline: OK       │ DLQ: 3      │
└─────────────────────────────────────────────────────────────────────┘

Column 1 (1/3): Security Posture At-a-Glance

Purpose: Answer "Is my estate safe?" without leaving the dashboard.

Section 1A: Vulnerability Summary

Data source: GET /api/v1/findings/summary (findings ledger) or GET /api/v1/scanner/summary (scanner service)

What to show:

  • Severity breakdown: Critical / High / Medium / Low counts
  • Reachability breakdown: Reachable / Unreachable / Unknown counts
  • Donut chart or horizontal stacked bar colored by severity
  • Trend arrow (↑/↓/→) compared to 24h ago
  • Link: "Open Findings" → /security/triage

Why this matters:

  • This is the #1 thing a security auditor looks at
  • Currently the dashboard shows "5 critical" per environment but it's all fake
  • Real data from the findings ledger makes this trustworthy

Section 1B: SBOM Health

Data source: GET /api/v1/scanner/sbom/summary or computed from environment scan state

What to show:

  • Total components tracked across all environments
  • Freshness breakdown: Fresh / Stale / Missing
  • B/I/R reachability coverage bars (the existing bar chart, but from real data)
  • Link: "View Supply Chain" → /security/supply-chain-data

Section 1C: Advisory Feed Status

Data source: GET /api/v1/advisory-sources/status (Concelier source management)

What to show:

  • Sources active: X of Y enabled
  • Healthy / Failed count
  • Last sync timestamp
  • Link: "Configure Sources" → /setup/integrations/advisory-vex-sources

Why this matters:

  • Advisory freshness directly affects vulnerability accuracy
  • If feeds are stale, findings are stale, decisions are wrong
  • This was the 55/75 healthy we discovered in the audit

Column 2 (2/3): Environments & Actions

Purpose: Answer "What needs my attention?" and "What's the deployment state?"

Section 2A: Promotion Pipeline

Data source: GET /api/v1/platform/context/environments (already loads via PlatformContextStore) + real scan/promotion state

What to show:

  • Environment cards in promotion order (Dev → Stage → Prod, grouped by region)
  • Each card: name, region, deploy status badge, SBOM freshness, critical findings count, pending approvals
  • Blocked environments highlighted at top with red border
  • Actions per card: Detail, Findings, Promote, Approve
  • Real metrics from scan/promotion APIs, NOT resolveStatusSeed()

Layout: Same card grid as current, but driven by real data.

Section 2B: Needs Your Attention

Data source: Multiple — approvals API, waivers API, promotions API, notifications API

What to show:

  • Actionable items only — things the user MUST do:
    • Pending approvals (with count and reason)
    • Expiring waivers (with countdown)
    • Blocked promotions (with blocking reason: policy gate, evidence freshness, etc.)
    • Feed degradation alerts
  • Each item is a clickable link to the action page
  • NOT static HTML — real data from APIs

Why this matters:

  • The current "Alerts" section is 3 hardcoded <li> elements
  • A real operator needs to see: "You have 3 things to do today"

Section 2C: Recent Activity (Live Stream)

Data source: GET /api/v1/timeline/events or WebSocket event stream (already have "Events: CONNECTED" in topbar)

What to show:

  • Real events, most recent first:
    • Release sealed/promoted/rolled back
    • Policy gate pass/block
    • Advisory feed sync (with advisory count)
    • Doctor check results
    • User actions (approval, waiver, exception)
  • Live updates via the event stream (already connected)
  • Maximum 10 items, with "View full activity" link → /evidence/audit-log

Why this matters:

  • Current "Recent Activity" is 3 static cards with text descriptions — not actual activity
  • The topbar already shows "Events: CONNECTED" — the live stream is available

Data source: GET /api/v1/platform/health or GET /api/v1/doctor/last-run/summary

What to show (single horizontal bar, always visible):

  • Services: 63/63 healthy (or X unhealthy)
  • DB: healthy/degraded
  • Events: Connected/Degraded
  • Doctor: 7 pass / 1 warn / 1 fail (link to /ops/operations/doctor)
  • Feed: Live/Stale
  • Evidence: ON/OFF
  • Offline: OK/Sealed
  • DLQ: N items (link to /ops/operations/dead-letter)

Why this matters:

  • The topbar already shows some of these (Events, Policy, Evidence, Feed, Offline)
  • But the dashboard should also show system health — an operator wants to know "is the platform itself healthy?" at a glance
  • Doctor results from the last run are critical operational context

What to Remove

  1. Risk table — duplicate of the environment grid. Merge into a single view.
  2. Domain navigation links (Release Runs, Security & Risk, Platform, Evidence, Platform Setup) — the sidebar already provides this. Dashboard space is premium.
  3. Activity section (3 static cards) — replace with real live activity stream
  4. All hardcoded data — every number must come from an API or show "No data yet"

Empty State (Fresh Install)

When the dashboard has no real data (no environments, no scans, no releases):

┌─────────────────────────────────────────────────────────┐
│  Welcome to Stella Ops                                   │
│                                                          │
│  Let's set up your release control plane.                │
│                                                          │
│  ① Connect a registry    [Setup Integrations →]          │
│  ② Define environments   [Topology Wizard →]             │
│  ③ Scan your first image [Start Scan →]                  │
│  ④ Create a release      [Create Release →]              │
│                                                          │
│  ─────────────────────────────────────────               │
│  Platform Health: 63/63 services ✓                       │
│  Advisory Sources: 55/75 healthy                          │
│  Doctor: Run diagnostics → [Quick Check]                 │
└─────────────────────────────────────────────────────────┘

The empty state should guide the user through setup, not show fake crisis data.


Data Sources Required

Dashboard Section API Endpoint Service
Vulnerability Summary GET /api/v1/findings/summary Findings Ledger
SBOM Health GET /api/v1/scanner/sbom/summary Scanner
B/I/R Reachability GET /api/v1/reachgraph/coverage ReachGraph
Advisory Feed Status GET /api/v1/advisory-sources/status Concelier
Environment Cards GET /api/v1/platform/context Platform (exists)
Environment Scan State GET /api/v1/scanner/environments/summary Scanner
Pending Approvals GET /api/v1/approvals?status=pending JobEngine
Expiring Waivers GET /api/v1/exceptions?expiresWithin=24h Policy
Blocked Promotions GET /api/v1/promotions?status=blocked JobEngine
Recent Activity GET /api/v1/timeline/events?limit=10 Timeline
Platform Health GET /api/v1/doctor/last-run/summary Platform/Doctor
Service Count GET /api/v1/platform/health Platform
DLQ Count GET /api/v1/jobengine/dead-letter/summary JobEngine

Many of these endpoints already exist (Platform context, advisory sources status, doctor, dead-letter, jobengine). Some may need summary/aggregation endpoints.


Implementation Approach

Phase 1: Honest Empty State (S effort)

  • Replace hardcoded data with API calls that can return empty
  • When empty: show the welcome/setup guide instead of fake data
  • When real data exists: show real data

Phase 2: 3-Column Layout (M effort)

  • Restructure the template from vertical scroll to 3-column grid
  • Left column: security posture cards
  • Right column: environments + actions + activity
  • Footer: platform health bar

Phase 3: Real API Wiring (L effort)

  • Wire each section to its real API endpoint
  • Add loading skeletons per section
  • Add "last updated" timestamps
  • Handle API errors gracefully (show error state per section, not whole page)

Phase 4: Live Activity Stream (M effort)

  • Replace static activity cards with real event stream
  • Use the existing WebSocket connection (Events: CONNECTED)
  • Show 10 most recent events with live updates

Files to Modify

File Change
src/Web/StellaOps.Web/src/app/features/dashboard-v3/dashboard-v3.component.ts Complete rewrite of template + data sources
src/Web/StellaOps.Web/src/app/core/api/ Add dashboard summary API clients
src/Platform/StellaOps.Platform.WebService/Endpoints/ Add dashboard aggregation endpoint
src/Web/StellaOps.Web/src/app/features/dashboard-v3/dashboard-v3.component.spec.ts Update tests