Polish UI across all route groups + redesign welcome page

- Welcome: split-panel layout with Sign In always above fold, feature cards, trust badges
- Release Control: dashboard, releases, promotions, approvals — design token alignment
- Security: posture, findings, scan submit, unknowns, reports — compact tables, severity badges
- Operations: ops hub, jobengine, scheduler, doctor, notifications, feeds — consistent styling
- Audit & Evidence: evidence overview, audit log, export center, replay — shimmer loading
- Setup & Admin: topology, integrations, identity, trust, system — hover lift, focus rings
- Shared: buttons, tabs, forms, colors — unified design tokens (btn-primary, tab-active, focus-ring)
- Archive 3 completed sprints (SPRINT_20260317_001/002/003)
- Add QA journey reports and route map

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-18 00:02:25 +02:00
parent b851aa8300
commit 8e6cbeab97
208 changed files with 9012 additions and 2207 deletions

View File

@@ -1,230 +0,0 @@
# Sprint 20260317-003 — Journey Problem Cluster Fixes
## Topic & Scope
- Implement all P0, P1, and P2 fixes identified in the Journey Problem Clusters Action Report (`docs/qa/JOURNEY_PROBLEM_CLUSTERS_ACTION_REPORT_20260317.md`).
- Covers VexHub migration repair, gateway route fixes, scope alignment, audit normalization, stage persistence, posture error tracking, navigation vocabulary, command palette, scan UX, welcome page, and release flow clarity.
- Working directories: `src/VexHub/`, `src/Web/`, `src/Platform/`, `src/Timeline/`, `devops/compose/`.
- Expected evidence: all three C# services build clean (0 warnings), TypeScript compiles clean (no new errors), all journey cluster items addressed.
## Dependencies & Concurrency
- Depends on `docs/implplan/SPRINT_20260317_002_DOCS_journey_problem_clusters_action_report.md` (analysis).
- No upstream sprint blockers — all changes are self-contained.
## Documentation Prerequisites
- `docs/qa/JOURNEY_PROBLEM_CLUSTERS_ACTION_REPORT_20260317.md`
- `AGENTS.md`
## Delivery Tracker
### P0-1 - VexHub migration mismatch repair
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Migration 002 references `vexhub.vex_sources` but 001 creates `vexhub.sources`.
- Added `003_fix_source_backoff_columns.sql` with `IF NOT EXISTS` for idempotency.
- Added `ConsecutiveFailures` and `NextEligiblePollAt` properties to `VexSource.cs`.
- Added EF column mappings in `VexHubDbContext.cs`.
Completion criteria:
- [x] Migration 003 exists and uses correct table name
- [x] EF model has backoff column mappings
- [x] VexHub service builds clean (0 warnings, 0 errors)
### P0-2 - Console-admin gateway route
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Frontend calls `/console/admin/*` but gateway had no explicit route, causing requests to fall through to Platform (404).
- Added `/console/admin``authority.stella-ops.local/console/admin` route before the generic `/console` route.
Completion criteria:
- [x] Gateway config has `/console/admin` route with correct specificity ordering
### P0-3 - Unknowns path fix (client + gateway)
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Web client called `/api/v1/scanner/unknowns` but scanner exposes `/api/v1/unknowns`.
- Changed client base URL to `/api/v1/unknowns`.
- Added gateway route `^/api/v1/unknowns(.*)` → scanner service.
- Updated test script references.
Completion criteria:
- [x] Client uses `/api/v1/unknowns`
- [x] Gateway has explicit unknowns route
- [x] No stale `scanner/unknowns` references in `src/Web/`
### P0-4 - Identity Providers scope fix
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Backend requires `platform.idp.admin` scope but `stella-ops-ui` client didn't include it.
- Added `platform.idp.read` and `platform.idp.admin` to `allowed_scopes` in `04-authority-schema.sql`.
- Added both scopes to the OIDC `scope` string in `config.json`.
Completion criteria:
- [x] SQL seed includes IDP scopes
- [x] Web config requests IDP scopes during login
### P0-5 - Risk dashboard URL construction
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Client built risk URLs from `authorityBase + '/risk'` → double-pathed `/authority/risk/risk/status`.
- Changed `app.config.ts` to use gateway base and `/api/risk`.
- Removed duplicate `/risk` prefix from all `risk-http.client.ts` endpoint paths.
Completion criteria:
- [x] `RISK_API_BASE_URL` resolves to `/api/risk` via gateway
- [x] No duplicate `/risk/risk` paths in client
### P1-1 - Audit module normalization + Authority source
Status: DONE
Dependency: none
Owners: Developer
Task description:
- `NormalizeModule` mapped "evidencelocker"→"sbom" and "notify"→"integrations" (wrong).
- Fixed to preserve original module names.
- Added `evidencelocker` and `notify` to the known modules catalog.
- Fixed hardcoded module labels in `HttpUnifiedAuditEventProvider`.
- Added Authority audit fetcher (`/console/admin/audit`) as a new source.
- Wired `AuthorityBaseUrl` config in `Program.cs`.
Completion criteria:
- [x] Module names are 1:1 with actual modules
- [x] Authority audit events are fetched
- [x] Timeline service builds clean
### P1-2 - Stage persistence full chain
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Stage was tracked in web store but never sent to backend or persisted in DB.
- Added `Stage` to `PlatformContextPreferencesRequest` and `PlatformContextPreferences`.
- Added stage to SQL upsert in `PlatformContextService.cs`.
- Added EF model property and column mapping.
- Added `stage` to `buildPreferencesPayload()` in TypeScript store.
- Created migration `059_UiContextPreferencesStage.sql`.
Completion criteria:
- [x] Stage round-trips: web store → API → DB → API → web store
- [x] Platform service builds clean
- [x] Migration file exists and is embedded
### P1-3 - Security posture degraded-data tracking
Status: DONE
Dependency: none
Owners: Developer
Task description:
- `SecurityRiskOverviewComponent` used `catchError(() => of([]))` silently converting API failures to zeros.
- Added 5 per-source error signals and a `hasDegradedData` computed signal.
- Each `catchError` now sets its error signal before returning the fallback.
- Error signals are cleared on each load cycle.
- Added degradation banner in template.
Completion criteria:
- [x] Per-source error tracking in place
- [x] Degradation banner shows when any source fails
- [x] TypeScript compiles clean
### P2-1 - Rename Triage to Findings in navigation
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Changed top-level nav group label from "Triage" to "Findings".
- Updated breadcrumb display text for `/triage/` segments.
- Left route paths and internal IDs unchanged.
Completion criteria:
- [x] Navigation shows "Findings" instead of "Triage"
- [x] Breadcrumbs show "Findings"
- [x] No route path changes
### P2-2 - Command palette plain scan search
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Plain text "scan" returned no quick actions (only `>` prefix did).
- Added `inlineMatchedActions` signal for mixed-mode results.
- Plain text queries now show matching quick actions above search results.
- Fixed scan quick action routes: `scan` and `scan-image` now route to `/security/scan` instead of triage pages.
Completion criteria:
- [x] Typing "scan" shows quick actions + search results
- [x] Scan actions route to `/security/scan`
- [x] Keyboard navigation works across both sections
### P2-3 - Scan local-mode limitation messaging
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Scan UI waited 60 polls (~3 minutes) before showing any explanation.
- Added `pollCount` signal, `scanInProgress` and `showQueueHint` computed signals.
- Immediate info banner on scan start explains local-mode queue behavior.
- After 10 polls (~30s), a queue hint banner appears with link to Jobs Engine.
Completion criteria:
- [x] Info banner visible immediately after scan submission
- [x] Queue hint appears after ~30 seconds
- [x] Both banners disappear on scan completion
### P2-4 - Post-seal promotion CTA
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Sealing a release didn't explain that promotion is the next step.
- Added explanation text distinguishing sealing from deployment.
- Added primary "Request Promotion" button linking to `/releases/promotions/create` with `releaseId` pre-filled.
- Demoted secondary links (view promotions, back to versions) to outline style.
Completion criteria:
- [x] Post-seal section explains sealing vs. promotion
- [x] "Request Promotion" CTA with pre-filled release ID
- [x] Visual hierarchy: primary CTA > secondary links
### P2-5 - Welcome page operator adoption rewrite
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Welcome page was brand-heavy with generic chips. Didn't explain what Stella does for operators.
- Added "Get Started" journey: Connect Registry → Scan Artifact → Governed Release → Promote with Evidence.
- Added "What Stella Replaces" section: manual scripts → policy-gated promotions, scattered scans → unified posture, trust-me deploys → verifiable evidence.
- Kept sign-in button, docs link, auth notice, and existing layout structure.
Completion criteria:
- [x] Welcome page answers "what do I stop scripting?" within 20 seconds
- [x] Four concrete first steps visible
- [x] Before/after value props visible
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-17 | Sprint created from Journey Problem Clusters Action Report. | Developer |
| 2026-03-17 | P0 items implemented in parallel (5 agents): VexHub migration, gateway routes, IDP scope, unknowns path, risk URL. All verified — 3 C# services build clean, TS compiles clean. | Developer |
| 2026-03-17 | P1 items implemented in parallel (3 agents): audit normalization + Authority source, stage persistence full chain, posture degraded-data tracking. All verified — builds clean. | Developer |
| 2026-03-17 | P2 items implemented in parallel (5 agents): Triage→Findings rename, command palette scan fix, scan local-mode messaging, post-seal promotion CTA, welcome page rewrite. All verified — TS compiles clean. | Developer |
## Decisions & Risks
- VexHub migration 003 uses `IF NOT EXISTS` for idempotency — safe on both fresh and partially-migrated databases.
- IDP scope changes only take effect on fresh DB (INSERT ON CONFLICT DO NOTHING). Existing deployments need manual `allowed_scopes` update or volume reset.
- Authority audit endpoint (`/console/admin/audit`) response shape was inferred from ConsoleAdminEndpointExtensions — may need runtime verification.
- Risk dashboard: the gateway route exists for `/api/risk/*` but some dashboard summary endpoints (`/api/risk/status`, `/api/risk/aggregated-status`) may not exist in the backend yet. The URL construction is now correct, but 404s may persist until backend endpoints are implemented.
- Welcome page content is operator-focused but may need product review for messaging alignment.
- Pre-existing TS error in `trust-score-config.component.spec.ts:234` is unrelated to this sprint.
## Next Checkpoints
- Rebuild affected Docker images (vexhub, platform, timeline, router-gateway, console).
- Reset DB volume and verify fresh-start VexHub health.
- Run full local journey re-test to confirm fixes resolve the reported issues.
- Product review of welcome page copy and Findings/Triage vocabulary decision.

View File

@@ -0,0 +1,807 @@
# Journey Problem Clusters Action Report - 2026-03-17
## Sources And Intent
This report consolidates three local evaluation reports:
- `docs/qa/JOURNEY_NOTES_20260316.md`
- `docs/qa/DEEP_JOURNEY_UX_FINDINGS_20260316.md`
- `docs/qa/LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md`
Goal: convert route notes and UX observations into a planable engineering report.
Method:
- Re-read the three reports together.
- Check active route ownership and client code instead of assuming older route names are still live.
- Confirm whether each problem is:
- a confirmed code defect
- a confirmed gateway or contract mismatch
- a product-model or UX problem
- a likely stale finding from an older UI state
Confidence labels used below:
- High: confirmed by active code path or gateway/runtime config.
- Medium: strongly supported by current code and observed behavior, but still needs one focused runtime retest.
- Low: report finding is likely real historically, but current code suggests it may already be fixed or partially fixed.
## Executive Summary
The three journey reports collapse into seven actionable clusters:
1. Clean-start reliability is not good enough for first evaluation. VexHub fails startup on a fresh local database because of a migration/table-name mismatch. That contaminates VEX and policy-first journeys from the start.
2. Several important route families are broken for different reasons, but they are all the same class of problem: web-to-gateway-to-service contract drift. `/security/risk`, `/security/unknowns`, and console-admin routes each demonstrate a different version of that drift.
3. Scope policy and UI visibility are misaligned. The strongest example is Identity Providers: the page is visible to an admin user, but the backend requires a scope the default admin client does not request.
4. The audit story is structurally incomplete, not just empty. The unified audit UI advertises modules that the Timeline aggregator does not actually ingest, and many services do not appear to emit unified audit events at all.
5. Shell context is unreliable. Stage is implemented in the web store but not persisted by backend preferences, region behaves like a multi-select without saying so, and some route families silently shed scope.
6. Stella still explains itself better in docs than in product. Welcome/onboarding copy is brand-heavy, while the actual operator adoption story sits in `/docs`.
7. A few older report items look partially fixed already. Those should be re-tested before they become backlog items, otherwise planning will mix live defects with already-landed cleanup.
If we want a high-leverage plan, the first sequence should be:
1. Clean boot and first-run blockers
2. Route/contract repairs on risk, unknowns, admin, and identity providers
3. Audit trust foundation
4. Context integrity
5. Adoption/discoverability and workflow honesty
## Cluster 1 - Clean-Start Reliability And Runtime Readiness
### Problem
Fresh local startup is not clean. VEX-backed product surfaces start in a broken state before a first-time evaluator takes any action.
### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded `vexhub` startup failure and visible `503` on `/ops/policy/vex`.
- The same report explicitly called out VEX-dependent surfaces as contaminated from the start.
### Confirmed Technical Cause
Confidence: High
Active code review shows:
- VexHub startup migrations are wired correctly via `AddStartupMigrations(...)`.
- SQL migration files are embedded as resources in the persistence project.
- `001_initial_schema.sql` creates `vexhub.sources`.
- `002_add_source_backoff_columns.sql` alters `vexhub.vex_sources`.
That means startup migration wiring is present, but migration `002` targets the wrong table name.
Relevant files:
- `src/VexHub/StellaOps.VexHub.WebService/Program.cs`
- `src/VexHub/__Libraries/StellaOps.VexHub.Persistence/Extensions/VexHubPersistenceExtensions.cs`
- `src/VexHub/__Libraries/StellaOps.VexHub.Persistence/Migrations/001_initial_schema.sql`
- `src/VexHub/__Libraries/StellaOps.VexHub.Persistence/Migrations/002_add_source_backoff_columns.sql`
- `src/VexHub/__Libraries/StellaOps.VexHub.Persistence/EfCore/Models/VexSource.cs`
### Impact
- First-run trust is broken before onboarding starts.
- `/ops/policy/vex` becomes a false-negative indicator for the whole policy stack.
- Any UX conclusions around VEX, trust, or exception workflows are polluted by startup contamination.
### Solution Options
Option A - Fix migration `002` in place
- Change `ALTER TABLE vexhub.vex_sources` to `ALTER TABLE vexhub.sources`.
- Add a fresh-database integration test that boots VexHub and verifies startup succeeds.
Pros:
- Smallest immediate change.
- Fastest path for local QA recovery.
Cons:
- Risky if the migration history has already escaped into environments where partial state exists.
Option B - Add a repair migration
- Leave `002` untouched.
- Add `003` that conditionally repairs whichever table name exists, then converges on `vexhub.sources`.
Pros:
- Safer for environments that may already have broken/partial migration history.
- Explicitly documents the schema repair.
Cons:
- Slightly more work.
### Recommended Path
Use Option B unless we are certain the current migration history is still effectively local-only. Also add a boot smoke test in compose/CI that fails the stack when VexHub does not become healthy on a fresh database.
## Cluster 2 - Route Contracts, Gateway Drift, And Broken Surface Areas
This cluster contains the highest concentration of "page loads but API contract is wrong" failures.
### 2.1 Risk Dashboard (`/security/risk`)
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded multiple `404`s from the page:
- `/authority/risk/risk/status`
- `/authority/risk/risk`
- `/api/risk-budget/kpis`
- `/api/risk-budget/snapshot`
#### Confirmed Technical Cause
Confidence: High
Active code review shows:
- `RISK_API_BASE_URL` is built from `authorityBase + '/risk'`.
- `RiskHttpClient` then calls `${baseUrl}/risk`, `${baseUrl}/risk/status`, `${baseUrl}/risk/aggregated-status`, and related paths.
- That produces paths like `/authority/risk/risk/status`.
- The local gateway expects `/api/risk/*` and `/api/risk-budget/*`.
Relevant files:
- `src/Web/StellaOps.Web/src/app/app.config.ts`
- `src/Web/StellaOps.Web/src/app/core/api/risk-http.client.ts`
- `devops/compose/router-gateway-local.json`
#### Impact
- The risk page is fundamentally wired to the wrong base path.
- Because the page is part of the security and governance narrative, this looks like product immaturity rather than a single client bug.
#### Solution Options
Option A - Canonicalize on `/api/risk` and `/api/risk-budget`
- Remove authority-based risk URL construction from the web app.
- Split risk and risk-budget clients cleanly by actual gateway contract.
Option B - Keep current client shapes but add a gateway alias
- Route `/authority/risk/*` to the policy risk backend.
Pros:
- Faster patch if multiple clients already depend on the wrong path.
Cons:
- Preserves a misleading contract shape.
- Increases long-term route drift.
#### Recommended Path
Use Option A. Also remove duplicate risk-budget client logic so risk and governance do not continue to drift independently.
### 2.2 Unknowns Dashboard (`/security/unknowns`)
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded `404`s for scanner unknowns.
#### Confirmed Technical Cause
Confidence: High
Current code and gateway config show:
- Web client calls `/api/v1/scanner/unknowns`.
- Scanner service exposes `/api/v1/unknowns`.
- Local gateway has routes for scans, scan policies, vulnerabilities, triage, secrets, sources, and witnesses, but no route for unknowns.
Relevant files:
- `src/Web/StellaOps.Web/src/app/core/api/unknowns.client.ts`
- `src/Scanner/StellaOps.Scanner.WebService/Endpoints/UnknownsEndpoints.cs`
- `src/Scanner/StellaOps.Scanner.WebService/Program.cs`
- `devops/compose/router-gateway-local.json`
#### Impact
- The page is not just "empty"; it is structurally unreachable.
- The product advertises unknown-component handling but the default local gateway cannot reach it.
#### Solution Options
Option A - Standardize on `/api/v1/unknowns`
- Change the web client to call `/api/v1/unknowns`.
- Add the missing gateway proxy route.
Option B - Preserve `/api/v1/scanner/unknowns` as a public alias
- Add explicit aliasing in gateway or service.
- Deprecate one path after clients converge.
#### Recommended Path
Use Option A and keep a short-lived alias only if other clients already depend on the prefixed form.
### 2.3 Console Admin Loading States
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded:
- `Loading tenants...`
- similar non-settling loading states on clients and tokens
#### Confirmed Technical Cause
Confidence: Medium to High
There are two strong clues:
- The web service uses `/console/admin` directly.
- Local gateway config only contains an explicit reverse-proxy for `/authority/console -> /console`.
- Existing e2e tests intercept `**/authority/console/tenants**`, not `**/console/admin/tenants**`.
Relevant files:
- `src/Web/StellaOps.Web/src/app/features/console-admin/services/console-admin-api.service.ts`
- `src/Authority/StellaOps.Authority/StellaOps.Authority/Console/Admin/ConsoleAdminEndpointExtensions.cs`
- `devops/compose/router-gateway-local.json`
- `src/Web/StellaOps.Web/tests/e2e/*cutover.spec.ts`
OpenAPI snapshots do advertise `/console/admin/*`, so there is evidence of intended support. But the local router config and test interception pattern point to path inconsistency.
#### Impact
- Admin evaluation feels broken in exactly the place where platform teams expect reliability.
- Because the UI stays in loading, the failure reads as app hang rather than scoped or contractual failure.
#### Solution Options
Option A - Make `/authority/console/*` the canonical browser-facing route and update the web client.
Option B - Add an explicit `/console/admin/*` gateway proxy and keep the current client.
Option C - Support both, but add a smoke test that proves the path used by the web client really resolves in local compose.
#### Recommended Path
Pick one public browser contract and make tests, router config, and web client all use it. Right now all three disagree.
## Cluster 3 - Scope Policy And UI Visibility Mismatch
### Problem
The UI sometimes exposes pages that the backend will not authorize for the default admin session.
### Identity Providers (`/setup/identity-providers`)
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded a visible `403`.
#### Confirmed Technical Cause
Confidence: High
Current code shows:
- Backend endpoint requires `PlatformPolicies.IdentityProviderAdmin`.
- That policy maps to scope `platform.idp.admin`.
- Default web config scopes do not include `platform.idp.admin`.
- Navigation/route visibility appears to rely on broad admin scopes like `ui.admin`, so the page shell remains discoverable.
Relevant files:
- `src/Platform/StellaOps.Platform.WebService/Endpoints/IdentityProviderEndpoints.cs`
- `src/Platform/StellaOps.Platform.WebService/Constants/PlatformScopes.cs`
- `src/Platform/StellaOps.Platform.WebService/Program.cs`
- `src/Web/StellaOps.Web/src/config/config.json`
- `src/Web/StellaOps.Web/src/app/core/navigation/navigation.config.ts`
#### Impact
- First-time setup contains a dead-end in a core platform-admin path.
- The product teaches the wrong lesson: "admin can see it but not use it."
#### Solution Options
Option A - Add `platform.idp.admin` to the default admin client/profile
- Best if Identity Providers are intended to be available in the standard setup flow.
Option B - Keep the stricter scope and gate the route properly
- Hide or downgrade the page unless the token actually carries `platform.idp.admin`.
- Route missing-scope users to the existing insufficient-permissions UX instead of a raw API failure.
Option C - Collapse the scope into an existing admin capability
- Only if the product decision is that Identity Provider administration should not require a distinct scope.
#### Recommended Path
First decide whether Identity Provider management is a standard admin capability or a separately delegated one. Then align token issuance, route guards, and backend policy with that decision.
## Cluster 4 - Audit And Evidence Trust Fragmentation
### Problem
The "audit log is empty" finding is not one bug. It is a stack of incompatible audit paths:
- some services do not emit unified audit events
- the unified aggregator only polls a subset of modules
- the UI advertises more modules than the aggregator supports
- at least one aggregated source is mislabeled
### Observed Symptoms
- `JOURNEY_NOTES_20260316.md` reported 0 audit events after real actions such as creating integrations and sealing a release.
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` repeated the same trust complaint after much broader action testing.
### Confirmed Technical Causes
Confidence: High
#### 4.1 The unified audit aggregator only covers part of the advertised surface
Current Timeline configuration includes only:
- JobEngine
- Policy
- EvidenceLocker
- Notify
It does not configure sources for:
- Authority
- Scanner
- VEX
- Attestor
- Scheduler
Yet the web audit UI advertises those modules.
Relevant files:
- `src/Timeline/StellaOps.Timeline.WebService/Audit/UnifiedAuditContracts.cs`
- `src/Timeline/StellaOps.Timeline.WebService/Program.cs`
- `src/Web/StellaOps.Web/src/app/core/api/audit-log.client.ts`
#### 4.2 The aggregator maps Notify events as `integrations`
`HttpUnifiedAuditEventProvider` fetches `/api/v1/notify/audit` but assigns `Module = "integrations"`.
Relevant file:
- `src/Timeline/StellaOps.Timeline.WebService/Audit/HttpUnifiedAuditEventProvider.cs`
#### 4.3 Authority audit is stored separately
Authority writes auth/admin events through `AuthorityAuditSink` into its own login-attempt store. That is not the same as unified audit ingestion.
Relevant files:
- `src/Authority/StellaOps.Authority/StellaOps.Authority/Audit/AuthorityAuditSink.cs`
- `src/Authority/StellaOps.Authority/StellaOps.Authority/Console/Admin/ConsoleAdminEndpointExtensions.cs`
#### 4.4 Most modules do not appear to use the unified audit emission helpers
Repo search found little to no integration of:
- `AddAuditEmission`
- `AuditActionFilter`
- `AuditActionAttribute`
across the platform/release/scanner/integrations paths that the journeys exercised.
### Impact
- Evidence trust is undermined at the exact moment Stella is trying to differentiate itself on auditability.
- Operators cannot tell whether "0 events" means no activity, no ingestion, or no cross-module support.
### Solution Options
Option A - Make the UI honest immediately
- If unified audit is partial, say so in the dashboard.
- Only show module chips that are actually aggregated.
- Mark unsupported modules as "not connected" instead of "0 events".
Option B - Expand Timeline aggregation to match the UI
- Add Authority, Scanner, VEX, Attestor, Scheduler, and real Integrations sources.
- Fix the Notify -> integrations labeling bug.
Option C - Standardize write-side emission
- Adopt one required audit-emission integration pattern for mutating endpoints.
- Add end-to-end tests: perform action -> event appears in `/api/v1/audit/events`.
Option D - Collapse toward centralized ingest
- Instead of wide polling, push all audit events into the unified audit ingest path and make module polling a temporary compatibility layer.
### Recommended Path
Do A and B first, then C. Right now the product is over-promising the audit surface. Honesty and coverage must come before UX polish.
## Cluster 5 - Context Integrity And Shell State Reliability
### 5.1 Stage Selector Is A Real State Defect
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded that selecting `Prod` left the shell at `STAGE All`.
#### Confirmed Technical Cause
Confidence: High
Current code shows:
- Web context store tracks `stage` in query and local state.
- `buildPreferencesPayload()` does not include `stage`.
- Platform context request/response contracts also omit `stage`.
- Backend preference storage therefore cannot persist it.
Relevant files:
- `src/Web/StellaOps.Web/src/app/core/context/platform-context.store.ts`
- `src/Platform/StellaOps.Platform.WebService/Contracts/ContextModels.cs`
- `src/Platform/StellaOps.Platform.WebService/Services/PlatformContextService.cs`
- `src/Platform/StellaOps.Platform.WebService/Endpoints/ContextEndpoints.cs`
#### Solution Options
Option A - Make stage a real persisted context dimension across web, contracts, storage, and APIs.
Option B - Remove or hide stage from the shell until persistence is implemented.
#### Recommended Path
Option A if stage is part of the intended operating model. Option B if stage is still speculative. The current middle state is deceptive.
### 5.2 Region Selector Semantics Are Under-Explained
#### Observed Symptoms
- The journey reports observed that region behaves like a toggleable set, not a simple picker.
#### Confirmed Technical Cause
Confidence: High for semantics, Medium for UX severity
Current web code shows region is intentionally multi-select. The main issue is that the control summary reads like a single scope selector, not a multi-select control.
Relevant files:
- `src/Web/StellaOps.Web/src/app/layout/context-chips/context-chips.component.ts`
- `src/Web/StellaOps.Web/src/app/core/context/platform-context.store.ts`
#### Recommended Path
- Rename the control label to make multi-select explicit.
- Add "Select all" and "Clear" affordances.
- Reflect selection as chips or count plus checkmarks, not a single picker summary.
### 5.3 Deep-Link Scope Hydration Is Inconsistent
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` recorded routes that shed tenant/region scope in URL or topbar.
- Docs, evidence, admin, and some release/promotion routes behaved inconsistently.
#### Technical Cause
Confidence: Medium
One hard defect exists already: stage persistence is incomplete. Beyond that, the route family appears to mix:
- canonical routes that merge query params
- routes/components that rehydrate shell state from incomplete preferences
- routes that are effectively global but do not say so
This cluster needs focused implementation review, but it is already large enough to justify a dedicated "context integrity" sprint.
#### Recommended Path
- Add route-family tests for query preservation across Evidence, Docs, Admin, Release, and Triage.
- Standardize navigation helpers around one context-preserving pattern.
- If a route is intentionally global, show that explicitly instead of silently resetting the shell.
## Cluster 6 - Adoption Story, Discoverability, And Vocabulary
### 6.1 Welcome Page Explains The Brand Better Than The Product
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` concluded that `/docs` explains Stella better than `/welcome` or the dashboard.
#### Confirmed Technical Cause
Confidence: High
Current welcome code is almost entirely presentation-driven:
- headline and brand emphasis
- "Release Control Plane" tagline
- chips like `Encrypted`, `Identity`, `Pipeline`
- sign-in plus docs link
It does not explain:
- what Stella replaces from existing CI/CD scripts
- what stays in CI
- what the first concrete operator action should be
Relevant file:
- `src/Web/StellaOps.Web/src/app/features/welcome/welcome-page.component.ts`
#### Recommended Path
Rewrite `/welcome` around the operator adoption model:
- Connect a registry
- Scan a real artifact
- Create a governed release
- Request promotion and export evidence
The welcome page should answer "what do I stop scripting myself if I adopt Stella?" within 20 seconds.
### 6.2 Vocabulary Still Fights The Security Persona
#### Observed Symptoms
- `DEEP_JOURNEY_UX_FINDINGS_20260316.md` argued that "Triage" is the wrong top-level term for a security engineer looking for vulnerabilities or findings.
#### Confirmed Technical Cause
Confidence: High
Current navigation still exposes a top-level `Triage` group and labels the core workspace `Artifact Workspace`.
Relevant file:
- `src/Web/StellaOps.Web/src/app/core/navigation/navigation.config.ts`
#### Recommended Path
- Rename the top-level label from `Triage` to `Vulnerabilities` or `Findings`.
- Keep "triage" as a workflow concept inside the page, not as the discovery label.
### 6.3 Command Palette And Navigation Still Mislead Scan Discovery
#### Observed Symptoms
- `DEEP_JOURNEY_UX_FINDINGS_20260316.md` reported zero results when searching `scan`.
#### Confirmed Technical Cause
Confidence: High
Current command-palette behavior explains the report:
- Quick actions only appear in "action mode" when query starts with `>`.
- Plain `scan` triggers the search backend, not quick actions.
- Quick actions exist, but some scan-related actions route to triage rather than the actual `/security/scan` page.
Relevant files:
- `src/Web/StellaOps.Web/src/app/shared/components/command-palette/command-palette.component.ts`
- `src/Web/StellaOps.Web/src/app/core/api/search.models.ts`
There is also evidence that scan discoverability improved since the older report:
- current sidebar code contains a `Scan Image` entry
- current security posture page includes a `Scan an Image` CTA
Relevant files:
- `src/Web/StellaOps.Web/src/app/layout/app-sidebar/app-sidebar.component.ts`
- `src/Web/StellaOps.Web/src/app/features/security-risk/security-risk-overview.component.ts`
#### Recommended Path
- Make plain `scan` return route/search results even without `>`.
- Align all scan quick actions to `/security/scan`.
- Keep the sidebar and page CTA, but make command-palette behavior consistent with user expectations.
## Cluster 7 - Workflow Honesty And Trust Gaps
### 7.1 Security Posture Can Silently Convert Failures Into Zeros
#### Observed Symptoms
- `DEEP_JOURNEY_UX_FINDINGS_20260316.md` reported posture zeros while triage had data.
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` repeatedly called out mixed seeded vs real trust problems.
#### Confirmed Technical Cause
Confidence: High
`SecurityRiskOverviewComponent` catches failures on findings, disposition, and SBOM requests and replaces them with empty arrays. The page then renders counts and KPIs from those empty collections.
It only partially falls back to triage stats for a few counts.
Relevant file:
- `src/Web/StellaOps.Web/src/app/features/security-risk/security-risk-overview.component.ts`
#### Impact
- A backend outage or contract mismatch can look identical to "no findings in scope".
- This is worse than a visible error because it is trust-eroding and hard for users to detect.
#### Recommended Path
- Distinguish `no data`, `degraded data`, and `zero`.
- Surface a data-quality banner when upstream APIs fail.
- Use one shared read model for posture and triage, or make the mismatch explicit.
### 7.2 Scan Flow Does Not Explain Pending/Simulation State Early Enough
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` saw `alpine:3.19` stay `PENDING`.
#### Confirmed Technical Cause
Confidence: Medium to High
Current scan UI:
- posts to `/api/v1/scans/`
- polls every 3 seconds
- stops after 60 polls
- only then shows a "taking longer than expected" message and a JobEngine hint
Relevant file:
- `src/Web/StellaOps.Web/src/app/features/scanner/scan-submit.component.ts`
This is better than a silent spinner, but it still does not explain local-mode limitations early enough.
#### Recommended Path
- Add an immediate note when scanner engines are simulated, absent, or queue-backed in local compose.
- Show queue/worker health if available.
- If scanning is intentionally incomplete in local mode, say so before submission.
### 7.3 Sealed Release Versus Approval Queue Is A Product-Model Gap
#### Observed Symptoms
- `JOURNEY_NOTES_20260316.md` observed that sealing a release did not create an approval request.
#### Interpretation
Confidence: Medium
This looks less like a code defect and more like a workflow-model mismatch:
- release sealing creates a sealed release definition
- promotion appears to be the action that enters approval/deployment workflow
The product currently expects the user to infer that distinction.
#### Recommended Path
- Rename or reframe the release wizard end-state to make it clear it seals a release definition, not a deployment request.
- Offer a direct post-seal CTA: `Request promotion`.
- Consider a combined path: `Seal and request promotion`.
### 7.4 Quick Verify Needs Focused Re-Test, Not Blind Backlogging
#### Observed Symptoms
- `LOCAL_DEVOPS_SECURITY_JOURNEY_20260317.md` reported that the visible `Start Verification` CTA was effectively unclickable.
#### Current Technical Read
Confidence: Medium
The active `QuickVerifyDrawerComponent` is a fixed-position right drawer with internal scroll and a footer button. Code review did not expose a simple obvious CSS mistake.
Relevant file:
- `src/Web/StellaOps.Web/src/app/shared/components/quick-verify-drawer/quick-verify-drawer.component.ts`
This should be handled as:
- one targeted reproducible UI bug
- plus a browser test that proves the footer CTA is clickable on supported viewports
not as a broad redesign request.
## Findings That Now Look Stale Or Partially Fixed
These should be re-tested before they become backlog items.
### Record Decision Dialog Viewport Issue
Status: Low confidence as an active defect
Reason:
- The current `DecisionDrawerComponent` is implemented as a fixed-position overlay modal with bounded height and scrollable body.
- That does not match the older report description of an in-page drawer stuck outside the viewport.
Relevant file:
- `src/Web/StellaOps.Web/src/app/features/triage/components/decision-drawer/decision-drawer.component.ts`
Action:
- Re-test in the current build before creating work.
### Scan Entry Point Completely Missing
Status: Low confidence as a current blanket statement
Reason:
- Current sidebar includes `Scan Image`.
- Current security posture page includes `Scan an Image`.
- The deeper remaining problem is inconsistent discoverability, not total absence.
Relevant files:
- `src/Web/StellaOps.Web/src/app/layout/app-sidebar/app-sidebar.component.ts`
- `src/Web/StellaOps.Web/src/app/features/security-risk/security-risk-overview.component.ts`
Action:
- Reword backlog items from "no scan entry point" to "scan discovery is inconsistent and command/search behavior is misleading."
## Recommended Backlog Shape
### P0 - Must Land Before Another Serious Local Journey
- Repair VexHub migration mismatch and add fresh-db startup test.
- Fix `/security/risk` client routing to canonical risk endpoints.
- Fix `/security/unknowns` path contract and add missing gateway route.
- Align Identity Providers visibility with actual required scope.
- Resolve console-admin public path contract in the web app and local gateway.
### P1 - Trust Foundation
- Make unified audit honest about actual coverage.
- Expand Timeline aggregation to the modules the UI advertises.
- Standardize audit emission on high-value mutating paths.
- Prevent security posture from silently converting backend failures into zero posture.
### P1 - Shell Integrity
- Implement real persisted stage context or remove the control.
- Normalize context preservation across route families.
- Make region multi-select semantics explicit.
### P2 - Adoption And Workflow Clarity
- Rewrite `/welcome` around the operator adoption model.
- Rename `Triage` to an operator-facing term.
- Make command-palette `scan` behavior work without hidden `>` syntax.
- Explain scan local-mode limitations before the user waits three minutes.
- Clarify release sealing vs promotion request in the release flow.
## Suggested Owner Split
- VexHub team: clean-start/migration repair
- Web + Router team: risk, unknowns, console-admin, command palette, context preservation
- Platform team: identity-provider scope decision and context preference contracts
- Timeline/Audit team: unified coverage, module mapping, emission standard
- Product/UX team: welcome copy, vocabulary, workflow framing, data-honesty patterns
## Final Planning Notes
The biggest planning mistake would be to treat these findings as many unrelated UI bugs.
Most of the journey friction comes from five root themes:
- startup contamination
- route contract drift
- scope/visibility misalignment
- incomplete audit plumbing
- product explanation living in docs instead of in product
If those five themes are addressed, a large fraction of the remaining UX issues will either disappear or become much easier to prioritize accurately.

View File

@@ -0,0 +1,752 @@
# Local DevOps/Security Journey - 2026-03-17
**Perspective**: DevOps or security engineer evaluating Stella Ops as a replacement for custom CI/CD scripts and partial deployment-security tooling.
**Method**
- Fresh local compose runtime using `devops/compose/docker-compose.stella-ops.yml`
- Browser-driven walkthrough with local Playwright automation and screenshot capture
- Minimal document pre-read; product understanding driven primarily by the running system and only enough local documentation to frame the intended architecture
**Environment**
- Date: 2026-03-17
- Host: local developer machine
- Login used: `admin / Admin@Stella2026!`
- Screenshot artifacts: `output/playwright/`
---
## Baseline Findings Recorded Before Deeper Pass
### Runtime readiness
- The product shell is reachable at `https://stella-ops.local`.
- The gateway does not reach full ready state on fresh startup:
- `/health` returns `ready:false`
- missing microservice: `vexhub`
- `stellaops-vexhub-web` crashes on startup migration with:
- `relation "vexhub.vex_sources" does not exist`
- This contaminates VEX-dependent surfaces and means the first-run product experience is already partially broken before a user begins meaningful work.
### Welcome and first-run positioning
- Root navigation lands on `/welcome`.
- The welcome page is visually sparse and brand-heavy:
- `RELEASE CONTROL PLANE`
- `ENCRYPTED IDENTITY PIPELINE`
- `Sign In`
- It does not explain:
- what Stella replaces in an existing pipeline
- what the first operator action should be
- how a user with Docker/Compose or script-based delivery should approach adoption
### First authenticated landing
- Successful login lands on `/mission-control/board`.
- The dashboard shell is broad and polished, but the first impression is mixed:
- strong top-level information architecture
- unclear relationship between demo/seed data and real operational state
- limited guidance on the next concrete action for a fresh install
### First-round product observations
- `/setup` is the strongest onboarding surface found so far because it groups the right domains and exposes a first-time path.
- `/setup/integrations` is one of the clearest product areas because it explains setup order in operator language.
- `/setup/integrations/registries` has a concrete onboarding wizard with useful hints such as Harbor health probing and `AuthRef` secret indirection.
- `/security/scan` accepts an image submission (`alpine:3.19`) and shows a scan ID, but the scan remains `PENDING` without enough user-facing explanation.
- `/ops/policy/vex` renders a real user-facing failure:
- `VEX Hub error: 503`
- `/docs` explains Stella's value proposition better than `/welcome` or the dashboard.
### Initial product-shaping concerns
- Internal Stella terminology appears before user value is established.
- Demo-like metrics are mixed with honest empty-state data.
- Some surfaces feel operationally credible; others feel seeded or disconnected.
- The product is easier to understand once inside Setup, Integrations, Release, Evidence, and Docs than it is from the initial welcome and dashboard journey.
---
## Investigation Status
- Baseline recorded.
- Deeper route and workflow pass completed for this session.
- Primary breadth artifact: `output/playwright/route-survey-20260317.json`
- Issue screenshots: `output/playwright/deep-route-survey/`
- Focused flow screenshots: `output/playwright/deep-flows/`
- Action-pack artifacts:
- `output/playwright/action-packs/setup-admin-actions-20260317.json`
- `output/playwright/action-packs/security-release-actions-20260317.json`
- `output/playwright/action-packs/evidence-ops-admin-actions-20260317.json`
- `output/playwright/action-packs/integrations-release-shell-actions-20260317.json`
- `output/playwright/action-packs/topology-policy-actions-20260317.json`
- `output/playwright/action-packs/focused-run-actions-20260317.json`
- `output/playwright/action-packs/micro-run-actions-20260317.json`
---
## Route Survey Summary
### Coverage
- 88 authenticated routes surveyed in a single logged-in browser session.
- Route families covered:
- Mission Control
- Setup and Admin
- Integrations
- Topology
- Security
- Triage
- Releases
- Policy
- Evidence
- Operations
- Console Admin
- User Preferences
- Docs
- 403 and 404 recovery pages
### Survey outcomes
- 88 routes loaded without browser-level navigation exceptions.
- 16 routes mutated URL after initial render or otherwise failed the simple stability check.
- 4 routes returned hard HTTP errors from product APIs during normal page load:
- `/setup/identity-providers` -> 403
- `/security/risk` -> multiple 404s
- `/security/unknowns` -> 404
- `/ops/policy/vex` -> 503
- Several additional routes completed render while background requests aborted or context requests dropped.
### Important interpretation note
- The route survey used one authenticated browser session because replaying saved auth state did not reliably restore protected-route access.
- When a route looks broken in this note, it reflects either:
- a product/runtime defect confirmed by response codes or clear user-facing error text
- or a UX/data credibility issue visible to a fresh user
- The `vexhub` startup failure remains an environment contaminant for all VEX-backed flows.
---
## What The Product Teaches Well
### Setup is the strongest self-learning area
- `/setup` does the best job of explaining the system in operator-facing chunks:
- identity
- trust
- integrations
- topology
- notifications
- usage
- system settings
- The first-time path is visible and sensible.
- The page explains that setup is a journey, not just a card wall.
### Integrations are close to adoptable
- `/setup/integrations` explains setup order well.
- `/setup/integrations/registries` is particularly strong:
- clear 6-step flow
- concrete hints for Harbor
- good secret-handling guidance via `AuthRef`
- scoped, practical inputs instead of vague platform terminology
- This is one of the few areas that already feels like it could replace bespoke onboarding docs for a real platform team.
### System Settings and Security Data are understandable
- `/setup/system` correctly frames itself as a handoff surface into health, doctor, SLO, and jobs instead of pretending to be the health system itself.
- `/setup/security-data` is concrete and operator-legible:
- advisory sources
- mirror
- Trivy DB
- version locks
- These pages are examples of Stella working well when it uses plain task language.
### Release, evidence, and operational shells are structurally good
- `/releases/promotions/create` clearly communicates the promotion model:
- identity
- target
- inputs
- gates
- approvals
- launch
- `/evidence/verify-replay` is one of the clearest evidence surfaces because it shows:
- request replay
- replay list
- quick verify
- determinism framing
- `/ops/operations/doctor` has good information scent even without interaction depth in this pass.
- `/settings/user-preferences` is polished and easy to understand.
### Docs explain Stella better than the product entrypoint
- `/docs` immediately explains the intended buyer/operator:
- non-Kubernetes container estates
- Compose/hosts/VMs/scripted automation
- security + evidence + release governance
- This page explains Stella's value proposition more effectively than `/welcome` or the dashboard.
---
## Core Product Problems Found In The Deeper Pass
### 1. Startup/readiness defects contaminate the first-run story
- The local runtime never reached clean ready state because `vexhub` failed startup migration.
- The product shell remains usable enough to explore, but the user is already inside a partially broken system.
- This matters because Stella's value proposition is operational trust, and the first experience visibly breaks that promise.
### 2. The old route inventory is not reliable enough to be a learning aid
- One of the old-path routes, `/ops/platform-setup/topology-wizard`, bounced to `/welcome` in a fresh authenticated session.
- That confirms the user's instruction to treat prior route notes as referential only.
- A fresh operator cannot rely on hidden route knowledge; the product must teach the path from inside the UI.
### 3. Context propagation is inconsistent across the product
- Several routes drop `tenant` and `regions` from the URL entirely:
- `/triage/artifacts`
- `/evidence/verify-replay`
- `/evidence/exports`
- `/ops/operations/notifications`
- `/ops/operations/dead-letter`
- all `/console-admin/*`
- `/settings/user-preferences`
- `/docs`
- Some of those routes also show topbar context drift:
- `REGION All regions`
- `ENV No env defined yet`
- `Policy: No baseline`
- Other pages in the same journey still show:
- `TENANT Demo Production`
- `REGION 4 regions`
- `ENV All environments`
- `Policy: Core Policy Pack latest`
- To a fresh user, this makes the platform feel internally inconsistent even when the page itself is not obviously broken.
### 4. Demo/seed behavior still leaks into critical operator judgments
- The product mixes strong, honest empty states with seeded operational-looking content.
- Example surfaces that look seeded or at least not grounded in this actual local session:
- `/security/vulnerabilities` shows named CVEs with reachability/exceptions
- `/releases/deployments` shows realistic deployment history rows
- `/mission-control/alerts` and `/mission-control/activity` show plausible but unproven operational narratives
- `/ops/operations/signals` shows five events and a 60% error rate
- Example surfaces that are honest-but-empty or clearly disconnected:
- scans stay `PENDING`
- audit log remains `0`
- decision capsules say create a release first
- The inconsistency is the trust problem, not the existence of fixtures by itself.
### 5. Some pages are clearly broken in user-visible ways
#### `/setup/identity-providers`
- User-facing message:
- `Failed to load providers`
- `Identity Provider API error: Unknown error`
- Under the hood this is a 403 from `/api/v1/platform/identity-providers`.
- This is exactly the kind of page a new admin would try early in setup.
#### `/security/risk`
- Multiple 404s during page load:
- `/authority/risk/risk/status`
- `/authority/risk/risk`
- `/api/gate/verdict`
- `/api/risk-budget/kpis`
- `/api/risk-budget/snapshot`
- This is not a subtle UX issue; it looks like an unimplemented or miswired feature surface.
#### `/security/unknowns`
- Still broken with scanner unknowns API 404.
- Fresh users see a real failure instead of a usable explanation or disabled state.
#### `/ops/policy/vex`
- Visible `503` because VEX Hub is down.
- The UI shows the failure, which is honest, but the product path is still broken.
#### `/console-admin/tenants`, `/console-admin/clients`, `/console-admin/tokens`
- These routes render headings and then remain in indefinite loading states:
- `Loading tenants...`
- `Loading OAuth2 clients...`
- `Loading tokens...`
- That makes the whole Console Admin area feel half-connected.
### 6. Report and dashboard surfaces still blur posture, reporting, and action
- `/security/reports` exposes `Export CSV` and `Generate PDF`, which is better than the earlier first pass suggested, but the page still behaves like a security-posture workspace more than a true report center.
- `/security/reports` and `/security` share very similar content structure and headings.
- For a fresh user, the distinction between:
- posture
- report generation
- evidence export
is still not as crisp as it should be.
### 7. The scan flow still does not close the loop
- Submitting `alpine:3.19` creates a scan ID and moves to a progress view.
- The resulting status remained `PENDING` during this session.
- The page does not adequately explain whether:
- the engine is intentionally simulated
- the job is queued
- or the scan path is not fully wired in local compose mode
- A technically competent evaluator reads this as "stuck" unless explicitly told otherwise.
### 8. Audit/evidence trust is undermined by a silent emptiness gap
- `/evidence/audit-log` still reports `0 Total Events (7d)`.
- The page even lists the kinds of actions that should appear automatically:
- release seals
- promotions
- approvals
- policy changes
- VEX decisions
- integration changes
- Given that the local product was actively used during this session, the empty audit log feels suspect, not reassuring.
---
## Route Family Notes
### Mission Control
- `/mission-control/board`
- Broad shell is strong.
- Still suffers from trust problems because its numbers and status cards are not obviously tied to anything the user has just done.
- `/mission-control/alerts`
- Reads like a plausible operational summary.
- Good copy, but not clearly tied to real local state.
- `/mission-control/activity`
- Nicely grouped by release runs, evidence, and audit.
- Again feels more like a prepared story than an earned activity stream.
### Setup And Admin
- `/setup`
- Best first-time self-learning surface.
- `/setup/identity-access`
- Useful and understandable.
- Good explanation of least privilege and canonical clients/tokens.
- `/setup/trust-signing`
- Better than expected in this pass.
- Not empty; shows a meaningful overview with key counts and tabs.
- `/setup/notifications`
- Strong distinction between setup-time configuration and operations-time runtime monitoring.
- `/setup/system`
- Good "handoff, not verdict" framing.
- `/setup/security-data`
- Good operator language and strong task grouping.
- `/setup/identity-providers`
- Broken by 403.
### Integrations
- `/setup/integrations`
- Strong overview and order guidance.
- URL mutated after initial load; likely context injection, not a blocker by itself.
- `/setup/integrations/registries`
- Wizard remains one of the strongest product flows.
- `/setup/integrations/runtime-hosts`
- Late URL mutation, but page renders.
- `/setup/integrations/advisory-vex-sources`
- Loaded with grouped categories and looked materially populated.
- Late URL mutation observed.
### Topology
- `/setup/topology/overview`
- Good information structure but it mutates URL after initial load.
- Content still leans heavily on degraded/unknown status without enough explanation.
- `/setup/topology/map`, `/setup/topology/targets`, `/setup/topology/agents`
- Similar late URL mutation pattern.
- Topology as a concept is present, but a fresh user still does not get a crisp "here is how you model your current Docker/host estate" guided path from the normal navigation alone.
### Security And Triage
- `/security`
- Strong shell, but still fixture-heavy or trust-ambiguous.
- `/security/findings`
- Weak compared with the rest of Security.
- No clear heading in the route survey and the surface starts from a "no baseline recommendations" framing, which is not a first-time user's mental model.
- `/security/vulnerabilities`
- Richer, but clearly showing seeded-looking CVE content.
- `/security/risk`
- Broken.
- `/security/unknowns`
- Broken.
- `/security/disposition`
- Loaded in the route survey; not broken in that pass.
- `/triage/artifacts`
- Still a compelling concept, but it drops URL context and therefore feels slightly detached from the rest of the scoped application shell.
### Releases
- `/releases/versions`
- Good catalog framing.
- `/releases/runs`
- Good concept and filters.
- `/releases/deployments`
- Looks operationally real but likely fixture-backed.
- `/releases/approvals`
- Honest empty state.
- `/releases/promotions`
- Honest and well explained.
- `/releases/promotions/create`
- Strong conceptual wizard.
- `/releases/bundles`
- Good explanation of digest-first identity and validation gates.
- Release management as a category is one of Stella's clearest strengths.
### Policy
- `/ops/policy/overview`
- Still a strong shell.
- `/ops/policy/governance` and `/ops/policy/simulation`
- Loaded cleanly and present a serious operator-facing control plane.
- `/ops/policy/vex`
- Broken by upstream runtime failure.
- Policy pages also show context drift through the topbar's `Policy: No baseline`, which clashes with other shells showing `Core Policy Pack latest`.
### Evidence
- `/evidence/overview`
- Strong overview but context drift is visible.
- `/evidence/capsules`
- Honest empty state: create a release first.
- `/evidence/verify-replay`
- Strong concept surface but drops URL context.
- `/evidence/exports`
- Useful structure, but also drops URL context and shows aborted background requests.
- `/evidence/audit-log`
- Well explained, but the all-zero event story damages trust.
### Operations
- `/ops/operations`
- Good summary shell.
- `/ops/operations/doctor`
- Promising and informative.
- `/ops/operations/signals`
- Reads like realistic ops data but may be seeded; trust ambiguity remains.
- `/ops/operations/feeds-airgap`
- Good operator language and healthy-looking structure.
- `/ops/operations/notifications`
- Strong separation from Setup Notifications, but it drops tenant/region context in the URL and topbar.
- `/ops/operations/data-integrity`
- Interesting concept, but title duplication (`Data Integrity - StellaOps - StellaOps`) and aborted background requests make it feel less finished.
- `/ops/operations/health-slo`
- Weak snapshot in this environment: no services available despite a clearly running system.
- `/ops/operations/dead-letter`
- Functional shell, but context drift is again visible.
### Console Admin, User Preferences, Docs, Recovery Pages
- `/console-admin/*`
- Mixed quality.
- Branding works best.
- Tenants/clients/tokens feel incompletely wired.
- `/settings/user-preferences`
- One of the cleanest and most polished pages in the product.
- `/docs`
- Best explanation of Stella's purpose.
- `/console/profile`
- Good 403 page with sensible recovery links.
- `/nonexistent`
- Good 404 page with recovery links.
---
## Action-Level Findings
### Shared shell controls are more real than the first pass suggested, but not always legible
- Global search behaves more like AI or knowledge retrieval than command navigation. Searching `scan` from User Preferences returned:
- `Best answer`
- `Related Questions`
- `Ask AI`
- knowledge snippets and endpoint references
- That behavior is useful, but it is a surprise if the user expects `Ctrl+K` or the shell search box to act like a route launcher.
- The region control is not a simple single-select picker. Toggling `EU West` from `REGION 4 regions` resulted in:
- URL change to `regions=apac,us-east,us-west`
- topbar change to `REGION 3 regions`
- The control appears to behave as a multi-select toggle, but the shell does not explain that model clearly.
- The window control works well. Selecting `7d` updated the topbar to `WINDOW 7d` and added `timeWindow=7d` to the URL.
- The stage control exposes `All / Dev / Stage / Prod`, but choosing `Prod` left the shell at `STAGE All`. This looks like a real state-application defect.
- Deep-link scope hydration is still inconsistent. In different authenticated runs:
- `/releases/versions/new` loaded with `REGION All regions` and `ENV No env defined yet`
- `/releases/promotions/create` loaded once with `REGION No regions defined`
- later in-session navigation to the same routes kept the active `3 regions / All environments / 7d` scope correctly
### Setup and admin actions reveal some of Stella's strongest product design
- Guided setup entry works and feels approachable:
- `Welcome to Stella Ops`
- autodetection steps
- simple "ready" state
- The weak point is that the autodetection copy is theatrical rather than informative. It does not tell the operator what was actually detected or what it means for next steps.
- Identity & Access is one of the strongest action surfaces in the product:
- `Users` explains enrollment flow clearly
- `Add User` opens an inline create form
- `Roles` exposes built-in permissions in a way an operator can actually inspect
- `OAuth Clients` and `API Tokens` explicitly explain that setup is read-only here and point the user to the canonical admin routes
- `Tenants` shows a real, editable tenant row with suspend and branding actions
- Trust & Signing improves once actions are tried:
- `Watchlist` opens a real workspace with tuning guidance, noisy-rule framing, and a return path to notifications
- `Audit Log` opens and shows example trust events
- The weaker part of Trust & Signing is discoverability inside the overview. `Signing Keys`, `Trusted Issuers`, and `Certificates` did not obviously move the user out of the overview in action testing, while `Watchlist` and `Audit Log` clearly did.
- Setup Notifications and Ops Notifications are one of the best paired workflows in the product:
- setup clearly owns lifecycle, templates, throttles, and routing design
- ops clearly owns live delivery validation and test sends
- both pages link to each other cleanly
- Console Admin Branding is another strong page:
- practical token editing
- live preview
- safe, understandable scope
- Console Admin Tenants remains weak:
- heading loads
- `Create Tenant` button renders
- body still says `Loading tenants...`
### Evidence and operations actions split cleanly into "genuinely useful" and "still suspicious"
- Export Center is one of the strongest true action surfaces in Stella.
- `Create Profile` opens a real modal with:
- profile name
- description
- export format choices
- include/exclude content switches
- schedule options
- `Run Now` is not just decorative. It created a new completed export run and surfaced output details such as:
- `StellaBundle (OCI referrer)`
- completed run ID
- output path under `/exports/...`
- download action
- Audit Log `View All Events` works and exposes:
- module filters
- action filters
- severity filters
- date-range filters
- The trust problem remains: the event table still showed `0 events loaded` after extensive product use in this session.
- Doctor Diagnostics is one of the clearest examples of a real, valuable action path:
- `Quick Check` ran successfully
- result set was not a no-op
- output showed `7 passed / 1 warnings / 1 failed / 5 skipped / 14 total / 1.1s`
- This is important because it proves at least some cross-service operator workflows are wired and useful in local mode.
- Replay & Verify still has a real action-level problem:
- the page surfaces a visible `Start Verification` button
- both locator-based and text-based attempts found the control
- the button remained effectively unclickable because the element sat outside the usable viewport/drawer geometry
- That looks less like a missing feature and more like a layout or interaction defect in the quick-verify panel itself.
- The 403 and 404 recovery pages behave well once authenticated:
- `Go to Dashboard` from `/console/profile` returns to Mission Control
- `Go to Dashboard` from `/nonexistent` does the same
### Integrations, releases, and promotions hold up much better once actions are pushed
- The registry onboarding flow is stronger than the earlier route-only pass suggested.
- The wizard starts directly in `Connection & Credentials`, even though the stepper still shows:
- `1 Provider`
- `2 Connection`
- `3 Scope`
- `4 Schedule`
- `5 Preflight`
- `6 Review`
- That step numbering is slightly confusing because the provider choice is implicit rather than visibly completed.
- After filling:
- endpoint
- `AuthRef URI`
- project / namespace
the `Next` button enabled and advanced successfully into `Discovery Scope`.
- Discovery Scope uses a good guardrail:
- repositories / namespaces / tag patterns are explicit
- creation is blocked until at least one scope is defined
- Release creation also works once the form is driven by actual input placeholders instead of label-based automation:
- filling release name, version, and target environment advanced from `Basic Info` to `Components`
- this confirms the earlier failure was mostly selector/accessibility related, not a broken route
- Promotion creation is similarly real:
- entering a release ID
- clicking `Load Target Environments`
advanced the flow to `Select Region and Environment Path`
- That means the release and promotion shells are not just static demos; they can be driven into later steps by a fresh operator.
### Topology and policy are more operational than the initial dashboard impression
- Topology Overview -> `Open Regions & Environments` works and lands on a usable region-first inventory page.
- The topology map has working zoom and reset controls, but not consistently:
- `+` worked
- `Reset` worked
- `-` was not reliably clickable by visible text
- That suggests an accessibility or control-labeling issue, not a dead feature.
- Agent Fleet still has a control problem:
- `Groups` and `All Agents` are visible
- `All Agents` was not interactable in the action pass
- Policy studio top-level navigation works well:
- `Governance`
- `Simulation`
- `Release Gates`
- Governance sub-tabs also work and expose meaningful content:
- `Validator`
- `Playground`
- `Audit`
- The Governance page is more mature than it first appears. It exposes:
- current risk-budget utilization (`72%`)
- contributors
- alerts
- budget acknowledgements
- Simulation sub-tabs are also real:
- `Console`
- `Lint`
- `Coverage`
- `Promotion Gate`
- Several simulation actions do real work:
- `Run Lint` returned a concrete lint result set with `1 error / 1 warning / 1 info`
- `Run Tests` on Coverage did not materially change visible counters, which makes the action feel more like a refresh over seeded data
- `Check Requirements` on Promotion Gate returned a real blocked state with actionable reasons
- Policy Lint is especially strong. It produced specific issues and suggested fixes, including:
- missing explicit shadow target environment
- coverage floor below the recommended baseline
- operator override missing evidence reference
- Policy Promotion Gate is also credible:
- blocked because shadow mode is not active
- coverage passed
- lint still has a blocking issue
- security review passed
- override path exists, with required reason capture
- Policy Simulation Console can be made to run if the operator provides actual input:
- select policy pack
- select SBOM
- set environment
- After that, `Run Simulation` succeeded and returned:
- `Completed 187ms`
- `4 Total Findings`
- `1 deny`
- `3 warn`
- one changed component under diff-versus-active policy
- Release Gates is less cohesive at the action layer:
- clicking `Reachability` jumps out of the policy shell into `Security / Reachability`
- `VEX` and `Freshness` are not directly clickable from the visible gate summary
- That cross-shell jump can be justified, but it weakens the "one operator shell" promise stated on the Policy overview.
### Action realism is now easier to separate from seeded behavior
- Actions that felt genuinely live in this session:
- Doctor `Quick Check`
- Export Center `Run Now`
- registry wizard progression
- release wizard progression
- promotion target loading
- policy lint
- policy promotion requirement check
- policy simulation console run
- Actions that still feel seeded, blocked, or trust-ambiguous:
- audit log event counts
- replay verification CTA geometry
- policy coverage rerun
- parts of the dashboard and reachability data
---
## Cross-Cutting UX / Product Findings
### FRESH-01 - Stella hides its adoption story behind docs instead of leading with it
Severity: HIGH
What a new technical user needs to learn immediately:
- what part of the current CI/CD pipeline stays in CI
- what Stella takes over
- how to start if they deploy with Compose, hosts, or scripts
Where Stella explains this well:
- `/docs`
Where Stella should explain this but does not:
- `/welcome`
- first authenticated landing on `/mission-control/board`
### FRESH-02 - Context is not trustworthy enough across shells
Severity: HIGH
Problems observed:
- some routes keep tenant/region query context
- some silently drop it
- some shift topbar state from concrete scope to `All regions / No env defined yet`
- policy indicator changes between `Core Policy Pack latest` and `No baseline`
Impact:
- makes deep links and mental model of scope unreliable
- especially damaging for evidence, ops, and admin surfaces
### FRESH-03 - Broken setup/security pages appear too early in the journey
Severity: HIGH
Broken or partially broken pages a new operator is likely to visit early:
- Identity Providers
- Risk
- Unknowns
- VEX
- Console Admin Tenants
Impact:
- creates the impression of unfinished product edges before the user reaches Stella's strongest capabilities
### FRESH-04 - Seed realism is a double-edged sword
Severity: HIGH
Positive:
- avoids a dead empty product
- helps pages look complete
Negative:
- makes it hard to know what Stella actually did during this session
- blurs the difference between seeded narrative and earned evidence
### FRESH-05 - Several pages mutate URL after initial render
Severity: MEDIUM
Affected route families include:
- integrations
- topology
- security
- operations
Likely cause:
- late context/state injection or route normalization
Impact:
- makes the product feel less deterministic than its messaging promises
### FRESH-06 - Some surfaces overuse Stella-internal wording
Severity: MEDIUM
Examples:
- Decisioning Studio
- canonical shell
- digest-first identity
- mutable VEX actions
- release-control plane
These terms are defensible after onboarding, but they arrive too early and too often for a first evaluator.
---
## Prioritized Backlog From This Session
### Critical / Immediate
1. Fix `vexhub` startup migration so local runtime reaches clean ready state.
2. Repair `/security/risk` so the page is not built on 404s.
3. Repair `/security/unknowns` or replace it with a truthful disabled/empty state.
4. Repair `/setup/identity-providers` so first-time admin setup does not hit a 403 dead end.
5. Fix or intentionally hide broken `/console-admin/tenants`, `/console-admin/clients`, and `/console-admin/tokens` loading states.
### High
1. Rewrite `/welcome` around adoption outcomes, not Stella-brand phrasing.
2. Make the dashboard teach the next action explicitly for first-time users.
3. Normalize tenant/region/policy context behavior across Evidence, Ops, Admin, Docs, and Triage.
4. Make seeded/demo content visibly labeled or switch more surfaces to honest empty state.
5. Explain scan simulation or queue semantics directly on `/security/scan` while local engines are incomplete.
6. Ensure audit log reflects actual local actions or clearly explain why it does not.
7. Fix the Replay & Verify quick-verify button so the visible CTA is actually clickable.
8. Clarify whether the shell search and `Ctrl+K` are route navigation, knowledge search, or AI answer surfaces.
9. Make deep-link scope hydration consistent on release and promotion routes.
### Medium
1. Reduce late URL mutation where possible.
2. Make `/security/findings` more legible for first-time use.
3. Sharpen the distinction between Security Posture, Security Reports, and Evidence Export.
4. Reduce duplicate or inconsistent topbar state such as `Policy: No baseline` versus `Core Policy Pack latest`.
5. Clean up low-fidelity details like the duplicated title on Data Integrity.
6. Make region selection semantics explicit; the control currently behaves like a multi-select toggle but reads like a single picker.
7. Fix the stage selector so choosing `Prod` or `Stage` actually applies to shell state.
8. Make the registry wizard's implicit provider step less confusing or reflect the true current step.
9. Ensure the Agent Fleet `All Agents` view switch is keyboard- and click-accessible.
10. Decide whether policy coverage reruns are computed live or fixture-backed and label that behavior clearly.
---
## Session Verdict
Stella Ops has enough depth that the product still looks serious after a much larger investigation. The strongest areas are:
- Setup
- Integrations
- Release Management
- Evidence / Replay
- Operations Doctor
- Docs
But the first-run experience still has too many trust-breaking edges for a fresh evaluator:
- startup not clean
- critical setup/security pages broken
- context drift across shells
- scan flow not closing
- audit story not matching real activity
- ambiguity about which data is real versus seeded
This deeper pass now has action-level evidence behind it. Stella is not failing because it lacks surface area; it is failing the first-time evaluation because too many of its genuinely strong workflows are hidden behind startup defects, inconsistent scope handling, and a few key trust-breaking action gaps. When the actions do work, the strongest parts of Stella are now clear:
- registry onboarding
- release and promotion progression
- export generation
- doctor diagnostics
- policy linting and promotion-readiness checks
- policy simulation with actual result output

View File

@@ -0,0 +1,118 @@
# Stella Ops Route Map (2026-03-17)
## GROUP 1: Release Control
- `/mission-control/board` — Dashboard
- `/mission-control/alerts` — Mission Alerts
- `/mission-control/activity` — Mission Activity
- `/releases/overview` — Release Ops Overview
- `/releases/versions` — Release Versions list
- `/releases/versions/new` — Create Release wizard (4 steps)
- `/releases/versions/:versionId` — Version Detail (Tabs: Overview, Artifacts, Deployments, Security Inputs, Evidence)
- `/releases/runs` — Release Runs
- `/releases/runs/:runId/{summary|graph|timeline|critical-path|replay|evidence}` — Run Workspace tabs
- `/releases/health` — Release Health
- `/releases/deployments` — Deployment History
- `/releases/deployments/:deploymentId` — Deployment Detail
- `/releases/promotions` — Promotions list
- `/releases/promotions/create` — Create Promotion
- `/releases/promotions/:promotionId` — Promotion Detail
- `/releases/approvals` — Approvals inbox
- `/releases/approvals/:id` — Approval Detail (Tabs: Overview, Gates, Security, Reachability, Ops/Data, Evidence, Replay, History)
- `/releases/hotfixes` — Hotfixes queue
- `/releases/hotfixes/:hotfixId` — Hotfix Detail
- `/releases/bundles` — Bundle catalog
- `/releases/bundles/create` — Bundle Builder
- `/releases/bundles/:bundleId` — Bundle Detail
- `/releases/bundles/:bundleId/versions/:versionId` — Bundle Version Detail
- `/releases/investigation/timeline` — Investigation Timeline
- `/releases/investigation/deploy-diff` — Deploy Diff
- `/releases/investigation/change-trace` — Change Trace
## GROUP 2: Security
- `/triage/artifacts` — Artifact Workspace (Vulnerabilities)
- `/triage/artifacts/:artifactId` — Artifact Detail
- `/triage/audit-bundles` — Audit Bundles
- `/security` — Security Posture Overview
- `/security/triage` — Security Triage Chat
- `/security/findings` — Findings Explorer
- `/security/findings/:findingId` — Finding Detail
- `/security/advisories-vex` — Advisories & VEX
- `/security/disposition` — Disposition Center
- `/security/supply-chain-data` — Supply-Chain Data
- `/security/scan` — Scan Image
- `/security/scan-policies` — Scan Policies
- `/security/reports` — Security Reports
- `/security/vulnerabilities` — Vulnerabilities Explorer
- `/security/vulnerabilities/:cveId` — Vulnerability Detail
- `/security/scans/:scanId` — Scan Detail
- `/security/sbom` — SBOM Graph
- `/security/sbom-lake` — SBOM Lake
- `/security/risk` — Risk Overview
- `/security/reachability` — Reachability Center (Sub: coverage, witnesses, poe, gaps)
- `/security/unknowns` — Unknowns Dashboard
- `/security/unknowns/:unknownId` — Unknown Detail
- `/security/unknowns/queue/grey` — Grey Queue
- `/security/lineage` — Lineage Graph
- `/security/advisory-sources` — Advisory Sources
- `/security/patch-map` — Patch Map
- `/security/symbol-sources` — Symbol Sources
- `/security/symbol-marketplace` — Symbol Marketplace
- `/security/remediation` — Remediation Browse
- `/security/secret-detection` — Secret Detection (Sub: settings, findings, exceptions)
## GROUP 3: Operations
- `/ops/operations` — Operations Hub
- `/ops/operations/jobengine` — JobEngine Dashboard (Sub: jobs, quotas)
- `/ops/operations/scheduler` — Scheduler (Sub: runs, schedules, workers)
- `/ops/operations/signals` — Signals Runtime
- `/ops/operations/environments` — Environments
- `/ops/operations/doctor` — Diagnostics
- `/ops/operations/notifications` — Notifications
- `/ops/operations/feeds-airgap` — Feeds & Airgap
- `/ops/operations/offline-kit` — Offline Kit (Sub: dashboard, bundles, verify, jwks)
- `/ops/operations/event-stream` — Event Stream Monitor
- `/ops/operations/data-integrity` — Data Integrity (Sub: nightly-ops, feeds-freshness, scan-pipeline, reachability-ingest, integration-connectivity, dlq, slos)
- `/ops/operations/health-slo` — Health & SLO (Sub: services, incidents)
- `/ops/operations/quotas` — Quotas (Sub: tenants, throttle, alerts, forecast, reports)
- `/ops/operations/dead-letter` — Dead Letter (Sub: queue, entry detail)
- `/ops/operations/aoc` — AOC Compliance (Sub: violations, provenance, ingestion, report)
- `/ops/operations/packs` — Pack Registry
- `/ops/operations/ai-runs` — AI Runs
- `/ops/policy` — Policy Decisioning (Tabs: Overview, Packs, Governance, Simulation, VEX & Exceptions, Release Gates, Audit)
- `/ops/integrations` — Integrations Hub (Sub: registries, scm, ci, runtime-hosts, advisory-vex-sources, secrets, notifications, sbom-sources, activity, registry-admin)
- `/ops/scanner-ops` — Scanner Ops (Sub: offline-kits, baselines, settings, analyzers, performance)
## GROUP 4: Audit & Evidence
- `/evidence` — Evidence Overview
- `/evidence/threads` — Evidence Threads
- `/evidence/capsules` — Decision Capsules
- `/evidence/verify-replay` — Replay & Verify
- `/evidence/proofs` — Proof Chains
- `/evidence/exports` — Export Center (Sub: bundles, replay, proof-chains, provenance)
- `/evidence/audit-log` — Audit Log Dashboard (Sub: events, timeline, correlations, anomalies, export, policy, authority, vex, integrations)
- `/evidence/workspaces/auditor` — Auditor Workspace
- `/evidence/workspaces/developer` — Developer Workspace
## GROUP 5: Setup & Admin
- `/setup` — Setup Overview
- `/setup/topology` — Topology (Sub: overview, map, regions, environments, targets, hosts, agents, connectivity, runtime-drift, promotion-graph, workflows, gate-profiles, readiness, pending-deletions)
- `/setup/integrations` — Integrations Hub (mirrors /ops/integrations)
- `/setup/identity-access` — Identity & Access (Tabs: Users, Roles, Clients, Tokens, Tenants)
- `/setup/identity-providers` — Identity Providers
- `/setup/trust-signing` — Trust & Signing (Tabs: Overview, Keys, Issuers, Certificates, Watchlist, Audit, Airgap, Incidents, Analytics)
- `/setup/tenant-branding` — Tenant & Branding
- `/setup/system` — System Settings
- `/setup/notifications` — Notifications (Sub: rules, channels, templates, delivery, simulator, config)
- `/setup/usage` — Usage & Limits
- `/setup/security-data` — Security Data Settings
- `/setup/workflows` — Workflows
- `/setup/ai-preferences` — AI Preferences
- `/setup/configuration-pane` — Configuration Pane
- `/setup/offline` — Offline Settings
- `/console-admin` — Console Admin (Sub: tenants, users, roles, clients, tokens, audit, branding)
## Utility Routes
- `/settings/user-preferences` — User Preferences
- `/docs/**` — Documentation Viewer
- `/welcome` — Welcome Page
- `/auth/callback` — OIDC Callback