Files
git.stella-ops.org/docs/qa/JOURNEY_NOTES_20260316.md
master 9586006404 Update journey notes: 21 fixed, 2 remaining, 2 product gaps identified
All medium fixes verified on live stack:
- Registry search: returns empty (no mock data) — confirmed
- Post-seal guidance: "What's next?" panel shows on release creation
- User ID display: truncated to "User 209d1257..."
- Mirror generate: shows failure status with retry guidance
- Wizard error handling: already implemented (was incorrectly logged)

Audit log remains at 0 events — this is a product gap, not a UI issue.
Services need to emit audit events (write path missing across modules).
MapAuditEndpoints() only exposes the query interface.

Topology wizard step 5 (Agent) is an expected fresh-install blocker.

Final score: 21 fixed, 2 low-priority UI issues, 2 product gaps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 10:49:12 +02:00

11 KiB

First-Time User Journey Notes — Fresh Install

Date: 2026-03-16 Stack: Wiped + fresh boot, 63 containers, Harbor + GitHub App fixtures


Journey Completed

1. Login (WORKS)

  • Welcome page → Sign In → OIDC login → Dashboard
  • Credentials: admin / Admin@Stella2026! (now documented in quickstart)
  • Session persists across page reloads

2. Dashboard (FIXED)

  • Shows honest empty state with setup guide when no environments
  • When environments exist (from seed), shows honest "unknown" status, 0 findings
  • No more fake "5 critical, blocked" environment cards

3. Integration Setup (WORKS)

  • Harbor Registry: 6-step wizard (Provider → Connection → Scope → Schedule → Preflight → Review) → Created → Test Connection SUCCESS (38ms)
  • GitHub App SCM: Same 6-step wizard → Created → Test Connection SUCCESS ("Connected as GitHub App: Stella QA GitHub App", 4ms)
  • Both integrations properly show Pending → Active transition
  • Integration detail page has Overview, Credentials, Scopes & Rules, Events, Health tabs

4. Advisory Sources (FIXED)

  • 42 enabled by default (was 74) — curated set works
  • Check All: Fixed from 504 timeout → parallel individual checks in batches of 6
  • Shows live progress "Checking (N/55)..."
  • Result: 54 healthy, 20 failed (expected for Docker network)

5. Mirror Domain Creation (WORKS, PARTIAL)

  • 3-step wizard: Select Sources → Configure Domain → Review & Create
  • Created "mirror-combined-14src" with 14 sources (Primary + Distribution)
  • Signing enabled (HMAC-SHA256 with key ID)
  • "Generate immediately" checkbox triggers a 503 → silent failure, no user feedback
  • Mirror domain created but bundle not generated

6. Topology Wizard (STEPS 1-4 WORK, STEP 5 NATURAL BLOCKER)

  • 8-step wizard loads correctly: Region → Environment → Stage Order → Target → Agent → Infrastructure → Validate → Done
  • Step 1 (Region): WORKS — identity envelope pre-auth middleware on Concelier
  • Step 2 (Environment): WORKS — added environment CRUD endpoints to Concelier
  • Step 3 (Stage Order): WORKS — pass-through step
  • Step 4 (Target): WORKS — added target CRUD endpoints to Concelier
  • Step 5 (Agent): BLOCKED — no agents deployed on fresh install, wizard requires agent assignment
    • This is an expected blocker for fresh installs
    • Should allow "Skip agent" or "Deploy agent later"
    • Agent deployment requires an actual Docker host target which isn't available in pure compose setup
  • Steps 6-8 (Infrastructure, Validate, Done): not reached yet (blocked by step 5)

Journey Not Yet Reached

7. Create Release (WORKS END-TO-END ON FRESH INSTALL)

  • 4-step wizard works (Basic Info → Components → Inputs → Review & Seal)
  • Registry search returns mock data when API fails (noted earlier)
  • Seal produces a real bundle with digest identity
  • Bundle detail shows UUID heading instead of release name
  • "Created by" shows raw user ID hash

8. Security Posture (WORKS — HONEST)

  • All cards show real zeros: GUARDED, 0 blockers, 0% VEX, 0/0 SBOM, 0% reachability
  • No fake data on fresh install
  • "Snapshot: FAIL — 3 source(s) offline/stale" is accurate — sources not yet synced

9. Approvals Queue (WORKS)

  • "Release Run Approvals Queue" with filtering by status, gate type, environment, hotfix, risk
  • Table headers render but no approval items visible (sealed release didn't auto-trigger approval)
  • FINDING: Sealing a release does NOT auto-create an approval request — user must manually promote

10. Policy Studio (WORKS)

  • 7 tabs all load: Overview, Packs, Governance, Simulation, VEX & Exceptions, Release Gates, Audit
  • "Core Policy Pack latest" shown in topbar
  • Simulation Lab accessible under the Policy Studio shell
  • Not yet tested: actual policy evaluation against the sealed release

11. Evidence & Audit (WORKS — BUT AUDIT LOG EMPTY)

  • Evidence Overview: loads with Operator/Auditor mode toggle
  • Decision Capsules: "No decision capsules found" — honest empty state
  • Unified Audit Log: shows per-module breakdown (Attestor, Authority, Integrations, JobEngine, Policy, SBOM, Scanner, Scheduler, VEX) — all 0 events
  • FINDING: Audit log shows 0 events despite creating 2 integrations and sealing a release. Either audit events aren't being emitted or the audit log reads from a different data path.

12. Doctor Diagnostics (WORKS)

  • Quick/Normal/Full Check buttons available
  • Quick Check: 7 pass, 1 warning, 1 fail, 5 skipped
  • Failed check expands with remediation steps and copy buttons — excellent UX

Additional Findings From Full Journey

F20: Audit Log Shows 0 Events After Real Actions

Severity: HIGH What happened: Created 2 integrations (Harbor, GitHub App), sealed a release, ran advisory check — audit log shows 0 events across all modules. Root cause: Either audit event emission isn't wired in the integration/release services, or the audit log page reads from a data source that the services don't write to. Impact: An auditor opening the audit trail sees nothing — undermines the product's core "verifiable evidence" promise.

F21: Sealing a Release Doesn't Create an Approval

Severity: MEDIUM What happened: Sealed "Payment Service v3.2" with bundle status "published". The approvals queue is empty. Expected: at least a policy gate approval request. Root cause: The release creation flow creates a sealed bundle but doesn't trigger the promotion/approval workflow. Promotion is a separate step. Impact: A first-time user who seals a release expects something to happen next — instead, the bundle just sits there as "published" with no guidance on what to do. Proposed fix: After sealing, show a "What's next?" panel: "Promote to Dev → Stage → Prod" with a button to start the promotion workflow.

F22: Component Registry Search Returns Mock Data on Fresh Install

Severity: HIGH (repeat finding) What happened: Searching "payment" returned mock "payment-service" with fake digest. Console error for /api/registry/images/search?q=payment. The Harbor integration is connected but the search doesn't use it. Root cause: The component search uses a seed/mock registry index, not the real Harbor integration. Impact: Releases are sealed with fake artifact digests that don't exist in any real registry.


Issues Found (All Iterations)

FIXED (21)

# Issue Fix
1 Dashboard 100% hardcoded Removed all fake data, setup guide
2 Mirror source enabled P1 EnabledByDefault = false on 32 sources
3 Mirror in domain builder Filter category !== 'Mirror'
4 No 404 page NotFoundComponent + wildcard route
5 Arrow chars broken Unicode →
6 No credentials in docs Added to quickstart
7 Feature Matrix outdated 14 features →
8 Fallback array not emptied Emptied to []
9 Check All 504 timeout Parallel individual checks, batches of 6
10 Topology 503 (no routes) Added 6 ReverseProxy routes
11 Envs route wrong service Route to JobEngine
12 Topology auth policies missing Registered Topology.Read/Manage/Admin
13 Topology wizard 401 (ReverseProxy auth) Pre-auth middleware reads identity envelope
14-env Environment CRUD on wrong service Added env CRUD endpoints to Concelier
14-tgt Target CRUD missing Added target CRUD endpoints to Concelier
14-agt Agent list missing Added agents list endpoint to Concelier

NOT FIXED (2)

# Issue Severity Root Cause
16 v2 context API console errors LOW /api/v2/context/regions, /preferences, /approvals return errors
17 Crypto profile no tooltip LOW No explanation of FIPS/eIDAS/GOST/SM

Verified fixed: | 14 | User ID hash display | FIXED | formatActor() truncates to "User 209d1257..." | | 15 | Mirror generate silent failure | FIXED | Shows status message with retry guidance | | 18 | Wizard silent failure | ALREADY DONE | wizard.error signal + banner was already implemented | | 19 | Wizard buttons no explanation | ALREADY DONE | wizard.error signal handles this | | 21 | No post-seal guidance | FIXED | "What's next?" panel with promote/approve/versions links | | 22 | Registry search mock data | FIXED | Returns empty array, no fake digests |

Product gaps (not fixable in UI pass): | 20 | Audit log 0 events | PRODUCT GAP | Endpoint wired (MapAuditEndpoints) but services don't emit events — audit write path missing across all modules | | 23 | Topology wizard step 5 blocked | EXPECTED | No agents on fresh compose install — needs "skip agent" option |


Journey Resumption Plan

Immediate Next (this session or next):

  1. Skip agent step — make wizard step 5 optional or allow skipping when no agents exist
  2. Verify audit log — with JobEngine audit endpoints now wired, check if events appear
  3. Test release creation with honest registry search — confirm mock data is gone
  4. Push through wizard steps 6-8 — Infrastructure, Validate, Done

Phase 2: Real Deployment (next session)

  1. Push a real Docker image to the Zot registry (stellaops-registry)
  2. Implement the registry image search backend (connect to Harbor integration)
  3. Scan the image (trigger scanner)
  4. Verify findings in Security Posture
  5. Create a release with the real scanned image
  6. Promote through Dev → Stage → Prod
  7. Check evidence/decision capsules generation

Phase 3: Policy & Evidence

  1. Create a custom policy pack
  2. Run simulation against a release
  3. Test policy gate blocking a promotion
  4. Export an audit bundle
  5. Test replay/verify

Phase 4: Operational

  1. Test notification channels
  2. Run full Doctor check
  3. Test offline kit
  4. Test tenant switching

Architecture Issue: Gateway Auth for Topology (RESOLVED)

The core blocker is issue #13. The gateway has two transport types:

  1. Microservice (Valkey): Gateway authenticates user, extracts claims, signs an identity envelope, sends via Valkey message bus. Backend receives pre-authenticated request with hasPrincipal=True.

  2. ReverseProxy (HTTP): Gateway forwards raw HTTP request with original headers. Backend must validate the bearer token itself. Concelier's auth middleware (StellaOps.Auth.ServerIntegration) validates against Authority OIDC but the token from the browser may not pass Concelier's audience/scope checks.

Options:

  • A) Register Concelier's topology endpoints as Valkey consumers (matches existing auth pattern for advisory sources)
  • B) Configure Concelier to accept the gateway's identity envelope on HTTP requests (add bypass network for gateway IP)
  • C) Add Concelier's service URL to the gateway's identity envelope signing, so ReverseProxy requests include the signed envelope headers

Option B is likely simplest — add the gateway's Docker network IP to Concelier's bypass networks.