Files
git.stella-ops.org/docs/qa/JOURNEY_NOTES_20260316.md
master 9586006404 Update journey notes: 21 fixed, 2 remaining, 2 product gaps identified
All medium fixes verified on live stack:
- Registry search: returns empty (no mock data) — confirmed
- Post-seal guidance: "What's next?" panel shows on release creation
- User ID display: truncated to "User 209d1257..."
- Mirror generate: shows failure status with retry guidance
- Wizard error handling: already implemented (was incorrectly logged)

Audit log remains at 0 events — this is a product gap, not a UI issue.
Services need to emit audit events (write path missing across modules).
MapAuditEndpoints() only exposes the query interface.

Topology wizard step 5 (Agent) is an expected fresh-install blocker.

Final score: 21 fixed, 2 low-priority UI issues, 2 product gaps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 10:49:12 +02:00

202 lines
11 KiB
Markdown

# First-Time User Journey Notes — Fresh Install
**Date**: 2026-03-16
**Stack**: Wiped + fresh boot, 63 containers, Harbor + GitHub App fixtures
---
## Journey Completed
### 1. Login (WORKS)
- Welcome page → Sign In → OIDC login → Dashboard
- Credentials: admin / Admin@Stella2026! (now documented in quickstart)
- Session persists across page reloads
### 2. Dashboard (FIXED)
- Shows honest empty state with setup guide when no environments
- When environments exist (from seed), shows honest "unknown" status, 0 findings
- No more fake "5 critical, blocked" environment cards
### 3. Integration Setup (WORKS)
- **Harbor Registry**: 6-step wizard (Provider → Connection → Scope → Schedule → Preflight → Review) → Created → Test Connection SUCCESS (38ms)
- **GitHub App SCM**: Same 6-step wizard → Created → Test Connection SUCCESS ("Connected as GitHub App: Stella QA GitHub App", 4ms)
- Both integrations properly show Pending → Active transition
- Integration detail page has Overview, Credentials, Scopes & Rules, Events, Health tabs
### 4. Advisory Sources (FIXED)
- 42 enabled by default (was 74) — curated set works
- **Check All**: Fixed from 504 timeout → parallel individual checks in batches of 6
- Shows live progress "Checking (N/55)..."
- Result: 54 healthy, 20 failed (expected for Docker network)
### 5. Mirror Domain Creation (WORKS, PARTIAL)
- 3-step wizard: Select Sources → Configure Domain → Review & Create
- Created "mirror-combined-14src" with 14 sources (Primary + Distribution)
- Signing enabled (HMAC-SHA256 with key ID)
- "Generate immediately" checkbox triggers a 503 → **silent failure, no user feedback**
- Mirror domain created but bundle not generated
### 6. Topology Wizard (STEPS 1-4 WORK, STEP 5 NATURAL BLOCKER)
- 8-step wizard loads correctly: Region → Environment → Stage Order → Target → Agent → Infrastructure → Validate → Done
- **Step 1 (Region)**: WORKS — identity envelope pre-auth middleware on Concelier
- **Step 2 (Environment)**: WORKS — added environment CRUD endpoints to Concelier
- **Step 3 (Stage Order)**: WORKS — pass-through step
- **Step 4 (Target)**: WORKS — added target CRUD endpoints to Concelier
- **Step 5 (Agent)**: BLOCKED — no agents deployed on fresh install, wizard requires agent assignment
- This is an expected blocker for fresh installs
- Should allow "Skip agent" or "Deploy agent later"
- Agent deployment requires an actual Docker host target which isn't available in pure compose setup
- Steps 6-8 (Infrastructure, Validate, Done): not reached yet (blocked by step 5)
---
## Journey Not Yet Reached
### 7. Create Release (WORKS END-TO-END ON FRESH INSTALL)
- 4-step wizard works (Basic Info → Components → Inputs → Review & Seal)
- Registry search returns mock data when API fails (noted earlier)
- Seal produces a real bundle with digest identity
- Bundle detail shows UUID heading instead of release name
- "Created by" shows raw user ID hash
### 8. Security Posture (WORKS — HONEST)
- All cards show real zeros: GUARDED, 0 blockers, 0% VEX, 0/0 SBOM, 0% reachability
- No fake data on fresh install
- "Snapshot: FAIL — 3 source(s) offline/stale" is accurate — sources not yet synced
### 9. Approvals Queue (WORKS)
- "Release Run Approvals Queue" with filtering by status, gate type, environment, hotfix, risk
- Table headers render but no approval items visible (sealed release didn't auto-trigger approval)
- **FINDING**: Sealing a release does NOT auto-create an approval request — user must manually promote
### 10. Policy Studio (WORKS)
- 7 tabs all load: Overview, Packs, Governance, Simulation, VEX & Exceptions, Release Gates, Audit
- "Core Policy Pack latest" shown in topbar
- Simulation Lab accessible under the Policy Studio shell
- **Not yet tested**: actual policy evaluation against the sealed release
### 11. Evidence & Audit (WORKS — BUT AUDIT LOG EMPTY)
- Evidence Overview: loads with Operator/Auditor mode toggle
- Decision Capsules: "No decision capsules found" — honest empty state
- Unified Audit Log: shows per-module breakdown (Attestor, Authority, Integrations, JobEngine, Policy, SBOM, Scanner, Scheduler, VEX) — all 0 events
- **FINDING**: Audit log shows 0 events despite creating 2 integrations and sealing a release. Either audit events aren't being emitted or the audit log reads from a different data path.
### 12. Doctor Diagnostics (WORKS)
- Quick/Normal/Full Check buttons available
- Quick Check: 7 pass, 1 warning, 1 fail, 5 skipped
- Failed check expands with remediation steps and copy buttons — excellent UX
---
## Additional Findings From Full Journey
### F20: Audit Log Shows 0 Events After Real Actions
**Severity**: HIGH
**What happened**: Created 2 integrations (Harbor, GitHub App), sealed a release, ran advisory check — audit log shows 0 events across all modules.
**Root cause**: Either audit event emission isn't wired in the integration/release services, or the audit log page reads from a data source that the services don't write to.
**Impact**: An auditor opening the audit trail sees nothing — undermines the product's core "verifiable evidence" promise.
### F21: Sealing a Release Doesn't Create an Approval
**Severity**: MEDIUM
**What happened**: Sealed "Payment Service v3.2" with bundle status "published". The approvals queue is empty. Expected: at least a policy gate approval request.
**Root cause**: The release creation flow creates a sealed bundle but doesn't trigger the promotion/approval workflow. Promotion is a separate step.
**Impact**: A first-time user who seals a release expects something to happen next — instead, the bundle just sits there as "published" with no guidance on what to do.
**Proposed fix**: After sealing, show a "What's next?" panel: "Promote to Dev → Stage → Prod" with a button to start the promotion workflow.
### F22: Component Registry Search Returns Mock Data on Fresh Install
**Severity**: HIGH (repeat finding)
**What happened**: Searching "payment" returned mock "payment-service" with fake digest. Console error for `/api/registry/images/search?q=payment`. The Harbor integration is connected but the search doesn't use it.
**Root cause**: The component search uses a seed/mock registry index, not the real Harbor integration.
**Impact**: Releases are sealed with fake artifact digests that don't exist in any real registry.
---
## Issues Found (All Iterations)
### FIXED (21)
| # | Issue | Fix |
|---|-------|-----|
| 1 | Dashboard 100% hardcoded | Removed all fake data, setup guide |
| 2 | Mirror source enabled P1 | EnabledByDefault = false on 32 sources |
| 3 | Mirror in domain builder | Filter category !== 'Mirror' |
| 4 | No 404 page | NotFoundComponent + wildcard route |
| 5 | Arrow chars broken | Unicode → |
| 6 | No credentials in docs | Added to quickstart |
| 7 | Feature Matrix outdated | 14 features → ✅ |
| 8 | Fallback array not emptied | Emptied to [] |
| 9 | Check All 504 timeout | Parallel individual checks, batches of 6 |
| 10 | Topology 503 (no routes) | Added 6 ReverseProxy routes |
| 11 | Envs route wrong service | Route to JobEngine |
| 12 | Topology auth policies missing | Registered Topology.Read/Manage/Admin |
| 13 | Topology wizard 401 (ReverseProxy auth) | Pre-auth middleware reads identity envelope |
| 14-env | Environment CRUD on wrong service | Added env CRUD endpoints to Concelier |
| 14-tgt | Target CRUD missing | Added target CRUD endpoints to Concelier |
| 14-agt | Agent list missing | Added agents list endpoint to Concelier |
### NOT FIXED (2)
| # | Issue | Severity | Root Cause |
|---|-------|----------|-----------|
| 16 | v2 context API console errors | LOW | /api/v2/context/regions, /preferences, /approvals return errors |
| 17 | Crypto profile no tooltip | LOW | No explanation of FIPS/eIDAS/GOST/SM |
**Verified fixed:**
| 14 | User ID hash display | FIXED | formatActor() truncates to "User 209d1257..." |
| 15 | Mirror generate silent failure | FIXED | Shows status message with retry guidance |
| 18 | Wizard silent failure | ALREADY DONE | wizard.error signal + banner was already implemented |
| 19 | Wizard buttons no explanation | ALREADY DONE | wizard.error signal handles this |
| 21 | No post-seal guidance | FIXED | "What's next?" panel with promote/approve/versions links |
| 22 | Registry search mock data | FIXED | Returns empty array, no fake digests |
**Product gaps (not fixable in UI pass):**
| 20 | Audit log 0 events | PRODUCT GAP | Endpoint wired (MapAuditEndpoints) but services don't emit events — audit write path missing across all modules |
| 23 | Topology wizard step 5 blocked | EXPECTED | No agents on fresh compose install — needs "skip agent" option |
---
## Journey Resumption Plan
### Immediate Next (this session or next):
1. **Skip agent step** — make wizard step 5 optional or allow skipping when no agents exist
2. **Verify audit log** — with JobEngine audit endpoints now wired, check if events appear
3. **Test release creation with honest registry search** — confirm mock data is gone
4. **Push through wizard steps 6-8** — Infrastructure, Validate, Done
### Phase 2: Real Deployment (next session)
1. Push a real Docker image to the Zot registry (stellaops-registry)
2. Implement the registry image search backend (connect to Harbor integration)
3. Scan the image (trigger scanner)
4. Verify findings in Security Posture
5. Create a release with the real scanned image
6. Promote through Dev → Stage → Prod
7. Check evidence/decision capsules generation
### Phase 3: Policy & Evidence
1. Create a custom policy pack
2. Run simulation against a release
3. Test policy gate blocking a promotion
4. Export an audit bundle
5. Test replay/verify
### Phase 4: Operational
1. Test notification channels
2. Run full Doctor check
3. Test offline kit
4. Test tenant switching
---
## Architecture Issue: Gateway Auth for Topology (RESOLVED)
The core blocker is **issue #13**. The gateway has two transport types:
1. **Microservice** (Valkey): Gateway authenticates user, extracts claims, signs an identity envelope, sends via Valkey message bus. Backend receives pre-authenticated request with `hasPrincipal=True`.
2. **ReverseProxy** (HTTP): Gateway forwards raw HTTP request with original headers. Backend must validate the bearer token itself. Concelier's auth middleware (`StellaOps.Auth.ServerIntegration`) validates against Authority OIDC but the token from the browser may not pass Concelier's audience/scope checks.
**Options**:
- A) Register Concelier's topology endpoints as Valkey consumers (matches existing auth pattern for advisory sources)
- B) Configure Concelier to accept the gateway's identity envelope on HTTP requests (add bypass network for gateway IP)
- C) Add Concelier's service URL to the gateway's identity envelope signing, so ReverseProxy requests include the signed envelope headers
Option B is likely simplest — add the gateway's Docker network IP to Concelier's bypass networks.