Stabilize setup admin onboarding journeys

2026-03-15 03:38:48 +02:00
parent 2661bfefa4
commit 4a5185121d
6 changed files with 392 additions and 22 deletions
--- a/docs/implplan/SPRINT_20260315_001_Platform_setup_admin_operator_journey_audit.md
+++ b/docs/implplan/SPRINT_20260315_001_Platform_setup_admin_operator_journey_audit.md
@@ -0,0 +1,80 @@
+# Sprint 20260315_001 - Platform Setup Administration Operator Journey Audit
+
+## Topic & Scope
+- Use Stella Ops as a first-time platform administrator setting up the product for real organizational use after installation.
+- Drive end-user admin journeys first: identity and access, tenant and branding, notifications administration, trust/signing administration, setup integrations management, and adjacent setup surfaces that an operator would use during onboarding.
+- Treat retained Playwright as evidence and regression coverage, not as a substitute for discovery; every newly discovered manual setup/admin defect must become retained coverage afterward.
+- Group fixes by root cause so the iteration closes full setup/admin behavior slices instead of isolated page patches.
+- Working directory: `.`.
+- Expected evidence: operator journey notes, retained Playwright additions or hardening, grouped defect analysis, focused tests where code changes land, rebuilt-stack retest results, and live aggregate evidence.
+
+## Dependencies & Concurrency
+- Depends on local commit `2661bfefa` as the closed baseline from scratch iteration 013.
+- Safe parallelism: avoid environment resets while live setup/admin journeys are running because the stack is shared.
+
+## Documentation Prerequisites
+- `AGENTS.md`
+- `docs/INSTALL_GUIDE.md`
+- `docs/dev/DEV_ENVIRONMENT_SETUP.md`
+- `docs/qa/feature-checks/FLOW.md`
+
+## Delivery Tracker
+
+### PLATFORM-SETUP-ADMIN-001 - Define and execute setup/admin operator journeys
+Status: DONE
+Dependency: none
+Owners: QA, Product Manager
+Task description:
+- Act as a platform administrator onboarding Stella Ops for real use. Cover identity and access management, tenant and branding changes, notification administration, trust/signing inventory and workflows, setup integrations management, and any adjacent setup surfaces encountered during the journey.
+
+Completion criteria:
+- [x] The primary setup/admin operator journeys are explicitly listed before fixes begin.
+- [x] Playwright is used to execute those journeys as an operator would, not only as route sweeps.
+- [x] Every broken route, page-load, data-load, validation rule, or action encountered on the operator path is recorded before any fix starts.
+
+### PLATFORM-SETUP-ADMIN-002 - Convert newly discovered admin steps into retained coverage
+Status: DONE
+Dependency: PLATFORM-SETUP-ADMIN-001
+Owners: QA, Test Automation
+Task description:
+- Add or deepen retained Playwright coverage for every newly discovered setup/admin step so future iterations recheck the same operator behavior automatically.
+
+Completion criteria:
+- [x] Every newly discovered operator/admin step is mapped to retained Playwright coverage or an explicit backlog gap.
+- [x] Retained coverage additions are organized by user journey, not only by route.
+- [x] The next aggregate run would exercise the newly discovered setup/admin path automatically.
+
+### PLATFORM-SETUP-ADMIN-003 - Repair grouped setup/admin defects and retest
+Status: DONE
+Dependency: PLATFORM-SETUP-ADMIN-002
+Owners: 3rd line support, Architect, Developer
+Task description:
+- Diagnose the grouped failures exposed by the setup/admin journey, choose the clean product/architecture-conformant fix, implement it, add retained Playwright coverage for the new behavior when needed, and rerun the affected journeys plus the aggregate audit before committing.
+
+Completion criteria:
+- [x] Root causes are recorded for the grouped failures.
+- [x] Fixes land with focused regression coverage and retained Playwright scenario updates where practical.
+- [x] The live stack is retested through the same setup/admin journeys before the iteration commit.
+
+## Execution Log
+| Date (UTC) | Update | Owner |
+| --- | --- | --- |
+| 2026-03-15 | Sprint created immediately after local commit `2661bfefa` closed the release-confidence operator iteration cleanly at `25/25` suites with `0` retries. | QA |
+| 2026-03-15 | Defined the setup/admin operator path as: identity and access (`/setup/identity-access` users, roles, tenants), tenant and branding (`/setup/tenant-branding`), notifications administration (`/setup/notifications/channels/new`, `/setup/notifications/rules/new`), trust and signing (`/setup/trust-signing/*`), setup integrations management, direct docs navigation, global search from the operator shell, and security reports embedding under the setup-adjacent admin workflow. | QA |
+| 2026-03-15 | Discovery before fixes found one real setup/admin defect and one retained-coverage defect. Real product defect: canonical `/setup/notifications/channels/new` did not enter create mode and the UI allowed empty `secretRef` even though the notifier contract requires a non-empty secret reference. Retained defect: the user-reported trust/admin probe misclassified `Signing Keys` and `Audit Log` as blank because it used the wrong selectors and did not wait for routed tab resolution. | QA |
+| 2026-03-15 | Repaired notifications onboarding in `channel-management.component.ts`, added focused Angular regression coverage in `channel-management.component.spec.ts`, added the spec file to `tsconfig.spec.features.json`, and deepened `live-setup-admin-action-sweep.mjs` so setup/admin retained coverage now proves routed channel creation, required `secretRef`, rule-channel visibility, and cleanup. | Developer |
+| 2026-03-15 | Hardened `live-user-reported-admin-trust-check.mjs` to wait for tab-specific resolution, inspect the real key/audit selectors used by trust management, and prove a successful valid user-create path in addition to invalid-email rejection. | Test Automation |
+| 2026-03-15 | Verification: focused Angular `channel-management.component.spec.ts` passed `71/71`; `npm run build` passed; `live-setup-admin-action-sweep.mjs` passed with `failedActionCount=0` and `runtimeIssueCount=0`; `live-user-reported-admin-trust-check.mjs` passed with `failedCheckCount=0`, including valid user creation plus role and tenant creation persistence. | QA |
+| 2026-03-15 | Direct live reruns of the adjacent setup/admin probes confirmed the remaining aggregate noise was not a reproducible product defect: `live-user-reported-admin-trust-check.mjs` resolved all trust tabs cleanly, and `live-setup-topology-action-sweep.mjs` finished `failedActionCount=0` with `runtimeIssueCount=0`. | QA |
+
+## Decisions & Risks
+- Decision: this iteration prioritizes first-time administrator behavior over broad route counts.
+- Risk: some setup/admin surfaces are currently covered only through shared checks, so behavior gaps may still exist even when route and aggregate summaries are green.
+- Root cause: the notifications channel create route advertised a canonical create page, but the component ignored route data and stayed in list mode; the same surface also presented `Secret Reference` as optional even though the notifier domain requires it.
+- Root cause: the retained trust/admin probe used generic selectors (`.key-dashboard__loading`, `.trust-audit-log__empty`) that do not exist in the live components and only waited a fixed 1.5 seconds after tab routing, producing false failures on `Signing Keys` and `Audit Log`.
+- Decision: retained Playwright checks for setup/admin surfaces must wait for route-specific resolution selectors rather than infer success from headings or generic shell-level alerts.
+- Investigation note: direct reruns on the live stack showed `Certificates`, `Audit Log`, and the adjacent setup topology journey all converging cleanly, so this iteration did not justify a product-side trust/topology change beyond the retained-check hardening already captured above.
+
+## Next Checkpoints
+- Let the in-flight `live-full-core-audit.mjs` consume the hardened user-reported admin/trust script and confirm the broader aggregate remains clean.
+- Start the next operator-first iteration from a fresh user journey outside setup/admin, carrying forward every newly retained step as regression coverage.