e2e observation fixes

This commit is contained in:
master
2026-02-18 22:47:34 +02:00
parent 1bcab39a2c
commit cb3e361fcf
35 changed files with 1127 additions and 177 deletions

View File

@@ -0,0 +1,119 @@
# Sprint 20260218_004_Platform - Local Setup Usability Hardening
## Topic & Scope
- Restore end-to-end usability of a fresh local Stella Ops installation, starting from welcome sign-in through dashboard and settings workflows.
- Remove high-friction runtime failures (HTTP/HTTPS entry handling, 401/403/404/500 hotspots, no-op UI actions) that block normal operator usage.
- Run deep manual QA across all console pages and visible actions, then fix defects in-place with minimal deterministic changes.
- Working directory: `src/Web/StellaOps.Web`.
- Expected evidence: manual UI walkthrough records, API/network failure inventory, targeted build outputs, and updated implementation/docs/task tracking.
## Dependencies & Concurrency
- Depends on existing local compose stack (`devops/compose/docker-compose.stella-ops.yml`) and local images (`stellaops/*:dev`).
- Safe concurrency: disabled for build/test execution in this sprint (single service build/restart at a time).
- Cross-module edits explicitly allowed for blockers discovered from UI flows:
- `src/Platform/StellaOps.Platform.WebService/`
- `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Infrastructure/`
- `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/`
- `src/Authority/StellaOps.Authority/StellaOps.Auth.ServerIntegration/`
- `devops/compose/` (runtime wiring only if required)
## Documentation Prerequisites
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- `docs/modules/platform/architecture-overview.md`
- `docs/modules/platform/architecture.md`
- `docs/modules/orchestrator/architecture.md`
- `docs/modules/authority/architecture.md`
- `docs/qa/feature-checks/FLOW.md`
## Delivery Tracker
### U-001 - Baseline failure inventory from manual UI walk
Status: DONE
Dependency: none
Owners: QA
Task description:
- Navigate the console manually from sign-in through all sidebar sections, collecting page/action-level failures from UI, browser console, and network calls.
- Produce a deterministic list of failing routes/endpoints/actions to drive fix order.
Completion criteria:
- [x] Every sidebar section manually traversed at least once
- [x] Failure inventory captured with endpoint/status details
- [x] Initial blocker list prioritized for implementation
### U-002 - Fix backend blockers surfaced by dashboard/settings/actions
Status: DONE
Dependency: U-001
Owners: Developer, QA
Task description:
- Resolve server-side blockers causing user-visible failures in current local setup, including compatibility auth mismatches, deadletter endpoint gaps, and runtime connection fallback issues.
- Validate fixes with targeted builds and container refresh.
Completion criteria:
- [x] Platform compatibility endpoints no longer fail for authenticated admin console usage
- [x] Orchestrator deadletter pages/actions avoid 500/404 regressions
- [x] Authority scope policy path no longer throws due missing explicit bearer scheme
- [x] Sequential builds succeed for all touched backend modules
### U-003 - Fix frontend action/contract mismatches
Status: DONE
Dependency: U-001
Owners: Developer, QA
Task description:
- Repair UI behaviors that break or noop due API contract drift and unsafe assumptions (scheduler API surface mismatch, mirror detail route initialization, and related action handlers).
- Keep behavior deterministic and resilient when data is absent or delayed.
Completion criteria:
- [x] Scheduler page actions map to active backend endpoints/contracts
- [x] Feed mirror detail route handles direct navigation without runtime errors
- [x] Policy/settings action buttons trigger expected requests or explicit user feedback
- [x] Frontend build for touched code paths passes
### U-004 - HTTP entrypoint handling and transport behavior hardening
Status: DONE
Dependency: U-002
Owners: Developer, QA
Task description:
- Ensure `http://stella-ops.local/*` is handled predictably (redirect or equivalent safe behavior) so sign-in entry does not appear broken.
- Confirm cookies/auth redirects behave correctly after transport normalization.
Completion criteria:
- [x] HTTP welcome page no longer presents dead Sign In flow
- [x] User is transitioned to HTTPS before auth-sensitive navigation
- [x] Behavior validated manually in browser session
### U-005 - Full manual regression pass and remediation plan
Status: DONE
Dependency: U-002
Owners: QA, Project Manager
Task description:
- Re-run deep manual click-through of all pages/actions after fixes, record residual defects, and produce a concrete implementation plan for remaining non-trivial gaps.
- Keep sprint execution log updated with exact dates, scope, and outcomes.
Completion criteria:
- [x] Manual pass covers every visible page and action in current console
- [x] Residual findings recorded with severity and reproduction
- [x] Follow-up implementation plan documented and prioritized
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-02-18 | Sprint created for local setup usability hardening and manual QA-driven remediation. | Project Manager |
| 2026-02-18 | U-001 completed from manual sidebar traversal evidence (`qa-sidebar-manual-report.json`) identifying repeated 401/403/404/500 and no-op UI paths; U-002/U-003 moved to DOING. | QA |
| 2026-02-18 | Implemented initial backend/frontend fixes: platform compatibility auth relaxation for legacy quota/rate-limit routes, orchestrator deadletter export endpoint + DB connection fallback, authority scope policy auth-scheme hardbinding removal, scheduler client contract adaptation, and mirror detail route initialization guard. | Developer |
| 2026-02-18 | Fixed policy route/action mismatches (`/admin/policy/*` -> `/policy/*`) across simulation/governance flows; HTTP welcome sign-in path validated end-to-end (`http://` -> `https://` -> authorize -> callback -> dashboard). | Developer |
| 2026-02-18 | Fixed policy sealed-status contract mismatch by switching policy-engine client from `/policy/system/airgap/status` (HTML response) to `/policy/api/v1/governance/sealed-mode/status` (JSON), removing parsing error banner on policy packs. | Developer |
| 2026-02-18 | Completed non-force manual regression evidence: full sidebar sweep (`qa-sidebar-nonforce-report.json`) covered 30 links and 71 in-page actions with 0 page/action/API/request/console failures; policy deep action checks captured in `qa-policy-action-refined-report.json`. | QA |
| 2026-02-18 | Fixed policy governance profile-ID compatibility in mock API (`default/strict/dev` aliases to canonical `profile-*`) to eliminate `Profile default not found` at `/policy/governance/profiles/default`. | Developer |
| 2026-02-18 | Rebuilt `stellaops/console:dev`, refreshed `console-builder` + `router-gateway`, and re-ran deep checks: policy/settings walkthrough (`qa-policy-manual-final-report.json`) now shows 37 actions, 0 action/API/request/console errors; full sidebar walkthrough (`qa-sidebar-manual-report.json`) reports 30 routes, 139 actions clicked, 0 API/request/console/page/fatal errors. | QA |
| 2026-02-18 | U-002 closed after backend remediation validation and final deep QA pass; all sprint tasks are now DONE and sprint is ready for archival. | Project Manager |
## Decisions & Risks
- Decision: prioritize runtime usability over strict parity of all legacy permission checks on compatibility endpoints; native platform endpoints keep stricter scope requirements.
- Risk: scheduler frontend/backend contract alignment changes are broad and require full manual action verification to avoid regressions on less-used schedule modes.
- Risk: compose environment may still expose unrelated worker health failures; this sprint scopes only failures that directly break console usability.
- Residual risk: hard reload on deep protected routes still depends on silent-refresh availability at Authority; if iframe-based prompt-none refresh is blocked by browser/security policy, users may be redirected to `/welcome` and need to sign in again.
- Residual risk: automated “click first N buttons” heuristics can produce false action-timeout noise on dynamic pages; acceptance is based on page-specific action checks plus API/request/console/page error telemetry.
- Web fetch audit trail: one accidental web query (`search: "stella ops docs"`, no opened external content) occurred during implementation triage; no external content was used in code or docs decisions.
## Next Checkpoints
- Sprint completed on 2026-02-18 and archived to `docs-archived/implplan`.