Repair live canonical migrations and scanner cache bootstrap

This commit is contained in:
master
2026-03-09 21:56:41 +02:00
parent 00bf2fa99a
commit dfd22281ed
21 changed files with 1018 additions and 12 deletions

View File

@@ -0,0 +1,74 @@
# Sprint 20260309_014 - Live Runtime Fault Repair
## Topic & Scope
- Repair live backend/runtime faults uncovered after the full 60-image rebuild and fresh `stella-ops.local` redeploy.
- Keep the rebuilt stack client-ready underneath the clean UI shell by fixing background workers, runtime contracts, and hardened-container assumptions instead of hiding errors behind empty states.
- Working directory: `src/Platform/**`.
- Cross-module edits allowed for this sprint: `src/JobEngine/**`, `src/Concelier/**`, `src/Scanner/**`, `devops/compose/**`, and linked docs in `docs/**`.
- Expected evidence: targeted `.csproj` test runs, live API verification, live Playwright rechecks on impacted routes, and runtime log validation after redeploy.
## Dependencies & Concurrency
- Depends on the scratch rebuild baseline and live search runtime repair from `SPRINT_20260309_013_AdvisoryAI_live_unified_search_corpus_runtime_repair.md`.
- Safe parallelism: avoid unrelated web/search feature edits already in flight from other agents; stage only the runtime-fault hunks touched here.
## Documentation Prerequisites
- `docs/modules/platform/architecture-overview.md`
- `docs/modules/jobengine/architecture.md`
- `docs/modules/concelier/architecture.md`
- `docs/modules/scanner/architecture.md`
- `docs/code-of-conduct/CODE_OF_CONDUCT.md`
- `docs/code-of-conduct/TESTING_PRACTICES.md`
## Delivery Tracker
### TASK-014-001 - Diagnose live runtime failures from rebuilt stack
Status: DONE
Dependency: none
Owners: QA, Developer
Task description:
- Rebuild all services, redeploy the compose stack, then inspect live route behavior and backend logs to identify runtime faults that survive basic page rendering.
Completion criteria:
- [x] Full image matrix rebuild completed.
- [x] Fresh compose recreate completed.
- [x] Live evidence captured for runtime faults and impacted routes.
### TASK-014-002 - Repair scheduler and analytics runtime contract faults
Status: DONE
Dependency: TASK-014-001
Owners: Developer
Task description:
- Fix PostgreSQL type/function mismatches causing scheduler planner loops and platform analytics maintenance to fail after startup.
Completion criteria:
- [x] Scheduler planner queries no longer emit `run_state = text` errors.
- [x] Platform analytics maintenance invokes `analytics.compute_daily_rollups` with the correct PostgreSQL parameter type.
- [x] Focused tests prove the repaired contracts.
### TASK-014-003 - Repair canonical advisory DI and scanner cache runtime assumptions
Status: DONE
Dependency: TASK-014-001
Owners: Developer
Task description:
- Restore Concelier canonical advisory service registration under the live WebService and align scanner cache paths with writable hardened-container storage so maintenance jobs stop failing after deploy.
Completion criteria:
- [x] `/api/v1/canonical` resolves through registered services without runtime DI failure.
- [x] Scanner cache maintenance no longer writes into read-only `/app` paths in live containers.
- [x] Focused tests and live verification cover the repaired contracts.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-09 | Sprint created after full rebuild/redeploy exposed live runtime faults in scheduler planning, platform analytics maintenance, Concelier canonical DI, and scanner cache maintenance. | Codex |
| 2026-03-09 | Repaired scheduler enum/query typing and platform analytics date binding; focused `.csproj` verification passed and post-redeploy logs stopped emitting the runtime faults. | Codex |
| 2026-03-09 | Added Concelier startup migration registration, fixed Scanner worker env-prefix bootstrap, and introduced compose cache ownership bootstrap; focused tests passed, `/api/v1/canonical` returned `200`, cache paths resolved to `/var/lib/stellaops/cache/scanner`, and live Playwright rechecks passed (`111/111` routes, changed-surfaces pass). | Codex |
## Decisions & Risks
- This sprint intentionally treats background worker failures as product defects even when the frontdoor UI still renders. A clean route sweep is insufficient if the live services are erroring underneath.
- Cross-module edits are permitted because the faults span runtime contracts across Platform, JobEngine, Concelier, Scanner, and compose deployment wiring.
- Microsoft Testing Platform projects in this sprint require `dotnet test <project>.csproj -- --filter-class ...`; `--filter` against the project silently ran whole suites and was rejected as verification evidence.
- Hardened Scanner containers need both a writable cache root and ownership bootstrap. The compose stack now uses `scanner-cache-init` to prepare the named volume for the non-root runtime user.
## Next Checkpoints
- Targeted repair commit once runtime faults are fixed, revalidated live, and staged without unrelated agent changes.