docs(sprint): add sprint 003-005 planning and update sprint 002 log

- SPRINT 003: Router frontdoor contract repair tasks
- SPRINT 004: Notify service and AI runs repair tasks
- SPRINT 005: JobEngine migration and scope repair tasks
- Update sprint 002 execution log with expanded route inventory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
master
2026-03-09 07:53:56 +02:00
parent 310e9f84fe
commit 71db8d4386
4 changed files with 229 additions and 0 deletions

View File

@@ -53,6 +53,7 @@ Completion criteria:
| 2026-03-09 | Added `scripts/live-frontdoor-canonical-route-sweep.mjs`, reusing live frontdoor auth/session seeding, canonical route inventory, strict route checks for known-sensitive pages, and structured JSON output under `output/playwright/`. Syntax validation passed before the live rerun. | Developer |
| 2026-03-09 | Fixed a harness defect in the shared auth/session model: the original live sweep restored `sessionStorage` only in the login tab, so every freshly opened route page was unauthenticated and falsely redirected to `/welcome`. Moved session seeding into `createAuthenticatedContext(...)` and reused the helper from the other live scripts. | Developer |
| 2026-03-09 | Ran the authenticated 106-route sweep against the rebuilt stack. After removing redirect/copy false positives, the real live backlog is 19 failing routes: reachability; feeds-airgap; jobengine; quotas; dead-letter; aoc; signals; packs; ai-runs; notifications; status; sbom-sources; policy simulation; policy trust-weights; policy staleness; policy audit; setup/platform trust-signing; and setup notifications. | Developer |
| 2026-03-09 | Expanded the canonical live sweep inventory to include the revived release-investigation, evidence-thread, and registry-admin routes so future frontdoor passes cover those pages as first-class surfaces instead of leaving them to ad hoc follow-up scripts. | Developer |
## Decisions & Risks
- Decision: keep this sprint focused on broad route-level live verification and action inventory, not on fixing specific route defects before the rebuilt stack is actually exercised.

View File

@@ -0,0 +1,77 @@
# Sprint 20260309-003 - Router Live Frontdoor Contract Repair
## Topic & Scope
- Repair the verified live frontdoor contract mismatches from the authenticated canonical route sweep where the gateway is routing to the wrong backend service or the web client is composing impossible frontdoor URLs.
- Keep this iteration focused on the highest-leverage cluster: JobEngine control routes, scanner-owned sources/witnesses routes, AI runs list routes, and console/pack-registry requests that currently self-inflict 404s in the live shell.
- Update the live router manifests that the compose stack actually mounts, keep the source router defaults aligned, and add focused frontend test coverage for the web-side fixes.
- Working directory: `devops/compose`.
- Allowed coordination edits: `src/Router/StellaOps.Gateway.WebService/appsettings.json`, `src/Web/StellaOps.Web/src/app/app.config.ts`, `src/Web/StellaOps.Web/src/app/features/pack-registry/services/pack-registry-browser.service.ts`, `src/Web/StellaOps.Web/src/app/features/pack-registry/services/pack-registry-browser.service.spec.ts`, `src/Web/StellaOps.Web/src/app/features/console/console-status.component.ts`, `src/Web/StellaOps.Web/src/app/features/console/console-status.component.spec.ts`, `docs/implplan/SPRINT_20260309_003_Router_live_frontdoor_contract_repair.md`.
- Expected evidence: live curl probes against the repaired frontdoor contracts, focused Angular specs for the touched client logic, and a rerun of the authenticated route sweep showing the remaining backlog has narrowed.
## Dependencies & Concurrency
- Depends on `SPRINT_20260309_002_FE_live_frontdoor_canonical_route_sweep.md` for the verified route failure inventory and reproduction evidence.
- Safe parallelism: stay within router manifests and the specifically listed web files; avoid unrelated search/reachability/component-revival areas being changed by other agents.
## Documentation Prerequisites
- `AGENTS.md`
- `src/Web/StellaOps.Web/AGENTS.md`
- `docs/qa/feature-checks/FLOW.md`
- `docs/modules/router/webservices-valkey-rollout-matrix.md`
- `docs/modules/ui/v2-rewire/S00_endpoint_contract_ledger_v1.md`
## Delivery Tracker
### ROUTER-LIVE-003-001 - Repair mounted frontdoor route ownership
Status: DOING
Dependency: none
Owners: Developer
Task description:
- Update the mounted compose router manifests so the live gateway sends `/api/v1/jobengine/*` to JobEngine and `/api/v1/sources` and `/api/v1/witnesses` to Scanner, while keeping AI runs on the existing AdvisoryAI `/api/v1/advisory-ai/*` frontdoor family instead of colliding with release-control `/api/v1/runs/*`.
- Keep the source router appsettings in sync so the repo default matches the live compose manifests.
Completion criteria:
- [ ] `devops/compose/router-gateway-local.json` routes the affected frontdoor paths to the verified owning services.
- [ ] `devops/compose/router-gateway-local.reverseproxy.json` and `src/Router/StellaOps.Gateway.WebService/appsettings.json` are aligned for the same paths.
- [ ] Direct frontdoor probes no longer return `404` for the repaired route families.
### ROUTER-LIVE-003-002 - Remove self-inflicted web client 404s
Status: TODO
Dependency: ROUTER-LIVE-003-001
Owners: Developer, QA
Task description:
- Fix the web config/providers and feature clients that currently generate invalid frontdoor URLs or request patterns, specifically JobEngine control consumers, AI runs list routes, the console status page bootstrap, and the pack registry installed probe.
- Add focused frontend specs to lock the repaired behavior.
Completion criteria:
- [ ] The touched web clients use canonical frontdoor bases for the repaired route families.
- [ ] Console status no longer subscribes with the synthetic `last` run id.
- [ ] Pack registry dashboard no longer depends on `/installed`.
- [ ] Focused frontend specs cover the repaired behavior.
### ROUTER-LIVE-003-003 - Rebuild and rerun live verification
Status: TODO
Dependency: ROUTER-LIVE-003-002
Owners: QA
Task description:
- Rebuild the affected web artifact, refresh the live gateway/web deployment, rerun targeted contract probes, and rerun the authenticated canonical route sweep to measure the reduced backlog.
Completion criteria:
- [ ] The router/web changes are deployed into the live compose stack.
- [ ] Targeted curl probes for the repaired route families succeed without `404`.
- [ ] The authenticated live sweep is rerun and the remaining failure inventory is recorded.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-09 | Sprint created from the authenticated 19-route failure backlog. Root-cause review confirmed that several failures are true frontdoor ownership mismatches in the mounted compose router manifests, while others are web clients composing impossible URLs on top of those broken routes. | Developer |
## Decisions & Risks
- Decision: treat `devops/compose/router-gateway-local.json` as the live authority for this iteration because the compose stack mounts it directly into the gateway container; source `appsettings.json` is parity work, not the live fix by itself.
- Decision: preserve `/v1/runs/*` and `/api/v1/runs/*` for release control. AdvisoryAI runs belong on `/api/v1/advisory-ai/runs` at the browser frontdoor and `/v1/advisory-ai/runs` inside the service, matching the existing router prefix and avoiding product-boundary collisions.
- Risk: the original AI-runs failure was not only a router mismatch. AdvisoryAI had an incomplete composition: run services were not registered and `RunEndpoints` were not mounted, so exposing the correct frontdoor path still requires a backend rebuild in this iteration.
- Risk: the trust-management routes appear to be a larger contract mismatch between a legacy `/api/v1/trust/*` web client and the documented `/api/v1/administration/trust-signing/*` platform surface, which may require a dedicated follow-on iteration once this narrower router/client cluster is cleared.
## Next Checkpoints
- 2026-03-09: land router manifest and web-client repairs.
- 2026-03-09: rebuild the web bundle and refresh the live stack.
- 2026-03-09: rerun the authenticated canonical route sweep and decide the next highest-leverage backlog slice.

View File

@@ -0,0 +1,78 @@
# Sprint 20260309-004 - Notify Live Notifications And AI Runs Repair
## Topic & Scope
- Repair the authenticated live failures on `/ops/operations/ai-runs`, `/ops/operations/notifications`, and `/setup/notifications` that remain after the frontdoor route ownership pass.
- Normalize legacy Notify channel rows so old persisted channel JSON does not break the channels list on clean restarts or reused volumes.
- Remove the stale web-side tenant override that forces notifications requests onto the wrong tenant, and align AI evidence-pack lookups to the public frontdoor contract.
- Working directory: `src/Notify`.
- Allowed coordination edits: `src/Web/StellaOps.Web/src/app/app.config.ts`, `src/Web/StellaOps.Web/src/app/core/api/evidence-pack.client.ts`, `src/Web/StellaOps.Web/src/app/core/api/evidence-pack.client.spec.ts`, `src/Web/StellaOps.Web/src/app/core/api/notify.client.ts`, `src/Web/StellaOps.Web/src/app/core/api/notify.client.spec.ts`, `docs/implplan/SPRINT_20260309_004_Notify_live_notifications_and_ai_runs_repair.md`.
- Expected evidence: focused Notify contract tests, focused Angular API-client specs, targeted live probes against the repaired contracts, and a rerun of the authenticated canonical route sweep.
## Dependencies & Concurrency
- Depends on `SPRINT_20260309_003_Router_live_frontdoor_contract_repair.md` for the repaired AdvisoryAI runs base route and the current authenticated sweep inventory.
- Safe parallelism: stay within `src/Notify/**` plus the explicitly listed web client files; avoid search/reachability/component-revival areas that were recently active.
## Documentation Prerequisites
- `AGENTS.md`
- `src/Notify/AGENTS.md`
- `src/Web/StellaOps.Web/AGENTS.md`
- `docs/qa/feature-checks/FLOW.md`
- `docs/modules/notify/architecture.md`
## Delivery Tracker
### NOTIFY-LIVE-004-001 - Normalize legacy notify channel rows and restore channel diagnostics
Status: DOING
Dependency: none
Owners: Developer
Task description:
- Diagnose why live `/api/v1/notify/channels` fails and repair the Notify WebService read path so persisted legacy channel JSON without canonical `secretRef` still deserializes into a stable `NotifyChannel` model.
- Restore the missing `/api/v1/notify/channels/{channelId}/health` contract so Notifications Studio can fetch per-channel diagnostics without a guaranteed `404`.
- Preserve meaningful legacy fields instead of dropping them, and add contract coverage that uses the exact legacy row shape observed in the live database.
Completion criteria:
- [ ] `GET /api/v1/notify/channels` no longer fails when legacy config rows omit `secretRef`.
- [ ] `GET /api/v1/notify/channels/{channelId}/health` returns a stable diagnostics payload for existing channels.
- [ ] Legacy config fields are normalized into the returned `NotifyChannelConfig` instead of discarded.
- [ ] Focused Notify contract coverage locks the regression.
### NOTIFY-LIVE-004-002 - Repair web-side AI runs and notifications callers
Status: DOING
Dependency: NOTIFY-LIVE-004-001
Owners: Developer, QA
Task description:
- Update the web clients so AI evidence-pack lookups use the public `/v1/evidence-packs?runId=` contract and notification calls default to the active authenticated tenant instead of a hard-coded dev tenant.
- Add focused Angular specs for both repaired callers.
Completion criteria:
- [ ] Evidence-pack run queries no longer call `/v1/runs/{runId}/evidence-packs` from the browser frontdoor.
- [ ] Notification requests resolve the live tenant from session/context when no explicit override is supplied.
- [ ] Focused Angular specs cover both repaired behaviors.
### NOTIFY-LIVE-004-003 - Rebuild and reverify live pages
Status: TODO
Dependency: NOTIFY-LIVE-004-002
Owners: QA
Task description:
- Rebuild the Notify service and the web bundle, refresh the live compose services, and rerun direct probes plus the authenticated canonical route sweep to confirm the backlog narrowed on the affected pages.
Completion criteria:
- [ ] The updated Notify image and web bundle are deployed into the compose stack.
- [ ] Direct authenticated probes for AI evidence packs and notifications channels/rules/deliveries succeed.
- [ ] The authenticated route sweep is rerun and the remaining failure inventory is recorded.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-09 | Sprint created from the warmed authenticated route sweep. Live diagnosis showed AI runs still calling the old browser-internal evidence-pack route, while notifications failures split between a stale hard-coded tenant header and legacy Notify channel rows persisted without canonical `secretRef`. | Developer |
## Decisions & Risks
- Decision: normalize legacy Notify channel rows on read instead of requiring a manual database cleanup. The live database currently contains pre-canonical JSON payloads such as `smtpHost`, `webhookUrl`, and `channel` with empty metadata; the product cannot treat reused volumes as unsupported.
- Decision: restore the documented channel-health route in Notify itself instead of teaching the web client to suppress diagnostics. The architecture dossier already treats `/channels/{id}/health` as canonical connector behavior.
- Decision: keep the browser on the public evidence-pack collection route and filter by `runId` query, matching the service's documented `/v1/evidence-packs` contract.
- Risk: notifications pages may still surface a separate `/api/v1/notify/audit` contract issue after channels/rules/deliveries are repaired; if it remains visible in Playwright after this iteration, it needs its own follow-on sprint rather than being hidden.
## Next Checkpoints
- 2026-03-09: land Notify compatibility normalization and web client repairs.
- 2026-03-09: rebuild `notify-web` and the web bundle, then refresh the compose services.
- 2026-03-09: rerun the authenticated route sweep and choose the next highest-leverage failure cluster.

View File

@@ -0,0 +1,73 @@
# Sprint 20260309-005 - JobEngine Live Scratch Reset And Ops Scope Repair
## Topic & Scope
- Repair the clean-reset JobEngine failure where a wiped database starts without the `orchestrator` schema, breaking the live `/ops/operations/jobengine` shell immediately after scratch setup.
- Restore the local authority scope bundle used by the compose installer so quota management and pack registry pages/actions do not self-fail with authorization gaps in the rebuilt shell.
- Keep this pass limited to scratch-start stability and authenticated ops access needed by the current live route backlog.
- Working directory: `src/JobEngine`.
- Allowed coordination edits: `devops/compose/docker-compose.stella-ops.yml`, `devops/compose/envsettings-override.json`, `docs/implplan/SPRINT_20260309_005_JobEngine_live_scratch_reset_and_ops_scope_repair.md`.
- Expected evidence: focused JobEngine unit coverage for startup migration registration, direct authenticated probes for quota/pack routes, and a rerun of the authenticated frontdoor sweep after redeploy.
## Dependencies & Concurrency
- Depends on `SPRINT_20260309_001_Platform_scratch_setup_bootstrap_restore.md` for the current scratch-reset install path and on `SPRINT_20260309_002_FE_live_frontdoor_canonical_route_sweep.md` for the verified live failure inventory.
- Safe parallelism: stay within `src/JobEngine/**` and the explicitly listed compose config files; avoid unrelated frontend/search/component revival work already merged by other agents.
## Documentation Prerequisites
- `AGENTS.md`
- `src/JobEngine/AGENTS.md`
- `docs/qa/feature-checks/FLOW.md`
- `docs/modules/jobengine/architecture.md`
## Delivery Tracker
### JOBENGINE-LIVE-005-001 - Auto-migrate JobEngine on clean reset
Status: DOING
Dependency: none
Owners: Developer
Task description:
- Wire JobEngine onto the shared startup-migrations host so a wiped compose volume converges the `orchestrator` schema automatically before repositories serve live traffic.
- Add focused regression coverage proving the infrastructure registration includes a hosted startup migration.
Completion criteria:
- [ ] `AddJobEngineInfrastructure` registers startup migrations for the `orchestrator` schema.
- [ ] JobEngine infrastructure references the shared migration library directly instead of relying on manual database bootstrap.
- [ ] Focused JobEngine tests lock the registration behavior.
### JOBENGINE-LIVE-005-002 - Restore compose-local ops scopes for quotas and packs
Status: TODO
Dependency: JOBENGINE-LIVE-005-001
Owners: Developer, QA
Task description:
- Expand the compose-local authority scope bundle so the rebuilt UI token includes the real JobEngine quota and pack-registry scopes required by the current operations pages and their primary actions.
Completion criteria:
- [ ] The compose authority scope string includes `orch:quota`.
- [ ] The compose authority scope string includes pack registry scopes needed by the current operations surfaces (`packs.read`, `packs.write`, `packs.run`, `packs.approve`).
- [ ] Direct authenticated probes no longer fail solely because the token is missing those scopes.
### JOBENGINE-LIVE-005-003 - Rebuild and reverify the scratch-reset stack
Status: TODO
Dependency: JOBENGINE-LIVE-005-002
Owners: QA
Task description:
- Rebuild the changed JobEngine/web artifacts, refresh the live compose services, and rerun direct probes plus the authenticated canonical route sweep to confirm the scratch-reset backlog has narrowed.
Completion criteria:
- [ ] The updated JobEngine service and web bundle are deployed into the live compose stack.
- [ ] Direct authenticated probes for `/api/v1/jobengine/jobs/summary`, quota endpoints, and pack registry list requests succeed without schema or scope failures.
- [ ] The authenticated live sweep is rerun and the remaining failure inventory is recorded.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-09 | Sprint created from the fresh scratch-reset live sweep. Root-cause review confirmed that JobEngine still starts against a wiped database without auto-applying the `orchestrator` schema, and the compose-local authority scope bundle omits quota and pack-registry scopes required by the active ops shell. | Developer |
## Decisions & Risks
- Decision: fix the clean-reset failure at the module root by registering startup migrations in JobEngine infrastructure. Manual seed SQL is not an acceptable recovery path under the repo-wide auto-migration rule.
- Decision: widen the compose-local scope bundle to match the actual scopes enforced by the current JobEngine endpoints. Hiding those routes in the UI would only mask a broken local install.
- Risk: some remaining `/ops/operations/*` failures may still reflect deeper backend contract gaps after migrations and scopes are repaired. Those should move into follow-on sprints with dedicated ownership instead of being papered over here.
## Next Checkpoints
- 2026-03-09: land JobEngine startup migration registration and scope repairs.
- 2026-03-09: rebuild the changed services and web artifact.
- 2026-03-09: rerun the authenticated route sweep and select the next live failure cluster.