Fix ElkSharp gateway target peer conflict polish

This commit is contained in:
master
2026-03-26 13:57:47 +02:00
parent 71edccd485
commit c210115224
4 changed files with 19758 additions and 340 deletions

View File

@@ -95,6 +95,58 @@ Completion criteria:
- [x] The document-processing render shows the `Set emailDispatchFailed -> End` edge repaired to a direct L-shape instead of the previous deep detour
- [x] Targeted renderer tests and the full workflow renderer test project pass
### TASK-007 - Add gateway-specific polygon boundary landing
Status: DONE
Dependency: TASK-006
Owners: Implementer
Task description:
Replace the remaining rectangle-style landing assumptions for `Decision`, `Fork`, and `Join` nodes with gateway-specific polygon-boundary handling. Off-axis gateway entry and exit should land on the actual boundary with short diagonal stubs, gateway-specific backtracking/entry validation should stop misclassifying those routes, and rectangle-side highway detection should no longer treat gateway targets as ordinary target-side highway groups.
Completion criteria:
- [x] Gateway helper logic can project an off-axis lane to a diagonal boundary landing instead of falling back to rectangular snapping
- [x] Workflow renderer tests cover diagonal gateway exit and diagonal gateway entry against `Decision` nodes
- [x] Document-processing artifact render passes again with zero selected broken short highways and zero selected entry-angle violations
- [x] Workflow docs and module-local guidance mention the gateway-specific landing behavior
### TASK-008 - Refine gateway diagonals to avoid corner-vertex landings
Status: DONE
Dependency: TASK-007
Owners: Implementer
Task description:
Tighten the gateway-boundary helper so `Decision`, `Fork`, and `Join` nodes still allow short diagonal stubs on their side faces, but no longer land those diagonals directly on a gateway corner vertex. Any remaining corner-diagonal cases must surface as entry-angle defects so the local repair path and the document-processing artifact assertion can catch them.
Completion criteria:
- [x] Side-face gateway diagonals remain valid for off-axis entry and exit
- [x] Corner-vertex gateway diagonals are rejected and shifted onto the adjacent edge interior
- [x] The document-processing artifact test asserts zero selected gateway corner diagonals
- [x] Workflow docs and module-local guidance mention the corner-diagonal exclusion
### TASK-009 - Fix gateway target slots and repeat-corridor node safety
Status: DONE
Dependency: TASK-008
Owners: Implementer
Task description:
Finish the gateway join fix by making gateway target slots polygon-aware instead of rectangular, while keeping local repair scoped to penalized lanes. Also remove the repeat-collector loophole that let a preserved outer corridor skip node-crossing repair when the pre-corridor prefix still crossed a node.
Completion criteria:
- [x] Gateway target slot assignment uses polygon-face intersections so repaired gateway arrivals do not collapse back onto the same face rail
- [x] Restricted local repair computes target-slot spacing against the full peer set on the same target side
- [x] Repeat-collector pre-corridor prefixes can reroute into a preserved corridor when they cross a node
- [x] The document-processing artifact render passes with zero selected node crossings and zero selected target-approach joins
- [x] The full workflow renderer test project passes
### TASK-010 - Finish source-departure lane separation and placement-grid follow-up
Status: DONE
Dependency: TASK-009
Owners: Implementer
Task description:
Add a dedicated source-side same-lane rule so two edges leaving the same source face cannot silently share the same departure lane unless an explicit corridor/highway exception applies. Separate node placement spacing from the routing lattice by deriving a placement grid from the average non-terminal node size, then revalidate the document-processing artifact so the new source-side rule does not leave late boundary-angle or target-join regressions behind.
Completion criteria:
- [x] Source-departure same-lane conflicts surface as blocking shared-lane issues instead of being treated only as target-side joins or generic proximity
- [x] Left-to-right placement spacing derives from an average-node-size placement grid rather than the edge-routing lattice alone
- [x] The document-processing artifact render revalidates with zero selected shared-lane violations and no new boundary-angle or target-join regressions
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
@@ -112,8 +164,20 @@ Completion criteria:
| 2026-03-23 | Expanded the iterative-router pressure path from the accidental 2-attempt/4-strategy clamp to bounded multi-attempt retries with a wider finite strategy sweep, added stagnation cutoffs to avoid blind repetition, and wired the document-processing artifact test to emit `elksharp.progress.log` plus in-memory progress diagnostics so long-running strategy searches can be inspected while they are still running. A live run confirmed the new path executed `Strategy 1 attempt 1`, `attempt 2`, `attempt 3`, then advanced to `Strategy 2` instead of stopping after two attempts. | Implementer |
| 2026-03-24 | Added per-attempt phase timings and route-pass counters to the iterative diagnostics JSON, regenerated the document-processing artifact with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (1/1, 50s), and confirmed the runtime hotspot is overwhelmingly `route-all-edges`: for the selected `reverse` strategy the three attempts spent about `45.3s` in `route-all-edges` versus about `15.9ms` in all post-processing/scoring phases combined. The same run still reported `ExcessiveDetourViolations=1` for `edge/33`, so the shortest-path issue remains unresolved and requires a local detour-repair path rather than more full-graph retries. | Implementer |
| 2026-03-24 | Reworked iterative retry attempts to repair only penalized edges after the first full strategy pass, made attempt 2 prioritize shortest-path detours, narrowed the protected-corridor exemption so ordinary forward overshoots still qualify for detour repair, and revalidated with `dotnet build src/__Libraries/StellaOps.ElkSharp/StellaOps.ElkSharp.sln`, `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (1/1, 22s), and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore` (20/20). The new artifact diagnostics show attempt 2+ `Mode=local-repair` with rerouted-edge counts below the full graph, and the `Set emailDispatchFailed -> End` path is now the direct L-shape instead of the previous deep outer detour. | Implementer |
| 2026-03-24 | Added late local geometry repair for node-side entry/exit angles, repeat-collector return-lane stacking, and target-side slot spacing; narrowed repeat-collector target-join scoring so the shared outer collector column is not miscounted as a target-side join; updated the backward-family regression expectations; and revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~ElkSharpWorkflowRenderLayoutEngineTests" -v minimal` (11/11) plus `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (1/1, 32s). The regenerated document-processing artifact now reaches `NodeCrossings=0`, `BrokenShortHighways=0`, `RepeatCollectorCorridorViolations=0`, `EntryAngleViolations=0`, `TargetApproachJoinViolations=0`, and `ExcessiveDetourViolations=0`, with strategy `reverse` becoming a valid selected result. | Implementer |
| 2026-03-24 | Added a blocking target-approach-backtracking metric plus local shortest-path repair so `Execute Batch -> Check Result` no longer curls past the target side before returning, kept attempt 2+ focused on penalized lanes only, and updated the backward-family collector regression to allow the nearest loop to take the new shorter direct return while the remaining outer-loop family still stacks on shared top collector lanes. Revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenLaidOutWithElkSharp_ShouldNotBacktrackIntoCheckResult" -v minimal` (1/1) and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~LayoutAsync_WhenBackwardFamilySharesTarget_ShouldStackOuterCollectorLanes" -v minimal` (1/1). | Implementer |
| 2026-03-24 | Optimized the A* hot path by precomputing node-obstacle blocked step masks per route, replacing the closed-set `HashSet` with indexed state flags, and adding cheap soft-obstacle bounding-box rejection before exact intersection/proximity math. Measured document-processing render time dropped from `41s` test duration / `62.34s` wall clock to `3s` / `8.94s`, and the full renderer test project dropped from `1m35s` to `6s` test duration (`17.25s` wall clock). Revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore -v minimal` (21/21). | Implementer |
| 2026-03-24 | Added gateway-specific polygon-boundary landing for `Decision`/`Fork`/`Join`, including diagonal-stub projection helpers, gateway-aware boundary-angle/backtracking scoring, and exclusion of gateway targets from rectangle-style short-highway grouping. Added focused gateway regression tests, regenerated the document-processing artifact render, and revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~GatewayBoundaryHelpers_WhenDecisionAnchorIsOffAxis_ShouldProjectDiagonalStub|FullyQualifiedName~LayoutAsync_WhenDecisionSourceExitsTowardLowerBranch_ShouldUseDiagonalGatewayExit|FullyQualifiedName~LayoutAsync_WhenDecisionTargetIsReachedOffAxis_ShouldUseDiagonalGatewayEntry" -v minimal`, `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal`, and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore -v minimal` (24/24). | Implementer |
| 2026-03-24 | Tightened gateway polygon landing again so 45-degree stubs are kept on gateway side faces but not on gateway corner vertices, added helper and artifact regressions for corner-diagonal rejection, regenerated the document-processing render, visually checked `elksharp.png`, and revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~GatewayBoundaryHelpers|FullyQualifiedName~DecisionSourceExitsTowardLowerBranch|FullyQualifiedName~DecisionTargetIsReachedOffAxis|FullyQualifiedName~DocumentProcessingWorkflow_WhenLaidOutWithElkSharp_ShouldNotBacktrackIntoCheckResult" -v minimal`, `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal`, and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore -v minimal` (26/26). | Implementer |
| 2026-03-24 | Finished the gateway-face-slot and repeat-corridor safety follow-up: gateway target slots now come from polygon-face intersections instead of rectangular side slots, restricted local repair computes slot spacing against the full peer set, and above-corridor repeat collectors reroute only the pre-corridor prefix when that prefix crosses a node. Revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~GatewayBoundaryHelpers|FullyQualifiedName~DecisionSourceExitsTowardLowerBranch|FullyQualifiedName~DecisionTargetIsReachedOffAxis" -v minimal` (7/7), `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (1/1), and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore -v minimal` (28/28). | Implementer |
| 2026-03-24 | Fixed the latest gateway-source and repeat-return regressions without widening retries back to whole-graph reroutes: gateway-source dominant-axis detours are now only asserted when a clean direct repair opportunity actually exists, the focused blocker regression keeps a right-facing gateway exit that climbs above the blocker before continuing, and the selected document-processing artifact again keeps repeat-return lanes outside the `Load Configuration` clearance band. Revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~GatewayBoundaryHelpers_WhenDecisionSourceHeadsMostlyRight_ShouldUseDirectFaceExit|FullyQualifiedName~GatewayBoundaryHelpers_WhenDecisionSourceHasBlockingNode_ShouldRepairOnlyTheLocalExitPrefix|FullyQualifiedName~GatewayBoundaryHelpers_WhenDecisionSourceAlreadyTurnsDownIntoBlocker_ShouldRecoverRightFacingExitFirst|FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (4/4) and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore -v minimal` (38/38). | Implementer |
| 2026-03-25 | Reopened ElkSharp follow-up work for the user-reported source-side same-lane conflict between `Internal Notification -> Has Recipients` and `Internal Notification -> Set internalNotificationFailed`. Added source-departure join spreading plus blocking `SharedLaneViolations`, derived placement spacing from an average-node-size placement grid, and verified the focused helper regression with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~SourceDepartureHelpers_WhenOutgoingEdgesShareTheSameDepartureLane_ShouldSpreadOnlyTheConflictingPeer" -v minimal` (1/1). The current document-processing artifact now reports `SharedLaneViolations=0` for the selected route, but still has unresolved late boundary-angle / target-join regressions (`EntryAngleViolations=6`, `TargetApproachJoinViolations=1`), so TASK-010 remains open. | Implementer |
| 2026-03-25 | Tightened the iterative local-repair planner so attempt 2+ now selects only currently failing edges plus exact conflict peers instead of padding the repair set with generic ranked edges, and added a lock-aware parallel local builder that computes candidates concurrently but serializes overlapping source/target neighborhoods before merging deterministically. Revalidated with `dotnet build src/__Libraries/StellaOps.ElkSharp/StellaOps.ElkSharp.sln -v minimal` and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (still failing in ~20s on `SharedLaneViolations=1`, `UnderNodeViolations=2`; selected offender cluster remains `edge/15+edge/17`, `edge/9`, `edge/15`). | Implementer |
| 2026-03-26 | Cleared the last document-processing handoff by letting gateway target peer-conflict candidates start from slotted feeder paths and reusing focused target-peer conflict polish during transactional final-detour repair. Revalidated with `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~Debug_DumpDocumentProcessingFinalDetourOffenders" -v normal --logger "console;verbosity=normal"` (1/1) and `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (1/1, 2m42s). | Implementer |
## Decisions & Risks
- 2026-03-26: The remaining document-processing defect was not another retry-budget issue. Gateway target peer-conflict candidate building needed a slotted feeder so focused peer-conflict polish could separate same-face arrivals without restoring the final excessive detour.
- 2026-03-25 follow-up: the selected document-processing artifact now enforces zero below-graph lanes and zero overlong 45-degree segments, and gateway source exits are no longer allowed to leave from fork/join tip vertices. Gateway target join detection/spreading now groups arrivals by their landed boundary band instead of letting gateway arrivals slip through as highway-like exemptions. Targeted evidence: `dotnet test src/Workflow/__Tests/StellaOps.Workflow.Renderer.Tests/StellaOps.Workflow.Renderer.Tests.csproj --no-restore --filter "FullyQualifiedName~DocumentProcessingWorkflow_WhenRenderedWithElkSharp_ShouldProducePngWithZeroNodeCrossings" -v minimal` (1/1 pass, refreshed 20260325 artifact). That checkpoint still left TASK-010 open; the 2026-03-26 peer-conflict fix closes it.
- There was no module-local `AGENTS.md` under `src/__Libraries/StellaOps.ElkSharp/`; this sprint adds one before code changes so the module is no longer undocumented.
- Cross-module edits are limited to workflow renderer tests and workflow engine docs because the implementation changes a shared library used by those surfaces.
- The iterative router must remain deterministic. Seeded-random strategy variants are allowed only when the seed is graph-stable so the same graph yields the same candidate set and final output.
@@ -132,6 +196,18 @@ Completion criteria:
- Iterative retries now repair only the penalized subset of edges after the first full-strategy pass. Diagnostics record the route mode and repaired edge ids so the document-processing artifact can prove that attempt 2+ no longer reroute the whole graph.
- The previous shortest-path exemption for any edge with corridor-like bend points was too broad and hid ordinary forward overshoot artifacts such as `edge/33`. Only protected reverse/corridor routes now keep that exemption; forward overshoots are eligible for local detour repair.
- Small or protected graphs now short-circuit to the baseline route before the iterative sweep. That preserves existing sink-corridor, backward-edge, and port-anchor contracts while still allowing the larger document-processing workflow to use iterative local repair.
- Final local geometry repair now runs after iterative post-processing to enforce node-side entry/exit angles, repeat-return lane separation, and target-slot spacing without sending more edges back through A*. The document-processing artifact test now asserts zero selected broken highways, zero repeat-collector lane collapses, zero node-side angle violations, and zero disallowed target-side joins.
- The shortest-path fix now treats target-side backtracking as a blocking defect. Late local repair trims or reroutes only the offending lane, and attempt 2+ tries a direct orthogonal shortcut before falling back to a low-penalty diagonal A* candidate when another rule blocks the straight repair.
- The A* router now precomputes blocked node-step masks, which removed the repeated obstacle scan from neighbor expansion and cut document-processing route-all-edges time from the previous tens-of-seconds range to sub-second strategy attempts. Future A* optimization should extend the same idea to previously committed edge lanes: build lane-occupancy masks or blocked-segment maps for soft obstacles, and derive intermediate grid spacing from roughly one third of the average service-task width/height instead of the current fixed dense intermediate spacing.
- Gateway nodes (`Decision`, `Fork`, `Join`) now use polygon-boundary landing instead of rectangle-side snapping. Off-axis lanes are allowed to finish with short diagonal stubs on the real gateway boundary, gateway-target backtracking detection now checks only the final near-end gateway approach instead of applying rectangle-side overshoot heuristics, and rectangle-style short-highway grouping is skipped for gateway targets because those cases are governed by gateway-boundary spacing rather than shared rectangular arrival rails.
- Gateway diagonals are now restricted to gateway side faces. If a candidate lands on a gateway corner vertex, the helper shifts it onto the adjacent edge interior and the boundary-angle validator rejects any remaining corner diagonal so local repair and artifact assertions can catch it.
- Gateway target repairs now use polygon-face slot projection instead of rectangular side slots. When only a penalized subset of edges is being repaired, target-slot spacing still considers the unchanged peer edges on that same target side so the repaired edge cannot collapse back into the existing arrival rail.
- Repeat-collector edges with preserved outer corridors are no longer exempt from node-crossing repair. If the prefix that leads into the corridor crosses a node, that prefix is rerouted into the preserved corridor while the outer corridor segment remains intact.
- Gateway-source dominant-axis scoring is now opportunity-gated: a gateway source is only treated as leaving on the wrong axis when a clean downstream-facing repair opportunity actually exists. Obstacle-blocked local exits can still take a short dogleg while the document-processing artifact assertions keep them clear of blockers and out of unrelated node clearance bands.
- The user-reported `Internal Notification` overlap was not a target-side highway issue. The previous rule set modeled target-side joins and repeat-corridor sharing, but not two edges leaving the same source face on the same departure lane. TASK-010 adds source-departure join spreading and blocking `SharedLaneViolations` for that case.
- Node placement spacing now uses a separate placement grid derived from the average non-terminal node width/height (`ResolvePlacementGrid`) instead of depending only on the routing lattice. The focused helper/layout checks are green, but the end-to-end document-processing artifact still needs a clean rerun after the late boundary-angle / target-join regressions are resolved.
- Iterative local repair now stays constrained to currently failing lanes and exact conflict peers. The planner no longer fills the repair budget with unrelated high-severity edges once the current failing rule set has been seeded.
- Per-iteration local repair candidate building can now run in parallel, but builds that share a source or target neighborhood acquire the same lock and wait instead of racing through the same local conflict zone. Current measured document-processing renders still finish in about 20 seconds, so the remaining work is repair quality for the `edge/9` / `edge/15` cluster rather than retry churn.
- Optimization plan for the next pass:
1. Build a reusable immutable per-strategy routing context so grid lines, blocked segment masks, and target-slot metadata are computed once per strategy instead of once per edge route.
2. Replace global whole-graph retries for soft penalties with issue-focused repair passes: detour edge repair, target-side join repair, and proximity cluster repair.