git.stella-ops.org/docs/workflow/engine/17-elksharp-architectural-decisions.md

# ElkSharp Architectural Decisions

This document records architectural decisions made during the ElkSharp rendering
engine development. Each record follows the ADR (Architecture Decision Record)
format: context, decision, consequences.

---

## ADR-1: Short-Stub Exit Normalization

**Status**: Accepted

**Context**:
`NormalizeExitPath` creates a perpendicular stub from the source node boundary to
establish a clean exit direction. The default stub length extends to the anchor X
coordinate: `Math.Max(sourceX + 24, anchorX)`. For edges where the anchor is far
from the source (e.g., a long forward edge), this creates a horizontal segment of
1000+ pixels that crosses intermediate nodes in the same Y-band.

The long horizontal stub was originally designed to produce clean orthogonal exits,
but it assumed the Y-band between source and anchor was unoccupied. In dense graphs,
intermediate nodes occupy the same Y-band, and the long stub crosses through them,
creating entry-angle violations and node-crossing penalties.

**Decision**:
When the long stub fails `HasClearSourceExitSegment` (i.e., the horizontal segment
between `sourceX + 24` and `anchorX` crosses a node), try a short stub instead. The
short stub extends only `sourceX +/- 24px` -- just enough to establish the
perpendicular exit direction without reaching into the occupied Y-band.

The short stub is controlled by the `useShortStub` parameter in `NormalizeExitPath`.
It fires ONLY when the default long stub fails clearance. The long stub remains the
default because it produces cleaner, more direct paths when clearance is available.

**Consequences**:
- Fixes entry-angle violations where intermediate nodes in occupied Y-bands blocked
  the perpendicular exit path.
- The short stub creates a 24px vertical segment that subsequent routing can extend
  into a clean corridor without crossing obstacles.
- Does not change behavior for edges with clear exit paths (the long stub is still
  preferred when it passes clearance).

---

## ADR-2: Gateway Vertex Entries

**Status**: Accepted

**Context**:
Gateway tips (diamond corner vertices at left and right extremes) were blocked for
all edges because source exits from tips create "pin" visual artifacts -- a thin
spike extending from the corner that looks like a rendering glitch rather than an
intentional connection.

However, for target entries (incoming edges), tips are the natural convergence point.
Multiple edges arriving at a decision gateway naturally converge toward its left tip.
Blocking tip entries forced edges to route to face interiors, which required
additional bends, created shared-lane conflicts between fork output edges, and
produced visually cluttered arrivals.

**Decision**:
Allow left/right tip vertices for target entries via a 3-way coordination mechanism:

1. `IsAllowedGatewayTipVertex`: Returns `true` for left and right tip vertices when
   the edge direction is "target entry" (incoming).
2. `HasValidGatewayBoundaryAngle`: Accepts any external approach angle at allowed
   tip vertices. Without this relaxation, the angle validator would reject diagonal
   approaches to the tip even though they are visually correct.
3. `CountBoundarySlotViolations`: Skips the slot-occupancy check when all entries
   at a boundary point share the same allowed tip vertex. Since a vertex is a single
   geometric point (not a face segment), slot capacity is not meaningful.

Source exits from tips remain blocked by `ForceDecisionSourceExitOffVertex`.

**Consequences**:
- Eliminates shared-lane conflicts from fork output edges that were forced to
  route around blocked tips.
- Creates cleaner convergent target entries where multiple edges naturally meet
  at the gateway's leading tip.
- The 3-way coordination must stay synchronized: changing any one of the three
  checks without updating the others causes cascading boundary-slot violations,
  angle rejections, or vertex blocking.
- Source exits remain clean -- the "pin" artifact is prevented for outgoing edges.

---

## ADR-3: Y-Gutter Expansion

**Status**: Accepted

**Context**:
After Sugiyama placement and initial edge routing, some edges route through or
alongside nodes because the placement did not leave sufficient vertical space for
routing corridors.

Two prior approaches failed:

1. **Post-placement individual node shifting**: Moving individual nodes to create
   clearance disrupted the barycenter ordering that Sugiyama optimized. The shifted
   node changed the median calculations for adjacent layers, causing cascading
   position changes that degraded overall layout quality.

2. **Post-refinement clearance insertion**: Adding vertical space after refinement
   failed because subsequent optimization passes (compact-toward-incoming, grid
   alignment) overrode the inserted space, collapsing the corridors.

**Decision**:
Use the same pattern as X-gutter expansion: shift entire Y-bands (all nodes below
the violation point) together, preserving relative positions.

- Scan routed edges for horizontal segments with under-node or alongside violations.
- Identify the blocking node and compute the required clearance.
- Shift ALL nodes with Y > violation Y downward by the clearance amount.
- Re-route edges with the expanded corridors.
- Run up to 2 iterations to handle cascading violations.

The expansion runs after X-gutters (inter-layer gaps are set) and before compact
passes (so compaction respects the new corridors).

**Consequences**:
- Creates adequate routing corridors without disrupting within-layer ordering.
- Routing gets clean paths on the first pass because the corridors exist before
  the iterative optimizer runs.
- The downward-only shift direction ensures the graph grows in one direction,
  avoiding oscillation between iterations.
- Up to 2 iterations handles the case where fixing one violation exposes another
  (the shifted band may push edges into a new conflict zone).

---

## ADR-4: Corridor Rerouting for Long Sweeps

**Status**: Accepted

**Context**:
Forward edges spanning 10+ layers (e.g., failure/timeout paths from an early task
to the End event) route horizontally at the source's Y coordinate. In a dense graph,
this horizontal segment crosses many intermediate nodes -- a 3076px sweep in the
document-processing test case.

No amount of Y-adjustment can clear a sweep that crosses the entire graph width.
Y-gutter expansion would need to push the entire graph below the sweep, which
defeats the purpose of the layout.

Backward edges already use corridor routing (above the graph field) because they
inherently travel against the layout direction. Forward edges did not have this
treatment.

**Decision**:
Route long forward sweeps (spanning > 40% of the graph width) through the top
corridor at `graphMinY - 56`:

1. Exit the source with a 24px perpendicular stub.
2. Route vertically to `graphMinY - 56`.
3. Route horizontally across the top corridor.
4. Descend to the target.

The 24px perpendicular exit stub is critical: without it, `NormalizeBoundaryAngles`
collapses the vertical corridor segment back into the source boundary, destroying
the corridor route.

Near-boundary sweeps (edges that would conflict with the graph's lower edge) use
the bottom corridor at `graphMaxY + 32`.

**Consequences**:
- Long-range forward edges route cleanly above the graph field, like backward edges.
- The graph's visual area remains clear of long horizontal sweeps.
- The perpendicular exit stub (24px) must survive normalization -- removing it or
  reducing it below the normalization threshold causes the corridor route to
  collapse.
- Below-graph detection (`HasCorridorBendPoints`) must exempt corridor edges;
  otherwise they would be penalized and rerouted back into the node field.

---

## ADR-5: FinalScore Adjustment (Search/Display Separation)

**Status**: Accepted

**Context**:
The iterative optimization loop uses the scoring function as both a quality metric
AND a search heuristic. The score determines which candidates are explored and which
are accepted as improvements.

During development, borderline detection patterns were identified -- situations where
the scoring detected a "violation" that was actually a valid layout artifact (e.g.,
a gateway face approach that looks like a boundary-slot conflict but is geometrically
correct).

The initial fix was to update the detection logic to exclude these borderline cases.
However, this changed the scoring function that the search used as its heuristic,
altering the search trajectory and causing a 40-second speed regression (from 12s
to 52s) because the optimizer explored different (and more) candidates.

**Decision**:
Keep the original scoring function unchanged during the iterative search (stable
heuristic trajectory). Apply detection exclusions ONLY in the `FinalScore`
computation (post-search).

The FinalScore excludes:
- Valid gateway face approaches (exterior closer to center than predecessor).
- Gateway-exit under-node (lane within 16px of source bottom).
- Convergent target joins from X-separated sources with > 15px Y-gap.
- Borderline shared lanes (gap within 3px of tolerance).

The search does not need to know about borderline patterns -- it just needs
consistent heuristics to explore the candidate space efficiently.

**Consequences**:
- The FinalScore accurately reflects visual quality: 0 hard violations in the
  document-processing test case.
- The search maintains stable 12-15s runtime because the heuristic is unchanged.
- The separation means that the search may "fix" violations that the FinalScore
  would have excluded. This is acceptable: the extra fixes are not harmful, and
  the stable search trajectory is worth the minor redundant work.
- Future scoring changes must decide whether they apply to the search heuristic
  (affects trajectory and speed) or only to the FinalScore (affects reported quality).

---

## ADR-6: Under-Node Alongside Detection

**Status**: Accepted

**Context**:
`CountUnderNodeViolations` detected edges that pass through a node's bounding box
with a gap greater than 0.5px. This threshold was chosen to avoid false positives
from floating-point precision.

However, edges running flush with a node boundary (gap = 0px, e.g., exactly at the
bottom edge of a node) were not detected. These edges are visually "glued" to the
node boundary -- they appear to touch the node even though they technically do not
pass through it.

The 0.5px threshold also missed edges within a few pixels of the boundary. An edge
at gap = 2px is visually indistinguishable from one at gap = 0px at typical zoom
levels, but only the latter was detected.

**Decision**:
Extend the under-node detection to include flush and near-flush edges:

- Standard under-node: gap > 0.5px (unchanged).
- Flush bottom (`isFlushBottom`): gap >= -4px and <= 0.5px relative to the node's
  bottom boundary.
- Flush top (`isFlushTop`): gap >= -4px and <= 0.5px relative to the node's top
  boundary.

The +/-4px range catches edges that are visually "alongside" the node boundary,
even if they are technically outside the bounding box by a few pixels.

**Consequences**:
- Catches visually "glued" edges that touch or nearly touch node boundaries.
- The Y-gutter expansion then creates clearance for these edges, pushing them
  into a clean routing corridor.
- The -4px lower bound prevents false positives from edges that are merely
  "nearby" but visually separate from the node.
- The detection threshold (±4px for alongside, > 0.5px for standard) should not
  be changed without sprint-level approval, as it affects which edges trigger
  Y-gutter expansion.