# HLC Queue Ordering Migration Guide

This guide describes how to enable HLC (Hybrid Logical Clock) ordering for the Scheduler queue, transitioning from legacy `(priority, created_at)` ordering to HLC-based ordering with cryptographic chain linking.

## Overview

HLC ordering provides:
- **Deterministic global ordering**: Causal consistency across distributed nodes
- **Cryptographic chain linking**: Audit-safe job sequence proofs
- **Reproducible processing**: Same input produces same chain

## Prerequisites

1. PostgreSQL 16+ with the scheduler schema
2. HLC library dependency (`StellaOps.HybridLogicalClock`)
3. Schema migration `002_hlc_queue_chain.sql` applied

## Migration Phases

### Phase 1: Deploy with Dual-Write Mode

Enable dual-write to populate the new `scheduler_log` table without affecting existing operations.

```yaml
# appsettings.yaml or environment configuration
Scheduler:
  Queue:
    Hlc:
      EnableHlcOrdering: false  # Keep using legacy ordering for reads
      DualWriteMode: true       # Write to both legacy and HLC tables
```

```csharp
// Program.cs or Startup.cs
services.AddOptions<SchedulerQueueOptions>()
    .Bind(configuration.GetSection("Scheduler:Queue"))
    .ValidateDataAnnotations()
    .ValidateOnStart();

// Register HLC services
services.AddHlcSchedulerServices();

// Register HLC clock
services.AddSingleton<IHybridLogicalClock>(sp =>
{
    var nodeId = Environment.MachineName; // or use a stable node identifier
    return new HybridLogicalClock(nodeId, TimeProvider.System);
});
```

**Verification:**
- Monitor `scheduler_hlc_enqueues_total` metric for dual-write activity
- Verify `scheduler_log` table is being populated
- Check chain verification passes: `scheduler_chain_verifications_total{result="valid"}`

### Phase 2: Backfill Historical Data (Optional)

If you need historical jobs in the HLC chain, backfill from the existing `scheduler.jobs` table:

```sql
-- Backfill script (run during maintenance window)
-- Note: This creates a new chain starting from historical data
-- The chain will not have valid prev_link values for historical entries

INSERT INTO scheduler.scheduler_log (
    tenant_id, t_hlc, partition_key, job_id, payload_hash, prev_link, link
)
SELECT
    tenant_id,
    -- Generate synthetic HLC timestamps based on created_at
    -- Format: YYYYMMDDHHMMSS-nodeid-counter
    TO_CHAR(created_at AT TIME ZONE 'UTC', 'YYYYMMDDHH24MISS') || '-backfill-' ||
        LPAD(ROW_NUMBER() OVER (PARTITION BY tenant_id ORDER BY created_at)::TEXT, 6, '0'),
    COALESCE(project_id, ''),
    id,
    DECODE(payload_digest, 'hex'),
    NULL,  -- No chain linking for historical data
    DECODE(payload_digest, 'hex')  -- Use payload_digest as link placeholder
FROM scheduler.jobs
WHERE status IN ('pending', 'scheduled', 'running')
  AND NOT EXISTS (
    SELECT 1 FROM scheduler.scheduler_log sl
    WHERE sl.job_id = jobs.id
  )
ORDER BY tenant_id, created_at;
```

### Phase 3: Enable HLC Ordering for Reads

Once dual-write is stable and backfill (if needed) is complete:

```yaml
Scheduler:
  Queue:
    Hlc:
      EnableHlcOrdering: true   # Use HLC ordering for reads
      DualWriteMode: true       # Keep dual-write during transition
      VerifyOnDequeue: false    # Optional: enable for extra validation
```

**Verification:**
- Monitor dequeue latency (should be similar to legacy)
- Verify job processing order matches HLC order
- Check chain integrity periodically

### Phase 4: Disable Dual-Write Mode

Once confident in HLC ordering:

```yaml
Scheduler:
  Queue:
    Hlc:
      EnableHlcOrdering: true
      DualWriteMode: false      # Stop writing to legacy table
      VerifyOnDequeue: false
```

## Configuration Reference

### SchedulerHlcOptions

| Property | Type | Default | Description |
|----------|------|---------|-------------|
| `EnableHlcOrdering` | bool | false | Use HLC ordering for queue reads |
| `DualWriteMode` | bool | false | Write to both legacy and HLC tables |
| `VerifyOnDequeue` | bool | false | Verify chain integrity on each dequeue |
| `MaxClockDriftMs` | int | 60000 | Maximum allowed clock drift in milliseconds |

## Metrics

| Metric | Type | Description |
|--------|------|-------------|
| `scheduler_hlc_enqueues_total` | Counter | Total HLC enqueue operations |
| `scheduler_hlc_enqueue_deduplicated_total` | Counter | Deduplicated enqueue operations |
| `scheduler_hlc_enqueue_duration_seconds` | Histogram | Enqueue operation duration |
| `scheduler_hlc_dequeues_total` | Counter | Total HLC dequeue operations |
| `scheduler_hlc_dequeued_entries_total` | Counter | Total entries dequeued |
| `scheduler_chain_verifications_total` | Counter | Chain verification operations |
| `scheduler_chain_verification_issues_total` | Counter | Chain verification issues found |
| `scheduler_batch_snapshots_created_total` | Counter | Batch snapshots created |

## Troubleshooting

### Chain Verification Failures

If chain verification reports issues:

1. Check `scheduler_chain_verification_issues_total` for issue count
2. Query the log for specific issues:
   ```csharp
   var result = await chainVerifier.VerifyAsync(tenantId);
   foreach (var issue in result.Issues)
   {
       logger.LogError(
           "Chain issue at job {JobId}: {Type} - {Description}",
           issue.JobId, issue.IssueType, issue.Description);
   }
   ```

3. Common causes:
   - Database corruption: Restore from backup
   - Concurrent writes without proper locking: Check transaction isolation
   - Clock drift: Verify `MaxClockDriftMs` setting

### Performance Considerations

- **Index usage**: Ensure `idx_scheduler_log_tenant_hlc` is being used
- **Chain head caching**: The `chain_heads` table provides O(1) access to latest link
- **Batch sizes**: Adjust dequeue batch size based on workload

## Rollback Procedure

To rollback to legacy ordering:

```yaml
Scheduler:
  Queue:
    Hlc:
      EnableHlcOrdering: false
      DualWriteMode: false
```

The `scheduler_log` table can be retained for audit purposes or dropped if no longer needed.

## Related Documentation

- [Scheduler Architecture](architecture.md)
- [HLC Library Documentation](../../__Libraries/StellaOps.HybridLogicalClock/README.md)
- [Product Advisory: Audit-safe Job Queue Ordering](../../product/advisories/audit-safe-job-queue-ordering.md)