# HLC Queue Ordering Migration Guide This guide describes how to enable HLC (Hybrid Logical Clock) ordering for the Scheduler queue, transitioning from legacy `(priority, created_at)` ordering to HLC-based ordering with cryptographic chain linking. ## Overview HLC ordering provides: - **Deterministic global ordering**: Causal consistency across distributed nodes - **Cryptographic chain linking**: Audit-safe job sequence proofs - **Reproducible processing**: Same input produces same chain ## Prerequisites 1. PostgreSQL 16+ with the scheduler schema 2. HLC library dependency (`StellaOps.HybridLogicalClock`) 3. Schema migration `002_hlc_queue_chain.sql` applied ## Migration Phases ### Phase 1: Deploy with Dual-Write Mode Enable dual-write to populate the new `scheduler_log` table without affecting existing operations. ```yaml # appsettings.yaml or environment configuration Scheduler: Queue: Hlc: EnableHlcOrdering: false # Keep using legacy ordering for reads DualWriteMode: true # Write to both legacy and HLC tables ``` ```csharp // Program.cs or Startup.cs services.AddOptions() .Bind(configuration.GetSection("Scheduler:Queue")) .ValidateDataAnnotations() .ValidateOnStart(); // Register HLC services services.AddHlcSchedulerServices(); // Register HLC clock services.AddSingleton(sp => { var nodeId = Environment.MachineName; // or use a stable node identifier return new HybridLogicalClock(nodeId, TimeProvider.System); }); ``` **Verification:** - Monitor `scheduler_hlc_enqueues_total` metric for dual-write activity - Verify `scheduler_log` table is being populated - Check chain verification passes: `scheduler_chain_verifications_total{result="valid"}` ### Phase 2: Backfill Historical Data (Optional) If you need historical jobs in the HLC chain, backfill from the existing `scheduler.jobs` table: ```sql -- Backfill script (run during maintenance window) -- Note: This creates a new chain starting from historical data -- The chain will not have valid prev_link values for historical entries INSERT INTO scheduler.scheduler_log ( tenant_id, t_hlc, partition_key, job_id, payload_hash, prev_link, link ) SELECT tenant_id, -- Generate synthetic HLC timestamps based on created_at -- Format: YYYYMMDDHHMMSS-nodeid-counter TO_CHAR(created_at AT TIME ZONE 'UTC', 'YYYYMMDDHH24MISS') || '-backfill-' || LPAD(ROW_NUMBER() OVER (PARTITION BY tenant_id ORDER BY created_at)::TEXT, 6, '0'), COALESCE(project_id, ''), id, DECODE(payload_digest, 'hex'), NULL, -- No chain linking for historical data DECODE(payload_digest, 'hex') -- Use payload_digest as link placeholder FROM scheduler.jobs WHERE status IN ('pending', 'scheduled', 'running') AND NOT EXISTS ( SELECT 1 FROM scheduler.scheduler_log sl WHERE sl.job_id = jobs.id ) ORDER BY tenant_id, created_at; ``` ### Phase 3: Enable HLC Ordering for Reads Once dual-write is stable and backfill (if needed) is complete: ```yaml Scheduler: Queue: Hlc: EnableHlcOrdering: true # Use HLC ordering for reads DualWriteMode: true # Keep dual-write during transition VerifyOnDequeue: false # Optional: enable for extra validation ``` **Verification:** - Monitor dequeue latency (should be similar to legacy) - Verify job processing order matches HLC order - Check chain integrity periodically ### Phase 4: Disable Dual-Write Mode Once confident in HLC ordering: ```yaml Scheduler: Queue: Hlc: EnableHlcOrdering: true DualWriteMode: false # Stop writing to legacy table VerifyOnDequeue: false ``` ## Configuration Reference ### SchedulerHlcOptions | Property | Type | Default | Description | |----------|------|---------|-------------| | `EnableHlcOrdering` | bool | false | Use HLC ordering for queue reads | | `DualWriteMode` | bool | false | Write to both legacy and HLC tables | | `VerifyOnDequeue` | bool | false | Verify chain integrity on each dequeue | | `MaxClockDriftMs` | int | 60000 | Maximum allowed clock drift in milliseconds | ## Metrics | Metric | Type | Description | |--------|------|-------------| | `scheduler_hlc_enqueues_total` | Counter | Total HLC enqueue operations | | `scheduler_hlc_enqueue_deduplicated_total` | Counter | Deduplicated enqueue operations | | `scheduler_hlc_enqueue_duration_seconds` | Histogram | Enqueue operation duration | | `scheduler_hlc_dequeues_total` | Counter | Total HLC dequeue operations | | `scheduler_hlc_dequeued_entries_total` | Counter | Total entries dequeued | | `scheduler_chain_verifications_total` | Counter | Chain verification operations | | `scheduler_chain_verification_issues_total` | Counter | Chain verification issues found | | `scheduler_batch_snapshots_created_total` | Counter | Batch snapshots created | ## Troubleshooting ### Chain Verification Failures If chain verification reports issues: 1. Check `scheduler_chain_verification_issues_total` for issue count 2. Query the log for specific issues: ```csharp var result = await chainVerifier.VerifyAsync(tenantId); foreach (var issue in result.Issues) { logger.LogError( "Chain issue at job {JobId}: {Type} - {Description}", issue.JobId, issue.IssueType, issue.Description); } ``` 3. Common causes: - Database corruption: Restore from backup - Concurrent writes without proper locking: Check transaction isolation - Clock drift: Verify `MaxClockDriftMs` setting ### Performance Considerations - **Index usage**: Ensure `idx_scheduler_log_tenant_hlc` is being used - **Chain head caching**: The `chain_heads` table provides O(1) access to latest link - **Batch sizes**: Adjust dequeue batch size based on workload ## Rollback Procedure To rollback to legacy ordering: ```yaml Scheduler: Queue: Hlc: EnableHlcOrdering: false DualWriteMode: false ``` The `scheduler_log` table can be retained for audit purposes or dropped if no longer needed. ## Related Documentation - [Scheduler Architecture](architecture.md) - [HLC Library Documentation](../../__Libraries/StellaOps.HybridLogicalClock/README.md) - [Product Advisory: Audit-safe Job Queue Ordering](../../product/advisories/audit-safe-job-queue-ordering.md)