6.2 KiB
HLC Queue Ordering Migration Guide
This guide describes how to enable HLC (Hybrid Logical Clock) ordering for the Scheduler queue, transitioning from legacy (priority, created_at) ordering to HLC-based ordering with cryptographic chain linking.
Overview
HLC ordering provides:
- Deterministic global ordering: Causal consistency across distributed nodes
- Cryptographic chain linking: Audit-safe job sequence proofs
- Reproducible processing: Same input produces same chain
Prerequisites
- PostgreSQL 16+ with the scheduler schema
- HLC library dependency (
StellaOps.HybridLogicalClock) - Schema migration
002_hlc_queue_chain.sqlapplied
Migration Phases
Phase 1: Deploy with Dual-Write Mode
Enable dual-write to populate the new scheduler_log table without affecting existing operations.
# appsettings.yaml or environment configuration
Scheduler:
Queue:
Hlc:
EnableHlcOrdering: false # Keep using legacy ordering for reads
DualWriteMode: true # Write to both legacy and HLC tables
// Program.cs or Startup.cs
services.AddOptions<SchedulerQueueOptions>()
.Bind(configuration.GetSection("Scheduler:Queue"))
.ValidateDataAnnotations()
.ValidateOnStart();
// Register HLC services
services.AddHlcSchedulerServices();
// Register HLC clock
services.AddSingleton<IHybridLogicalClock>(sp =>
{
var nodeId = Environment.MachineName; // or use a stable node identifier
return new HybridLogicalClock(nodeId, TimeProvider.System);
});
Verification:
- Monitor
scheduler_hlc_enqueues_totalmetric for dual-write activity - Verify
scheduler_logtable is being populated - Check chain verification passes:
scheduler_chain_verifications_total{result="valid"}
Phase 2: Backfill Historical Data (Optional)
If you need historical jobs in the HLC chain, backfill from the existing scheduler.jobs table:
-- Backfill script (run during maintenance window)
-- Note: This creates a new chain starting from historical data
-- The chain will not have valid prev_link values for historical entries
INSERT INTO scheduler.scheduler_log (
tenant_id, t_hlc, partition_key, job_id, payload_hash, prev_link, link
)
SELECT
tenant_id,
-- Generate synthetic HLC timestamps based on created_at
-- Format: YYYYMMDDHHMMSS-nodeid-counter
TO_CHAR(created_at AT TIME ZONE 'UTC', 'YYYYMMDDHH24MISS') || '-backfill-' ||
LPAD(ROW_NUMBER() OVER (PARTITION BY tenant_id ORDER BY created_at)::TEXT, 6, '0'),
COALESCE(project_id, ''),
id,
DECODE(payload_digest, 'hex'),
NULL, -- No chain linking for historical data
DECODE(payload_digest, 'hex') -- Use payload_digest as link placeholder
FROM scheduler.jobs
WHERE status IN ('pending', 'scheduled', 'running')
AND NOT EXISTS (
SELECT 1 FROM scheduler.scheduler_log sl
WHERE sl.job_id = jobs.id
)
ORDER BY tenant_id, created_at;
Phase 3: Enable HLC Ordering for Reads
Once dual-write is stable and backfill (if needed) is complete:
Scheduler:
Queue:
Hlc:
EnableHlcOrdering: true # Use HLC ordering for reads
DualWriteMode: true # Keep dual-write during transition
VerifyOnDequeue: false # Optional: enable for extra validation
Verification:
- Monitor dequeue latency (should be similar to legacy)
- Verify job processing order matches HLC order
- Check chain integrity periodically
Phase 4: Disable Dual-Write Mode
Once confident in HLC ordering:
Scheduler:
Queue:
Hlc:
EnableHlcOrdering: true
DualWriteMode: false # Stop writing to legacy table
VerifyOnDequeue: false
Configuration Reference
SchedulerHlcOptions
| Property | Type | Default | Description |
|---|---|---|---|
EnableHlcOrdering |
bool | false | Use HLC ordering for queue reads |
DualWriteMode |
bool | false | Write to both legacy and HLC tables |
VerifyOnDequeue |
bool | false | Verify chain integrity on each dequeue |
MaxClockDriftMs |
int | 60000 | Maximum allowed clock drift in milliseconds |
Metrics
| Metric | Type | Description |
|---|---|---|
scheduler_hlc_enqueues_total |
Counter | Total HLC enqueue operations |
scheduler_hlc_enqueue_deduplicated_total |
Counter | Deduplicated enqueue operations |
scheduler_hlc_enqueue_duration_seconds |
Histogram | Enqueue operation duration |
scheduler_hlc_dequeues_total |
Counter | Total HLC dequeue operations |
scheduler_hlc_dequeued_entries_total |
Counter | Total entries dequeued |
scheduler_chain_verifications_total |
Counter | Chain verification operations |
scheduler_chain_verification_issues_total |
Counter | Chain verification issues found |
scheduler_batch_snapshots_created_total |
Counter | Batch snapshots created |
Troubleshooting
Chain Verification Failures
If chain verification reports issues:
-
Check
scheduler_chain_verification_issues_totalfor issue count -
Query the log for specific issues:
var result = await chainVerifier.VerifyAsync(tenantId); foreach (var issue in result.Issues) { logger.LogError( "Chain issue at job {JobId}: {Type} - {Description}", issue.JobId, issue.IssueType, issue.Description); } -
Common causes:
- Database corruption: Restore from backup
- Concurrent writes without proper locking: Check transaction isolation
- Clock drift: Verify
MaxClockDriftMssetting
Performance Considerations
- Index usage: Ensure
idx_scheduler_log_tenant_hlcis being used - Chain head caching: The
chain_headstable provides O(1) access to latest link - Batch sizes: Adjust dequeue batch size based on workload
Rollback Procedure
To rollback to legacy ordering:
Scheduler:
Queue:
Hlc:
EnableHlcOrdering: false
DualWriteMode: false
The scheduler_log table can be retained for audit purposes or dropped if no longer needed.