Files
git.stella-ops.org/docs/modules/scheduler/hlc-ordering.md

6.0 KiB

Scheduler HLC Ordering Architecture

This document describes the Hybrid Logical Clock (HLC) based ordering system used by the StellaOps Scheduler for audit-safe job queue operations.

Overview

The Scheduler uses HLC timestamps instead of wall-clock time to ensure:

  1. Total ordering of jobs across distributed nodes
  2. Audit-safe sequencing with cryptographic chain linking
  3. Deterministic merge when offline nodes reconnect
  4. Clock skew tolerance in distributed deployments

HLC Timestamp Format

An HLC timestamp consists of three components:

(PhysicalTime, LogicalCounter, NodeId)
Component Description Example
PhysicalTime Unix milliseconds (UTC) 1704585600000
LogicalCounter Monotonic counter for same-millisecond events 0, 1, 2...
NodeId Unique identifier for the node scheduler-prod-01

String format: {physical}:{logical}:{nodeId} Example: 1704585600000:0:scheduler-prod-01

Database Schema

scheduler_log Table

CREATE TABLE scheduler.scheduler_log (
    id                  BIGSERIAL PRIMARY KEY,
    t_hlc               TEXT NOT NULL,           -- HLC timestamp
    job_id              TEXT NOT NULL,           -- Job identifier
    action              TEXT NOT NULL,           -- ENQUEUE, DEQUEUE, EXECUTE, COMPLETE, FAIL
    prev_chain_link     TEXT,                    -- Hash of previous entry
    chain_link          TEXT NOT NULL,           -- Hash of this entry
    payload             JSONB NOT NULL,          -- Job metadata
    tenant_id           TEXT NOT NULL,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_scheduler_log_hlc ON scheduler.scheduler_log (t_hlc);
CREATE INDEX idx_scheduler_log_tenant_hlc ON scheduler.scheduler_log (tenant_id, t_hlc);
CREATE INDEX idx_scheduler_log_job ON scheduler.scheduler_log (job_id);

batch_snapshot Table

CREATE TABLE scheduler.batch_snapshot (
    id                  BIGSERIAL PRIMARY KEY,
    snapshot_hlc        TEXT NOT NULL,           -- HLC at snapshot time
    from_chain_link     TEXT NOT NULL,           -- First entry in batch
    to_chain_link       TEXT NOT NULL,           -- Last entry in batch
    entry_count         INTEGER NOT NULL,
    merkle_root         TEXT NOT NULL,           -- Merkle root of entries
    dsse_envelope       JSONB,                   -- DSSE-signed attestation
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

chain_heads Table

CREATE TABLE scheduler.chain_heads (
    tenant_id           TEXT PRIMARY KEY,
    head_chain_link     TEXT NOT NULL,           -- Current chain head
    head_hlc            TEXT NOT NULL,           -- HLC of chain head
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Each log entry is cryptographically linked to its predecessor:

public static string ComputeChainLink(
    string tHlc,
    string jobId,
    string action,
    string? prevChainLink,
    string payloadDigest)
{
    using var hasher = IncrementalHash.CreateHash(HashAlgorithmName.SHA256);
    hasher.AppendData(Encoding.UTF8.GetBytes(tHlc));
    hasher.AppendData(Encoding.UTF8.GetBytes(jobId));
    hasher.AppendData(Encoding.UTF8.GetBytes(action));
    hasher.AppendData(Encoding.UTF8.GetBytes(prevChainLink ?? "genesis"));
    hasher.AppendData(Encoding.UTF8.GetBytes(payloadDigest));
    return Convert.ToHexString(hasher.GetHashAndReset()).ToLowerInvariant();
}

Configuration Options

# etc/scheduler.yaml
scheduler:
  hlc:
    enabled: true                    # Enable HLC ordering (default: true)
    nodeId: "scheduler-prod-01"      # Unique node identifier
    maxClockSkew: "00:00:05"         # Maximum tolerable clock skew (5 seconds)
    persistenceInterval: "00:01:00"  # HLC state persistence interval
    
  chain:
    enabled: true                    # Enable chain linking (default: true)
    batchSize: 1000                  # Entries per batch snapshot
    batchInterval: "00:05:00"        # Batch snapshot interval
    signSnapshots: true              # DSSE-sign batch snapshots
    keyId: "scheduler-signing-key"   # Key for snapshot signing

Operational Considerations

Clock Skew Handling

The HLC algorithm tolerates clock skew by:

  1. Advancing logical counter when physical time hasn't progressed
  2. Rejecting events with excessive clock skew (> maxClockSkew)
  3. Emitting hlc_clock_skew_rejections_total metric for monitoring

Alert: HlcClockSkewExceeded triggers when skew > tolerance.

Chain Verification

Verify chain integrity on startup and periodically:

# CLI command
stella scheduler chain verify --tenant-id <tenant>

# API endpoint
GET /api/v1/scheduler/chain/verify?tenantId=<tenant>

Offline Merge

When offline nodes reconnect:

  1. Export local job log as bundle
  2. Import on connected node
  3. HLC-based merge produces deterministic ordering
  4. Chain is extended with merged entries

See docs/operations/airgap-operations-runbook.md for details.

Metrics

Metric Type Description
hlc_ticks_total Counter Total HLC tick operations
hlc_clock_skew_rejections_total Counter Events rejected due to clock skew
hlc_physical_offset_seconds Gauge Current physical time offset
scheduler_chain_entries_total Counter Total chain log entries
scheduler_chain_verifications_total Counter Chain verification operations
scheduler_chain_verification_failures_total Counter Failed verifications
scheduler_batch_snapshots_total Counter Batch snapshots created

Grafana Dashboard

See devops/observability/grafana/hlc-queue-metrics.json for the HLC monitoring dashboard.