Files
git.stella-ops.org/docs/modules/scheduler/hlc-ordering.md

177 lines
6.0 KiB
Markdown

# Scheduler HLC Ordering Architecture
This document describes the Hybrid Logical Clock (HLC) based ordering system used by the StellaOps Scheduler for audit-safe job queue operations.
## Overview
The Scheduler uses HLC timestamps instead of wall-clock time to ensure:
1. **Total ordering** of jobs across distributed nodes
2. **Audit-safe sequencing** with cryptographic chain linking
3. **Deterministic merge** when offline nodes reconnect
4. **Clock skew tolerance** in distributed deployments
## HLC Timestamp Format
An HLC timestamp consists of three components:
```
(PhysicalTime, LogicalCounter, NodeId)
```
| Component | Description | Example |
|-----------|-------------|---------|
| PhysicalTime | Unix milliseconds (UTC) | `1704585600000` |
| LogicalCounter | Monotonic counter for same-millisecond events | `0`, `1`, `2`... |
| NodeId | Unique identifier for the node | `scheduler-prod-01` |
**String format:** `{physical}:{logical}:{nodeId}`
Example: `1704585600000:0:scheduler-prod-01`
## Database Schema
### scheduler_log Table
```sql
CREATE TABLE scheduler.scheduler_log (
id BIGSERIAL PRIMARY KEY,
t_hlc TEXT NOT NULL, -- HLC timestamp
job_id TEXT NOT NULL, -- Job identifier
action TEXT NOT NULL, -- ENQUEUE, DEQUEUE, EXECUTE, COMPLETE, FAIL
prev_chain_link TEXT, -- Hash of previous entry
chain_link TEXT NOT NULL, -- Hash of this entry
payload JSONB NOT NULL, -- Job metadata
tenant_id TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_scheduler_log_hlc ON scheduler.scheduler_log (t_hlc);
CREATE INDEX idx_scheduler_log_tenant_hlc ON scheduler.scheduler_log (tenant_id, t_hlc);
CREATE INDEX idx_scheduler_log_job ON scheduler.scheduler_log (job_id);
```
### batch_snapshot Table
```sql
CREATE TABLE scheduler.batch_snapshot (
id BIGSERIAL PRIMARY KEY,
snapshot_hlc TEXT NOT NULL, -- HLC at snapshot time
from_chain_link TEXT NOT NULL, -- First entry in batch
to_chain_link TEXT NOT NULL, -- Last entry in batch
entry_count INTEGER NOT NULL,
merkle_root TEXT NOT NULL, -- Merkle root of entries
dsse_envelope JSONB, -- DSSE-signed attestation
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
### chain_heads Table
```sql
CREATE TABLE scheduler.chain_heads (
tenant_id TEXT PRIMARY KEY,
head_chain_link TEXT NOT NULL, -- Current chain head
head_hlc TEXT NOT NULL, -- HLC of chain head
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
## Chain Link Computation
Each log entry is cryptographically linked to its predecessor:
```csharp
public static string ComputeChainLink(
string tHlc,
string jobId,
string action,
string? prevChainLink,
string payloadDigest)
{
using var hasher = IncrementalHash.CreateHash(HashAlgorithmName.SHA256);
hasher.AppendData(Encoding.UTF8.GetBytes(tHlc));
hasher.AppendData(Encoding.UTF8.GetBytes(jobId));
hasher.AppendData(Encoding.UTF8.GetBytes(action));
hasher.AppendData(Encoding.UTF8.GetBytes(prevChainLink ?? "genesis"));
hasher.AppendData(Encoding.UTF8.GetBytes(payloadDigest));
return Convert.ToHexString(hasher.GetHashAndReset()).ToLowerInvariant();
}
```
## Configuration Options
```yaml
# etc/scheduler.yaml
scheduler:
hlc:
enabled: true # Enable HLC ordering (default: true)
nodeId: "scheduler-prod-01" # Unique node identifier
maxClockSkew: "00:00:05" # Maximum tolerable clock skew (5 seconds)
persistenceInterval: "00:01:00" # HLC state persistence interval
chain:
enabled: true # Enable chain linking (default: true)
batchSize: 1000 # Entries per batch snapshot
batchInterval: "00:05:00" # Batch snapshot interval
signSnapshots: true # DSSE-sign batch snapshots
keyId: "scheduler-signing-key" # Key for snapshot signing
```
## Operational Considerations
### Clock Skew Handling
The HLC algorithm tolerates clock skew by:
1. Advancing logical counter when physical time hasn't progressed
2. Rejecting events with excessive clock skew (> `maxClockSkew`)
3. Emitting `hlc_clock_skew_rejections_total` metric for monitoring
**Alert:** `HlcClockSkewExceeded` triggers when skew > tolerance.
### Chain Verification
Verify chain integrity on startup and periodically:
```bash
# CLI command
stella scheduler chain verify --tenant-id <tenant>
# API endpoint
GET /api/v1/scheduler/chain/verify?tenantId=<tenant>
```
### Offline Merge
When offline nodes reconnect:
1. Export local job log as bundle
2. Import on connected node
3. HLC-based merge produces deterministic ordering
4. Chain is extended with merged entries
See `docs/operations/airgap-operations-runbook.md` for details.
## Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `hlc_ticks_total` | Counter | Total HLC tick operations |
| `hlc_clock_skew_rejections_total` | Counter | Events rejected due to clock skew |
| `hlc_physical_offset_seconds` | Gauge | Current physical time offset |
| `scheduler_chain_entries_total` | Counter | Total chain log entries |
| `scheduler_chain_verifications_total` | Counter | Chain verification operations |
| `scheduler_chain_verification_failures_total` | Counter | Failed verifications |
| `scheduler_batch_snapshots_total` | Counter | Batch snapshots created |
## Grafana Dashboard
See `devops/observability/grafana/hlc-queue-metrics.json` for the HLC monitoring dashboard.
## Related Documentation
- [HLC Core Library](../../../src/__Libraries/StellaOps.HybridLogicalClock/README.md)
- [HLC Migration Guide](./hlc-migration-guide.md)
- [Air-Gap Operations Runbook](../../operations/airgap-operations-runbook.md)
- [HLC Troubleshooting](../../operations/runbooks/hlc-troubleshooting.md)