Files
git.stella-ops.org/docs/modules/airgap/guides/job-sync-offline.md

219 lines
6.5 KiB
Markdown

# HLC Job Sync Offline Operations
Sprint: SPRINT_20260105_002_003_ROUTER
This document describes the offline job synchronization mechanism using Hybrid Logical Clock (HLC) ordering for air-gap scenarios.
## Overview
When nodes operate in disconnected/offline mode, scheduled jobs are enqueued locally with HLC timestamps. Upon reconnection or air-gap transfer, these job logs are merged deterministically to maintain global ordering.
Key features:
- **Deterministic ordering**: Jobs merge by HLC total order `(T_hlc.PhysicalTime, T_hlc.LogicalCounter, NodeId, JobId)`
- **Chain integrity**: Each entry links to the previous via `link = Hash(prev_link || job_id || t_hlc || payload_hash)`
- **Conflict-free**: Same payload = same JobId (deterministic), so duplicates are safely dropped
- **Audit trail**: Source node ID and original links preserved for traceability
## CLI Commands
### Export Job Logs
Export offline job logs to a sync bundle for air-gap transfer:
```bash
# Export job logs for a tenant
stella airgap jobs export --tenant my-tenant -o job-sync-bundle.json
# Export with verbose output
stella airgap jobs export --tenant my-tenant -o bundle.json --verbose
# Export as JSON for automation
stella airgap jobs export --tenant my-tenant --json
```
Options:
- `--tenant, -t` - Tenant ID (defaults to "default")
- `--output, -o` - Output file path
- `--node` - Export specific node only (default: current node)
- `--sign` - Sign bundle with DSSE
- `--json` - Output result as JSON
- `--verbose` - Enable verbose logging
### Import Job Logs
Import a job sync bundle from air-gap transfer:
```bash
# Verify bundle without importing
stella airgap jobs import bundle.json --verify-only
# Import bundle
stella airgap jobs import bundle.json
# Force import despite validation issues
stella airgap jobs import bundle.json --force
# Import with JSON output for automation
stella airgap jobs import bundle.json --json
```
Options:
- `bundle` - Path to job sync bundle file (required)
- `--verify-only` - Only verify the bundle without importing
- `--force` - Force import even if validation fails
- `--json` - Output result as JSON
- `--verbose` - Enable verbose logging
### List Available Bundles
List job sync bundles in a directory:
```bash
# List bundles in current directory
stella airgap jobs list
# List bundles in specific directory
stella airgap jobs list --source /path/to/bundles
# Output as JSON
stella airgap jobs list --json
```
Options:
- `--source, -s` - Source directory (default: current directory)
- `--json` - Output result as JSON
- `--verbose` - Enable verbose logging
## Bundle Format
Job sync bundles are JSON files with the following structure:
```json
{
"bundleId": "guid",
"tenantId": "string",
"createdAt": "ISO8601",
"createdByNodeId": "string",
"manifestDigest": "sha256:hex",
"signature": "base64 (optional)",
"signedBy": "keyId (optional)",
"jobLogs": [
{
"nodeId": "string",
"lastHlc": "HLC timestamp string",
"chainHead": "base64",
"entries": [
{
"nodeId": "string",
"tHlc": "HLC timestamp string",
"jobId": "guid",
"partitionKey": "string (optional)",
"payload": "JSON string",
"payloadHash": "base64",
"prevLink": "base64 (null for first)",
"link": "base64",
"enqueuedAt": "ISO8601"
}
]
}
]
}
```
## Validation
Bundle validation checks:
1. **Manifest digest**: Recomputes digest from job logs and compares
2. **Chain integrity**: Verifies each entry's prev_link matches expected
3. **Link verification**: Recomputes links and verifies against stored values
4. **Chain head**: Verifies last entry link matches node's chain head
## Merge Algorithm
When importing bundles from multiple nodes:
1. **Collect**: Gather all entries from all node logs
2. **Sort**: Order by HLC total order `(PhysicalTime, LogicalCounter, NodeId, JobId)`
3. **Deduplicate**: Same JobId = same payload (drop later duplicates)
4. **Recompute chain**: Build unified chain from merged entries
This produces a deterministic ordering regardless of import sequence.
## Conflict Resolution
| Scenario | Resolution |
|----------|------------|
| Same JobId, same payload, different HLC | Take earliest HLC, drop duplicates |
| Same JobId, different payloads | Error - indicates bug in deterministic ID computation |
## Metrics
The following metrics are emitted:
| Metric | Type | Description |
|--------|------|-------------|
| `airgap_bundles_exported_total` | Counter | Total bundles exported |
| `airgap_bundles_imported_total` | Counter | Total bundles imported |
| `airgap_jobs_synced_total` | Counter | Total jobs synced |
| `airgap_duplicates_dropped_total` | Counter | Duplicates dropped during merge |
| `airgap_merge_conflicts_total` | Counter | Merge conflicts by type |
| `airgap_offline_enqueues_total` | Counter | Offline enqueue operations |
| `airgap_bundle_size_bytes` | Histogram | Bundle size distribution |
| `airgap_sync_duration_seconds` | Histogram | Sync operation duration |
| `airgap_merge_entries_count` | Histogram | Entries per merge operation |
## Service Registration
To use job sync in your application:
```csharp
// Register core services
services.AddAirGapSyncServices(nodeId: "my-node-id");
// Register file-based transport (for air-gap)
services.AddFileBasedJobSyncTransport();
// Or router-based transport (for connected scenarios)
services.AddRouterJobSyncTransport();
// Register sync service (requires ISyncSchedulerLogRepository)
services.AddAirGapSyncImportService();
```
## Operational Runbook
### Pre-Export Checklist
- [ ] Node has offline job logs to export
- [ ] Target path is writable
- [ ] Signing key available (if --sign used)
### Pre-Import Checklist
- [ ] Bundle file accessible
- [ ] Bundle signature verified (if signed)
- [ ] Scheduler database accessible
- [ ] Sufficient disk space
### Recovery Procedures
**Chain validation failure:**
1. Identify which entry has chain break
2. Check for data corruption in bundle
3. Re-export from source node if possible
4. Use `--force` only if data loss is acceptable
**Duplicate conflict:**
1. This is expected - duplicates are safely dropped
2. Check duplicate count in output
3. Verify merged jobs match expected count
**Payload mismatch (same JobId, different payloads):**
1. This indicates a bug - same idempotency key should produce same payload
2. Review job generation logic
3. Do not force import - fix root cause
## See Also
- [Air-Gap Operations](operations.md)
- [Mirror Bundles](mirror-bundles.md)
- [Staleness and Time](staleness-and-time.md)