7.9 KiB
7.9 KiB
Checkpoint Divergence Detection and Incident Response
This runbook covers the detection of Rekor checkpoint divergence, anomaly types, alert handling, and incident response procedures.
Overview
Checkpoint divergence detection monitors the integrity of Rekor transparency logs by:
- Comparing root hashes at the same tree size
- Verifying tree size monotonicity (only increases)
- Cross-checking primary logs against mirrors
- Detecting stale or unresponsive logs
Divergence can indicate:
- Split-view attacks (malicious log server showing different trees to different clients)
- Rollback attacks (hiding recent log entries)
- Log compromise or key theft
- Network partitions or operational issues
Detection Rules
| Check | Condition | Severity | Recommended Action |
|---|---|---|---|
| Root hash mismatch | Same tree_size, different root_hash | CRITICAL | Quarantine + immediate investigation |
| Tree size rollback | new_tree_size < stored_tree_size | CRITICAL | Reject checkpoint + alert |
| Cross-log divergence | Primary root ≠ mirror root at same size | WARNING | Alert + investigate |
| Stale checkpoint | Checkpoint age > threshold | WARNING | Alert + monitor |
Alert Payloads
Root Hash Mismatch Alert
{
"eventType": "rekor.checkpoint.divergence",
"severity": "critical",
"origin": "rekor.sigstore.dev",
"treeSize": 12345678,
"expectedRootHash": "sha256:abc123...",
"actualRootHash": "sha256:def456...",
"detectedAt": "2026-01-15T12:34:56Z",
"backend": "sigstore-prod",
"description": "Checkpoint root hash mismatch detected. Possible split-view attack.",
"recommendedAction": "Quarantine"
}
Rollback Attempt Alert
{
"eventType": "rekor.checkpoint.rollback",
"severity": "critical",
"origin": "rekor.sigstore.dev",
"previousTreeSize": 12345678,
"attemptedTreeSize": 12345600,
"detectedAt": "2026-01-15T12:34:56Z",
"description": "Tree size regression detected. Possible rollback attack."
}
Cross-Log Divergence Alert
{
"eventType": "rekor.checkpoint.cross_log_divergence",
"severity": "warning",
"primaryOrigin": "rekor.sigstore.dev",
"mirrorOrigin": "rekor.mirror.example.com",
"treeSize": 12345678,
"primaryRootHash": "sha256:abc123...",
"mirrorRootHash": "sha256:def456...",
"description": "Cross-log divergence detected between primary and mirror."
}
Metrics
# Counter: total checkpoint mismatches
attestor_rekor_checkpoint_mismatch_total{backend="sigstore-prod",origin="rekor.sigstore.dev"} 0
# Counter: rollback attempts detected
attestor_rekor_checkpoint_rollback_detected_total{backend="sigstore-prod"} 0
# Counter: cross-log divergences detected
attestor_rekor_cross_log_divergence_total{primary="rekor.sigstore.dev",mirror="mirror.example.com"} 0
# Gauge: seconds since last valid checkpoint
attestor_rekor_checkpoint_age_seconds{backend="sigstore-prod"} 120
# Counter: total anomalies detected (all types)
attestor_rekor_anomalies_detected_total{type="RootHashMismatch",severity="critical"} 0
Incident Response Procedures
Level 1: Root Hash Mismatch (CRITICAL)
Symptoms:
attestor_rekor_checkpoint_mismatch_totalincrements- Alert received: "rekor.checkpoint.divergence"
Immediate Actions:
- Quarantine all affected proofs - Do not rely on any inclusion proofs from the affected log until resolved
- Suspend automated verifications - Halt any automated systems that depend on the log
- Preserve evidence - Capture both checkpoints (expected and actual) with full metadata
- Alert security team - This is a potential compromise indicator
Investigation Steps:
- Verify the mismatch isn't a local storage corruption
stella attestor checkpoint verify --origin rekor.sigstore.dev --tree-size 12345678 - Cross-check with independent sources (other clients, mirrors)
- Check if Sigstore has published any incident reports
- Review network logs for MITM indicators
Resolution:
- If confirmed attack: Follow security incident process
- If local corruption: Resync from trusted source
- If upstream issue: Wait for Sigstore remediation, follow their guidance
Level 2: Tree Size Rollback (CRITICAL)
Symptoms:
attestor_rekor_checkpoint_rollback_detected_totalincrements- Alert received: "rekor.checkpoint.rollback"
Immediate Actions:
- Reject the checkpoint - Do not accept or store it
- Log full details for forensic analysis
- Check network path - Could indicate MITM or DNS hijacking
Investigation Steps:
- Verify current log state directly:
curl -s https://rekor.sigstore.dev/api/v1/log | jq .treeSize - Compare with stored latest tree size
- Check DNS resolution and TLS certificate chain
Resolution:
- If network attack: Remediate network path, rotate credentials
- If temporary glitch: Monitor for repetition
- If persistent: Escalate to upstream provider
Level 3: Cross-Log Divergence (WARNING)
Symptoms:
attestor_rekor_cross_log_divergence_totalincrements- Alert received: "rekor.checkpoint.cross_log_divergence"
Immediate Actions:
- Do not panic - Mirrors may have legitimate lag
- Check mirror sync status - May be catching up
Investigation Steps:
- Compare tree sizes:
stella attestor checkpoint list --origins rekor.sigstore.dev,mirror.example.com - If same tree size with different roots: Escalate to CRITICAL
- If different tree sizes: Allow time for sync
- If persistent: Investigate mirror operator
Resolution:
- Sync lag: Monitor until caught up
- Persistent divergence: Disable mirror, investigate, or remove from trust list
Level 4: Stale Checkpoint (WARNING)
Symptoms:
attestor_rekor_checkpoint_age_secondsexceeds threshold- Log health status: DEGRADED or UNHEALTHY
Immediate Actions:
- Check log service status
- Verify network connectivity to log
Investigation Steps:
- Check Sigstore status page
- Test direct API access:
curl -I https://rekor.sigstore.dev/api/v1/log - Review recent checkpoint fetch attempts
Resolution:
- Upstream outage: Wait, rely on cached data
- Local network issue: Restore connectivity
- Persistent: Consider failover to mirror
Configuration
Detector Options
attestor:
divergenceDetection:
# Enable checkpoint monitoring
enabled: true
# Threshold for "stale checkpoint" warning
staleCheckpointThreshold: 1h
# Threshold for "stale tree size" (no growth)
staleTreeSizeThreshold: 2h
# Log health thresholds
degradedCheckpointAgeThreshold: 30m
unhealthyCheckpointAgeThreshold: 2h
# Enable cross-log consistency checks
enableCrossLogChecks: true
# Mirror origins to check against primary
mirrorOrigins:
- rekor.mirror.example.com
- rekor.mirror2.example.com
Alert Options
attestor:
alerts:
# Enable alert publishing to Notify service
enabled: true
# Default tenant for system alerts
defaultTenant: system
# Severity thresholds for alerting
alertOnHighSeverity: true
alertOnWarning: true
alertOnInfo: false
# Alert stream name
stream: attestor.alerts
Runbook Checklist
Daily Operations
- Verify
attestor_rekor_checkpoint_age_seconds< threshold - Check for any anomaly counter increments
- Review divergence detector logs for warnings
Weekly Review
- Audit checkpoint storage integrity
- Verify mirror sync status
- Review and tune alerting thresholds
Post-Incident
- Document root cause
- Update detection rules if needed
- Review and improve response procedures
- Share learnings with team