# Checkpoint Divergence Detection and Incident Response This runbook covers the detection of Rekor checkpoint divergence, anomaly types, alert handling, and incident response procedures. ## Overview Checkpoint divergence detection monitors the integrity of Rekor transparency logs by: - Comparing root hashes at the same tree size - Verifying tree size monotonicity (only increases) - Cross-checking primary logs against mirrors - Detecting stale or unresponsive logs Divergence can indicate: - Split-view attacks (malicious log server showing different trees to different clients) - Rollback attacks (hiding recent log entries) - Log compromise or key theft - Network partitions or operational issues ## Detection Rules | Check | Condition | Severity | Recommended Action | |-------|-----------|----------|-------------------| | Root hash mismatch | Same tree_size, different root_hash | CRITICAL | Quarantine + immediate investigation | | Tree size rollback | new_tree_size < stored_tree_size | CRITICAL | Reject checkpoint + alert | | Cross-log divergence | Primary root ≠ mirror root at same size | WARNING | Alert + investigate | | Stale checkpoint | Checkpoint age > threshold | WARNING | Alert + monitor | ## Alert Payloads ### Root Hash Mismatch Alert ```json { "eventType": "rekor.checkpoint.divergence", "severity": "critical", "origin": "rekor.sigstore.dev", "treeSize": 12345678, "expectedRootHash": "sha256:abc123...", "actualRootHash": "sha256:def456...", "detectedAt": "2026-01-15T12:34:56Z", "backend": "sigstore-prod", "description": "Checkpoint root hash mismatch detected. Possible split-view attack.", "recommendedAction": "Quarantine" } ``` ### Rollback Attempt Alert ```json { "eventType": "rekor.checkpoint.rollback", "severity": "critical", "origin": "rekor.sigstore.dev", "previousTreeSize": 12345678, "attemptedTreeSize": 12345600, "detectedAt": "2026-01-15T12:34:56Z", "description": "Tree size regression detected. Possible rollback attack." } ``` ### Cross-Log Divergence Alert ```json { "eventType": "rekor.checkpoint.cross_log_divergence", "severity": "warning", "primaryOrigin": "rekor.sigstore.dev", "mirrorOrigin": "rekor.mirror.example.com", "treeSize": 12345678, "primaryRootHash": "sha256:abc123...", "mirrorRootHash": "sha256:def456...", "description": "Cross-log divergence detected between primary and mirror." } ``` ## Metrics ``` # Counter: total checkpoint mismatches attestor_rekor_checkpoint_mismatch_total{backend="sigstore-prod",origin="rekor.sigstore.dev"} 0 # Counter: rollback attempts detected attestor_rekor_checkpoint_rollback_detected_total{backend="sigstore-prod"} 0 # Counter: cross-log divergences detected attestor_rekor_cross_log_divergence_total{primary="rekor.sigstore.dev",mirror="mirror.example.com"} 0 # Gauge: seconds since last valid checkpoint attestor_rekor_checkpoint_age_seconds{backend="sigstore-prod"} 120 # Counter: total anomalies detected (all types) attestor_rekor_anomalies_detected_total{type="RootHashMismatch",severity="critical"} 0 ``` ## Incident Response Procedures ### Level 1: Root Hash Mismatch (CRITICAL) **Symptoms:** - `attestor_rekor_checkpoint_mismatch_total` increments - Alert received: "rekor.checkpoint.divergence" **Immediate Actions:** 1. **Quarantine all affected proofs** - Do not rely on any inclusion proofs from the affected log until resolved 2. **Suspend automated verifications** - Halt any automated systems that depend on the log 3. **Preserve evidence** - Capture both checkpoints (expected and actual) with full metadata 4. **Alert security team** - This is a potential compromise indicator **Investigation Steps:** 1. Verify the mismatch isn't a local storage corruption ```bash stella attestor checkpoint verify --origin rekor.sigstore.dev --tree-size 12345678 ``` 2. Cross-check with independent sources (other clients, mirrors) 3. Check if Sigstore has published any incident reports 4. Review network logs for MITM indicators **Resolution:** - If confirmed attack: Follow security incident process - If local corruption: Resync from trusted source - If upstream issue: Wait for Sigstore remediation, follow their guidance ### Level 2: Tree Size Rollback (CRITICAL) **Symptoms:** - `attestor_rekor_checkpoint_rollback_detected_total` increments - Alert received: "rekor.checkpoint.rollback" **Immediate Actions:** 1. **Reject the checkpoint** - Do not accept or store it 2. **Log full details** for forensic analysis 3. **Check network path** - Could indicate MITM or DNS hijacking **Investigation Steps:** 1. Verify current log state directly: ```bash curl -s https://rekor.sigstore.dev/api/v1/log | jq .treeSize ``` 2. Compare with stored latest tree size 3. Check DNS resolution and TLS certificate chain **Resolution:** - If network attack: Remediate network path, rotate credentials - If temporary glitch: Monitor for repetition - If persistent: Escalate to upstream provider ### Level 3: Cross-Log Divergence (WARNING) **Symptoms:** - `attestor_rekor_cross_log_divergence_total` increments - Alert received: "rekor.checkpoint.cross_log_divergence" **Immediate Actions:** 1. **Do not panic** - Mirrors may have legitimate lag 2. **Check mirror sync status** - May be catching up **Investigation Steps:** 1. Compare tree sizes: ```bash stella attestor checkpoint list --origins rekor.sigstore.dev,mirror.example.com ``` 2. If same tree size with different roots: Escalate to CRITICAL 3. If different tree sizes: Allow time for sync 4. If persistent: Investigate mirror operator **Resolution:** - Sync lag: Monitor until caught up - Persistent divergence: Disable mirror, investigate, or remove from trust list ### Level 4: Stale Checkpoint (WARNING) **Symptoms:** - `attestor_rekor_checkpoint_age_seconds` exceeds threshold - Log health status: DEGRADED or UNHEALTHY **Immediate Actions:** 1. Check log service status 2. Verify network connectivity to log **Investigation Steps:** 1. Check Sigstore status page 2. Test direct API access: ```bash curl -I https://rekor.sigstore.dev/api/v1/log ``` 3. Review recent checkpoint fetch attempts **Resolution:** - Upstream outage: Wait, rely on cached data - Local network issue: Restore connectivity - Persistent: Consider failover to mirror ## Configuration ### Detector Options ```yaml attestor: divergenceDetection: # Enable checkpoint monitoring enabled: true # Threshold for "stale checkpoint" warning staleCheckpointThreshold: 1h # Threshold for "stale tree size" (no growth) staleTreeSizeThreshold: 2h # Log health thresholds degradedCheckpointAgeThreshold: 30m unhealthyCheckpointAgeThreshold: 2h # Enable cross-log consistency checks enableCrossLogChecks: true # Mirror origins to check against primary mirrorOrigins: - rekor.mirror.example.com - rekor.mirror2.example.com ``` ### Alert Options ```yaml attestor: alerts: # Enable alert publishing to Notify service enabled: true # Default tenant for system alerts defaultTenant: system # Severity thresholds for alerting alertOnHighSeverity: true alertOnWarning: true alertOnInfo: false # Alert stream name stream: attestor.alerts ``` ## Runbook Checklist ### Daily Operations - [ ] Verify `attestor_rekor_checkpoint_age_seconds` < threshold - [ ] Check for any anomaly counter increments - [ ] Review divergence detector logs for warnings ### Weekly Review - [ ] Audit checkpoint storage integrity - [ ] Verify mirror sync status - [ ] Review and tune alerting thresholds ### Post-Incident - [ ] Document root cause - [ ] Update detection rules if needed - [ ] Review and improve response procedures - [ ] Share learnings with team ## See Also - [Rekor Verification Design](../modules/attestor/rekor-verification-design.md) - [Attestor Architecture](../modules/attestor/architecture.md) - [Sigstore Rekor Documentation](https://docs.sigstore.dev/rekor/overview/) - [Certificate Transparency RFC 6962](https://www.rfc-editor.org/rfc/rfc6962)