263 lines
7.9 KiB
Markdown
263 lines
7.9 KiB
Markdown
# Checkpoint Divergence Detection and Incident Response
|
|
|
|
This runbook covers the detection of Rekor checkpoint divergence, anomaly types, alert handling, and incident response procedures.
|
|
|
|
## Overview
|
|
|
|
Checkpoint divergence detection monitors the integrity of Rekor transparency logs by:
|
|
- Comparing root hashes at the same tree size
|
|
- Verifying tree size monotonicity (only increases)
|
|
- Cross-checking primary logs against mirrors
|
|
- Detecting stale or unresponsive logs
|
|
|
|
Divergence can indicate:
|
|
- Split-view attacks (malicious log server showing different trees to different clients)
|
|
- Rollback attacks (hiding recent log entries)
|
|
- Log compromise or key theft
|
|
- Network partitions or operational issues
|
|
|
|
## Detection Rules
|
|
|
|
| Check | Condition | Severity | Recommended Action |
|
|
|-------|-----------|----------|-------------------|
|
|
| Root hash mismatch | Same tree_size, different root_hash | CRITICAL | Quarantine + immediate investigation |
|
|
| Tree size rollback | new_tree_size < stored_tree_size | CRITICAL | Reject checkpoint + alert |
|
|
| Cross-log divergence | Primary root ≠ mirror root at same size | WARNING | Alert + investigate |
|
|
| Stale checkpoint | Checkpoint age > threshold | WARNING | Alert + monitor |
|
|
|
|
## Alert Payloads
|
|
|
|
### Root Hash Mismatch Alert
|
|
```json
|
|
{
|
|
"eventType": "rekor.checkpoint.divergence",
|
|
"severity": "critical",
|
|
"origin": "rekor.sigstore.dev",
|
|
"treeSize": 12345678,
|
|
"expectedRootHash": "sha256:abc123...",
|
|
"actualRootHash": "sha256:def456...",
|
|
"detectedAt": "2026-01-15T12:34:56Z",
|
|
"backend": "sigstore-prod",
|
|
"description": "Checkpoint root hash mismatch detected. Possible split-view attack.",
|
|
"recommendedAction": "Quarantine"
|
|
}
|
|
```
|
|
|
|
### Rollback Attempt Alert
|
|
```json
|
|
{
|
|
"eventType": "rekor.checkpoint.rollback",
|
|
"severity": "critical",
|
|
"origin": "rekor.sigstore.dev",
|
|
"previousTreeSize": 12345678,
|
|
"attemptedTreeSize": 12345600,
|
|
"detectedAt": "2026-01-15T12:34:56Z",
|
|
"description": "Tree size regression detected. Possible rollback attack."
|
|
}
|
|
```
|
|
|
|
### Cross-Log Divergence Alert
|
|
```json
|
|
{
|
|
"eventType": "rekor.checkpoint.cross_log_divergence",
|
|
"severity": "warning",
|
|
"primaryOrigin": "rekor.sigstore.dev",
|
|
"mirrorOrigin": "rekor.mirror.example.com",
|
|
"treeSize": 12345678,
|
|
"primaryRootHash": "sha256:abc123...",
|
|
"mirrorRootHash": "sha256:def456...",
|
|
"description": "Cross-log divergence detected between primary and mirror."
|
|
}
|
|
```
|
|
|
|
## Metrics
|
|
|
|
```
|
|
# Counter: total checkpoint mismatches
|
|
attestor_rekor_checkpoint_mismatch_total{backend="sigstore-prod",origin="rekor.sigstore.dev"} 0
|
|
|
|
# Counter: rollback attempts detected
|
|
attestor_rekor_checkpoint_rollback_detected_total{backend="sigstore-prod"} 0
|
|
|
|
# Counter: cross-log divergences detected
|
|
attestor_rekor_cross_log_divergence_total{primary="rekor.sigstore.dev",mirror="mirror.example.com"} 0
|
|
|
|
# Gauge: seconds since last valid checkpoint
|
|
attestor_rekor_checkpoint_age_seconds{backend="sigstore-prod"} 120
|
|
|
|
# Counter: total anomalies detected (all types)
|
|
attestor_rekor_anomalies_detected_total{type="RootHashMismatch",severity="critical"} 0
|
|
```
|
|
|
|
## Incident Response Procedures
|
|
|
|
### Level 1: Root Hash Mismatch (CRITICAL)
|
|
|
|
**Symptoms:**
|
|
- `attestor_rekor_checkpoint_mismatch_total` increments
|
|
- Alert received: "rekor.checkpoint.divergence"
|
|
|
|
**Immediate Actions:**
|
|
1. **Quarantine all affected proofs** - Do not rely on any inclusion proofs from the affected log until resolved
|
|
2. **Suspend automated verifications** - Halt any automated systems that depend on the log
|
|
3. **Preserve evidence** - Capture both checkpoints (expected and actual) with full metadata
|
|
4. **Alert security team** - This is a potential compromise indicator
|
|
|
|
**Investigation Steps:**
|
|
1. Verify the mismatch isn't a local storage corruption
|
|
```bash
|
|
stella attestor checkpoint verify --origin rekor.sigstore.dev --tree-size 12345678
|
|
```
|
|
2. Cross-check with independent sources (other clients, mirrors)
|
|
3. Check if Sigstore has published any incident reports
|
|
4. Review network logs for MITM indicators
|
|
|
|
**Resolution:**
|
|
- If confirmed attack: Follow security incident process
|
|
- If local corruption: Resync from trusted source
|
|
- If upstream issue: Wait for Sigstore remediation, follow their guidance
|
|
|
|
### Level 2: Tree Size Rollback (CRITICAL)
|
|
|
|
**Symptoms:**
|
|
- `attestor_rekor_checkpoint_rollback_detected_total` increments
|
|
- Alert received: "rekor.checkpoint.rollback"
|
|
|
|
**Immediate Actions:**
|
|
1. **Reject the checkpoint** - Do not accept or store it
|
|
2. **Log full details** for forensic analysis
|
|
3. **Check network path** - Could indicate MITM or DNS hijacking
|
|
|
|
**Investigation Steps:**
|
|
1. Verify current log state directly:
|
|
```bash
|
|
curl -s https://rekor.sigstore.dev/api/v1/log | jq .treeSize
|
|
```
|
|
2. Compare with stored latest tree size
|
|
3. Check DNS resolution and TLS certificate chain
|
|
|
|
**Resolution:**
|
|
- If network attack: Remediate network path, rotate credentials
|
|
- If temporary glitch: Monitor for repetition
|
|
- If persistent: Escalate to upstream provider
|
|
|
|
### Level 3: Cross-Log Divergence (WARNING)
|
|
|
|
**Symptoms:**
|
|
- `attestor_rekor_cross_log_divergence_total` increments
|
|
- Alert received: "rekor.checkpoint.cross_log_divergence"
|
|
|
|
**Immediate Actions:**
|
|
1. **Do not panic** - Mirrors may have legitimate lag
|
|
2. **Check mirror sync status** - May be catching up
|
|
|
|
**Investigation Steps:**
|
|
1. Compare tree sizes:
|
|
```bash
|
|
stella attestor checkpoint list --origins rekor.sigstore.dev,mirror.example.com
|
|
```
|
|
2. If same tree size with different roots: Escalate to CRITICAL
|
|
3. If different tree sizes: Allow time for sync
|
|
4. If persistent: Investigate mirror operator
|
|
|
|
**Resolution:**
|
|
- Sync lag: Monitor until caught up
|
|
- Persistent divergence: Disable mirror, investigate, or remove from trust list
|
|
|
|
### Level 4: Stale Checkpoint (WARNING)
|
|
|
|
**Symptoms:**
|
|
- `attestor_rekor_checkpoint_age_seconds` exceeds threshold
|
|
- Log health status: DEGRADED or UNHEALTHY
|
|
|
|
**Immediate Actions:**
|
|
1. Check log service status
|
|
2. Verify network connectivity to log
|
|
|
|
**Investigation Steps:**
|
|
1. Check Sigstore status page
|
|
2. Test direct API access:
|
|
```bash
|
|
curl -I https://rekor.sigstore.dev/api/v1/log
|
|
```
|
|
3. Review recent checkpoint fetch attempts
|
|
|
|
**Resolution:**
|
|
- Upstream outage: Wait, rely on cached data
|
|
- Local network issue: Restore connectivity
|
|
- Persistent: Consider failover to mirror
|
|
|
|
## Configuration
|
|
|
|
### Detector Options
|
|
|
|
```yaml
|
|
attestor:
|
|
divergenceDetection:
|
|
# Enable checkpoint monitoring
|
|
enabled: true
|
|
|
|
# Threshold for "stale checkpoint" warning
|
|
staleCheckpointThreshold: 1h
|
|
|
|
# Threshold for "stale tree size" (no growth)
|
|
staleTreeSizeThreshold: 2h
|
|
|
|
# Log health thresholds
|
|
degradedCheckpointAgeThreshold: 30m
|
|
unhealthyCheckpointAgeThreshold: 2h
|
|
|
|
# Enable cross-log consistency checks
|
|
enableCrossLogChecks: true
|
|
|
|
# Mirror origins to check against primary
|
|
mirrorOrigins:
|
|
- rekor.mirror.example.com
|
|
- rekor.mirror2.example.com
|
|
```
|
|
|
|
### Alert Options
|
|
|
|
```yaml
|
|
attestor:
|
|
alerts:
|
|
# Enable alert publishing to Notify service
|
|
enabled: true
|
|
|
|
# Default tenant for system alerts
|
|
defaultTenant: system
|
|
|
|
# Severity thresholds for alerting
|
|
alertOnHighSeverity: true
|
|
alertOnWarning: true
|
|
alertOnInfo: false
|
|
|
|
# Alert stream name
|
|
stream: attestor.alerts
|
|
```
|
|
|
|
## Runbook Checklist
|
|
|
|
### Daily Operations
|
|
- [ ] Verify `attestor_rekor_checkpoint_age_seconds` < threshold
|
|
- [ ] Check for any anomaly counter increments
|
|
- [ ] Review divergence detector logs for warnings
|
|
|
|
### Weekly Review
|
|
- [ ] Audit checkpoint storage integrity
|
|
- [ ] Verify mirror sync status
|
|
- [ ] Review and tune alerting thresholds
|
|
|
|
### Post-Incident
|
|
- [ ] Document root cause
|
|
- [ ] Update detection rules if needed
|
|
- [ ] Review and improve response procedures
|
|
- [ ] Share learnings with team
|
|
|
|
## See Also
|
|
|
|
- [Rekor Verification Design](../modules/attestor/rekor-verification-design.md)
|
|
- [Attestor Architecture](../modules/attestor/architecture.md)
|
|
- [Sigstore Rekor Documentation](https://docs.sigstore.dev/rekor/overview/)
|
|
- [Certificate Transparency RFC 6962](https://www.rfc-editor.org/rfc/rfc6962)
|