sprints completion. new product advisories prepared

This commit is contained in:
master
2026-01-16 16:30:03 +02:00
parent a927d924e3
commit 4ca3ce8fb4
255 changed files with 42434 additions and 1020 deletions

View File

@@ -0,0 +1,262 @@
# Checkpoint Divergence Detection and Incident Response
This runbook covers the detection of Rekor checkpoint divergence, anomaly types, alert handling, and incident response procedures.
## Overview
Checkpoint divergence detection monitors the integrity of Rekor transparency logs by:
- Comparing root hashes at the same tree size
- Verifying tree size monotonicity (only increases)
- Cross-checking primary logs against mirrors
- Detecting stale or unresponsive logs
Divergence can indicate:
- Split-view attacks (malicious log server showing different trees to different clients)
- Rollback attacks (hiding recent log entries)
- Log compromise or key theft
- Network partitions or operational issues
## Detection Rules
| Check | Condition | Severity | Recommended Action |
|-------|-----------|----------|-------------------|
| Root hash mismatch | Same tree_size, different root_hash | CRITICAL | Quarantine + immediate investigation |
| Tree size rollback | new_tree_size < stored_tree_size | CRITICAL | Reject checkpoint + alert |
| Cross-log divergence | Primary root mirror root at same size | WARNING | Alert + investigate |
| Stale checkpoint | Checkpoint age > threshold | WARNING | Alert + monitor |
## Alert Payloads
### Root Hash Mismatch Alert
```json
{
"eventType": "rekor.checkpoint.divergence",
"severity": "critical",
"origin": "rekor.sigstore.dev",
"treeSize": 12345678,
"expectedRootHash": "sha256:abc123...",
"actualRootHash": "sha256:def456...",
"detectedAt": "2026-01-15T12:34:56Z",
"backend": "sigstore-prod",
"description": "Checkpoint root hash mismatch detected. Possible split-view attack.",
"recommendedAction": "Quarantine"
}
```
### Rollback Attempt Alert
```json
{
"eventType": "rekor.checkpoint.rollback",
"severity": "critical",
"origin": "rekor.sigstore.dev",
"previousTreeSize": 12345678,
"attemptedTreeSize": 12345600,
"detectedAt": "2026-01-15T12:34:56Z",
"description": "Tree size regression detected. Possible rollback attack."
}
```
### Cross-Log Divergence Alert
```json
{
"eventType": "rekor.checkpoint.cross_log_divergence",
"severity": "warning",
"primaryOrigin": "rekor.sigstore.dev",
"mirrorOrigin": "rekor.mirror.example.com",
"treeSize": 12345678,
"primaryRootHash": "sha256:abc123...",
"mirrorRootHash": "sha256:def456...",
"description": "Cross-log divergence detected between primary and mirror."
}
```
## Metrics
```
# Counter: total checkpoint mismatches
attestor_rekor_checkpoint_mismatch_total{backend="sigstore-prod",origin="rekor.sigstore.dev"} 0
# Counter: rollback attempts detected
attestor_rekor_checkpoint_rollback_detected_total{backend="sigstore-prod"} 0
# Counter: cross-log divergences detected
attestor_rekor_cross_log_divergence_total{primary="rekor.sigstore.dev",mirror="mirror.example.com"} 0
# Gauge: seconds since last valid checkpoint
attestor_rekor_checkpoint_age_seconds{backend="sigstore-prod"} 120
# Counter: total anomalies detected (all types)
attestor_rekor_anomalies_detected_total{type="RootHashMismatch",severity="critical"} 0
```
## Incident Response Procedures
### Level 1: Root Hash Mismatch (CRITICAL)
**Symptoms:**
- `attestor_rekor_checkpoint_mismatch_total` increments
- Alert received: "rekor.checkpoint.divergence"
**Immediate Actions:**
1. **Quarantine all affected proofs** - Do not rely on any inclusion proofs from the affected log until resolved
2. **Suspend automated verifications** - Halt any automated systems that depend on the log
3. **Preserve evidence** - Capture both checkpoints (expected and actual) with full metadata
4. **Alert security team** - This is a potential compromise indicator
**Investigation Steps:**
1. Verify the mismatch isn't a local storage corruption
```bash
stella attestor checkpoint verify --origin rekor.sigstore.dev --tree-size 12345678
```
2. Cross-check with independent sources (other clients, mirrors)
3. Check if Sigstore has published any incident reports
4. Review network logs for MITM indicators
**Resolution:**
- If confirmed attack: Follow security incident process
- If local corruption: Resync from trusted source
- If upstream issue: Wait for Sigstore remediation, follow their guidance
### Level 2: Tree Size Rollback (CRITICAL)
**Symptoms:**
- `attestor_rekor_checkpoint_rollback_detected_total` increments
- Alert received: "rekor.checkpoint.rollback"
**Immediate Actions:**
1. **Reject the checkpoint** - Do not accept or store it
2. **Log full details** for forensic analysis
3. **Check network path** - Could indicate MITM or DNS hijacking
**Investigation Steps:**
1. Verify current log state directly:
```bash
curl -s https://rekor.sigstore.dev/api/v1/log | jq .treeSize
```
2. Compare with stored latest tree size
3. Check DNS resolution and TLS certificate chain
**Resolution:**
- If network attack: Remediate network path, rotate credentials
- If temporary glitch: Monitor for repetition
- If persistent: Escalate to upstream provider
### Level 3: Cross-Log Divergence (WARNING)
**Symptoms:**
- `attestor_rekor_cross_log_divergence_total` increments
- Alert received: "rekor.checkpoint.cross_log_divergence"
**Immediate Actions:**
1. **Do not panic** - Mirrors may have legitimate lag
2. **Check mirror sync status** - May be catching up
**Investigation Steps:**
1. Compare tree sizes:
```bash
stella attestor checkpoint list --origins rekor.sigstore.dev,mirror.example.com
```
2. If same tree size with different roots: Escalate to CRITICAL
3. If different tree sizes: Allow time for sync
4. If persistent: Investigate mirror operator
**Resolution:**
- Sync lag: Monitor until caught up
- Persistent divergence: Disable mirror, investigate, or remove from trust list
### Level 4: Stale Checkpoint (WARNING)
**Symptoms:**
- `attestor_rekor_checkpoint_age_seconds` exceeds threshold
- Log health status: DEGRADED or UNHEALTHY
**Immediate Actions:**
1. Check log service status
2. Verify network connectivity to log
**Investigation Steps:**
1. Check Sigstore status page
2. Test direct API access:
```bash
curl -I https://rekor.sigstore.dev/api/v1/log
```
3. Review recent checkpoint fetch attempts
**Resolution:**
- Upstream outage: Wait, rely on cached data
- Local network issue: Restore connectivity
- Persistent: Consider failover to mirror
## Configuration
### Detector Options
```yaml
attestor:
divergenceDetection:
# Enable checkpoint monitoring
enabled: true
# Threshold for "stale checkpoint" warning
staleCheckpointThreshold: 1h
# Threshold for "stale tree size" (no growth)
staleTreeSizeThreshold: 2h
# Log health thresholds
degradedCheckpointAgeThreshold: 30m
unhealthyCheckpointAgeThreshold: 2h
# Enable cross-log consistency checks
enableCrossLogChecks: true
# Mirror origins to check against primary
mirrorOrigins:
- rekor.mirror.example.com
- rekor.mirror2.example.com
```
### Alert Options
```yaml
attestor:
alerts:
# Enable alert publishing to Notify service
enabled: true
# Default tenant for system alerts
defaultTenant: system
# Severity thresholds for alerting
alertOnHighSeverity: true
alertOnWarning: true
alertOnInfo: false
# Alert stream name
stream: attestor.alerts
```
## Runbook Checklist
### Daily Operations
- [ ] Verify `attestor_rekor_checkpoint_age_seconds` < threshold
- [ ] Check for any anomaly counter increments
- [ ] Review divergence detector logs for warnings
### Weekly Review
- [ ] Audit checkpoint storage integrity
- [ ] Verify mirror sync status
- [ ] Review and tune alerting thresholds
### Post-Incident
- [ ] Document root cause
- [ ] Update detection rules if needed
- [ ] Review and improve response procedures
- [ ] Share learnings with team
## See Also
- [Rekor Verification Design](../modules/attestor/rekor-verification-design.md)
- [Attestor Architecture](../modules/attestor/architecture.md)
- [Sigstore Rekor Documentation](https://docs.sigstore.dev/rekor/overview/)
- [Certificate Transparency RFC 6962](https://www.rfc-editor.org/rfc/rfc6962)