5.8 KiB
5.8 KiB
Identity Watchlist Monitoring Runbook
This runbook covers operational procedures for the Stella Ops identity watchlist monitoring system.
Service Overview
The identity watchlist monitor is a background service that:
- Monitors new Attestor entries in real-time (or via polling)
- Matches signer identities against configured watchlist patterns
- Emits alerts through the notification system
- Applies deduplication to prevent alert storms
Configuration
# appsettings.json
{
"Attestor": {
"Watchlist": {
"Enabled": true,
"Mode": "ChangeFeed", # or "Polling" for air-gap
"PollingInterval": "00:00:05", # 5 seconds
"MaxEventsPerSecond": 100,
"DefaultDedupWindowMinutes": 60,
"RegexTimeoutMs": 100,
"MaxWatchlistEntriesPerTenant": 1000,
"PatternCacheSize": 1000,
"InitialDelay": "00:00:10",
"NotifyChannelName": "attestor_entries_inserted"
}
}
}
Alert Triage Procedures
Critical Severity Alert
Response Time: Immediate (< 15 minutes)
- Acknowledge the alert in your incident management system
- Verify the matched identity in Rekor:
rekor-cli get --uuid <rekor-uuid> - Determine impact:
- What artifact was signed?
- Is this a known/expected signer?
- What systems consume this artifact?
- Escalate if malicious activity is confirmed
- Document findings in incident record
Warning Severity Alert
Response Time: Within 1 hour
- Review the alert details
- Check context:
- Is this a new legitimate workflow?
- Is the pattern too broad?
- Adjust watchlist entry if needed:
stella watchlist update <id> --severity info # or stella watchlist update <id> --enabled false - Document decision rationale
Info Severity Alert
Response Time: Next business day
- Review for patterns or trends
- Consider if alert should be disabled or tuned
- Archive after review
Performance Tuning
High Scan Latency
Symptom: attestor.watchlist.scan_latency_seconds > 10ms
Investigation:
- Check pattern cache hit rate:
SELECT COUNT(*) FROM attestor.identity_watchlist WHERE enabled = true; - Review regex patterns for complexity
- Check tenant watchlist count
Resolution:
- Increase
PatternCacheSizeif cache misses are high - Simplify complex regex patterns
- Consider splitting overly broad patterns
High Alert Volume
Symptom: attestor.watchlist.alerts_emitted_total growing rapidly
Investigation:
- Identify top-triggering entries:
stella watchlist alerts --since 1h --format json | jq 'group_by(.watchlistEntryId) | map({id: .[0].watchlistEntryId, count: length}) | sort_by(-.count)' - Check if pattern is too broad
Resolution:
- Narrow pattern scope
- Increase dedup window
- Reduce severity if appropriate
Database Performance
Symptom: Slow list/match queries
Investigation:
EXPLAIN ANALYZE
SELECT * FROM attestor.identity_watchlist
WHERE enabled = true AND (tenant_id = 'tenant-1' OR scope IN ('Global', 'System'));
Resolution:
- Verify indexes exist:
SELECT indexname FROM pg_indexes WHERE tablename = 'identity_watchlist'; - Run VACUUM ANALYZE if needed
- Consider partitioning for large deployments
Deduplication Table Maintenance
Cleanup Expired Records
Run periodically (daily recommended):
DELETE FROM attestor.identity_alert_dedup
WHERE last_alert_at < NOW() - INTERVAL '7 days';
Check Dedup Effectiveness
SELECT
watchlist_id,
COUNT(*) as suppressed_identities,
SUM(alert_count) as total_suppressions
FROM attestor.identity_alert_dedup
GROUP BY watchlist_id
ORDER BY total_suppressions DESC
LIMIT 10;
Air-Gap Operation
For environments without network access to PostgreSQL LISTEN/NOTIFY:
- Set
Mode: Pollingin configuration - Adjust
PollingIntervalbased on acceptable delay (default: 5s) - Ensure sufficient database connection pool size
- Monitor for missed entries during polling gaps
Disaster Recovery
Service Restart
- Entries are processed based on
IntegratedTimeUtc - On restart, the service resumes from last checkpoint
- Some duplicate alerts may occur during recovery (handled by dedup)
Database Failover
- Service will retry connections automatically
- Pattern cache survives in-memory during brief outages
- Long outages may require service restart
Watchlist Export/Import
Export:
stella watchlist list --include-global --format json > watchlist-backup.json
Import (manual):
# Process each entry and recreate
jq -c '.[]' watchlist-backup.json | while read entry; do
# Extract fields and call stella watchlist add
done
Metrics Reference
| Metric | Description | Alert Threshold |
|---|---|---|
attestor.watchlist.entries_scanned_total |
Processing volume | N/A (informational) |
attestor.watchlist.matches_total |
Match frequency | > 100/min (review patterns) |
attestor.watchlist.alerts_emitted_total |
Alert volume | > 50/min (check notification capacity) |
attestor.watchlist.alerts_suppressed_total |
Dedup effectiveness | High ratio = good dedup working |
attestor.watchlist.scan_latency_seconds |
Performance | p99 > 50ms (tune cache/patterns) |
Escalation Contacts
| Severity | Contact | Response SLA |
|---|---|---|
| Critical | On-call Security | 15 minutes |
| Warning | Security Team | 1 hour |
| Info | Security Analyst | Next business day |