tests fixes and some product advisories tunes ups

This commit is contained in:
master
2026-01-30 07:57:43 +02:00
parent 644887997c
commit 55744f6a39
345 changed files with 26290 additions and 2267 deletions

View File

@@ -0,0 +1,214 @@
# Identity Watchlist Monitoring Runbook
This runbook covers operational procedures for the Stella Ops identity watchlist monitoring system.
## Service Overview
The identity watchlist monitor is a background service that:
1. Monitors new Attestor entries in real-time (or via polling)
2. Matches signer identities against configured watchlist patterns
3. Emits alerts through the notification system
4. Applies deduplication to prevent alert storms
### Configuration
```yaml
# appsettings.json
{
"Attestor": {
"Watchlist": {
"Enabled": true,
"Mode": "ChangeFeed", # or "Polling" for air-gap
"PollingInterval": "00:00:05", # 5 seconds
"MaxEventsPerSecond": 100,
"DefaultDedupWindowMinutes": 60,
"RegexTimeoutMs": 100,
"MaxWatchlistEntriesPerTenant": 1000,
"PatternCacheSize": 1000,
"InitialDelay": "00:00:10",
"NotifyChannelName": "attestor_entries_inserted"
}
}
}
```
## Alert Triage Procedures
### Critical Severity Alert
**Response Time**: Immediate (< 15 minutes)
1. **Acknowledge** the alert in your incident management system
2. **Verify** the matched identity in Rekor:
```bash
rekor-cli get --uuid <rekor-uuid>
```
3. **Determine impact**:
- What artifact was signed?
- Is this a known/expected signer?
- What systems consume this artifact?
4. **Escalate** if malicious activity is confirmed
5. **Document** findings in incident record
### Warning Severity Alert
**Response Time**: Within 1 hour
1. **Review** the alert details
2. **Check context**:
- Is this a new legitimate workflow?
- Is the pattern too broad?
3. **Adjust** watchlist entry if needed:
```bash
stella watchlist update <id> --severity info
# or
stella watchlist update <id> --enabled false
```
4. **Document** decision rationale
### Info Severity Alert
**Response Time**: Next business day
1. **Review** for patterns or trends
2. **Consider** if alert should be disabled or tuned
3. **Archive** after review
## Performance Tuning
### High Scan Latency
**Symptom**: `attestor.watchlist.scan_latency_seconds` > 10ms
**Investigation**:
1. Check pattern cache hit rate:
```sql
SELECT COUNT(*) FROM attestor.identity_watchlist WHERE enabled = true;
```
2. Review regex patterns for complexity
3. Check tenant watchlist count
**Resolution**:
- Increase `PatternCacheSize` if cache misses are high
- Simplify complex regex patterns
- Consider splitting overly broad patterns
### High Alert Volume
**Symptom**: `attestor.watchlist.alerts_emitted_total` growing rapidly
**Investigation**:
1. Identify top-triggering entries:
```bash
stella watchlist alerts --since 1h --format json | jq 'group_by(.watchlistEntryId) | map({id: .[0].watchlistEntryId, count: length}) | sort_by(-.count)'
```
2. Check if pattern is too broad
**Resolution**:
- Narrow pattern scope
- Increase dedup window
- Reduce severity if appropriate
### Database Performance
**Symptom**: Slow list/match queries
**Investigation**:
```sql
EXPLAIN ANALYZE
SELECT * FROM attestor.identity_watchlist
WHERE enabled = true AND (tenant_id = 'tenant-1' OR scope IN ('Global', 'System'));
```
**Resolution**:
- Verify indexes exist:
```sql
SELECT indexname FROM pg_indexes WHERE tablename = 'identity_watchlist';
```
- Run VACUUM ANALYZE if needed
- Consider partitioning for large deployments
## Deduplication Table Maintenance
### Cleanup Expired Records
Run periodically (daily recommended):
```sql
DELETE FROM attestor.identity_alert_dedup
WHERE last_alert_at < NOW() - INTERVAL '7 days';
```
### Check Dedup Effectiveness
```sql
SELECT
watchlist_id,
COUNT(*) as suppressed_identities,
SUM(alert_count) as total_suppressions
FROM attestor.identity_alert_dedup
GROUP BY watchlist_id
ORDER BY total_suppressions DESC
LIMIT 10;
```
## Air-Gap Operation
For environments without network access to PostgreSQL LISTEN/NOTIFY:
1. Set `Mode: Polling` in configuration
2. Adjust `PollingInterval` based on acceptable delay (default: 5s)
3. Ensure sufficient database connection pool size
4. Monitor for missed entries during polling gaps
## Disaster Recovery
### Service Restart
1. Entries are processed based on `IntegratedTimeUtc`
2. On restart, the service resumes from last checkpoint
3. Some duplicate alerts may occur during recovery (handled by dedup)
### Database Failover
1. Service will retry connections automatically
2. Pattern cache survives in-memory during brief outages
3. Long outages may require service restart
### Watchlist Export/Import
Export:
```bash
stella watchlist list --include-global --format json > watchlist-backup.json
```
Import (manual):
```bash
# Process each entry and recreate
jq -c '.[]' watchlist-backup.json | while read entry; do
# Extract fields and call stella watchlist add
done
```
## Metrics Reference
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `attestor.watchlist.entries_scanned_total` | Processing volume | N/A (informational) |
| `attestor.watchlist.matches_total` | Match frequency | > 100/min (review patterns) |
| `attestor.watchlist.alerts_emitted_total` | Alert volume | > 50/min (check notification capacity) |
| `attestor.watchlist.alerts_suppressed_total` | Dedup effectiveness | High ratio = good dedup working |
| `attestor.watchlist.scan_latency_seconds` | Performance | p99 > 50ms (tune cache/patterns) |
## Escalation Contacts
| Severity | Contact | Response SLA |
|----------|---------|--------------|
| Critical | On-call Security | 15 minutes |
| Warning | Security Team | 1 hour |
| Info | Security Analyst | Next business day |
## Related Documents
- [Identity Watchlist User Guide](../modules/attestor/guides/identity-watchlist.md)
- [Attestor Architecture](../modules/attestor/architecture.md)
- [Notification System](../modules/notify/architecture.md)