synergy moats product advisory implementations
This commit is contained in:
195
docs/operations/runbooks/scanner-registry-auth.md
Normal file
195
docs/operations/runbooks/scanner-registry-auth.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Runbook: Scanner - Registry Authentication Failures
|
||||
|
||||
> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
|
||||
> **Task:** RUN-002 - Scanner Runbooks
|
||||
|
||||
## Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Component** | Scanner |
|
||||
| **Severity** | High |
|
||||
| **On-call scope** | Platform team, Security team |
|
||||
| **Last updated** | 2026-01-17 |
|
||||
| **Doctor check** | `check.scanner.registry-auth` |
|
||||
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- [ ] Scans failing with "401 Unauthorized" or "403 Forbidden"
|
||||
- [ ] Alert `ScannerRegistryAuthFailed` firing
|
||||
- [ ] Error: "failed to authenticate with registry"
|
||||
- [ ] Error: "failed to pull image manifest"
|
||||
- [ ] Scans work for public images but fail for private images
|
||||
|
||||
---
|
||||
|
||||
## Impact
|
||||
|
||||
| Impact Type | Description |
|
||||
|-------------|-------------|
|
||||
| **User-facing** | Cannot scan private images; release pipeline blocked |
|
||||
| **Data integrity** | No data loss; authentication issue only |
|
||||
| **SLA impact** | All scans for affected registry blocked |
|
||||
|
||||
---
|
||||
|
||||
## Diagnosis
|
||||
|
||||
### Quick checks
|
||||
|
||||
1. **Check Doctor diagnostics:**
|
||||
```bash
|
||||
stella doctor --check check.scanner.registry-auth
|
||||
```
|
||||
|
||||
2. **List configured registries:**
|
||||
```bash
|
||||
stella registry list --show-status
|
||||
```
|
||||
Look for: Registries with "auth_failed" status
|
||||
|
||||
3. **Test registry authentication:**
|
||||
```bash
|
||||
stella registry test <registry-url>
|
||||
```
|
||||
|
||||
### Deep diagnosis
|
||||
|
||||
1. **Check credential expiration:**
|
||||
```bash
|
||||
stella registry credentials show <registry-name>
|
||||
```
|
||||
Look for: Expiration date, token type
|
||||
|
||||
2. **Test with verbose output:**
|
||||
```bash
|
||||
stella registry test <registry-url> --verbose
|
||||
```
|
||||
Look for: Specific auth error message, HTTP status code
|
||||
|
||||
3. **Check registry logs:**
|
||||
```bash
|
||||
stella scanner logs --filter "registry auth" --last 30m
|
||||
```
|
||||
|
||||
4. **Verify IAM/OIDC configuration (for cloud registries):**
|
||||
```bash
|
||||
stella registry iam-status <registry-name>
|
||||
```
|
||||
Problem if: IAM role not assumable, OIDC token expired
|
||||
|
||||
---
|
||||
|
||||
## Resolution
|
||||
|
||||
### Immediate mitigation
|
||||
|
||||
1. **Refresh credentials (for token-based auth):**
|
||||
```bash
|
||||
stella registry refresh-credentials <registry-name>
|
||||
```
|
||||
|
||||
2. **Update static credentials:**
|
||||
```bash
|
||||
stella registry update-credentials <registry-name> \
|
||||
--username <user> \
|
||||
--password <token>
|
||||
```
|
||||
|
||||
3. **For Docker Hub rate limiting:**
|
||||
```bash
|
||||
stella registry configure docker-hub \
|
||||
--username <user> \
|
||||
--access-token <token>
|
||||
```
|
||||
|
||||
### Root cause fix
|
||||
|
||||
**If credentials expired:**
|
||||
|
||||
1. Generate new access token in registry (ECR, GCR, ACR, etc.)
|
||||
|
||||
2. Update credentials:
|
||||
```bash
|
||||
stella registry update-credentials <registry-name> --from-env
|
||||
```
|
||||
|
||||
3. Configure automatic token refresh:
|
||||
```bash
|
||||
stella registry config set <registry-name>.auto_refresh true
|
||||
stella registry config set <registry-name>.refresh_interval 11h
|
||||
```
|
||||
|
||||
**If IAM role/policy changed (AWS ECR):**
|
||||
|
||||
1. Verify IAM role permissions:
|
||||
```bash
|
||||
stella registry iam verify <registry-name>
|
||||
```
|
||||
|
||||
2. Update IAM role ARN if changed:
|
||||
```bash
|
||||
stella registry configure ecr \
|
||||
--region <region> \
|
||||
--role-arn <arn>
|
||||
```
|
||||
|
||||
**If OIDC federation changed (GCP Artifact Registry):**
|
||||
|
||||
1. Verify service account:
|
||||
```bash
|
||||
stella registry oidc verify <registry-name>
|
||||
```
|
||||
|
||||
2. Update workload identity configuration:
|
||||
```bash
|
||||
stella registry configure gcr \
|
||||
--project <project> \
|
||||
--workload-identity-provider <provider>
|
||||
```
|
||||
|
||||
**If certificate changed (self-hosted registries):**
|
||||
|
||||
1. Update CA certificate:
|
||||
```bash
|
||||
stella registry configure <registry-name> \
|
||||
--ca-cert /path/to/ca.crt
|
||||
```
|
||||
|
||||
2. Or skip verification (not recommended for production):
|
||||
```bash
|
||||
stella registry configure <registry-name> \
|
||||
--insecure-skip-verify
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
```bash
|
||||
# Test authentication
|
||||
stella registry test <registry-url>
|
||||
|
||||
# Test scanning a private image
|
||||
stella scan image --image <registry-url>/<image>:<tag> --dry-run
|
||||
|
||||
# Verify no auth failures in recent logs
|
||||
stella scanner logs --filter "auth" --level error --last 30m
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prevention
|
||||
|
||||
- [ ] **Credentials:** Use service accounts/workload identity instead of static tokens
|
||||
- [ ] **Rotation:** Configure automatic token refresh before expiration
|
||||
- [ ] **Monitoring:** Alert on authentication failure rate > 0
|
||||
- [ ] **Documentation:** Document registry credential management procedures
|
||||
|
||||
---
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **Architecture:** `docs/modules/scanner/registry-auth.md`
|
||||
- **Related runbooks:** `scanner-worker-stuck.md`, `scanner-timeout.md`
|
||||
- **Registry setup:** `docs/operations/registry-configuration.md`
|
||||
Reference in New Issue
Block a user