Files
git.stella-ops.org/docs/operations/runbooks/scanner-registry-auth.md

4.5 KiB

Runbook: Scanner - Registry Authentication Failures

Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-002 - Scanner Runbooks

Metadata

Field Value
Component Scanner
Severity High
On-call scope Platform team, Security team
Last updated 2026-01-17
Doctor check check.scanner.registry-auth

Symptoms

  • Scans failing with "401 Unauthorized" or "403 Forbidden"
  • Alert ScannerRegistryAuthFailed firing
  • Error: "failed to authenticate with registry"
  • Error: "failed to pull image manifest"
  • Scans work for public images but fail for private images

Impact

Impact Type Description
User-facing Cannot scan private images; release pipeline blocked
Data integrity No data loss; authentication issue only
SLA impact All scans for affected registry blocked

Diagnosis

Quick checks

  1. Check Doctor diagnostics:

    stella doctor --check check.scanner.registry-auth
    
  2. List configured registries:

    stella registry list --show-status
    

    Look for: Registries with "auth_failed" status

  3. Test registry authentication:

    stella registry test <registry-url>
    

Deep diagnosis

  1. Check credential expiration:

    stella registry credentials show <registry-name>
    

    Look for: Expiration date, token type

  2. Test with verbose output:

    stella registry test <registry-url> --verbose
    

    Look for: Specific auth error message, HTTP status code

  3. Check registry logs:

    stella scanner logs --filter "registry auth" --last 30m
    
  4. Verify IAM/OIDC configuration (for cloud registries):

    stella registry iam-status <registry-name>
    

    Problem if: IAM role not assumable, OIDC token expired


Resolution

Immediate mitigation

  1. Refresh credentials (for token-based auth):

    stella registry refresh-credentials <registry-name>
    
  2. Update static credentials:

    stella registry update-credentials <registry-name> \
      --username <user> \
      --password <token>
    
  3. For Docker Hub rate limiting:

    stella registry configure docker-hub \
      --username <user> \
      --access-token <token>
    

Root cause fix

If credentials expired:

  1. Generate new access token in registry (ECR, GCR, ACR, etc.)

  2. Update credentials:

    stella registry update-credentials <registry-name> --from-env
    
  3. Configure automatic token refresh:

    stella registry config set <registry-name>.auto_refresh true
    stella registry config set <registry-name>.refresh_interval 11h
    

If IAM role/policy changed (AWS ECR):

  1. Verify IAM role permissions:

    stella registry iam verify <registry-name>
    
  2. Update IAM role ARN if changed:

    stella registry configure ecr \
      --region <region> \
      --role-arn <arn>
    

If OIDC federation changed (GCP Artifact Registry):

  1. Verify service account:

    stella registry oidc verify <registry-name>
    
  2. Update workload identity configuration:

    stella registry configure gcr \
      --project <project> \
      --workload-identity-provider <provider>
    

If certificate changed (self-hosted registries):

  1. Update CA certificate:

    stella registry configure <registry-name> \
      --ca-cert /path/to/ca.crt
    
  2. Or skip verification (not recommended for production):

    stella registry configure <registry-name> \
      --insecure-skip-verify
    

Verification

# Test authentication
stella registry test <registry-url>

# Test scanning a private image
stella scan image --image <registry-url>/<image>:<tag> --dry-run

# Verify no auth failures in recent logs
stella scanner logs --filter "auth" --level error --last 30m

Prevention

  • Credentials: Use service accounts/workload identity instead of static tokens
  • Rotation: Configure automatic token refresh before expiration
  • Monitoring: Alert on authentication failure rate > 0
  • Documentation: Document registry credential management procedures

  • Architecture: docs/modules/scanner/registry-auth.md
  • Related runbooks: scanner-worker-stuck.md, scanner-timeout.md
  • Registry setup: docs/operations/registry-configuration.md