Files
git.stella-ops.org/docs/runbooks/registry-referrer-troubleshooting.md
2026-01-28 02:30:48 +02:00

6.7 KiB

Registry Referrer Discovery Troubleshooting

Sprint: SPRINT_0127_001_0001_oci_referrer_bundle_export Module: ExportCenter, AirGap

This runbook covers diagnosing and resolving OCI referrer discovery issues during mirror bundle exports.

Quick Reference

Symptom Likely Cause Solution
No referrers discovered Registry doesn't support referrers API Check registry compatibility
Discovery timeout Network issues or slow registry Increase timeout, check connectivity
Partial referrers Rate limiting or auth issues Check credentials and rate limits
Checksum mismatch Referrer modified after discovery Re-export bundle

Registry Compatibility Quick Reference

Registry OCI 1.1 API Fallback Notes
Docker Hub Partial Yes Rate limits may affect discovery
GHCR No Yes Uses tag-based discovery only
GCR Yes Yes Full OCI 1.1 support
ECR Yes Yes Requires proper IAM permissions
ACR Yes Yes Full OCI 1.1 support
Harbor 2.0+ Yes Yes Full OCI 1.1 support
Quay Partial Yes Varies by version
JFrog Artifactory Partial Yes Requires OCI layout repository

See Registry Compatibility Matrix for detailed information.

Diagnosing Issues

1. Check Export Logs

Look for capability probing and discovery logs:

# Look for probing logs
grep "Probing.*registries for OCI referrer" /var/log/stellaops/export-center.log

# Check individual registry results
grep "Registry.*OCI 1" /var/log/stellaops/export-center.log

# Example output:
# [INFO] Probing 2 registries for OCI referrer capabilities before export
# [INFO] Registry gcr.io: OCI 1.1 (referrers API supported, version=OCI-Distribution/2.1, probe_ms=42)
# [WARN] Registry ghcr.io: OCI 1.0 (using fallback tag discovery, version=registry/2.0, probe_ms=85)

2. Check Telemetry Metrics

Query Prometheus for referrer discovery metrics:

# Capability probes by registry and support status
sum by (registry, api_supported) (
  rate(export_registry_capabilities_probed_total[5m])
)

# Discovery method breakdown
sum by (registry, method) (
  rate(export_referrer_discovery_method_total[5m])
)

# Failure rate by registry
sum by (registry) (
  rate(export_referrer_discovery_failures_total[5m])
)

3. Test Registry Connectivity

Manually probe registry capabilities:

# Test OCI referrers API (OCI 1.1)
curl -H "Accept: application/vnd.oci.image.index.v1+json" \
  "https://registry.example.com/v2/myrepo/referrers/sha256:abc123..."

# Expected responses:
# - 200 OK with manifest list: Registry supports referrers API
# - 404 Not Found: No referrers exist (API supported)
# - 501 Not Implemented: Registry doesn't support referrers API

# Check distribution version
curl -I "https://registry.example.com/v2/"
# Look for: OCI-Distribution-API-Version header

4. Test Fallback Tag Discovery

If native API is not supported:

# List tags matching fallback pattern
curl "https://registry.example.com/v2/myrepo/tags/list" | \
  jq '.tags | map(select(startswith("sha256-")))'

# Expected: Tags like "sha256-abc123.sbom", "sha256-abc123.att"

Common Issues and Solutions

Issue: "Failed to probe capabilities for registry"

Symptoms:

  • Warning logs about probe failures
  • Referrer discovery using fallback or skipped

Causes:

  1. Network connectivity issues
  2. Authentication failures
  3. Registry rate limiting
  4. TLS certificate issues

Solutions:

# Check network connectivity
curl -v "https://registry.example.com/v2/"

# Verify authentication
docker login registry.example.com

# Check TLS certificates
openssl s_client -connect registry.example.com:443 -servername registry.example.com

Issue: "No referrers found for image"

Symptoms:

  • Discovery succeeds but returns empty list
  • Bundle missing expected SBOMs/attestations

Causes:

  1. No referrers actually attached to image
  2. Referrers attached to different digest (tag vs digest mismatch)
  3. Referrers pruned by registry retention policy

Solutions:

# Verify referrers exist for the specific digest
crane manifest registry.example.com/repo@sha256:abc123 | \
  jq '.subject.digest'

# List referrers using oras
oras discover registry.example.com/repo@sha256:abc123

# Check if referrers exist with different artifact types
curl "https://registry.example.com/v2/repo/referrers/sha256:abc123?artifactType=application/vnd.cyclonedx%2Bjson"

Issue: "Referrer checksum mismatch during import"

Symptoms:

  • ImportValidator reports ReferrerChecksumMismatch
  • Bundle verification fails

Causes:

  1. Referrer artifact modified after export
  2. Registry replaced artifact
  3. Bundle corruption during transfer

Solutions:

  1. Re-export the bundle to get fresh referrer content
  2. Verify bundle integrity: sha256sum bundle.tgz
  3. Check if referrer was intentionally updated upstream

Issue: Slow referrer discovery

Symptoms:

  • Export takes much longer than expected
  • Timeout warnings in logs

Causes:

  1. Large number of referrers per image
  2. Slow registry responses
  3. No capability caching (cache miss)

Solutions:

# Increase timeout in export config
export:
  referrer_discovery:
    timeout_seconds: 120
    max_concurrent_discoveries: 4

Validation Commands

Verify Bundle Referrers

# Extract and list referrer structure
tar -tzf bundle.tgz | grep "^referrers/"

# Check manifest for referrer counts
tar -xzf bundle.tgz -O manifest.yaml | grep -A5 "referrers:"

# Validate a specific referrer checksum
tar -xzf bundle.tgz -O referrers/sha256-abc123/sha256-def456.json | sha256sum

CLI Validation

# Validate bundle referrers
stellaops bundle validate --file bundle.tgz --check-referrers

# Import with strict referrer validation
stellaops bundle import --file bundle.tgz --strict-referrer-validation

Escalation

If issues persist after following this runbook:

  1. Collect diagnostic information:

    • Export logs with DEBUG level enabled
    • Telemetry metrics for the affected time window
    • Registry type and version
    • Network trace if applicable
  2. Check known issues

  3. Open a support ticket with:

    • Environment details (StellaOps version, registry type)
    • Error messages and logs
    • Steps to reproduce