Files
git.stella-ops.org/docs/runbooks/registry-referrer-troubleshooting.md
2026-01-28 02:30:48 +02:00

240 lines
6.7 KiB
Markdown

# Registry Referrer Discovery Troubleshooting
> Sprint: SPRINT_0127_001_0001_oci_referrer_bundle_export
> Module: ExportCenter, AirGap
This runbook covers diagnosing and resolving OCI referrer discovery issues during mirror bundle exports.
## Quick Reference
| Symptom | Likely Cause | Solution |
|---------|--------------|----------|
| No referrers discovered | Registry doesn't support referrers API | Check [registry compatibility](#registry-compatibility-quick-reference) |
| Discovery timeout | Network issues or slow registry | Increase timeout, check connectivity |
| Partial referrers | Rate limiting or auth issues | Check credentials and rate limits |
| Checksum mismatch | Referrer modified after discovery | Re-export bundle |
## Registry Compatibility Quick Reference
| Registry | OCI 1.1 API | Fallback | Notes |
|----------|-------------|----------|-------|
| Docker Hub | Partial | Yes | Rate limits may affect discovery |
| GHCR | No | Yes | Uses tag-based discovery only |
| GCR | Yes | Yes | Full OCI 1.1 support |
| ECR | Yes | Yes | Requires proper IAM permissions |
| ACR | Yes | Yes | Full OCI 1.1 support |
| Harbor 2.0+ | Yes | Yes | Full OCI 1.1 support |
| Quay | Partial | Yes | Varies by version |
| JFrog Artifactory | Partial | Yes | Requires OCI layout repository |
See [Registry Compatibility Matrix](../modules/export-center/registry-compatibility.md) for detailed information.
## Diagnosing Issues
### 1. Check Export Logs
Look for capability probing and discovery logs:
```bash
# Look for probing logs
grep "Probing.*registries for OCI referrer" /var/log/stellaops/export-center.log
# Check individual registry results
grep "Registry.*OCI 1" /var/log/stellaops/export-center.log
# Example output:
# [INFO] Probing 2 registries for OCI referrer capabilities before export
# [INFO] Registry gcr.io: OCI 1.1 (referrers API supported, version=OCI-Distribution/2.1, probe_ms=42)
# [WARN] Registry ghcr.io: OCI 1.0 (using fallback tag discovery, version=registry/2.0, probe_ms=85)
```
### 2. Check Telemetry Metrics
Query Prometheus for referrer discovery metrics:
```promql
# Capability probes by registry and support status
sum by (registry, api_supported) (
rate(export_registry_capabilities_probed_total[5m])
)
# Discovery method breakdown
sum by (registry, method) (
rate(export_referrer_discovery_method_total[5m])
)
# Failure rate by registry
sum by (registry) (
rate(export_referrer_discovery_failures_total[5m])
)
```
### 3. Test Registry Connectivity
Manually probe registry capabilities:
```bash
# Test OCI referrers API (OCI 1.1)
curl -H "Accept: application/vnd.oci.image.index.v1+json" \
"https://registry.example.com/v2/myrepo/referrers/sha256:abc123..."
# Expected responses:
# - 200 OK with manifest list: Registry supports referrers API
# - 404 Not Found: No referrers exist (API supported)
# - 501 Not Implemented: Registry doesn't support referrers API
# Check distribution version
curl -I "https://registry.example.com/v2/"
# Look for: OCI-Distribution-API-Version header
```
### 4. Test Fallback Tag Discovery
If native API is not supported:
```bash
# List tags matching fallback pattern
curl "https://registry.example.com/v2/myrepo/tags/list" | \
jq '.tags | map(select(startswith("sha256-")))'
# Expected: Tags like "sha256-abc123.sbom", "sha256-abc123.att"
```
## Common Issues and Solutions
### Issue: "Failed to probe capabilities for registry"
**Symptoms:**
- Warning logs about probe failures
- Referrer discovery using fallback or skipped
**Causes:**
1. Network connectivity issues
2. Authentication failures
3. Registry rate limiting
4. TLS certificate issues
**Solutions:**
```bash
# Check network connectivity
curl -v "https://registry.example.com/v2/"
# Verify authentication
docker login registry.example.com
# Check TLS certificates
openssl s_client -connect registry.example.com:443 -servername registry.example.com
```
### Issue: "No referrers found for image"
**Symptoms:**
- Discovery succeeds but returns empty list
- Bundle missing expected SBOMs/attestations
**Causes:**
1. No referrers actually attached to image
2. Referrers attached to different digest (tag vs digest mismatch)
3. Referrers pruned by registry retention policy
**Solutions:**
```bash
# Verify referrers exist for the specific digest
crane manifest registry.example.com/repo@sha256:abc123 | \
jq '.subject.digest'
# List referrers using oras
oras discover registry.example.com/repo@sha256:abc123
# Check if referrers exist with different artifact types
curl "https://registry.example.com/v2/repo/referrers/sha256:abc123?artifactType=application/vnd.cyclonedx%2Bjson"
```
### Issue: "Referrer checksum mismatch during import"
**Symptoms:**
- `ImportValidator` reports `ReferrerChecksumMismatch`
- Bundle verification fails
**Causes:**
1. Referrer artifact modified after export
2. Registry replaced artifact
3. Bundle corruption during transfer
**Solutions:**
1. Re-export the bundle to get fresh referrer content
2. Verify bundle integrity: `sha256sum bundle.tgz`
3. Check if referrer was intentionally updated upstream
### Issue: Slow referrer discovery
**Symptoms:**
- Export takes much longer than expected
- Timeout warnings in logs
**Causes:**
1. Large number of referrers per image
2. Slow registry responses
3. No capability caching (cache miss)
**Solutions:**
```yaml
# Increase timeout in export config
export:
referrer_discovery:
timeout_seconds: 120
max_concurrent_discoveries: 4
```
## Validation Commands
### Verify Bundle Referrers
```bash
# Extract and list referrer structure
tar -tzf bundle.tgz | grep "^referrers/"
# Check manifest for referrer counts
tar -xzf bundle.tgz -O manifest.yaml | grep -A5 "referrers:"
# Validate a specific referrer checksum
tar -xzf bundle.tgz -O referrers/sha256-abc123/sha256-def456.json | sha256sum
```
### CLI Validation
```bash
# Validate bundle referrers
stellaops bundle validate --file bundle.tgz --check-referrers
# Import with strict referrer validation
stellaops bundle import --file bundle.tgz --strict-referrer-validation
```
## Escalation
If issues persist after following this runbook:
1. Collect diagnostic information:
- Export logs with DEBUG level enabled
- Telemetry metrics for the affected time window
- Registry type and version
- Network trace if applicable
2. Check [known issues](https://github.com/stella-ops/issues?q=label:referrer-discovery)
3. Open a support ticket with:
- Environment details (StellaOps version, registry type)
- Error messages and logs
- Steps to reproduce
## Related Documentation
- [Export Center Architecture](../modules/export-center/architecture.md#oci-referrer-discovery)
- [Registry Compatibility Matrix](../modules/export-center/registry-compatibility.md)
- [Offline Bundle Format](../modules/airgap/guides/offline-bundle-format.md#oci-referrer-artifacts)