1.8 KiB
1.8 KiB
checkId, plugin, severity, tags
| checkId | plugin | severity | tags | |||
|---|---|---|---|---|---|---|
| check.servicegraph.circuitbreaker | stellaops.doctor.servicegraph | warn |
|
Circuit Breaker Status
What It Checks
Reads Resilience:Enabled or HttpClient:Resilience:Enabled and, when enabled, validates BreakDurationSeconds, FailureThreshold, and SamplingDurationSeconds.
The check reports info when resilience is not configured, warns when BreakDurationSeconds < 5 or FailureThreshold < 2, and passes otherwise.
Why It Matters
Circuit breakers protect external dependencies from retry storms. Bad thresholds either trip too aggressively or never trip when a downstream service is failing.
Common Causes
- Resilience policies were never enabled on outgoing HTTP clients
- Thresholds were copied from a benchmark profile into production
- Multiple services use different resilience defaults, making failures unpredictable
How to Fix
Docker Compose
services:
doctor-web:
environment:
Resilience__Enabled: "true"
Resilience__CircuitBreaker__BreakDurationSeconds: "30"
Resilience__CircuitBreaker__FailureThreshold: "5"
Resilience__CircuitBreaker__SamplingDurationSeconds: "60"
Bare Metal / systemd
Keep breaker settings in the same configuration source used for HTTP client registration so the service and Doctor observe the same values.
Kubernetes / Helm
Standardize resilience values across backend-facing workloads instead of per-pod overrides.
Verification
stella doctor --check check.servicegraph.circuitbreaker
Related Checks
check.servicegraph.backend- breaker policy protects this path when the backend degradescheck.servicegraph.timeouts- timeout settings and breaker settings should be tuned together