--- checkId: check.servicegraph.circuitbreaker plugin: stellaops.doctor.servicegraph severity: warn tags: [servicegraph, resilience, circuit-breaker] --- # Circuit Breaker Status ## What It Checks Reads `Resilience:Enabled` or `HttpClient:Resilience:Enabled` and, when enabled, validates `BreakDurationSeconds`, `FailureThreshold`, and `SamplingDurationSeconds`. The check reports info when resilience is not configured, warns when `BreakDurationSeconds < 5` or `FailureThreshold < 2`, and passes otherwise. ## Why It Matters Circuit breakers protect external dependencies from retry storms. Bad thresholds either trip too aggressively or never trip when a downstream service is failing. ## Common Causes - Resilience policies were never enabled on outgoing HTTP clients - Thresholds were copied from a benchmark profile into production - Multiple services use different resilience defaults, making failures unpredictable ## How to Fix ### Docker Compose ```yaml services: doctor-web: environment: Resilience__Enabled: "true" Resilience__CircuitBreaker__BreakDurationSeconds: "30" Resilience__CircuitBreaker__FailureThreshold: "5" Resilience__CircuitBreaker__SamplingDurationSeconds: "60" ``` ### Bare Metal / systemd Keep breaker settings in the same configuration source used for HTTP client registration so the service and Doctor observe the same values. ### Kubernetes / Helm Standardize resilience values across backend-facing workloads instead of per-pod overrides. ## Verification ```bash stella doctor --check check.servicegraph.circuitbreaker ``` ## Related Checks - `check.servicegraph.backend` - breaker policy protects this path when the backend degrades - `check.servicegraph.timeouts` - timeout settings and breaker settings should be tuned together