Files
git.stella-ops.org/docs/doctor/articles/servicegraph/servicegraph-circuitbreaker.md
2026-03-31 23:26:24 +03:00

1.8 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.servicegraph.circuitbreaker stellaops.doctor.servicegraph warn
servicegraph
resilience
circuit-breaker

Circuit Breaker Status

What It Checks

Reads Resilience:Enabled or HttpClient:Resilience:Enabled and, when enabled, validates BreakDurationSeconds, FailureThreshold, and SamplingDurationSeconds.

The check reports info when resilience is not configured, warns when BreakDurationSeconds < 5 or FailureThreshold < 2, and passes otherwise.

Why It Matters

Circuit breakers protect external dependencies from retry storms. Bad thresholds either trip too aggressively or never trip when a downstream service is failing.

Common Causes

  • Resilience policies were never enabled on outgoing HTTP clients
  • Thresholds were copied from a benchmark profile into production
  • Multiple services use different resilience defaults, making failures unpredictable

How to Fix

Docker Compose

services:
  doctor-web:
    environment:
      Resilience__Enabled: "true"
      Resilience__CircuitBreaker__BreakDurationSeconds: "30"
      Resilience__CircuitBreaker__FailureThreshold: "5"
      Resilience__CircuitBreaker__SamplingDurationSeconds: "60"

Bare Metal / systemd

Keep breaker settings in the same configuration source used for HTTP client registration so the service and Doctor observe the same values.

Kubernetes / Helm

Standardize resilience values across backend-facing workloads instead of per-pod overrides.

Verification

stella doctor --check check.servicegraph.circuitbreaker
  • check.servicegraph.backend - breaker policy protects this path when the backend degrades
  • check.servicegraph.timeouts - timeout settings and breaker settings should be tuned together