Files

master 152c1b1357 doctor: complete runtime check documentation sprint

Signed-off-by: master <>

2026-03-31 23:26:24 +03:00

1.7 KiB

Raw Blame History

checkId, plugin, severity, tags

checkId

plugin

severity

Service Timeouts

What It Checks

Validates HttpClient:Timeout, Database:CommandTimeout, Cache:OperationTimeout, and HealthChecks:Timeout.

The check warns when HTTP timeout is below 5s or above 300s, database timeout is below 5s or above 120s, cache timeout exceeds 30s, or health-check timeout exceeds the HTTP timeout.

Why It Matters

Timeouts define how quickly failures surface and how long stuck work ties up resources. Poor values cause either premature failures or prolonged resource exhaustion.

Common Causes

Defaults from one environment were copied into another with very different latency
Health-check timeout was set higher than the main request timeout
Cache or database timeouts were raised to hide underlying performance problems

How to Fix

Docker Compose

services:
  doctor-web:
    environment:
      HttpClient__Timeout: "100"
      Database__CommandTimeout: "30"
      Cache__OperationTimeout: "5"
      HealthChecks__Timeout: "10"

Bare Metal / systemd

Tune timeouts from measured service latencies, not from guesswork. Raise values only after understanding the slower dependency.

Kubernetes / Helm

Keep application timeouts lower than ingress, service-mesh, and job-level deadlines so failures happen in the component that owns the retry policy.

Verification

stella doctor --check check.servicegraph.timeouts

check.servicegraph.backend - timeout misconfiguration often shows up as backend failures first
check.db.latency - high database latency can force operators to revisit timeout values

1.7 KiB Raw Blame History