doctor: complete runtime check documentation sprint

Signed-off-by: master <>
This commit is contained in:
master
2026-03-31 23:26:24 +03:00
parent 404d50bcb7
commit 152c1b1357
54 changed files with 2210 additions and 258 deletions

View File

@@ -0,0 +1,56 @@
---
checkId: check.servicegraph.backend
plugin: stellaops.doctor.servicegraph
severity: fail
tags: [servicegraph, backend, api, connectivity]
---
# Backend API Connectivity
## What It Checks
Reads `StellaOps:BackendUrl` or `BackendUrl`, appends `/health`, and performs an HTTP GET through `IHttpClientFactory`.
The check passes on a successful response, warns when latency exceeds `2000ms`, and fails on non-success status codes or connection errors.
## Why It Matters
The backend API is the control plane entry point for many Stella Ops flows. If it is unreachable, UI features and cross-service orchestration degrade quickly.
## Common Causes
- `StellaOps__BackendUrl` points to the wrong host, port, or scheme
- The backend service is down or returning `5xx`
- DNS, proxy, or network rules block access from the Doctor service
## How to Fix
### Docker Compose
```yaml
services:
doctor-web:
environment:
StellaOps__BackendUrl: http://platform-web:8080
```
```bash
docker compose -f devops/compose/docker-compose.stella-ops.yml exec doctor-web curl -fsS http://platform-web:8080/health
docker compose -f devops/compose/docker-compose.stella-ops.yml logs --tail 100 platform-web
```
### Bare Metal / systemd
```bash
curl -fsS http://<backend-host>:<port>/health
journalctl -u <backend-service> -n 200
```
### Kubernetes / Helm
```bash
kubectl exec deploy/doctor-web -n <namespace> -- curl -fsS http://<backend-service>.<namespace>.svc.cluster.local:<port>/health
kubectl logs deploy/<backend-service> -n <namespace> --tail=200
```
## Verification
```bash
stella doctor --check check.servicegraph.backend
```
## Related Checks
- `check.servicegraph.endpoints` - validates the rest of the service graph after the main backend is reachable
- `check.servicegraph.timeouts` - slow backend responses often trace back to timeout tuning