doctor: complete runtime check documentation sprint

Signed-off-by: master <>
This commit is contained in:
master
2026-03-31 23:26:24 +03:00
parent 404d50bcb7
commit 152c1b1357
54 changed files with 2210 additions and 258 deletions

View File

@@ -0,0 +1,53 @@
---
checkId: check.servicegraph.endpoints
plugin: stellaops.doctor.servicegraph
severity: fail
tags: [servicegraph, services, endpoints, connectivity]
---
# Service Endpoints
## What It Checks
Collects configured service URLs for Authority, Scanner, Concelier, Excititor, Attestor, VexLens, and Gateway, appends `/health`, and probes each endpoint.
The check fails when any configured endpoint is unreachable or returns a non-success status. If no endpoints are configured, the check is skipped.
## Why It Matters
Stella Ops is a multi-service platform. A single broken internal endpoint can stall release orchestration, evidence generation, or advisory workflows even when the main web process is alive.
## Common Causes
- One or more `StellaOps:*Url` values are missing or point to the wrong internal service name
- Internal DNS or network routing is broken
- The target workload is up but not exposing `/health`
## How to Fix
### Docker Compose
Set the internal URLs explicitly:
```yaml
StellaOps__AuthorityUrl: http://authority-web:8080
StellaOps__ScannerUrl: http://scanner-web:8080
StellaOps__GatewayUrl: http://web:8080
```
Probe each endpoint from the Doctor container:
```bash
docker compose -f devops/compose/docker-compose.stella-ops.yml exec doctor-web curl -fsS http://authority-web:8080/health
docker compose -f devops/compose/docker-compose.stella-ops.yml exec doctor-web curl -fsS http://scanner-web:8080/health
```
### Bare Metal / systemd
Confirm the service-discovery or reverse-proxy names resolve from the Doctor host.
### Kubernetes / Helm
Use cluster-local service DNS names and check that each workload exports a health endpoint through the same port the URL references.
## Verification
```bash
stella doctor --check check.servicegraph.endpoints
```
## Related Checks
- `check.servicegraph.backend` - the backend is usually the first endpoint operators validate
- `check.servicegraph.mq` - asynchronous workflows also depend on messaging, not only HTTP endpoints