Files
git.stella-ops.org/docs/doctor/articles/auth/token-service.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

4.1 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.auth.token-service stellaops.doctor.auth fail
auth
service
health

Token Service Health

What It Checks

Verifies the availability and performance of the token service endpoint (/connect/token). The check evaluates four conditions:

  1. Service unavailable -- token endpoint is not responding. Result: Fail with the endpoint URL and error message.
  2. Critically slow -- response time exceeds 2000ms. Result: Fail with actual response time and threshold.
  3. Slow -- response time exceeds 500ms but is under 2000ms. Result: Warn with response time, threshold, and token issuance count.
  4. Healthy -- service is available and response time is under 500ms. Result: Pass with response time, tokens issued in last 24 hours, and active session count.

Evidence collected: ServiceAvailable (YES/NO), Endpoint, ResponseTimeMs, CriticalThreshold (2000), WarningThreshold (500), TokensIssuedLast24h, ActiveSessions, Error.

The check always runs (CanRun returns true).

Why It Matters

The token service is the single point through which all access tokens are issued. If it is unavailable, no user can log in, no service can authenticate, and every API call fails with 401. Even if the service is available but slow, user login experiences degrade, automated integrations time out, and the platform feels unresponsive. This check is typically the first to detect Authority database issues or resource starvation.

Common Causes

  • Authority service not running (container stopped, process crashed)
  • Token endpoint misconfigured (wrong path, wrong port)
  • Database connectivity issue (Authority cannot query clients/keys)
  • Database performance issues (slow queries for token validation)
  • Service overloaded (high authentication request volume)
  • Resource contention (CPU/memory pressure on Authority host)
  • Higher than normal load (warning-level)
  • Database query performance degraded (warning-level)

How to Fix

Docker Compose

# Check Authority service status
docker compose -f devops/compose/docker-compose.stella-ops.yml ps authority

# View Authority service logs
docker compose -f devops/compose/docker-compose.stella-ops.yml logs authority --tail 200

# Restart Authority service
docker compose -f devops/compose/docker-compose.stella-ops.yml restart authority

# Test token endpoint directly
docker compose -f devops/compose/docker-compose.stella-ops.yml exec authority \
  curl -s -o /dev/null -w "%{http_code} %{time_total}s" http://localhost:80/connect/token

# Check database connectivity
docker compose -f devops/compose/docker-compose.stella-ops.yml exec authority \
  stella doctor run --check check.storage.postgres

Bare Metal / systemd

# Check authority service status
stella auth status

# Restart authority service
stella service restart authority

# Check database connectivity
stella doctor run --check check.storage.postgres

# Monitor service metrics
stella auth metrics --period 1h

# Review database performance
stella doctor run --check check.storage.performance

# Watch metrics in real-time (warning-level slowness)
stella auth metrics --watch

Kubernetes / Helm

# Check authority pod status
kubectl get pods -l app.kubernetes.io/component=authority -n stellaops

# View pod logs
kubectl logs -l app.kubernetes.io/component=authority -n stellaops --tail=200

# Check resource usage
kubectl top pods -l app.kubernetes.io/component=authority -n stellaops

# Restart authority pods
kubectl rollout restart deployment/stellaops-authority -n stellaops

# Scale up if under load
kubectl scale deployment stellaops-authority --replicas=3 -n stellaops

# Check liveness/readiness probe status
kubectl describe pod -l app.kubernetes.io/component=authority -n stellaops | grep -A5 "Liveness\|Readiness"

Verification

stella doctor run --check check.auth.token-service
  • check.auth.config -- auth must be configured before the token service can function
  • check.auth.signing-key -- token issuance requires a valid signing key
  • check.auth.oidc -- if delegating to external OIDC, that provider must also be healthy