Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
71
docs/doctor/articles/integration/ci-system-connectivity.md
Normal file
71
docs/doctor/articles/integration/ci-system-connectivity.md
Normal file
@@ -0,0 +1,71 @@
|
||||
---
|
||||
checkId: check.integration.ci.system
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [integration, ci, cd, jenkins, gitlab, github]
|
||||
---
|
||||
# CI System Connectivity
|
||||
|
||||
## What It Checks
|
||||
Iterates over all CI/CD systems defined under `CI:Systems` (or the legacy `CI:Url` single-system key). For each system it sends an HTTP GET to a type-specific health endpoint (Jenkins `/api/json`, GitLab `/api/v4/version`, GitHub `/rate_limit`, Azure DevOps `/_apis/connectionData`, or generic `/health`), sets the appropriate auth header (Bearer for GitHub/generic, `PRIVATE-TOKEN` for GitLab), and records reachability, authentication success, and latency. If the system is reachable and authenticated, it optionally queries runner/agent status (Jenkins `/computer/api/json`, GitLab `/api/v4/runners?status=online`). The check **fails** when any system is unreachable or returns 401/403, **warns** when all systems are reachable but one or more has zero available runners (out of a non-zero total), and **passes** otherwise.
|
||||
|
||||
## Why It Matters
|
||||
CI/CD systems are the trigger point for automated builds, tests, and release pipelines. If a CI system is unreachable or its credentials have expired, new commits will not be built, security scans will not run, and promotions will stall. Runner exhaustion has the same effect: pipelines queue indefinitely, delaying releases and blocking evidence collection.
|
||||
|
||||
## Common Causes
|
||||
- CI system is down or undergoing maintenance
|
||||
- Network connectivity issue between Stella Ops and the CI host
|
||||
- API credentials (token or password) have expired or been rotated
|
||||
- Firewall or security group blocking the CI API port
|
||||
- All CI runners/agents are offline or busy
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Verify the CI URL is correct in your environment file
|
||||
grep -E '^CI__' .env
|
||||
|
||||
# Test connectivity from within the Docker network
|
||||
docker compose exec gateway curl -sv https://ci.example.com/api/json
|
||||
|
||||
# Rotate or set a new API token
|
||||
echo 'CI__Systems__0__ApiToken=<new-token>' >> .env
|
||||
docker compose restart gateway
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Check config in appsettings
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.CI'
|
||||
|
||||
# Test connectivity
|
||||
curl -H "Authorization: Bearer $CI_TOKEN" https://ci.example.com/api/json
|
||||
|
||||
# Update the token
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
ci:
|
||||
systems:
|
||||
- name: jenkins-prod
|
||||
url: https://ci.example.com
|
||||
type: jenkins
|
||||
apiToken: <token> # or use existingSecret
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.ci.system
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.webhooks` -- validates webhook delivery from CI events
|
||||
- `check.integration.git` -- validates Git provider reachability (often same host as CI)
|
||||
66
docs/doctor/articles/integration/git-provider-api.md
Normal file
66
docs/doctor/articles/integration/git-provider-api.md
Normal file
@@ -0,0 +1,66 @@
|
||||
---
|
||||
checkId: check.integration.git
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [connectivity, git, scm]
|
||||
---
|
||||
# Git Provider API
|
||||
|
||||
## What It Checks
|
||||
Resolves the configured Git provider URL from `Git:Url`, `Scm:Url`, `GitHub:Url`, `GitLab:Url`, or `Gitea:Url`. Auto-detects the provider type (GitHub, GitLab, Gitea, Bitbucket, Azure DevOps) from the URL and sends an HTTP GET to the corresponding API endpoint (e.g., GitHub -> `api.github.com`, GitLab -> `/api/v4/version`, Gitea -> `/api/v1/version`, Bitbucket -> `/rest/api/1.0/application-properties`). The check **passes** if the response is 2xx, 401, or 403 (reachable even if auth is needed), **warns** on other non-error status codes, and **fails** on connection errors or exceptions.
|
||||
|
||||
## Why It Matters
|
||||
Git provider connectivity is essential for source-code scanning, SBOM ingestion, webhook event reception, and commit-status reporting. A misconfigured or unreachable Git URL silently breaks SCM-triggered workflows and prevents evidence collection from source repositories.
|
||||
|
||||
## Common Causes
|
||||
- Git provider URL is incorrect or has a trailing-path typo
|
||||
- Network connectivity issues or DNS failure
|
||||
- Git provider service is down or undergoing maintenance
|
||||
- Provider uses a non-standard API path
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check current Git URL
|
||||
grep 'GIT__URL\|SCM__URL\|GITHUB__URL' .env
|
||||
|
||||
# Test from inside the network
|
||||
docker compose exec gateway curl -sv https://git.example.com/api/v4/version
|
||||
|
||||
# Update the URL
|
||||
echo 'Git__Url=https://git.example.com' >> .env
|
||||
docker compose restart gateway
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Git'
|
||||
|
||||
# Test connectivity
|
||||
curl -v https://git.example.com/api/v4/version
|
||||
|
||||
# Fix the URL
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
git:
|
||||
url: https://git.example.com
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.git
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.ci.system` -- CI systems often share the same Git host
|
||||
- `check.integration.webhooks` -- webhook endpoints receive events from Git providers
|
||||
72
docs/doctor/articles/integration/ldap-connectivity.md
Normal file
72
docs/doctor/articles/integration/ldap-connectivity.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
checkId: check.integration.ldap
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [connectivity, ldap, directory, auth]
|
||||
---
|
||||
# LDAP/AD Connectivity
|
||||
|
||||
## What It Checks
|
||||
Reads the LDAP host from `Ldap:Host`, `ActiveDirectory:Host`, or `Authority:Ldap:Host` and the port from the corresponding `:Port` key (defaulting to 389, or 636 when `UseSsl` is true). Opens a raw TCP connection to the host and port with a 5-second timeout. The check **passes** if the TCP connection succeeds, **fails** on timeout, socket error, or connection refusal.
|
||||
|
||||
## Why It Matters
|
||||
LDAP or Active Directory integration is used for user authentication, group synchronization, and role mapping. If the LDAP server is unreachable, users cannot log in via directory credentials, group-based access policies cannot be evaluated, and new user provisioning stops. This directly impacts operator access to the control plane.
|
||||
|
||||
## Common Causes
|
||||
- LDAP/AD server is not running or is being restarted
|
||||
- Firewall blocking LDAP port (389) or LDAPS port (636)
|
||||
- DNS resolution failure for the LDAP hostname
|
||||
- Network unreachable between Stella Ops and the directory server
|
||||
- Incorrect host or port in configuration
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check LDAP configuration
|
||||
grep 'LDAP__\|ACTIVEDIRECTORY__' .env
|
||||
|
||||
# Test TCP connectivity from the gateway container
|
||||
docker compose exec gateway bash -c "echo > /dev/tcp/ldap.example.com/389 && echo OK || echo FAIL"
|
||||
|
||||
# Update LDAP host/port
|
||||
echo 'Ldap__Host=ldap.example.com' >> .env
|
||||
echo 'Ldap__Port=636' >> .env
|
||||
echo 'Ldap__UseSsl=true' >> .env
|
||||
docker compose restart gateway
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Ldap'
|
||||
|
||||
# Test connectivity
|
||||
telnet ldap.example.com 389
|
||||
# or
|
||||
nslookup ldap.example.com
|
||||
|
||||
# Update configuration
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
ldap:
|
||||
host: ldap.example.com
|
||||
port: 636
|
||||
useSsl: true
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.ldap
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oidc` -- OIDC provider connectivity (alternative auth mechanism)
|
||||
73
docs/doctor/articles/integration/object-storage.md
Normal file
73
docs/doctor/articles/integration/object-storage.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
checkId: check.integration.s3.storage
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [connectivity, s3, storage]
|
||||
---
|
||||
# Object Storage Connectivity
|
||||
|
||||
## What It Checks
|
||||
Reads the S3 endpoint from `S3:Endpoint`, `Storage:S3:Endpoint`, or `AWS:S3:ServiceURL`. Parses the URI to extract host and port (defaulting to 443 for HTTPS, 80 for HTTP). Opens a raw TCP connection with a 5-second timeout. The check **passes** if the TCP connection succeeds, **fails** on timeout, socket error, invalid URI format, or connection refusal.
|
||||
|
||||
## Why It Matters
|
||||
S3-compatible object storage is used for evidence packet archival, SBOM storage, offline kit distribution, and large artifact persistence. If the storage endpoint is unreachable, evidence export fails, SBOM uploads are rejected, and offline kit generation cannot complete. This blocks audit compliance workflows and air-gap distribution.
|
||||
|
||||
## Common Causes
|
||||
- S3 endpoint (MinIO, AWS S3, or compatible) is unreachable
|
||||
- Network connectivity issues or DNS failure
|
||||
- Firewall blocking the storage port
|
||||
- Invalid endpoint URL format in configuration
|
||||
- MinIO or S3-compatible service is not running
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check S3 configuration
|
||||
grep 'S3__\|STORAGE__S3' .env
|
||||
|
||||
# Test connectivity to MinIO
|
||||
docker compose exec gateway curl -v http://minio:9000/minio/health/live
|
||||
|
||||
# Restart MinIO if stopped
|
||||
docker compose up -d minio
|
||||
|
||||
# Update endpoint
|
||||
echo 'S3__Endpoint=http://minio:9000' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify S3 configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.S3'
|
||||
|
||||
# Test connectivity
|
||||
curl -v http://minio.example.com:9000/minio/health/live
|
||||
|
||||
# Check if MinIO is running
|
||||
sudo systemctl status minio
|
||||
|
||||
# Update configuration
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
s3:
|
||||
endpoint: http://minio.storage.svc.cluster.local:9000
|
||||
bucket: stellaops-evidence
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.s3.storage
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.registry` -- OCI registries may also store artifacts
|
||||
@@ -0,0 +1,70 @@
|
||||
---
|
||||
checkId: check.integration.oci.registry
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [connectivity, oci, registry]
|
||||
---
|
||||
# OCI Registry Connectivity
|
||||
|
||||
## What It Checks
|
||||
Reads the registry URL from `OCI:RegistryUrl` or `Registry:Url`. Sends an HTTP GET to `<registryUrl>/v2/` (the OCI Distribution Spec base endpoint). The check **passes** if the response is 200 (open registry) or 401 (registry reachable, auth required), **warns** on any other status code, and **fails** on connection errors.
|
||||
|
||||
## Why It Matters
|
||||
The OCI registry is the central artifact store for container images, SBOMs, attestations, and signatures. If the registry is unreachable, image pulls fail during deployment, SBOM scans cannot fetch manifests, attestation verification cannot retrieve signatures, and promotions are blocked. This is a foundational dependency for nearly every Stella Ops workflow.
|
||||
|
||||
## Common Causes
|
||||
- Registry URL is incorrect (typo, wrong port, wrong scheme)
|
||||
- Network connectivity issues between Stella Ops and the registry
|
||||
- Registry service is down or restarting
|
||||
- Registry does not support the OCI Distribution spec at `/v2/`
|
||||
- Registry endpoint is misconfigured (path prefix required)
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check registry configuration
|
||||
grep 'OCI__REGISTRYURL\|REGISTRY__URL' .env
|
||||
|
||||
# Test the /v2/ endpoint from inside the network
|
||||
docker compose exec gateway curl -sv https://registry.example.com/v2/
|
||||
|
||||
# Update registry URL
|
||||
echo 'OCI__RegistryUrl=https://registry.example.com' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.OCI'
|
||||
|
||||
# Test connectivity
|
||||
curl -v https://registry.example.com/v2/
|
||||
|
||||
# Fix configuration
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
oci:
|
||||
registryUrl: https://registry.example.com
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oci.registry
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.credentials` -- validates registry credentials
|
||||
- `check.integration.oci.pull` -- verifies pull authorization
|
||||
- `check.integration.oci.push` -- verifies push authorization
|
||||
- `check.integration.oci.referrers` -- checks OCI 1.1 referrers API support
|
||||
- `check.integration.oci.capabilities` -- probes full capability matrix
|
||||
75
docs/doctor/articles/integration/oidc-provider.md
Normal file
75
docs/doctor/articles/integration/oidc-provider.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
checkId: check.integration.oidc
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [connectivity, oidc, auth, identity]
|
||||
---
|
||||
# OIDC Provider
|
||||
|
||||
## What It Checks
|
||||
Reads the OIDC issuer URL from `Oidc:Issuer`, `Authentication:Oidc:Issuer`, or `Authority:Oidc:Issuer`. Fetches the OpenID Connect discovery document at `<issuer>/.well-known/openid-configuration`. On a successful response, parses the JSON for three required endpoints: `authorization_endpoint`, `token_endpoint`, and `jwks_uri`. The check **passes** if all three are present, **warns** if the discovery document is incomplete (missing one or more endpoints), **fails** if the discovery endpoint returns a non-success status code, and **fails** on connection errors.
|
||||
|
||||
## Why It Matters
|
||||
OIDC authentication is the primary identity mechanism for Stella Ops operators and API clients. If the OIDC provider is unreachable or misconfigured, users cannot log in, API tokens cannot be validated, and all authenticated workflows halt. An incomplete discovery document causes subtle failures where some auth flows work but others (e.g., token refresh) silently break.
|
||||
|
||||
## Common Causes
|
||||
- OIDC issuer URL is incorrect or has a trailing slash issue
|
||||
- OIDC provider (Authority, Keycloak, Azure AD, etc.) is down
|
||||
- Network connectivity issues between Stella Ops and the identity provider
|
||||
- Provider does not support OpenID Connect discovery
|
||||
- Discovery document is missing required endpoints
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check OIDC configuration
|
||||
grep 'OIDC__ISSUER\|AUTHENTICATION__OIDC' .env
|
||||
|
||||
# Test discovery endpoint
|
||||
docker compose exec gateway curl -sv \
|
||||
https://auth.example.com/.well-known/openid-configuration
|
||||
|
||||
# Verify the Authority service is running
|
||||
docker compose ps authority
|
||||
|
||||
# Update issuer URL
|
||||
echo 'Oidc__Issuer=https://auth.example.com' >> .env
|
||||
docker compose restart gateway platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Oidc'
|
||||
|
||||
# Test discovery
|
||||
curl -v https://auth.example.com/.well-known/openid-configuration
|
||||
|
||||
# Check required fields in the response
|
||||
curl -s https://auth.example.com/.well-known/openid-configuration \
|
||||
| jq '{authorization_endpoint, token_endpoint, jwks_uri}'
|
||||
|
||||
# Fix configuration
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
oidc:
|
||||
issuer: https://auth.example.com
|
||||
clientId: stellaops-ui
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oidc
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.ldap` -- alternative directory-based authentication
|
||||
@@ -0,0 +1,89 @@
|
||||
---
|
||||
checkId: check.integration.oci.capabilities
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: info
|
||||
tags: [registry, oci, capabilities, compatibility]
|
||||
---
|
||||
# OCI Registry Capability Matrix
|
||||
|
||||
## What It Checks
|
||||
Probes the configured OCI registry for five capabilities using a test repository (`OCI:TestRepository`, default `library/alpine`):
|
||||
|
||||
1. **Distribution version** -- GET `/v2/`, reads `OCI-Distribution-API-Version` or `Docker-Distribution-API-Version` header.
|
||||
2. **Referrers API** -- GET `/v2/<repo>/referrers/<digest>` with OCI accept header; passes if 200 or if a 404 response contains OCI index JSON.
|
||||
3. **Chunked upload** -- POST `/v2/<repo>/blobs/uploads/`; passes on 202 Accepted (upload session is immediately cancelled).
|
||||
4. **Cross-repo mount** -- POST `/v2/<repo>/blobs/uploads/?mount=<digest>&from=library/alpine`; passes on 201 Created or 202 Accepted.
|
||||
5. **Delete support** (manifests and blobs) -- OPTIONS request to check if `DELETE` appears in the `Allow` header.
|
||||
|
||||
Calculates a capability score (N/5). **Warns** if referrers API is unsupported, **info** if any other capability is missing, **passes** if all 5 are supported. **Fails** on connection errors.
|
||||
|
||||
## Why It Matters
|
||||
Different OCI registries support different subsets of the OCI Distribution Spec. Stella Ops uses referrers for attestation linking, chunked uploads for large SBOMs, cross-repo mounts for efficient promotion, and deletes for garbage collection. Knowing the capability matrix upfront prevents mysterious failures during release operations and allows operators to configure appropriate fallbacks.
|
||||
|
||||
## Common Causes
|
||||
- Registry does not implement OCI Distribution Spec v1.1 (no referrers API)
|
||||
- Registry has delete operations disabled by policy
|
||||
- Chunked upload is disabled in registry configuration
|
||||
- Cross-repo mount is not supported by the registry implementation
|
||||
- Registry version is too old for newer OCI features
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check registry type and version
|
||||
docker compose exec gateway curl -sv https://registry.example.com/v2/ \
|
||||
-o /dev/null 2>&1 | grep -i 'distribution-api-version'
|
||||
|
||||
# If referrers API is missing, consider upgrading the registry
|
||||
# Harbor 2.6+, Quay 3.12+, ACR, ECR, GCR/Artifact Registry support referrers
|
||||
|
||||
# Enable delete in Harbor
|
||||
# Update harbor.yml: delete_enabled: true
|
||||
# Restart Harbor
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Test referrers API directly
|
||||
curl -H "Accept: application/vnd.oci.image.index.v1+json" \
|
||||
https://registry.example.com/v2/library/alpine/referrers/sha256:abc...
|
||||
|
||||
# Test chunked upload
|
||||
curl -X POST https://registry.example.com/v2/test/blobs/uploads/
|
||||
|
||||
# Enable delete in Docker Distribution
|
||||
# In /etc/docker/registry/config.yml:
|
||||
# storage:
|
||||
# delete:
|
||||
# enabled: true
|
||||
sudo systemctl restart docker-registry
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml (for Harbor)
|
||||
harbor:
|
||||
registry:
|
||||
deleteEnabled: true
|
||||
|
||||
# values.yaml (for Stella Ops)
|
||||
oci:
|
||||
registryUrl: https://registry.example.com
|
||||
testRepository: library/alpine
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oci.capabilities
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.registry` -- basic registry connectivity
|
||||
- `check.integration.oci.referrers` -- focused referrers API check with digest resolution
|
||||
- `check.integration.oci.credentials` -- credential validation
|
||||
- `check.integration.oci.pull` -- pull authorization
|
||||
- `check.integration.oci.push` -- push authorization
|
||||
76
docs/doctor/articles/integration/registry-credentials.md
Normal file
76
docs/doctor/articles/integration/registry-credentials.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
checkId: check.integration.oci.credentials
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: fail
|
||||
tags: [registry, oci, credentials, secrets, auth]
|
||||
---
|
||||
# OCI Registry Credentials
|
||||
|
||||
## What It Checks
|
||||
Determines the authentication method from configuration: bearer token (`OCI:Token` / `Registry:Token`), basic auth (`OCI:Username` + `OCI:Password` / `Registry:Username` + `Registry:Password`), or anonymous. Immediately **fails** if a username is provided without a password. Then validates credentials by sending an authenticated HTTP GET to `<registryUrl>/v2/`. The check **passes** on 200 OK, or on 401 if the response includes a `WWW-Authenticate: Bearer` challenge and basic credentials are configured (OAuth2 token exchange scenario). It **fails** on 401 (invalid credentials) or 403 (forbidden), and **fails** on connection errors or timeouts.
|
||||
|
||||
## Why It Matters
|
||||
Invalid or expired registry credentials cause image pull/push failures across all deployment pipelines. Because credentials are often rotated on a schedule, this check provides early detection of expired tokens before they silently break promotions, SBOM ingestion, or attestation storage. A username-without-password misconfiguration indicates a secret reference that failed to resolve.
|
||||
|
||||
## Common Causes
|
||||
- Credentials are invalid or have been rotated without updating the configuration
|
||||
- Token has been revoked by the registry administrator
|
||||
- Username provided without a corresponding password (broken secret reference)
|
||||
- Service account token expired
|
||||
- IP address or network not in the registry's allowlist
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check credential configuration
|
||||
grep 'OCI__USERNAME\|OCI__PASSWORD\|OCI__TOKEN\|REGISTRY__' .env
|
||||
|
||||
# Test credentials manually
|
||||
docker login registry.example.com
|
||||
|
||||
# Rotate credentials
|
||||
echo 'OCI__Username=stellaops-svc' >> .env
|
||||
echo 'OCI__Password=<new-password>' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Check credential configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.OCI | {Username, Password: (if .Password then "****" else null end), Token: (if .Token then "****" else null end)}'
|
||||
|
||||
# Test with curl
|
||||
curl -u stellaops-svc:<password> https://registry.example.com/v2/
|
||||
|
||||
# Update credentials
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
oci:
|
||||
registryUrl: https://registry.example.com
|
||||
existingSecret: stellaops-registry-creds # Secret with username/password keys
|
||||
```
|
||||
```bash
|
||||
# Create or update the secret
|
||||
kubectl create secret generic stellaops-registry-creds \
|
||||
--from-literal=username=stellaops-svc \
|
||||
--from-literal=password=<new-password> \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oci.credentials
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.registry` -- basic connectivity (does not test auth)
|
||||
- `check.integration.oci.pull` -- verifies pull authorization with these credentials
|
||||
- `check.integration.oci.push` -- verifies push authorization with these credentials
|
||||
@@ -0,0 +1,72 @@
|
||||
---
|
||||
checkId: check.integration.oci.pull
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: fail
|
||||
tags: [registry, oci, pull, authorization, credentials]
|
||||
---
|
||||
# OCI Registry Pull Authorization
|
||||
|
||||
## What It Checks
|
||||
Sends an authenticated HTTP HEAD request to `<registryUrl>/v2/<testRepo>/manifests/<testTag>` with OCI and Docker manifest accept headers. Uses the test repository from `OCI:TestRepository` (default `library/alpine`) and test tag from `OCI:TestTag` (default `latest`). The check **passes** on 2xx (records manifest digest and content type), returns **info** on 404 (test image not found -- cannot verify), **fails** on 401 (invalid credentials), **fails** on 403 (valid credentials but no pull permission), and **fails** on connection errors or timeouts.
|
||||
|
||||
## Why It Matters
|
||||
Pull authorization is the most fundamental registry operation. Stella Ops pulls images for scanning, SBOM extraction, attestation verification, and deployment. If pull authorization fails, the entire image-based workflow is blocked. This check tests actual pull permissions rather than just credential validity, catching permission misconfigurations that `check.integration.oci.credentials` cannot detect.
|
||||
|
||||
## Common Causes
|
||||
- Credentials are invalid or expired
|
||||
- Token has been revoked
|
||||
- Anonymous pull is not allowed and no credentials are configured
|
||||
- Service account has been removed from the repository's access list
|
||||
- Repository access restricted by IP, network, or organization policy
|
||||
- Test image does not exist in the registry (404 -- configure `OCI:TestRepository`)
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Test pull manually
|
||||
docker pull registry.example.com/library/alpine:latest
|
||||
|
||||
# Check configured test repository
|
||||
grep 'OCI__TESTREPOSITORY\|REGISTRY__TESTREPOSITORY' .env
|
||||
|
||||
# Set a valid test image that exists in your registry
|
||||
echo 'OCI__TestRepository=myorg/base-image' >> .env
|
||||
echo 'OCI__TestTag=latest' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Test pull authorization with curl
|
||||
curl -I -H "Accept: application/vnd.oci.image.manifest.v1+json" \
|
||||
-u stellaops-svc:<password> \
|
||||
https://registry.example.com/v2/library/alpine/manifests/latest
|
||||
|
||||
# Configure a test image that exists in your registry
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
# Set OCI:TestRepository and OCI:TestTag
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
oci:
|
||||
registryUrl: https://registry.example.com
|
||||
testRepository: myorg/base-image
|
||||
testTag: latest
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oci.pull
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.credentials` -- validates credential configuration and token validity
|
||||
- `check.integration.oci.push` -- verifies push authorization
|
||||
- `check.integration.oci.registry` -- basic registry connectivity
|
||||
@@ -0,0 +1,74 @@
|
||||
---
|
||||
checkId: check.integration.oci.push
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: fail
|
||||
tags: [registry, oci, push, authorization, credentials]
|
||||
---
|
||||
# OCI Registry Push Authorization
|
||||
|
||||
## What It Checks
|
||||
Sends an authenticated HTTP POST to `<registryUrl>/v2/<testRepo>/blobs/uploads/` to initiate a blob upload session. Uses the test repository from `OCI:TestRepository` or `OCI:PushTestRepository` (default `stellaops/doctor-test`). Only runs if credentials are configured. The check **passes** on 202 Accepted (the upload session is immediately cancelled by sending a DELETE to the returned Location header), **fails** on 401 (invalid credentials), **fails** on 403 (valid credentials but no push permission), and **fails** on connection errors or timeouts. No data is actually written to the registry.
|
||||
|
||||
## Why It Matters
|
||||
Push authorization is required for storing attestations, SBOMs, signatures, and promoted images in the registry. Without push access, Stella Ops cannot attach evidence artifacts to releases, sign images, or complete promotion workflows. This check verifies the actual push permission grant, not just credential validity, using a non-destructive probe that leaves no artifacts behind.
|
||||
|
||||
## Common Causes
|
||||
- Credentials are valid but lack push (write) permissions
|
||||
- Repository does not exist and the registry does not support auto-creation
|
||||
- Service account has read-only access
|
||||
- Organization or team policy restricts push to specific accounts
|
||||
- Token has been revoked or expired
|
||||
- IP or network restrictions prevent write operations
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Test push manually
|
||||
echo "test" | docker push registry.example.com/stellaops/doctor-test:probe
|
||||
|
||||
# Grant push permissions to the service account in your registry UI
|
||||
|
||||
# Set a writable test repository
|
||||
echo 'OCI__PushTestRepository=myorg/stellaops-test' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Test push authorization with curl
|
||||
curl -X POST \
|
||||
-u stellaops-svc:<password> \
|
||||
https://registry.example.com/v2/stellaops/doctor-test/blobs/uploads/
|
||||
|
||||
# Expected: 202 Accepted with Location header
|
||||
|
||||
# Fix permissions in registry
|
||||
# Harbor: Add stellaops-svc as Developer/Admin to the project
|
||||
# GitLab: Grant Reporter+ role to the service account
|
||||
# ECR: Attach ecr:InitiateLayerUpload policy
|
||||
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
oci:
|
||||
registryUrl: https://registry.example.com
|
||||
pushTestRepository: myorg/stellaops-test
|
||||
existingSecret: stellaops-registry-creds
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oci.push
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.credentials` -- validates credential configuration and token validity
|
||||
- `check.integration.oci.pull` -- verifies pull authorization
|
||||
- `check.integration.oci.registry` -- basic registry connectivity
|
||||
82
docs/doctor/articles/integration/registry-referrers-api.md
Normal file
82
docs/doctor/articles/integration/registry-referrers-api.md
Normal file
@@ -0,0 +1,82 @@
|
||||
---
|
||||
checkId: check.integration.oci.referrers
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [registry, oci, referrers, compatibility, oci-1.1]
|
||||
---
|
||||
# OCI Registry Referrers API Support
|
||||
|
||||
## What It Checks
|
||||
First resolves the manifest digest for the test image (`OCI:TestRepository`:`OCI:TestTag`, defaults to `library/alpine:latest`) by sending a HEAD request to the manifests endpoint and reading the `Docker-Content-Digest` header. Then probes the referrers API at `<registryUrl>/v2/<repo>/referrers/<digest>` with the `application/vnd.oci.image.index.v1+json` accept header. The check **passes** on 200 OK or on 404 if the response body contains OCI index JSON (valid response meaning no referrers exist yet). It **warns** on 404 without OCI index (API not supported, tag-based fallback required) or 405 Method Not Allowed. Returns **info** if the test image is not found (cannot verify). **Fails** on connection errors.
|
||||
|
||||
## Why It Matters
|
||||
The OCI 1.1 referrers API enables artifact linking: attaching SBOMs, signatures, attestations, and VEX documents directly to container image manifests. Without it, Stella Ops must fall back to the tag-based referrer pattern (`sha256-{digest}.{artifactType}`), which is less efficient, harder to discover, and may conflict with registry tag naming policies. Knowing referrers API availability determines which linking strategy is used.
|
||||
|
||||
## Common Causes
|
||||
- Registry does not implement OCI Distribution Spec v1.1
|
||||
- Registry version is too old (pre-referrers API)
|
||||
- Referrers API disabled in registry configuration
|
||||
- Test image does not exist in registry (cannot resolve digest to probe)
|
||||
- Credentials lack pull permissions for the test image
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check registry version and referrers support
|
||||
docker compose exec gateway curl -sv \
|
||||
-H "Accept: application/vnd.oci.image.index.v1+json" \
|
||||
https://registry.example.com/v2/library/alpine/referrers/sha256:abc...
|
||||
|
||||
# Upgrade registry to a version supporting OCI 1.1 referrers:
|
||||
# - Harbor 2.6+
|
||||
# - Quay 3.12+
|
||||
# - ACR (default)
|
||||
# - ECR (default)
|
||||
# - GCR/Artifact Registry (default)
|
||||
# - Distribution 2.8+
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify registry version
|
||||
curl -I https://registry.example.com/v2/ 2>&1 | grep -i distribution
|
||||
|
||||
# Test referrers API
|
||||
DIGEST=$(curl -sI -H "Accept: application/vnd.oci.image.manifest.v1+json" \
|
||||
https://registry.example.com/v2/library/alpine/manifests/latest \
|
||||
| grep Docker-Content-Digest | awk '{print $2}' | tr -d '\r')
|
||||
|
||||
curl -H "Accept: application/vnd.oci.image.index.v1+json" \
|
||||
https://registry.example.com/v2/library/alpine/referrers/$DIGEST
|
||||
|
||||
# Upgrade the registry package
|
||||
sudo apt upgrade docker-registry # or equivalent
|
||||
sudo systemctl restart docker-registry
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# Upgrade Harbor chart
|
||||
helm upgrade harbor harbor/harbor --set registry.referrers.enabled=true
|
||||
|
||||
# Or configure Stella Ops with a test image that exists
|
||||
# values.yaml
|
||||
oci:
|
||||
registryUrl: https://registry.example.com
|
||||
testRepository: myorg/base-image
|
||||
testTag: latest
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.oci.referrers
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.capabilities` -- broader capability matrix including referrers
|
||||
- `check.integration.oci.registry` -- basic registry connectivity
|
||||
- `check.integration.oci.pull` -- pull authorization (needed to resolve test image digest)
|
||||
@@ -0,0 +1,89 @@
|
||||
---
|
||||
checkId: check.integration.secrets.manager
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: fail
|
||||
tags: [integration, secrets, vault, security, keyvault]
|
||||
---
|
||||
# Secrets Manager Connectivity
|
||||
|
||||
## What It Checks
|
||||
Iterates over all secrets managers defined under `Secrets:Managers` (or the legacy `Secrets:Vault:Url` / `Vault:Url` single-manager key). For each manager it sends an HTTP GET to a type-specific health endpoint: Vault uses `/v1/sys/health?standbyok=true&sealedcode=200&uninitcode=200`, Azure Key Vault uses `/healthstatus`, and others use `/health`. Sets the appropriate auth header (`X-Vault-Token` for Vault, `Bearer` for others). Records reachability, authentication success, and latency. For Vault, parses the response JSON for `sealed`, `initialized`, and `version` fields. The check **fails** if any manager is unreachable or returns 401/403, **fails** if any Vault instance is sealed, and **passes** if all managers are healthy and unsealed.
|
||||
|
||||
## Why It Matters
|
||||
Secrets managers store registry credentials, signing keys, API tokens, and encryption keys. If a secrets manager is unreachable, Stella Ops cannot retrieve credentials for deployments, cannot sign attestations, and cannot decrypt sensitive configuration. A sealed Vault is equally critical: all secret reads fail until it is manually unsealed. This is a hard blocker for any release operation.
|
||||
|
||||
## Common Causes
|
||||
- Secrets manager service is down or restarting
|
||||
- Network connectivity issue between Stella Ops and the secrets manager
|
||||
- Authentication token has expired or been revoked
|
||||
- TLS certificate issue (expired, untrusted CA)
|
||||
- Vault was restarted and needs manual unseal
|
||||
- Vault auto-seal triggered due to HSM connectivity loss
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check secrets manager configuration
|
||||
grep 'SECRETS__\|VAULT__' .env
|
||||
|
||||
# Test Vault health
|
||||
docker compose exec gateway curl -sv \
|
||||
http://vault:8200/v1/sys/health
|
||||
|
||||
# Unseal Vault if sealed
|
||||
docker compose exec vault vault operator unseal <key1>
|
||||
docker compose exec vault vault operator unseal <key2>
|
||||
docker compose exec vault vault operator unseal <key3>
|
||||
|
||||
# Refresh Vault token
|
||||
docker compose exec vault vault token create -policy=stellaops
|
||||
echo 'Secrets__Managers__0__Token=<new-token>' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Check Vault status
|
||||
vault status
|
||||
|
||||
# Unseal if needed
|
||||
vault operator unseal
|
||||
|
||||
# Renew the Vault token
|
||||
vault token renew
|
||||
|
||||
# Check Azure Key Vault health
|
||||
curl -v https://myvault.vault.azure.net/healthstatus
|
||||
|
||||
# Update configuration
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
secrets:
|
||||
managers:
|
||||
- name: vault-prod
|
||||
url: http://vault.vault.svc.cluster.local:8200
|
||||
type: vault
|
||||
existingSecret: stellaops-vault-token
|
||||
```
|
||||
```bash
|
||||
# Update Vault token secret
|
||||
kubectl create secret generic stellaops-vault-token \
|
||||
--from-literal=token=<new-token> \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.secrets.manager
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.oci.credentials` -- registry credentials that may be sourced from the secrets manager
|
||||
74
docs/doctor/articles/integration/slack-webhook.md
Normal file
74
docs/doctor/articles/integration/slack-webhook.md
Normal file
@@ -0,0 +1,74 @@
|
||||
---
|
||||
checkId: check.integration.slack
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: info
|
||||
tags: [notification, slack, webhook]
|
||||
---
|
||||
# Slack Webhook
|
||||
|
||||
## What It Checks
|
||||
Reads the Slack webhook URL from `Slack:WebhookUrl` or `Notify:Slack:WebhookUrl`. First validates the URL format: **warns** if the URL does not start with `https://hooks.slack.com/`. Then tests host reachability by sending an HTTP GET to the base URL (`https://hooks.slack.com`). The check **passes** if the Slack host is reachable, **warns** if the host is unreachable or if the URL format is suspicious. Does not send an actual webhook payload to avoid generating noise in the Slack channel.
|
||||
|
||||
## Why It Matters
|
||||
Slack notifications keep operators informed about deployment status, policy violations, security findings, and approval requests in real time. A misconfigured or unreachable Slack webhook means critical alerts go undelivered, potentially delaying incident response, approval workflows, or security remediation.
|
||||
|
||||
## Common Causes
|
||||
- Network connectivity issues between Stella Ops and Slack
|
||||
- Firewall blocking outbound HTTPS to `hooks.slack.com`
|
||||
- Proxy misconfiguration preventing external HTTPS
|
||||
- Webhook URL is malformed or points to the wrong service
|
||||
- Slack webhook URL has been regenerated (old URL invalidated)
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check Slack webhook configuration
|
||||
grep 'SLACK__WEBHOOKURL\|NOTIFY__SLACK' .env
|
||||
|
||||
# Test connectivity to Slack
|
||||
docker compose exec gateway curl -sv https://hooks.slack.com/ -o /dev/null
|
||||
|
||||
# Update webhook URL
|
||||
echo 'Slack__WebhookUrl=https://hooks.slack.com/services/T.../B.../xxx' >> .env
|
||||
docker compose restart platform
|
||||
|
||||
# If behind a proxy
|
||||
echo 'HTTP_PROXY=http://proxy:8080' >> .env
|
||||
echo 'HTTPS_PROXY=http://proxy:8080' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Slack'
|
||||
|
||||
# Test connectivity
|
||||
curl -sv https://hooks.slack.com/ -o /dev/null
|
||||
|
||||
# Update webhook URL
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
slack:
|
||||
webhookUrl: https://hooks.slack.com/services/T.../B.../xxx
|
||||
# or use an existing secret
|
||||
existingSecret: stellaops-slack-webhook
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.slack
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.teams` -- Microsoft Teams webhook (alternative notification channel)
|
||||
- `check.integration.webhooks` -- general webhook health monitoring
|
||||
76
docs/doctor/articles/integration/smtp-connectivity.md
Normal file
76
docs/doctor/articles/integration/smtp-connectivity.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
checkId: check.integration.smtp
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [connectivity, email, smtp]
|
||||
---
|
||||
# SMTP Email Connectivity
|
||||
|
||||
## What It Checks
|
||||
Reads the SMTP host from `Smtp:Host`, `Email:Smtp:Host`, or `Notify:Email:Host` and the port from the corresponding `:Port` key (defaulting to 587). Opens a raw TCP connection to the SMTP server with a 5-second timeout. The check **passes** if the TCP connection succeeds, **fails** on timeout, socket error, DNS failure, or connection refusal.
|
||||
|
||||
## Why It Matters
|
||||
Email notifications deliver approval requests, security alerts, deployment summaries, and audit reports to operators who may not be monitoring Slack or the web UI. If the SMTP server is unreachable, these notifications silently fail. For organizations with compliance requirements, email delivery may be the mandated audit notification channel.
|
||||
|
||||
## Common Causes
|
||||
- SMTP server is not running or is being restarted
|
||||
- Firewall blocking SMTP port (25, 465, or 587)
|
||||
- DNS resolution failure for the SMTP hostname
|
||||
- Network unreachable between Stella Ops and the mail server
|
||||
- Incorrect host or port in configuration
|
||||
- ISP/cloud provider blocking outbound SMTP
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check SMTP configuration
|
||||
grep 'SMTP__\|EMAIL__SMTP\|NOTIFY__EMAIL' .env
|
||||
|
||||
# Test TCP connectivity
|
||||
docker compose exec gateway bash -c \
|
||||
"echo > /dev/tcp/smtp.example.com/587 && echo OK || echo FAIL"
|
||||
|
||||
# Update SMTP settings
|
||||
echo 'Smtp__Host=smtp.example.com' >> .env
|
||||
echo 'Smtp__Port=587' >> .env
|
||||
echo 'Smtp__UseSsl=true' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Smtp'
|
||||
|
||||
# Test connectivity
|
||||
telnet smtp.example.com 587
|
||||
# or
|
||||
nslookup smtp.example.com
|
||||
|
||||
# Update configuration
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
smtp:
|
||||
host: smtp.example.com
|
||||
port: 587
|
||||
useSsl: true
|
||||
existingSecret: stellaops-smtp-creds # Secret with username/password
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.smtp
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.slack` -- Slack notifications (alternative channel)
|
||||
- `check.integration.teams` -- Teams notifications (alternative channel)
|
||||
75
docs/doctor/articles/integration/teams-webhook.md
Normal file
75
docs/doctor/articles/integration/teams-webhook.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
checkId: check.integration.teams
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: info
|
||||
tags: [notification, teams, webhook]
|
||||
---
|
||||
# Teams Webhook
|
||||
|
||||
## What It Checks
|
||||
Reads the Microsoft Teams webhook URL from `Teams:WebhookUrl` or `Notify:Teams:WebhookUrl`. First validates the URL format: **warns** if the URL does not contain `webhook.office.com` or `teams.microsoft.com`. Then tests host reachability by sending an HTTP GET to the base URL of the webhook host. The check **passes** if the Teams host is reachable, **warns** if the host is unreachable or if the URL format is suspicious. Does not send an actual webhook payload to avoid generating noise in the Teams channel.
|
||||
|
||||
## Why It Matters
|
||||
Microsoft Teams notifications keep operators informed about deployment status, policy violations, security findings, and approval requests. A misconfigured or unreachable Teams webhook means critical alerts go undelivered, potentially delaying incident response and approval workflows. For organizations standardized on Microsoft 365, Teams may be the primary notification channel.
|
||||
|
||||
## Common Causes
|
||||
- Network connectivity issues between Stella Ops and Microsoft services
|
||||
- Firewall blocking outbound HTTPS to `webhook.office.com`
|
||||
- Proxy misconfiguration preventing external HTTPS
|
||||
- Webhook URL is malformed or was copied incorrectly
|
||||
- Teams webhook connector has been removed or regenerated
|
||||
- Microsoft has migrated to a new webhook URL domain
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# Check Teams webhook configuration
|
||||
grep 'TEAMS__WEBHOOKURL\|NOTIFY__TEAMS' .env
|
||||
|
||||
# Test connectivity to Teams webhook host
|
||||
docker compose exec gateway curl -sv https://webhook.office.com/ -o /dev/null
|
||||
|
||||
# Update webhook URL
|
||||
echo 'Teams__WebhookUrl=https://webhook.office.com/webhookb2/...' >> .env
|
||||
docker compose restart platform
|
||||
|
||||
# If behind a proxy
|
||||
echo 'HTTP_PROXY=http://proxy:8080' >> .env
|
||||
echo 'HTTPS_PROXY=http://proxy:8080' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Verify configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Teams'
|
||||
|
||||
# Test connectivity
|
||||
curl -sv https://webhook.office.com/ -o /dev/null
|
||||
|
||||
# Update webhook URL
|
||||
sudo nano /etc/stellaops/appsettings.Production.json
|
||||
sudo systemctl restart stellaops-platform
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
teams:
|
||||
webhookUrl: https://webhook.office.com/webhookb2/...
|
||||
# or use an existing secret
|
||||
existingSecret: stellaops-teams-webhook
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.teams
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.slack` -- Slack webhook (alternative notification channel)
|
||||
- `check.integration.webhooks` -- general webhook health monitoring
|
||||
77
docs/doctor/articles/integration/webhook-health.md
Normal file
77
docs/doctor/articles/integration/webhook-health.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
checkId: check.integration.webhooks
|
||||
plugin: stellaops.doctor.integration
|
||||
severity: warn
|
||||
tags: [integration, webhooks, notifications, events]
|
||||
---
|
||||
# Integration Webhook Health
|
||||
|
||||
## What It Checks
|
||||
Iterates over all webhook endpoints defined under `Webhooks:Endpoints`. For **outbound** webhooks it sends an HTTP HEAD request to the target URL and considers the endpoint reachable if the response status code is below 500. For **inbound** webhooks it marks reachability as true (endpoint is local). It then calculates the delivery failure rate from `TotalDeliveries` and `SuccessfulDeliveries` counters. The check **fails** if any outbound endpoint is unreachable or if any webhook's failure rate exceeds 20%, **warns** if any webhook's failure rate is between 5% and 20%, and **passes** otherwise.
|
||||
|
||||
## Why It Matters
|
||||
Webhooks are the primary event-driven communication channel between Stella Ops and external systems. Unreachable outbound endpoints mean notifications, CI triggers, and audit event deliveries silently fail. A rising failure rate is an early warning of endpoint degradation that can cascade into missed alerts, delayed approvals, and incomplete audit trails.
|
||||
|
||||
## Common Causes
|
||||
- Webhook endpoint is down or returning 5xx errors
|
||||
- Network connectivity issue or DNS resolution failure
|
||||
- TLS certificate expired or untrusted
|
||||
- Payload format changed causing receiver to reject events
|
||||
- Rate limiting by the receiving service
|
||||
- Intermittent timeouts under load
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
# List configured webhooks
|
||||
grep 'WEBHOOKS__' .env
|
||||
|
||||
# Test an outbound webhook endpoint
|
||||
docker compose exec gateway curl -I https://hooks.example.com/stellaops
|
||||
|
||||
# View webhook delivery logs
|
||||
docker compose logs platform | grep -i webhook
|
||||
|
||||
# Update a webhook URL
|
||||
echo 'Webhooks__Endpoints__0__Url=https://hooks.example.com/v2/stellaops' >> .env
|
||||
docker compose restart platform
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Check webhook configuration
|
||||
cat /etc/stellaops/appsettings.Production.json | jq '.Webhooks'
|
||||
|
||||
# Test endpoint connectivity
|
||||
curl -I https://hooks.example.com/stellaops
|
||||
|
||||
# Review delivery history
|
||||
stella webhooks logs <webhook-name> --status failed
|
||||
|
||||
# Retry failed deliveries
|
||||
stella webhooks retry <webhook-name>
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
```yaml
|
||||
# values.yaml
|
||||
webhooks:
|
||||
endpoints:
|
||||
- name: slack-releases
|
||||
url: https://hooks.example.com/stellaops
|
||||
direction: outbound
|
||||
```
|
||||
```bash
|
||||
helm upgrade stellaops ./chart -f values.yaml
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.integration.webhooks
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.integration.slack` -- Slack-specific webhook validation
|
||||
- `check.integration.teams` -- Teams-specific webhook validation
|
||||
- `check.integration.ci.system` -- CI systems that receive webhook events
|
||||
Reference in New Issue
Block a user