4.1 KiB
Runbook: Attestor - HSM Connection Issues
Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-005 - Attestor Runbooks
Metadata
| Field | Value |
|---|---|
| Component | Attestor / Cryptography |
| Severity | Critical |
| On-call scope | Platform team, Security team |
| Last updated | 2026-01-17 |
| Doctor check | check.crypto.hsm-availability |
Symptoms
- Signing operations failing with "HSM unavailable"
- Alert
AttestorHsmConnectionFailedfiring - Error: "PKCS#11 operation failed" or "HSM session timeout"
- Attestations cannot be created
- Key operations (sign, verify) failing
Impact
| Impact Type | Description |
|---|---|
| User-facing | No attestations can be signed; releases blocked |
| Data integrity | Keys are safe in HSM; operations resume when connection restored |
| SLA impact | All signing operations blocked; compliance posture at risk |
Diagnosis
Quick checks
-
Check Doctor diagnostics:
stella doctor --check check.crypto.hsm-availability -
Check HSM connection status:
stella crypto hsm status -
Test HSM connectivity:
stella crypto hsm test
Deep diagnosis
-
Check PKCS#11 library status:
stella crypto hsm pkcs11-statusLook for: Library loaded, slot available, session active
-
Check HSM network connectivity:
stella crypto hsm ping -
Check HSM session logs:
stella crypto hsm logs --last 30mLook for: Session errors, timeout, authentication failures
-
Check HSM slot status:
stella crypto hsm slots listProblem if: Slot not found, slot busy, token not present
Resolution
Immediate mitigation
-
Attempt HSM reconnection:
stella crypto hsm reconnect -
If HSM unreachable, switch to software signing (if permitted):
stella attest config set signing.mode software stella attest reloadWarning: Software signing may not meet compliance requirements
-
Use backup HSM if configured:
stella crypto hsm failover --to backup
Root cause fix
If network connectivity issue:
-
Check HSM network path:
stella crypto hsm connectivity --verbose -
Verify firewall rules allow HSM port (typically 1792 for Luna, 2225 for SafeNet)
-
Check HSM server status with vendor tools
If session timeout:
-
Increase session timeout:
stella crypto hsm config set session.timeout 300s stella crypto hsm reconnect -
Enable session keep-alive:
stella crypto hsm config set session.keepalive true stella crypto hsm config set session.keepalive_interval 60s
If authentication failed:
-
Verify HSM credentials:
stella crypto hsm auth verify -
Update HSM PIN if changed:
stella crypto hsm auth update --slot <slot-id>
If PKCS#11 library issue:
-
Verify library path:
stella crypto hsm config get pkcs11.library_path -
Reload PKCS#11 library:
stella crypto hsm pkcs11-reload -
Check library compatibility:
stella crypto hsm pkcs11-info
Verification
# Test HSM connectivity
stella crypto hsm test
# Test signing operation
stella attest test-sign
# Verify key access
stella keys verify <key-id> --operation sign
# Check no errors in logs
stella crypto hsm logs --level error --last 30m
Prevention
- Redundancy: Configure backup HSM for failover
- Monitoring: Alert on HSM connection failures immediately
- Keep-alive: Enable session keep-alive to prevent timeouts
- Testing: Include HSM health in regular health checks
Related Resources
- Architecture:
docs/modules/cryptography/hsm-integration.md - Related runbooks:
attestor-signing-failed.md,crypto-ops.md - Doctor check:
src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Crypto/ - HSM setup:
docs/operations/hsm-configuration.md