synergy moats product advisory implementations
This commit is contained in:
193
docs/operations/runbooks/attestor-hsm-connection.md
Normal file
193
docs/operations/runbooks/attestor-hsm-connection.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# Runbook: Attestor - HSM Connection Issues
|
||||
|
||||
> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
|
||||
> **Task:** RUN-005 - Attestor Runbooks
|
||||
|
||||
## Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Component** | Attestor / Cryptography |
|
||||
| **Severity** | Critical |
|
||||
| **On-call scope** | Platform team, Security team |
|
||||
| **Last updated** | 2026-01-17 |
|
||||
| **Doctor check** | `check.crypto.hsm-availability` |
|
||||
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- [ ] Signing operations failing with "HSM unavailable"
|
||||
- [ ] Alert `AttestorHsmConnectionFailed` firing
|
||||
- [ ] Error: "PKCS#11 operation failed" or "HSM session timeout"
|
||||
- [ ] Attestations cannot be created
|
||||
- [ ] Key operations (sign, verify) failing
|
||||
|
||||
---
|
||||
|
||||
## Impact
|
||||
|
||||
| Impact Type | Description |
|
||||
|-------------|-------------|
|
||||
| **User-facing** | No attestations can be signed; releases blocked |
|
||||
| **Data integrity** | Keys are safe in HSM; operations resume when connection restored |
|
||||
| **SLA impact** | All signing operations blocked; compliance posture at risk |
|
||||
|
||||
---
|
||||
|
||||
## Diagnosis
|
||||
|
||||
### Quick checks
|
||||
|
||||
1. **Check Doctor diagnostics:**
|
||||
```bash
|
||||
stella doctor --check check.crypto.hsm-availability
|
||||
```
|
||||
|
||||
2. **Check HSM connection status:**
|
||||
```bash
|
||||
stella crypto hsm status
|
||||
```
|
||||
|
||||
3. **Test HSM connectivity:**
|
||||
```bash
|
||||
stella crypto hsm test
|
||||
```
|
||||
|
||||
### Deep diagnosis
|
||||
|
||||
1. **Check PKCS#11 library status:**
|
||||
```bash
|
||||
stella crypto hsm pkcs11-status
|
||||
```
|
||||
Look for: Library loaded, slot available, session active
|
||||
|
||||
2. **Check HSM network connectivity:**
|
||||
```bash
|
||||
stella crypto hsm ping
|
||||
```
|
||||
|
||||
3. **Check HSM session logs:**
|
||||
```bash
|
||||
stella crypto hsm logs --last 30m
|
||||
```
|
||||
Look for: Session errors, timeout, authentication failures
|
||||
|
||||
4. **Check HSM slot status:**
|
||||
```bash
|
||||
stella crypto hsm slots list
|
||||
```
|
||||
Problem if: Slot not found, slot busy, token not present
|
||||
|
||||
---
|
||||
|
||||
## Resolution
|
||||
|
||||
### Immediate mitigation
|
||||
|
||||
1. **Attempt HSM reconnection:**
|
||||
```bash
|
||||
stella crypto hsm reconnect
|
||||
```
|
||||
|
||||
2. **If HSM unreachable, switch to software signing (if permitted):**
|
||||
```bash
|
||||
stella attest config set signing.mode software
|
||||
stella attest reload
|
||||
```
|
||||
**Warning:** Software signing may not meet compliance requirements
|
||||
|
||||
3. **Use backup HSM if configured:**
|
||||
```bash
|
||||
stella crypto hsm failover --to backup
|
||||
```
|
||||
|
||||
### Root cause fix
|
||||
|
||||
**If network connectivity issue:**
|
||||
|
||||
1. Check HSM network path:
|
||||
```bash
|
||||
stella crypto hsm connectivity --verbose
|
||||
```
|
||||
|
||||
2. Verify firewall rules allow HSM port (typically 1792 for Luna, 2225 for SafeNet)
|
||||
|
||||
3. Check HSM server status with vendor tools
|
||||
|
||||
**If session timeout:**
|
||||
|
||||
1. Increase session timeout:
|
||||
```bash
|
||||
stella crypto hsm config set session.timeout 300s
|
||||
stella crypto hsm reconnect
|
||||
```
|
||||
|
||||
2. Enable session keep-alive:
|
||||
```bash
|
||||
stella crypto hsm config set session.keepalive true
|
||||
stella crypto hsm config set session.keepalive_interval 60s
|
||||
```
|
||||
|
||||
**If authentication failed:**
|
||||
|
||||
1. Verify HSM credentials:
|
||||
```bash
|
||||
stella crypto hsm auth verify
|
||||
```
|
||||
|
||||
2. Update HSM PIN if changed:
|
||||
```bash
|
||||
stella crypto hsm auth update --slot <slot-id>
|
||||
```
|
||||
|
||||
**If PKCS#11 library issue:**
|
||||
|
||||
1. Verify library path:
|
||||
```bash
|
||||
stella crypto hsm config get pkcs11.library_path
|
||||
```
|
||||
|
||||
2. Reload PKCS#11 library:
|
||||
```bash
|
||||
stella crypto hsm pkcs11-reload
|
||||
```
|
||||
|
||||
3. Check library compatibility:
|
||||
```bash
|
||||
stella crypto hsm pkcs11-info
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
```bash
|
||||
# Test HSM connectivity
|
||||
stella crypto hsm test
|
||||
|
||||
# Test signing operation
|
||||
stella attest test-sign
|
||||
|
||||
# Verify key access
|
||||
stella keys verify <key-id> --operation sign
|
||||
|
||||
# Check no errors in logs
|
||||
stella crypto hsm logs --level error --last 30m
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prevention
|
||||
|
||||
- [ ] **Redundancy:** Configure backup HSM for failover
|
||||
- [ ] **Monitoring:** Alert on HSM connection failures immediately
|
||||
- [ ] **Keep-alive:** Enable session keep-alive to prevent timeouts
|
||||
- [ ] **Testing:** Include HSM health in regular health checks
|
||||
|
||||
---
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **Architecture:** `docs/modules/cryptography/hsm-integration.md`
|
||||
- **Related runbooks:** `attestor-signing-failed.md`, `crypto-ops.md`
|
||||
- **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Crypto/`
|
||||
- **HSM setup:** `docs/operations/hsm-configuration.md`
|
||||
Reference in New Issue
Block a user