Files
git.stella-ops.org/docs/operations/runbooks/attestor-hsm-connection.md

4.1 KiB

Runbook: Attestor - HSM Connection Issues

Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-005 - Attestor Runbooks

Metadata

Field Value
Component Attestor / Cryptography
Severity Critical
On-call scope Platform team, Security team
Last updated 2026-01-17
Doctor check check.crypto.hsm-availability

Symptoms

  • Signing operations failing with "HSM unavailable"
  • Alert AttestorHsmConnectionFailed firing
  • Error: "PKCS#11 operation failed" or "HSM session timeout"
  • Attestations cannot be created
  • Key operations (sign, verify) failing

Impact

Impact Type Description
User-facing No attestations can be signed; releases blocked
Data integrity Keys are safe in HSM; operations resume when connection restored
SLA impact All signing operations blocked; compliance posture at risk

Diagnosis

Quick checks

  1. Check Doctor diagnostics:

    stella doctor --check check.crypto.hsm-availability
    
  2. Check HSM connection status:

    stella crypto hsm status
    
  3. Test HSM connectivity:

    stella crypto hsm test
    

Deep diagnosis

  1. Check PKCS#11 library status:

    stella crypto hsm pkcs11-status
    

    Look for: Library loaded, slot available, session active

  2. Check HSM network connectivity:

    stella crypto hsm ping
    
  3. Check HSM session logs:

    stella crypto hsm logs --last 30m
    

    Look for: Session errors, timeout, authentication failures

  4. Check HSM slot status:

    stella crypto hsm slots list
    

    Problem if: Slot not found, slot busy, token not present


Resolution

Immediate mitigation

  1. Attempt HSM reconnection:

    stella crypto hsm reconnect
    
  2. If HSM unreachable, switch to software signing (if permitted):

    stella attest config set signing.mode software
    stella attest reload
    

    Warning: Software signing may not meet compliance requirements

  3. Use backup HSM if configured:

    stella crypto hsm failover --to backup
    

Root cause fix

If network connectivity issue:

  1. Check HSM network path:

    stella crypto hsm connectivity --verbose
    
  2. Verify firewall rules allow HSM port (typically 1792 for Luna, 2225 for SafeNet)

  3. Check HSM server status with vendor tools

If session timeout:

  1. Increase session timeout:

    stella crypto hsm config set session.timeout 300s
    stella crypto hsm reconnect
    
  2. Enable session keep-alive:

    stella crypto hsm config set session.keepalive true
    stella crypto hsm config set session.keepalive_interval 60s
    

If authentication failed:

  1. Verify HSM credentials:

    stella crypto hsm auth verify
    
  2. Update HSM PIN if changed:

    stella crypto hsm auth update --slot <slot-id>
    

If PKCS#11 library issue:

  1. Verify library path:

    stella crypto hsm config get pkcs11.library_path
    
  2. Reload PKCS#11 library:

    stella crypto hsm pkcs11-reload
    
  3. Check library compatibility:

    stella crypto hsm pkcs11-info
    

Verification

# Test HSM connectivity
stella crypto hsm test

# Test signing operation
stella attest test-sign

# Verify key access
stella keys verify <key-id> --operation sign

# Check no errors in logs
stella crypto hsm logs --level error --last 30m

Prevention

  • Redundancy: Configure backup HSM for failover
  • Monitoring: Alert on HSM connection failures immediately
  • Keep-alive: Enable session keep-alive to prevent timeouts
  • Testing: Include HSM health in regular health checks

  • Architecture: docs/modules/cryptography/hsm-integration.md
  • Related runbooks: attestor-signing-failed.md, crypto-ops.md
  • Doctor check: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Crypto/
  • HSM setup: docs/operations/hsm-configuration.md