10 KiB
10 KiB
Key Escrow and Recovery Runbook
This runbook documents Shamir secret sharing key escrow and recovery procedures in Stella Ops.
Sprint: SPRINT_20260112_018_CRYPTO_key_escrow_shamir
Overview
Key escrow ensures critical cryptographic keys can be recovered if primary access is lost. Stella Ops uses Shamir's Secret Sharing to split keys into shares distributed among trusted custodians.
Key features:
- M-of-N threshold recovery (any M shares reconstruct the key)
- Share encryption at rest
- Custodian-based share distribution
- Integration with dual-control ceremonies
- Full audit trail
When to Use Key Escrow
| Scenario | Escrow Required |
|---|---|
| Root signing keys | Yes |
| HSM master keys | Yes |
| Trust anchor keys | Yes |
| Service signing keys | Recommended |
| User signing keys | Optional |
| Ephemeral keys | No |
Shamir Secret Sharing
How It Works
Shamir's Secret Sharing splits a secret into N shares where any M shares can reconstruct the original:
Secret S → Split(S, M, N) → [Share₁, Share₂, ..., Shareₙ]
Any M shares → Combine → Secret S
Fewer than M shares → Cannot reconstruct
Configuration Parameters
| Parameter | Description | Recommended |
|---|---|---|
| Threshold (M) | Minimum shares needed | 2-3 for keys |
| Total Shares (N) | Total shares created | M + 2 minimum |
| Share Encryption | Encrypt shares at rest | Always enabled |
Threshold Guidelines
| Key Type | Minimum M | Recommended N | Rationale |
|---|---|---|---|
| Root keys | 3 | 5 | High assurance |
| HSM keys | 2 | 4 | Availability + security |
| Service keys | 2 | 3 | Operational recovery |
Escrowing a Key
Via CLI
stella escrow create \
--key-id root-signing-key-2026 \
--threshold 3 \
--shares 5 \
--custodians custodian-1,custodian-2,custodian-3,custodian-4,custodian-5 \
--expires-in 365d \
--reason "Annual key escrow for root signing key"
Via API
curl -X POST https://signer.example.com/api/v1/escrow \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"keyId": "root-signing-key-2026",
"threshold": 3,
"totalShares": 5,
"custodianIds": [
"custodian-1", "custodian-2", "custodian-3",
"custodian-4", "custodian-5"
],
"expirationDays": 365,
"reason": "Annual key escrow for root signing key"
}'
Escrow Response
{
"escrowId": "esc-abc123",
"keyId": "root-signing-key-2026",
"threshold": 3,
"totalShares": 5,
"status": "Active",
"createdAt": "2026-01-16T10:00:00Z",
"expiresAt": "2027-01-16T10:00:00Z",
"shares": [
{ "shareId": "shr-001", "custodianId": "custodian-1", "distributed": true },
{ "shareId": "shr-002", "custodianId": "custodian-2", "distributed": true },
{ "shareId": "shr-003", "custodianId": "custodian-3", "distributed": true },
{ "shareId": "shr-004", "custodianId": "custodian-4", "distributed": true },
{ "shareId": "shr-005", "custodianId": "custodian-5", "distributed": true }
]
}
Share Distribution
Distribution Methods
| Method | Security | Use Case |
|---|---|---|
| Direct API delivery | High | Automated systems |
| Encrypted email | Medium | Remote custodians |
| In-person ceremony | Highest | Root keys |
| Hardware token | Highest | HSM keys |
Custodian Requirements
Each custodian must:
- Have verified identity in Authority
- Complete escrow custodian training
- Have secure share storage capability
- Be geographically distributed (recommended)
Verifying Share Distribution
stella escrow status --escrow-id esc-abc123
# Output:
# Escrow: esc-abc123
# Key: root-signing-key-2026
# Status: Active
# Threshold: 3 of 5
# Shares:
# [1] custodian-1: Distributed ✓
# [2] custodian-2: Distributed ✓
# [3] custodian-3: Distributed ✓
# [4] custodian-4: Distributed ✓
# [5] custodian-5: Distributed ✓
Key Recovery
Prerequisites
Recovery requires:
- Valid recovery request (incident, key loss, rotation)
- Dual-control ceremony approval (if configured)
- Minimum M custodians available with shares
- Secure recovery environment
Recovery Workflow
1. Initiate recovery request
2. (If required) Dual-control ceremony approval
3. Collect shares from M custodians
4. Verify share checksums
5. Reconstruct key
6. Verify reconstructed key
7. Log recovery event
Via CLI
# Step 1: Initiate recovery
stella escrow recover init \
--escrow-id esc-abc123 \
--reason "HSM failure - emergency key recovery" \
--ceremony-required
# Step 2: Collect shares (each custodian runs)
stella escrow recover submit-share \
--recovery-id rec-xyz789 \
--share-file /secure/my-share.enc \
--passphrase-file /secure/passphrase
# Step 3: Execute recovery (after threshold reached)
stella escrow recover execute \
--recovery-id rec-xyz789 \
--output-key-file /secure/recovered-key.pem
Via API
# Initiate recovery
curl -X POST https://signer.example.com/api/v1/escrow/esc-abc123/recover \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"reason": "HSM failure - emergency key recovery",
"requireCeremony": true
}'
# Submit share
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/shares \
-H "Authorization: Bearer $CUSTODIAN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"shareId": "shr-001",
"encryptedShare": "base64-encoded-share",
"checksum": "sha256:abc123..."
}'
# Execute recovery (after threshold)
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/execute \
-H "Authorization: Bearer $TOKEN"
Recovery Response
{
"recoveryId": "rec-xyz789",
"status": "Completed",
"keyId": "root-signing-key-2026",
"sharesCollected": 3,
"threshold": 3,
"completedAt": "2026-01-16T15:30:00Z",
"keyFingerprint": "SHA256:xyz789...",
"verified": true
}
Share Management
Custodian Share Storage
Custodians should store shares:
| Storage | Security Level | Notes |
|---|---|---|
| HSM | Highest | Preferred for root keys |
| Hardware token | High | YubiKey, smart card |
| Encrypted file | Medium | AES-256-GCM minimum |
| Password manager | Medium | Enterprise vault only |
Share Format
{
"shareId": "shr-001",
"escrowId": "esc-abc123",
"index": 1,
"threshold": 3,
"totalShares": 5,
"encryptedData": "base64-encoded-aes-256-gcm-ciphertext",
"checksum": "sha256:abc123...",
"createdAt": "2026-01-16T10:00:00Z",
"expiresAt": "2027-01-16T10:00:00Z"
}
Share Rotation
Re-escrow keys periodically:
stella escrow re-escrow \
--escrow-id esc-abc123 \
--new-custodians custodian-1,custodian-2,custodian-6,custodian-7,custodian-8 \
--reason "Annual share rotation"
This creates new shares and revokes old ones.
Audit Trail
Audit Events
| Event | Description |
|---|---|
escrow.created |
Key escrowed |
escrow.share.distributed |
Share sent to custodian |
escrow.share.accessed |
Custodian accessed share |
recovery.initiated |
Recovery started |
recovery.share.submitted |
Share submitted for recovery |
recovery.completed |
Key reconstructed |
recovery.failed |
Recovery failed |
escrow.revoked |
Escrow revoked |
Query Audit Logs
stella audit query \
--event-type "escrow.*,recovery.*" \
--escrow-id esc-abc123 \
--since 30d
Configuration
Escrow Settings
# escrow-config.yaml
escrow:
enabled: true
defaultThreshold: 2
minimumThreshold: 2
maximumShares: 10
shareEncryption:
algorithm: AES-256-GCM
keyDerivation: HKDF-SHA256
requireDualControlForRecovery: true
maxRecoveryAttempts: 3
recoveryTimeoutHours: 24
Custodian Configuration
# custodians.yaml
custodians:
- id: custodian-1
name: "Security Lead"
email: security-lead@company.com
publicKey: "-----BEGIN PUBLIC KEY-----..."
location: "US-East"
- id: custodian-2
name: "Key Officer A"
email: key-officer-a@company.com
publicKey: "-----BEGIN PUBLIC KEY-----..."
location: "EU-West"
Security Considerations
Share Security
- Never transmit shares in plaintext
- Encrypt shares with custodian's public key
- Verify checksums before and after storage
- Use secure channels for distribution
Recovery Security
- Require dual-control ceremonies for critical keys
- Limit recovery time window
- Verify recovered key fingerprint
- Audit all recovery attempts
Custodian Security
- Verify custodian identity before share access
- Geographic distribution reduces collusion risk
- Rotate custodians periodically
- Train custodians on secure handling
Troubleshooting
Common Issues
| Issue | Cause | Resolution |
|---|---|---|
| Share checksum mismatch | Corrupted share | Request re-distribution |
| Cannot decrypt share | Wrong passphrase | Verify passphrase |
| Recovery timeout | Shares not collected in time | Restart recovery |
| Key verification failed | Wrong shares combined | Verify share indices |
Verification Failures
# Verify share integrity
stella escrow verify-share --share-file share.enc
# Test reconstruction with subset
stella escrow test-recovery \
--escrow-id esc-abc123 \
--share-files share1.enc,share2.enc,share3.enc
Emergency Procedures
Lost Share
If a custodian loses their share:
- Verify at least M shares remain accessible
- Re-escrow with new share set
- Revoke compromised escrow
- Document incident
Compromised Custodian
If a custodian is compromised:
- Do NOT use their share for any recovery
- Re-escrow immediately with new custodians
- Revoke old escrow
- Consider key rotation if threshold was exposed
Multiple Lost Shares
If fewer than M shares are available:
- Key cannot be recovered via escrow
- Use backup key if available
- Generate new key and re-establish trust
- Document as key loss incident