418 lines
10 KiB
Markdown
418 lines
10 KiB
Markdown
# Key Escrow and Recovery Runbook
|
|
|
|
This runbook documents Shamir secret sharing key escrow and recovery procedures in Stella Ops.
|
|
|
|
> **Sprint:** SPRINT_20260112_018_CRYPTO_key_escrow_shamir
|
|
|
|
## Overview
|
|
|
|
Key escrow ensures critical cryptographic keys can be recovered if primary access is lost. Stella Ops uses Shamir's Secret Sharing to split keys into shares distributed among trusted custodians.
|
|
|
|
Key features:
|
|
- M-of-N threshold recovery (any M shares reconstruct the key)
|
|
- Share encryption at rest
|
|
- Custodian-based share distribution
|
|
- Integration with dual-control ceremonies
|
|
- Full audit trail
|
|
|
|
## When to Use Key Escrow
|
|
|
|
| Scenario | Escrow Required |
|
|
|----------|-----------------|
|
|
| Root signing keys | Yes |
|
|
| HSM master keys | Yes |
|
|
| Trust anchor keys | Yes |
|
|
| Service signing keys | Recommended |
|
|
| User signing keys | Optional |
|
|
| Ephemeral keys | No |
|
|
|
|
## Shamir Secret Sharing
|
|
|
|
### How It Works
|
|
|
|
Shamir's Secret Sharing splits a secret into N shares where any M shares can reconstruct the original:
|
|
|
|
```
|
|
Secret S → Split(S, M, N) → [Share₁, Share₂, ..., Shareₙ]
|
|
|
|
Any M shares → Combine → Secret S
|
|
Fewer than M shares → Cannot reconstruct
|
|
```
|
|
|
|
### Configuration Parameters
|
|
|
|
| Parameter | Description | Recommended |
|
|
|-----------|-------------|-------------|
|
|
| Threshold (M) | Minimum shares needed | 2-3 for keys |
|
|
| Total Shares (N) | Total shares created | M + 2 minimum |
|
|
| Share Encryption | Encrypt shares at rest | Always enabled |
|
|
|
|
### Threshold Guidelines
|
|
|
|
| Key Type | Minimum M | Recommended N | Rationale |
|
|
|----------|-----------|---------------|-----------|
|
|
| Root keys | 3 | 5 | High assurance |
|
|
| HSM keys | 2 | 4 | Availability + security |
|
|
| Service keys | 2 | 3 | Operational recovery |
|
|
|
|
## Escrowing a Key
|
|
|
|
### Via CLI
|
|
|
|
```bash
|
|
stella escrow create \
|
|
--key-id root-signing-key-2026 \
|
|
--threshold 3 \
|
|
--shares 5 \
|
|
--custodians custodian-1,custodian-2,custodian-3,custodian-4,custodian-5 \
|
|
--expires-in 365d \
|
|
--reason "Annual key escrow for root signing key"
|
|
```
|
|
|
|
### Via API
|
|
|
|
```bash
|
|
curl -X POST https://signer.example.com/api/v1/escrow \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"keyId": "root-signing-key-2026",
|
|
"threshold": 3,
|
|
"totalShares": 5,
|
|
"custodianIds": [
|
|
"custodian-1", "custodian-2", "custodian-3",
|
|
"custodian-4", "custodian-5"
|
|
],
|
|
"expirationDays": 365,
|
|
"reason": "Annual key escrow for root signing key"
|
|
}'
|
|
```
|
|
|
|
### Escrow Response
|
|
|
|
```json
|
|
{
|
|
"escrowId": "esc-abc123",
|
|
"keyId": "root-signing-key-2026",
|
|
"threshold": 3,
|
|
"totalShares": 5,
|
|
"status": "Active",
|
|
"createdAt": "2026-01-16T10:00:00Z",
|
|
"expiresAt": "2027-01-16T10:00:00Z",
|
|
"shares": [
|
|
{ "shareId": "shr-001", "custodianId": "custodian-1", "distributed": true },
|
|
{ "shareId": "shr-002", "custodianId": "custodian-2", "distributed": true },
|
|
{ "shareId": "shr-003", "custodianId": "custodian-3", "distributed": true },
|
|
{ "shareId": "shr-004", "custodianId": "custodian-4", "distributed": true },
|
|
{ "shareId": "shr-005", "custodianId": "custodian-5", "distributed": true }
|
|
]
|
|
}
|
|
```
|
|
|
|
## Share Distribution
|
|
|
|
### Distribution Methods
|
|
|
|
| Method | Security | Use Case |
|
|
|--------|----------|----------|
|
|
| Direct API delivery | High | Automated systems |
|
|
| Encrypted email | Medium | Remote custodians |
|
|
| In-person ceremony | Highest | Root keys |
|
|
| Hardware token | Highest | HSM keys |
|
|
|
|
### Custodian Requirements
|
|
|
|
Each custodian must:
|
|
1. Have verified identity in Authority
|
|
2. Complete escrow custodian training
|
|
3. Have secure share storage capability
|
|
4. Be geographically distributed (recommended)
|
|
|
|
### Verifying Share Distribution
|
|
|
|
```bash
|
|
stella escrow status --escrow-id esc-abc123
|
|
|
|
# Output:
|
|
# Escrow: esc-abc123
|
|
# Key: root-signing-key-2026
|
|
# Status: Active
|
|
# Threshold: 3 of 5
|
|
# Shares:
|
|
# [1] custodian-1: Distributed ✓
|
|
# [2] custodian-2: Distributed ✓
|
|
# [3] custodian-3: Distributed ✓
|
|
# [4] custodian-4: Distributed ✓
|
|
# [5] custodian-5: Distributed ✓
|
|
```
|
|
|
|
## Key Recovery
|
|
|
|
### Prerequisites
|
|
|
|
Recovery requires:
|
|
1. Valid recovery request (incident, key loss, rotation)
|
|
2. Dual-control ceremony approval (if configured)
|
|
3. Minimum M custodians available with shares
|
|
4. Secure recovery environment
|
|
|
|
### Recovery Workflow
|
|
|
|
```
|
|
1. Initiate recovery request
|
|
2. (If required) Dual-control ceremony approval
|
|
3. Collect shares from M custodians
|
|
4. Verify share checksums
|
|
5. Reconstruct key
|
|
6. Verify reconstructed key
|
|
7. Log recovery event
|
|
```
|
|
|
|
### Via CLI
|
|
|
|
```bash
|
|
# Step 1: Initiate recovery
|
|
stella escrow recover init \
|
|
--escrow-id esc-abc123 \
|
|
--reason "HSM failure - emergency key recovery" \
|
|
--ceremony-required
|
|
|
|
# Step 2: Collect shares (each custodian runs)
|
|
stella escrow recover submit-share \
|
|
--recovery-id rec-xyz789 \
|
|
--share-file /secure/my-share.enc \
|
|
--passphrase-file /secure/passphrase
|
|
|
|
# Step 3: Execute recovery (after threshold reached)
|
|
stella escrow recover execute \
|
|
--recovery-id rec-xyz789 \
|
|
--output-key-file /secure/recovered-key.pem
|
|
```
|
|
|
|
### Via API
|
|
|
|
```bash
|
|
# Initiate recovery
|
|
curl -X POST https://signer.example.com/api/v1/escrow/esc-abc123/recover \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"reason": "HSM failure - emergency key recovery",
|
|
"requireCeremony": true
|
|
}'
|
|
|
|
# Submit share
|
|
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/shares \
|
|
-H "Authorization: Bearer $CUSTODIAN_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"shareId": "shr-001",
|
|
"encryptedShare": "base64-encoded-share",
|
|
"checksum": "sha256:abc123..."
|
|
}'
|
|
|
|
# Execute recovery (after threshold)
|
|
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/execute \
|
|
-H "Authorization: Bearer $TOKEN"
|
|
```
|
|
|
|
### Recovery Response
|
|
|
|
```json
|
|
{
|
|
"recoveryId": "rec-xyz789",
|
|
"status": "Completed",
|
|
"keyId": "root-signing-key-2026",
|
|
"sharesCollected": 3,
|
|
"threshold": 3,
|
|
"completedAt": "2026-01-16T15:30:00Z",
|
|
"keyFingerprint": "SHA256:xyz789...",
|
|
"verified": true
|
|
}
|
|
```
|
|
|
|
## Share Management
|
|
|
|
### Custodian Share Storage
|
|
|
|
Custodians should store shares:
|
|
|
|
| Storage | Security Level | Notes |
|
|
|---------|----------------|-------|
|
|
| HSM | Highest | Preferred for root keys |
|
|
| Hardware token | High | YubiKey, smart card |
|
|
| Encrypted file | Medium | AES-256-GCM minimum |
|
|
| Password manager | Medium | Enterprise vault only |
|
|
|
|
### Share Format
|
|
|
|
```json
|
|
{
|
|
"shareId": "shr-001",
|
|
"escrowId": "esc-abc123",
|
|
"index": 1,
|
|
"threshold": 3,
|
|
"totalShares": 5,
|
|
"encryptedData": "base64-encoded-aes-256-gcm-ciphertext",
|
|
"checksum": "sha256:abc123...",
|
|
"createdAt": "2026-01-16T10:00:00Z",
|
|
"expiresAt": "2027-01-16T10:00:00Z"
|
|
}
|
|
```
|
|
|
|
### Share Rotation
|
|
|
|
Re-escrow keys periodically:
|
|
|
|
```bash
|
|
stella escrow re-escrow \
|
|
--escrow-id esc-abc123 \
|
|
--new-custodians custodian-1,custodian-2,custodian-6,custodian-7,custodian-8 \
|
|
--reason "Annual share rotation"
|
|
```
|
|
|
|
This creates new shares and revokes old ones.
|
|
|
|
## Audit Trail
|
|
|
|
### Audit Events
|
|
|
|
| Event | Description |
|
|
|-------|-------------|
|
|
| `escrow.created` | Key escrowed |
|
|
| `escrow.share.distributed` | Share sent to custodian |
|
|
| `escrow.share.accessed` | Custodian accessed share |
|
|
| `recovery.initiated` | Recovery started |
|
|
| `recovery.share.submitted` | Share submitted for recovery |
|
|
| `recovery.completed` | Key reconstructed |
|
|
| `recovery.failed` | Recovery failed |
|
|
| `escrow.revoked` | Escrow revoked |
|
|
|
|
### Query Audit Logs
|
|
|
|
```bash
|
|
stella audit query \
|
|
--event-type "escrow.*,recovery.*" \
|
|
--escrow-id esc-abc123 \
|
|
--since 30d
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Escrow Settings
|
|
|
|
```yaml
|
|
# escrow-config.yaml
|
|
escrow:
|
|
enabled: true
|
|
defaultThreshold: 2
|
|
minimumThreshold: 2
|
|
maximumShares: 10
|
|
shareEncryption:
|
|
algorithm: AES-256-GCM
|
|
keyDerivation: HKDF-SHA256
|
|
requireDualControlForRecovery: true
|
|
maxRecoveryAttempts: 3
|
|
recoveryTimeoutHours: 24
|
|
```
|
|
|
|
### Custodian Configuration
|
|
|
|
```yaml
|
|
# custodians.yaml
|
|
custodians:
|
|
- id: custodian-1
|
|
name: "Security Lead"
|
|
email: security-lead@company.com
|
|
publicKey: "-----BEGIN PUBLIC KEY-----..."
|
|
location: "US-East"
|
|
|
|
- id: custodian-2
|
|
name: "Key Officer A"
|
|
email: key-officer-a@company.com
|
|
publicKey: "-----BEGIN PUBLIC KEY-----..."
|
|
location: "EU-West"
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Share Security
|
|
|
|
- Never transmit shares in plaintext
|
|
- Encrypt shares with custodian's public key
|
|
- Verify checksums before and after storage
|
|
- Use secure channels for distribution
|
|
|
|
### Recovery Security
|
|
|
|
- Require dual-control ceremonies for critical keys
|
|
- Limit recovery time window
|
|
- Verify recovered key fingerprint
|
|
- Audit all recovery attempts
|
|
|
|
### Custodian Security
|
|
|
|
- Verify custodian identity before share access
|
|
- Geographic distribution reduces collusion risk
|
|
- Rotate custodians periodically
|
|
- Train custodians on secure handling
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
| Issue | Cause | Resolution |
|
|
|-------|-------|------------|
|
|
| Share checksum mismatch | Corrupted share | Request re-distribution |
|
|
| Cannot decrypt share | Wrong passphrase | Verify passphrase |
|
|
| Recovery timeout | Shares not collected in time | Restart recovery |
|
|
| Key verification failed | Wrong shares combined | Verify share indices |
|
|
|
|
### Verification Failures
|
|
|
|
```bash
|
|
# Verify share integrity
|
|
stella escrow verify-share --share-file share.enc
|
|
|
|
# Test reconstruction with subset
|
|
stella escrow test-recovery \
|
|
--escrow-id esc-abc123 \
|
|
--share-files share1.enc,share2.enc,share3.enc
|
|
```
|
|
|
|
## Emergency Procedures
|
|
|
|
### Lost Share
|
|
|
|
If a custodian loses their share:
|
|
|
|
1. Verify at least M shares remain accessible
|
|
2. Re-escrow with new share set
|
|
3. Revoke compromised escrow
|
|
4. Document incident
|
|
|
|
### Compromised Custodian
|
|
|
|
If a custodian is compromised:
|
|
|
|
1. Do NOT use their share for any recovery
|
|
2. Re-escrow immediately with new custodians
|
|
3. Revoke old escrow
|
|
4. Consider key rotation if threshold was exposed
|
|
|
|
### Multiple Lost Shares
|
|
|
|
If fewer than M shares are available:
|
|
|
|
1. Key cannot be recovered via escrow
|
|
2. Use backup key if available
|
|
3. Generate new key and re-establish trust
|
|
4. Document as key loss incident
|
|
|
|
## Related Documentation
|
|
|
|
- [Dual-Control Ceremony Runbook](./dual-control-ceremony-runbook.md)
|
|
- [Key Rotation Runbook](./key-rotation-runbook.md)
|
|
- [HSM Setup Runbook](./hsm-setup-runbook.md)
|
|
- [Cryptography Architecture](../modules/cryptography/architecture.md)
|