sprints completion. new product advisories prepared
This commit is contained in:
417
docs/operations/key-escrow-runbook.md
Normal file
417
docs/operations/key-escrow-runbook.md
Normal file
@@ -0,0 +1,417 @@
|
||||
# Key Escrow and Recovery Runbook
|
||||
|
||||
This runbook documents Shamir secret sharing key escrow and recovery procedures in Stella Ops.
|
||||
|
||||
> **Sprint:** SPRINT_20260112_018_CRYPTO_key_escrow_shamir
|
||||
|
||||
## Overview
|
||||
|
||||
Key escrow ensures critical cryptographic keys can be recovered if primary access is lost. Stella Ops uses Shamir's Secret Sharing to split keys into shares distributed among trusted custodians.
|
||||
|
||||
Key features:
|
||||
- M-of-N threshold recovery (any M shares reconstruct the key)
|
||||
- Share encryption at rest
|
||||
- Custodian-based share distribution
|
||||
- Integration with dual-control ceremonies
|
||||
- Full audit trail
|
||||
|
||||
## When to Use Key Escrow
|
||||
|
||||
| Scenario | Escrow Required |
|
||||
|----------|-----------------|
|
||||
| Root signing keys | Yes |
|
||||
| HSM master keys | Yes |
|
||||
| Trust anchor keys | Yes |
|
||||
| Service signing keys | Recommended |
|
||||
| User signing keys | Optional |
|
||||
| Ephemeral keys | No |
|
||||
|
||||
## Shamir Secret Sharing
|
||||
|
||||
### How It Works
|
||||
|
||||
Shamir's Secret Sharing splits a secret into N shares where any M shares can reconstruct the original:
|
||||
|
||||
```
|
||||
Secret S → Split(S, M, N) → [Share₁, Share₂, ..., Shareₙ]
|
||||
|
||||
Any M shares → Combine → Secret S
|
||||
Fewer than M shares → Cannot reconstruct
|
||||
```
|
||||
|
||||
### Configuration Parameters
|
||||
|
||||
| Parameter | Description | Recommended |
|
||||
|-----------|-------------|-------------|
|
||||
| Threshold (M) | Minimum shares needed | 2-3 for keys |
|
||||
| Total Shares (N) | Total shares created | M + 2 minimum |
|
||||
| Share Encryption | Encrypt shares at rest | Always enabled |
|
||||
|
||||
### Threshold Guidelines
|
||||
|
||||
| Key Type | Minimum M | Recommended N | Rationale |
|
||||
|----------|-----------|---------------|-----------|
|
||||
| Root keys | 3 | 5 | High assurance |
|
||||
| HSM keys | 2 | 4 | Availability + security |
|
||||
| Service keys | 2 | 3 | Operational recovery |
|
||||
|
||||
## Escrowing a Key
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
stella escrow create \
|
||||
--key-id root-signing-key-2026 \
|
||||
--threshold 3 \
|
||||
--shares 5 \
|
||||
--custodians custodian-1,custodian-2,custodian-3,custodian-4,custodian-5 \
|
||||
--expires-in 365d \
|
||||
--reason "Annual key escrow for root signing key"
|
||||
```
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
curl -X POST https://signer.example.com/api/v1/escrow \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"keyId": "root-signing-key-2026",
|
||||
"threshold": 3,
|
||||
"totalShares": 5,
|
||||
"custodianIds": [
|
||||
"custodian-1", "custodian-2", "custodian-3",
|
||||
"custodian-4", "custodian-5"
|
||||
],
|
||||
"expirationDays": 365,
|
||||
"reason": "Annual key escrow for root signing key"
|
||||
}'
|
||||
```
|
||||
|
||||
### Escrow Response
|
||||
|
||||
```json
|
||||
{
|
||||
"escrowId": "esc-abc123",
|
||||
"keyId": "root-signing-key-2026",
|
||||
"threshold": 3,
|
||||
"totalShares": 5,
|
||||
"status": "Active",
|
||||
"createdAt": "2026-01-16T10:00:00Z",
|
||||
"expiresAt": "2027-01-16T10:00:00Z",
|
||||
"shares": [
|
||||
{ "shareId": "shr-001", "custodianId": "custodian-1", "distributed": true },
|
||||
{ "shareId": "shr-002", "custodianId": "custodian-2", "distributed": true },
|
||||
{ "shareId": "shr-003", "custodianId": "custodian-3", "distributed": true },
|
||||
{ "shareId": "shr-004", "custodianId": "custodian-4", "distributed": true },
|
||||
{ "shareId": "shr-005", "custodianId": "custodian-5", "distributed": true }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Share Distribution
|
||||
|
||||
### Distribution Methods
|
||||
|
||||
| Method | Security | Use Case |
|
||||
|--------|----------|----------|
|
||||
| Direct API delivery | High | Automated systems |
|
||||
| Encrypted email | Medium | Remote custodians |
|
||||
| In-person ceremony | Highest | Root keys |
|
||||
| Hardware token | Highest | HSM keys |
|
||||
|
||||
### Custodian Requirements
|
||||
|
||||
Each custodian must:
|
||||
1. Have verified identity in Authority
|
||||
2. Complete escrow custodian training
|
||||
3. Have secure share storage capability
|
||||
4. Be geographically distributed (recommended)
|
||||
|
||||
### Verifying Share Distribution
|
||||
|
||||
```bash
|
||||
stella escrow status --escrow-id esc-abc123
|
||||
|
||||
# Output:
|
||||
# Escrow: esc-abc123
|
||||
# Key: root-signing-key-2026
|
||||
# Status: Active
|
||||
# Threshold: 3 of 5
|
||||
# Shares:
|
||||
# [1] custodian-1: Distributed ✓
|
||||
# [2] custodian-2: Distributed ✓
|
||||
# [3] custodian-3: Distributed ✓
|
||||
# [4] custodian-4: Distributed ✓
|
||||
# [5] custodian-5: Distributed ✓
|
||||
```
|
||||
|
||||
## Key Recovery
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Recovery requires:
|
||||
1. Valid recovery request (incident, key loss, rotation)
|
||||
2. Dual-control ceremony approval (if configured)
|
||||
3. Minimum M custodians available with shares
|
||||
4. Secure recovery environment
|
||||
|
||||
### Recovery Workflow
|
||||
|
||||
```
|
||||
1. Initiate recovery request
|
||||
2. (If required) Dual-control ceremony approval
|
||||
3. Collect shares from M custodians
|
||||
4. Verify share checksums
|
||||
5. Reconstruct key
|
||||
6. Verify reconstructed key
|
||||
7. Log recovery event
|
||||
```
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
# Step 1: Initiate recovery
|
||||
stella escrow recover init \
|
||||
--escrow-id esc-abc123 \
|
||||
--reason "HSM failure - emergency key recovery" \
|
||||
--ceremony-required
|
||||
|
||||
# Step 2: Collect shares (each custodian runs)
|
||||
stella escrow recover submit-share \
|
||||
--recovery-id rec-xyz789 \
|
||||
--share-file /secure/my-share.enc \
|
||||
--passphrase-file /secure/passphrase
|
||||
|
||||
# Step 3: Execute recovery (after threshold reached)
|
||||
stella escrow recover execute \
|
||||
--recovery-id rec-xyz789 \
|
||||
--output-key-file /secure/recovered-key.pem
|
||||
```
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
# Initiate recovery
|
||||
curl -X POST https://signer.example.com/api/v1/escrow/esc-abc123/recover \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"reason": "HSM failure - emergency key recovery",
|
||||
"requireCeremony": true
|
||||
}'
|
||||
|
||||
# Submit share
|
||||
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/shares \
|
||||
-H "Authorization: Bearer $CUSTODIAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"shareId": "shr-001",
|
||||
"encryptedShare": "base64-encoded-share",
|
||||
"checksum": "sha256:abc123..."
|
||||
}'
|
||||
|
||||
# Execute recovery (after threshold)
|
||||
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/execute \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
|
||||
### Recovery Response
|
||||
|
||||
```json
|
||||
{
|
||||
"recoveryId": "rec-xyz789",
|
||||
"status": "Completed",
|
||||
"keyId": "root-signing-key-2026",
|
||||
"sharesCollected": 3,
|
||||
"threshold": 3,
|
||||
"completedAt": "2026-01-16T15:30:00Z",
|
||||
"keyFingerprint": "SHA256:xyz789...",
|
||||
"verified": true
|
||||
}
|
||||
```
|
||||
|
||||
## Share Management
|
||||
|
||||
### Custodian Share Storage
|
||||
|
||||
Custodians should store shares:
|
||||
|
||||
| Storage | Security Level | Notes |
|
||||
|---------|----------------|-------|
|
||||
| HSM | Highest | Preferred for root keys |
|
||||
| Hardware token | High | YubiKey, smart card |
|
||||
| Encrypted file | Medium | AES-256-GCM minimum |
|
||||
| Password manager | Medium | Enterprise vault only |
|
||||
|
||||
### Share Format
|
||||
|
||||
```json
|
||||
{
|
||||
"shareId": "shr-001",
|
||||
"escrowId": "esc-abc123",
|
||||
"index": 1,
|
||||
"threshold": 3,
|
||||
"totalShares": 5,
|
||||
"encryptedData": "base64-encoded-aes-256-gcm-ciphertext",
|
||||
"checksum": "sha256:abc123...",
|
||||
"createdAt": "2026-01-16T10:00:00Z",
|
||||
"expiresAt": "2027-01-16T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Share Rotation
|
||||
|
||||
Re-escrow keys periodically:
|
||||
|
||||
```bash
|
||||
stella escrow re-escrow \
|
||||
--escrow-id esc-abc123 \
|
||||
--new-custodians custodian-1,custodian-2,custodian-6,custodian-7,custodian-8 \
|
||||
--reason "Annual share rotation"
|
||||
```
|
||||
|
||||
This creates new shares and revokes old ones.
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Audit Events
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `escrow.created` | Key escrowed |
|
||||
| `escrow.share.distributed` | Share sent to custodian |
|
||||
| `escrow.share.accessed` | Custodian accessed share |
|
||||
| `recovery.initiated` | Recovery started |
|
||||
| `recovery.share.submitted` | Share submitted for recovery |
|
||||
| `recovery.completed` | Key reconstructed |
|
||||
| `recovery.failed` | Recovery failed |
|
||||
| `escrow.revoked` | Escrow revoked |
|
||||
|
||||
### Query Audit Logs
|
||||
|
||||
```bash
|
||||
stella audit query \
|
||||
--event-type "escrow.*,recovery.*" \
|
||||
--escrow-id esc-abc123 \
|
||||
--since 30d
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Escrow Settings
|
||||
|
||||
```yaml
|
||||
# escrow-config.yaml
|
||||
escrow:
|
||||
enabled: true
|
||||
defaultThreshold: 2
|
||||
minimumThreshold: 2
|
||||
maximumShares: 10
|
||||
shareEncryption:
|
||||
algorithm: AES-256-GCM
|
||||
keyDerivation: HKDF-SHA256
|
||||
requireDualControlForRecovery: true
|
||||
maxRecoveryAttempts: 3
|
||||
recoveryTimeoutHours: 24
|
||||
```
|
||||
|
||||
### Custodian Configuration
|
||||
|
||||
```yaml
|
||||
# custodians.yaml
|
||||
custodians:
|
||||
- id: custodian-1
|
||||
name: "Security Lead"
|
||||
email: security-lead@company.com
|
||||
publicKey: "-----BEGIN PUBLIC KEY-----..."
|
||||
location: "US-East"
|
||||
|
||||
- id: custodian-2
|
||||
name: "Key Officer A"
|
||||
email: key-officer-a@company.com
|
||||
publicKey: "-----BEGIN PUBLIC KEY-----..."
|
||||
location: "EU-West"
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Share Security
|
||||
|
||||
- Never transmit shares in plaintext
|
||||
- Encrypt shares with custodian's public key
|
||||
- Verify checksums before and after storage
|
||||
- Use secure channels for distribution
|
||||
|
||||
### Recovery Security
|
||||
|
||||
- Require dual-control ceremonies for critical keys
|
||||
- Limit recovery time window
|
||||
- Verify recovered key fingerprint
|
||||
- Audit all recovery attempts
|
||||
|
||||
### Custodian Security
|
||||
|
||||
- Verify custodian identity before share access
|
||||
- Geographic distribution reduces collusion risk
|
||||
- Rotate custodians periodically
|
||||
- Train custodians on secure handling
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Share checksum mismatch | Corrupted share | Request re-distribution |
|
||||
| Cannot decrypt share | Wrong passphrase | Verify passphrase |
|
||||
| Recovery timeout | Shares not collected in time | Restart recovery |
|
||||
| Key verification failed | Wrong shares combined | Verify share indices |
|
||||
|
||||
### Verification Failures
|
||||
|
||||
```bash
|
||||
# Verify share integrity
|
||||
stella escrow verify-share --share-file share.enc
|
||||
|
||||
# Test reconstruction with subset
|
||||
stella escrow test-recovery \
|
||||
--escrow-id esc-abc123 \
|
||||
--share-files share1.enc,share2.enc,share3.enc
|
||||
```
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Lost Share
|
||||
|
||||
If a custodian loses their share:
|
||||
|
||||
1. Verify at least M shares remain accessible
|
||||
2. Re-escrow with new share set
|
||||
3. Revoke compromised escrow
|
||||
4. Document incident
|
||||
|
||||
### Compromised Custodian
|
||||
|
||||
If a custodian is compromised:
|
||||
|
||||
1. Do NOT use their share for any recovery
|
||||
2. Re-escrow immediately with new custodians
|
||||
3. Revoke old escrow
|
||||
4. Consider key rotation if threshold was exposed
|
||||
|
||||
### Multiple Lost Shares
|
||||
|
||||
If fewer than M shares are available:
|
||||
|
||||
1. Key cannot be recovered via escrow
|
||||
2. Use backup key if available
|
||||
3. Generate new key and re-establish trust
|
||||
4. Document as key loss incident
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Dual-Control Ceremony Runbook](./dual-control-ceremony-runbook.md)
|
||||
- [Key Rotation Runbook](./key-rotation-runbook.md)
|
||||
- [HSM Setup Runbook](./hsm-setup-runbook.md)
|
||||
- [Cryptography Architecture](../modules/cryptography/architecture.md)
|
||||
Reference in New Issue
Block a user