Files
git.stella-ops.org/docs/operations/key-escrow-runbook.md

10 KiB

Key Escrow and Recovery Runbook

This runbook documents Shamir secret sharing key escrow and recovery procedures in Stella Ops.

Sprint: SPRINT_20260112_018_CRYPTO_key_escrow_shamir

Overview

Key escrow ensures critical cryptographic keys can be recovered if primary access is lost. Stella Ops uses Shamir's Secret Sharing to split keys into shares distributed among trusted custodians.

Key features:

  • M-of-N threshold recovery (any M shares reconstruct the key)
  • Share encryption at rest
  • Custodian-based share distribution
  • Integration with dual-control ceremonies
  • Full audit trail

When to Use Key Escrow

Scenario Escrow Required
Root signing keys Yes
HSM master keys Yes
Trust anchor keys Yes
Service signing keys Recommended
User signing keys Optional
Ephemeral keys No

Shamir Secret Sharing

How It Works

Shamir's Secret Sharing splits a secret into N shares where any M shares can reconstruct the original:

Secret S → Split(S, M, N) → [Share₁, Share₂, ..., Shareₙ]

Any M shares → Combine → Secret S
Fewer than M shares → Cannot reconstruct

Configuration Parameters

Parameter Description Recommended
Threshold (M) Minimum shares needed 2-3 for keys
Total Shares (N) Total shares created M + 2 minimum
Share Encryption Encrypt shares at rest Always enabled

Threshold Guidelines

Key Type Minimum M Recommended N Rationale
Root keys 3 5 High assurance
HSM keys 2 4 Availability + security
Service keys 2 3 Operational recovery

Escrowing a Key

Via CLI

stella escrow create \
  --key-id root-signing-key-2026 \
  --threshold 3 \
  --shares 5 \
  --custodians custodian-1,custodian-2,custodian-3,custodian-4,custodian-5 \
  --expires-in 365d \
  --reason "Annual key escrow for root signing key"

Via API

curl -X POST https://signer.example.com/api/v1/escrow \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "keyId": "root-signing-key-2026",
    "threshold": 3,
    "totalShares": 5,
    "custodianIds": [
      "custodian-1", "custodian-2", "custodian-3",
      "custodian-4", "custodian-5"
    ],
    "expirationDays": 365,
    "reason": "Annual key escrow for root signing key"
  }'

Escrow Response

{
  "escrowId": "esc-abc123",
  "keyId": "root-signing-key-2026",
  "threshold": 3,
  "totalShares": 5,
  "status": "Active",
  "createdAt": "2026-01-16T10:00:00Z",
  "expiresAt": "2027-01-16T10:00:00Z",
  "shares": [
    { "shareId": "shr-001", "custodianId": "custodian-1", "distributed": true },
    { "shareId": "shr-002", "custodianId": "custodian-2", "distributed": true },
    { "shareId": "shr-003", "custodianId": "custodian-3", "distributed": true },
    { "shareId": "shr-004", "custodianId": "custodian-4", "distributed": true },
    { "shareId": "shr-005", "custodianId": "custodian-5", "distributed": true }
  ]
}

Share Distribution

Distribution Methods

Method Security Use Case
Direct API delivery High Automated systems
Encrypted email Medium Remote custodians
In-person ceremony Highest Root keys
Hardware token Highest HSM keys

Custodian Requirements

Each custodian must:

  1. Have verified identity in Authority
  2. Complete escrow custodian training
  3. Have secure share storage capability
  4. Be geographically distributed (recommended)

Verifying Share Distribution

stella escrow status --escrow-id esc-abc123

# Output:
# Escrow: esc-abc123
# Key: root-signing-key-2026
# Status: Active
# Threshold: 3 of 5
# Shares:
#   [1] custodian-1: Distributed ✓
#   [2] custodian-2: Distributed ✓
#   [3] custodian-3: Distributed ✓
#   [4] custodian-4: Distributed ✓
#   [5] custodian-5: Distributed ✓

Key Recovery

Prerequisites

Recovery requires:

  1. Valid recovery request (incident, key loss, rotation)
  2. Dual-control ceremony approval (if configured)
  3. Minimum M custodians available with shares
  4. Secure recovery environment

Recovery Workflow

1. Initiate recovery request
2. (If required) Dual-control ceremony approval
3. Collect shares from M custodians
4. Verify share checksums
5. Reconstruct key
6. Verify reconstructed key
7. Log recovery event

Via CLI

# Step 1: Initiate recovery
stella escrow recover init \
  --escrow-id esc-abc123 \
  --reason "HSM failure - emergency key recovery" \
  --ceremony-required

# Step 2: Collect shares (each custodian runs)
stella escrow recover submit-share \
  --recovery-id rec-xyz789 \
  --share-file /secure/my-share.enc \
  --passphrase-file /secure/passphrase

# Step 3: Execute recovery (after threshold reached)
stella escrow recover execute \
  --recovery-id rec-xyz789 \
  --output-key-file /secure/recovered-key.pem

Via API

# Initiate recovery
curl -X POST https://signer.example.com/api/v1/escrow/esc-abc123/recover \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "HSM failure - emergency key recovery",
    "requireCeremony": true
  }'

# Submit share
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/shares \
  -H "Authorization: Bearer $CUSTODIAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "shareId": "shr-001",
    "encryptedShare": "base64-encoded-share",
    "checksum": "sha256:abc123..."
  }'

# Execute recovery (after threshold)
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/execute \
  -H "Authorization: Bearer $TOKEN"

Recovery Response

{
  "recoveryId": "rec-xyz789",
  "status": "Completed",
  "keyId": "root-signing-key-2026",
  "sharesCollected": 3,
  "threshold": 3,
  "completedAt": "2026-01-16T15:30:00Z",
  "keyFingerprint": "SHA256:xyz789...",
  "verified": true
}

Share Management

Custodian Share Storage

Custodians should store shares:

Storage Security Level Notes
HSM Highest Preferred for root keys
Hardware token High YubiKey, smart card
Encrypted file Medium AES-256-GCM minimum
Password manager Medium Enterprise vault only

Share Format

{
  "shareId": "shr-001",
  "escrowId": "esc-abc123",
  "index": 1,
  "threshold": 3,
  "totalShares": 5,
  "encryptedData": "base64-encoded-aes-256-gcm-ciphertext",
  "checksum": "sha256:abc123...",
  "createdAt": "2026-01-16T10:00:00Z",
  "expiresAt": "2027-01-16T10:00:00Z"
}

Share Rotation

Re-escrow keys periodically:

stella escrow re-escrow \
  --escrow-id esc-abc123 \
  --new-custodians custodian-1,custodian-2,custodian-6,custodian-7,custodian-8 \
  --reason "Annual share rotation"

This creates new shares and revokes old ones.

Audit Trail

Audit Events

Event Description
escrow.created Key escrowed
escrow.share.distributed Share sent to custodian
escrow.share.accessed Custodian accessed share
recovery.initiated Recovery started
recovery.share.submitted Share submitted for recovery
recovery.completed Key reconstructed
recovery.failed Recovery failed
escrow.revoked Escrow revoked

Query Audit Logs

stella audit query \
  --event-type "escrow.*,recovery.*" \
  --escrow-id esc-abc123 \
  --since 30d

Configuration

Escrow Settings

# escrow-config.yaml
escrow:
  enabled: true
  defaultThreshold: 2
  minimumThreshold: 2
  maximumShares: 10
  shareEncryption:
    algorithm: AES-256-GCM
    keyDerivation: HKDF-SHA256
  requireDualControlForRecovery: true
  maxRecoveryAttempts: 3
  recoveryTimeoutHours: 24

Custodian Configuration

# custodians.yaml
custodians:
  - id: custodian-1
    name: "Security Lead"
    email: security-lead@company.com
    publicKey: "-----BEGIN PUBLIC KEY-----..."
    location: "US-East"
    
  - id: custodian-2
    name: "Key Officer A"
    email: key-officer-a@company.com
    publicKey: "-----BEGIN PUBLIC KEY-----..."
    location: "EU-West"

Security Considerations

Share Security

  • Never transmit shares in plaintext
  • Encrypt shares with custodian's public key
  • Verify checksums before and after storage
  • Use secure channels for distribution

Recovery Security

  • Require dual-control ceremonies for critical keys
  • Limit recovery time window
  • Verify recovered key fingerprint
  • Audit all recovery attempts

Custodian Security

  • Verify custodian identity before share access
  • Geographic distribution reduces collusion risk
  • Rotate custodians periodically
  • Train custodians on secure handling

Troubleshooting

Common Issues

Issue Cause Resolution
Share checksum mismatch Corrupted share Request re-distribution
Cannot decrypt share Wrong passphrase Verify passphrase
Recovery timeout Shares not collected in time Restart recovery
Key verification failed Wrong shares combined Verify share indices

Verification Failures

# Verify share integrity
stella escrow verify-share --share-file share.enc

# Test reconstruction with subset
stella escrow test-recovery \
  --escrow-id esc-abc123 \
  --share-files share1.enc,share2.enc,share3.enc

Emergency Procedures

Lost Share

If a custodian loses their share:

  1. Verify at least M shares remain accessible
  2. Re-escrow with new share set
  3. Revoke compromised escrow
  4. Document incident

Compromised Custodian

If a custodian is compromised:

  1. Do NOT use their share for any recovery
  2. Re-escrow immediately with new custodians
  3. Revoke old escrow
  4. Consider key rotation if threshold was exposed

Multiple Lost Shares

If fewer than M shares are available:

  1. Key cannot be recovered via escrow
  2. Use backup key if available
  3. Generate new key and re-establish trust
  4. Document as key loss incident