sprints completion. new product advisories prepared
This commit is contained in:
331
docs/operations/break-glass-runbook.md
Normal file
331
docs/operations/break-glass-runbook.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Break-Glass Account Runbook
|
||||
|
||||
This runbook documents emergency access procedures using the break-glass account system when standard authentication is unavailable.
|
||||
|
||||
> **Sprint:** SPRINT_20260112_018_AUTH_local_rbac_fallback
|
||||
|
||||
## Overview
|
||||
|
||||
Break-glass accounts provide emergency administrative access when:
|
||||
- PostgreSQL database is unavailable
|
||||
- OIDC/OAuth2 identity provider is unreachable
|
||||
- Authority service is degraded
|
||||
- Network isolation prevents standard authentication
|
||||
|
||||
Break-glass access is fully audited and time-limited by design.
|
||||
|
||||
## When to Use Break-Glass Access
|
||||
|
||||
| Scenario | Standard Auth | Break-Glass |
|
||||
|----------|---------------|-------------|
|
||||
| Database maintenance | N/A | Use |
|
||||
| IdP outage | Unavailable | Use |
|
||||
| Network partition | Unavailable | Use |
|
||||
| Routine operations | Available | Do NOT use |
|
||||
| Security incident response | May be unavailable | Use with incident code |
|
||||
|
||||
**CRITICAL:** Break-glass access should only be used when standard authentication is genuinely unavailable. All usage is logged and auditable.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Configuration Requirements
|
||||
|
||||
Break-glass must be explicitly enabled in local policy:
|
||||
|
||||
```yaml
|
||||
# /etc/stellaops/authority/local-policy.yaml
|
||||
breakGlass:
|
||||
enabled: true
|
||||
sessionTimeoutMinutes: 15
|
||||
maxExtensions: 2
|
||||
allowedReasonCodes:
|
||||
- database_maintenance
|
||||
- idp_outage
|
||||
- network_partition
|
||||
- security_incident
|
||||
- disaster_recovery
|
||||
accounts:
|
||||
- id: "break-glass-admin"
|
||||
passwordHash: "$argon2id$v=19$m=65536,t=3,p=4$..."
|
||||
roles: ["admin"]
|
||||
```
|
||||
|
||||
### Password Hash Generation
|
||||
|
||||
Generate password hashes using Argon2id:
|
||||
|
||||
```bash
|
||||
# Using argon2 CLI tool
|
||||
echo -n "your-secure-password" | argon2 $(openssl rand -base64 16) -id -t 3 -m 16 -p 4 -l 32 -e
|
||||
|
||||
# Or using stella CLI
|
||||
stella auth hash-password --algorithm argon2id
|
||||
```
|
||||
|
||||
## Break-Glass Login Procedure
|
||||
|
||||
### Step 1: Verify Standard Auth is Unavailable
|
||||
|
||||
Before using break-glass, confirm standard authentication is genuinely unavailable:
|
||||
|
||||
```bash
|
||||
# Check Authority health
|
||||
curl -s https://authority.example.com/health | jq .
|
||||
|
||||
# Check OIDC endpoint
|
||||
curl -s https://idp.example.com/.well-known/openid-configuration
|
||||
|
||||
# Check database connectivity
|
||||
stella doctor check --component postgres
|
||||
```
|
||||
|
||||
### Step 2: Access Break-Glass Login
|
||||
|
||||
Navigate to the break-glass endpoint:
|
||||
|
||||
```
|
||||
https://authority.example.com/break-glass/login
|
||||
```
|
||||
|
||||
Or use the CLI:
|
||||
|
||||
```bash
|
||||
stella auth break-glass login \
|
||||
--account break-glass-admin \
|
||||
--reason database_maintenance
|
||||
```
|
||||
|
||||
### Step 3: Provide Credentials and Reason
|
||||
|
||||
| Field | Description | Required |
|
||||
|-------|-------------|----------|
|
||||
| Account ID | Break-glass account identifier | Yes |
|
||||
| Password | Account password | Yes |
|
||||
| Reason Code | Pre-approved reason code | Yes |
|
||||
| Reason Details | Free-text explanation | Recommended |
|
||||
|
||||
**Approved Reason Codes:**
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `database_maintenance` | Scheduled or emergency database work |
|
||||
| `idp_outage` | Identity provider unavailable |
|
||||
| `network_partition` | Network connectivity issues |
|
||||
| `security_incident` | Active security incident response |
|
||||
| `disaster_recovery` | DR/BCP activation |
|
||||
|
||||
### Step 4: Session Created
|
||||
|
||||
On successful authentication:
|
||||
|
||||
- Session token issued with limited TTL (default: 15 minutes)
|
||||
- Audit event logged: `breakglass.session.created`
|
||||
- All subsequent actions are tagged with break-glass context
|
||||
|
||||
## Session Management
|
||||
|
||||
### Session Timeout
|
||||
|
||||
Break-glass sessions have strict time limits:
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `sessionTimeoutMinutes` | 15 | Session lifetime |
|
||||
| `maxExtensions` | 2 | Maximum session extensions |
|
||||
| Extension period | 15 min | Time added per extension |
|
||||
|
||||
### Extending a Session
|
||||
|
||||
If additional time is needed:
|
||||
|
||||
```bash
|
||||
# CLI
|
||||
stella auth break-glass extend \
|
||||
--session-id <session-id> \
|
||||
--reason "database migration still running"
|
||||
|
||||
# UI
|
||||
# Click "Extend Session" button in break-glass banner
|
||||
```
|
||||
|
||||
Extension requires:
|
||||
1. Re-entering password
|
||||
2. Providing extension reason
|
||||
3. Not exceeding `maxExtensions` limit
|
||||
|
||||
### Session Termination
|
||||
|
||||
Sessions end when:
|
||||
- User explicitly logs out
|
||||
- Session timeout expires
|
||||
- Max extensions reached
|
||||
- Administrator force-terminates
|
||||
|
||||
```bash
|
||||
# Explicit logout
|
||||
stella auth break-glass logout --session-id <session-id>
|
||||
|
||||
# Force terminate (admin)
|
||||
stella auth break-glass terminate --session-id <session-id> --reason "normal auth restored"
|
||||
```
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Audit Events
|
||||
|
||||
All break-glass activity is logged:
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `breakglass.session.created` | Session started |
|
||||
| `breakglass.session.extended` | Session extended |
|
||||
| `breakglass.session.terminated` | User logout |
|
||||
| `breakglass.session.expired` | Timeout reached |
|
||||
| `breakglass.auth.failed` | Authentication failed |
|
||||
| `breakglass.reason.invalid` | Invalid reason code |
|
||||
| `breakglass.extensions.exceeded` | Max extensions reached |
|
||||
|
||||
### Audit Event Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"eventType": "breakglass.session.created",
|
||||
"timestamp": "2026-01-16T10:30:00Z",
|
||||
"accountId": "break-glass-admin",
|
||||
"sessionId": "bg-sess-abc123",
|
||||
"reasonCode": "database_maintenance",
|
||||
"reasonDetails": "PostgreSQL major version upgrade",
|
||||
"sourceIp": "10.0.1.50",
|
||||
"userAgent": "stella-cli/2027.Q1"
|
||||
}
|
||||
```
|
||||
|
||||
### Querying Audit Logs
|
||||
|
||||
```bash
|
||||
# List all break-glass events
|
||||
stella audit query --event-type "breakglass.*" --since "24h"
|
||||
|
||||
# Export for compliance
|
||||
stella audit export \
|
||||
--event-type "breakglass.*" \
|
||||
--start 2026-01-01 \
|
||||
--end 2026-01-31 \
|
||||
--format json \
|
||||
--output break-glass-audit-jan2026.json
|
||||
```
|
||||
|
||||
## Fallback Policy Store
|
||||
|
||||
### Automatic Failover
|
||||
|
||||
When PostgreSQL becomes unavailable:
|
||||
|
||||
1. Authority detects health check failures
|
||||
2. After `failureThreshold` (default: 3) consecutive failures
|
||||
3. Authority switches to local policy store
|
||||
4. Mode changes to `Fallback`
|
||||
5. Event logged: `authority.mode.changed`
|
||||
|
||||
### Policy Store Modes
|
||||
|
||||
| Mode | Description | Available Features |
|
||||
|------|-------------|-------------------|
|
||||
| `Primary` | PostgreSQL available | Full RBAC, user management |
|
||||
| `Fallback` | Using local policy | Break-glass only |
|
||||
| `Degraded` | Both degraded | Emergency access only |
|
||||
|
||||
### Recovery
|
||||
|
||||
When PostgreSQL recovers:
|
||||
|
||||
1. Health checks pass
|
||||
2. After `minFallbackDurationMs` (default: 30s) cooldown
|
||||
3. Authority switches back to Primary
|
||||
4. Fallback sessions can continue until expiry
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Password Policy
|
||||
|
||||
Break-glass account passwords should:
|
||||
- Be at least 20 characters
|
||||
- Include upper, lower, numbers, symbols
|
||||
- Be stored securely (HSM, Vault, split custody)
|
||||
- Be rotated on a schedule (quarterly recommended)
|
||||
|
||||
### Access Control
|
||||
|
||||
- Limit break-glass accounts to essential personnel
|
||||
- Use separate accounts per operator when possible
|
||||
- Review access list quarterly
|
||||
- Disable unused accounts immediately
|
||||
|
||||
### Monitoring
|
||||
|
||||
Set up alerts for break-glass activity:
|
||||
|
||||
```yaml
|
||||
# Alert rule example
|
||||
- alert: BreakGlassSessionCreated
|
||||
expr: stellaops_breakglass_sessions_created_total > 0
|
||||
for: 0m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Break-glass session created
|
||||
description: A break-glass session was created. Verify this is expected.
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Login Failures
|
||||
|
||||
| Error | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| `invalid_credentials` | Wrong password | Verify password |
|
||||
| `invalid_reason_code` | Reason not in allowed list | Use approved reason code |
|
||||
| `account_disabled` | Account explicitly disabled | Contact administrator |
|
||||
| `break_glass_disabled` | Feature disabled in config | Enable in local-policy.yaml |
|
||||
|
||||
### Session Issues
|
||||
|
||||
| Issue | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Session expired immediately | Clock skew | Sync server time |
|
||||
| Cannot extend | Max extensions reached | Log out and re-authenticate |
|
||||
| Actions failing | Insufficient roles | Verify account has required roles |
|
||||
|
||||
### Policy Store Issues
|
||||
|
||||
```bash
|
||||
# Check policy store status
|
||||
stella doctor check --component authority
|
||||
|
||||
# Verify local policy file
|
||||
stella auth policy validate --file /etc/stellaops/authority/local-policy.yaml
|
||||
|
||||
# Force reload policy
|
||||
stella auth policy reload
|
||||
```
|
||||
|
||||
## Compliance Notes
|
||||
|
||||
Break-glass usage must be:
|
||||
- Documented in incident reports
|
||||
- Reviewed during security audits
|
||||
- Reported in compliance dashboards
|
||||
- Justified for each session
|
||||
|
||||
Retain audit logs for:
|
||||
- SOC 2: 1 year minimum
|
||||
- HIPAA: 6 years
|
||||
- PCI-DSS: 1 year
|
||||
- Internal policy: As defined
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Local RBAC Policy Schema](../modules/authority/local-policy-schema.md)
|
||||
- [Authority Architecture](../modules/authority/architecture.md)
|
||||
- [Offline Operations](../operations/airgap-operations-runbook.md)
|
||||
- [Audit System](../modules/audit/architecture.md)
|
||||
262
docs/operations/checkpoint-divergence-runbook.md
Normal file
262
docs/operations/checkpoint-divergence-runbook.md
Normal file
@@ -0,0 +1,262 @@
|
||||
# Checkpoint Divergence Detection and Incident Response
|
||||
|
||||
This runbook covers the detection of Rekor checkpoint divergence, anomaly types, alert handling, and incident response procedures.
|
||||
|
||||
## Overview
|
||||
|
||||
Checkpoint divergence detection monitors the integrity of Rekor transparency logs by:
|
||||
- Comparing root hashes at the same tree size
|
||||
- Verifying tree size monotonicity (only increases)
|
||||
- Cross-checking primary logs against mirrors
|
||||
- Detecting stale or unresponsive logs
|
||||
|
||||
Divergence can indicate:
|
||||
- Split-view attacks (malicious log server showing different trees to different clients)
|
||||
- Rollback attacks (hiding recent log entries)
|
||||
- Log compromise or key theft
|
||||
- Network partitions or operational issues
|
||||
|
||||
## Detection Rules
|
||||
|
||||
| Check | Condition | Severity | Recommended Action |
|
||||
|-------|-----------|----------|-------------------|
|
||||
| Root hash mismatch | Same tree_size, different root_hash | CRITICAL | Quarantine + immediate investigation |
|
||||
| Tree size rollback | new_tree_size < stored_tree_size | CRITICAL | Reject checkpoint + alert |
|
||||
| Cross-log divergence | Primary root ≠ mirror root at same size | WARNING | Alert + investigate |
|
||||
| Stale checkpoint | Checkpoint age > threshold | WARNING | Alert + monitor |
|
||||
|
||||
## Alert Payloads
|
||||
|
||||
### Root Hash Mismatch Alert
|
||||
```json
|
||||
{
|
||||
"eventType": "rekor.checkpoint.divergence",
|
||||
"severity": "critical",
|
||||
"origin": "rekor.sigstore.dev",
|
||||
"treeSize": 12345678,
|
||||
"expectedRootHash": "sha256:abc123...",
|
||||
"actualRootHash": "sha256:def456...",
|
||||
"detectedAt": "2026-01-15T12:34:56Z",
|
||||
"backend": "sigstore-prod",
|
||||
"description": "Checkpoint root hash mismatch detected. Possible split-view attack.",
|
||||
"recommendedAction": "Quarantine"
|
||||
}
|
||||
```
|
||||
|
||||
### Rollback Attempt Alert
|
||||
```json
|
||||
{
|
||||
"eventType": "rekor.checkpoint.rollback",
|
||||
"severity": "critical",
|
||||
"origin": "rekor.sigstore.dev",
|
||||
"previousTreeSize": 12345678,
|
||||
"attemptedTreeSize": 12345600,
|
||||
"detectedAt": "2026-01-15T12:34:56Z",
|
||||
"description": "Tree size regression detected. Possible rollback attack."
|
||||
}
|
||||
```
|
||||
|
||||
### Cross-Log Divergence Alert
|
||||
```json
|
||||
{
|
||||
"eventType": "rekor.checkpoint.cross_log_divergence",
|
||||
"severity": "warning",
|
||||
"primaryOrigin": "rekor.sigstore.dev",
|
||||
"mirrorOrigin": "rekor.mirror.example.com",
|
||||
"treeSize": 12345678,
|
||||
"primaryRootHash": "sha256:abc123...",
|
||||
"mirrorRootHash": "sha256:def456...",
|
||||
"description": "Cross-log divergence detected between primary and mirror."
|
||||
}
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
```
|
||||
# Counter: total checkpoint mismatches
|
||||
attestor_rekor_checkpoint_mismatch_total{backend="sigstore-prod",origin="rekor.sigstore.dev"} 0
|
||||
|
||||
# Counter: rollback attempts detected
|
||||
attestor_rekor_checkpoint_rollback_detected_total{backend="sigstore-prod"} 0
|
||||
|
||||
# Counter: cross-log divergences detected
|
||||
attestor_rekor_cross_log_divergence_total{primary="rekor.sigstore.dev",mirror="mirror.example.com"} 0
|
||||
|
||||
# Gauge: seconds since last valid checkpoint
|
||||
attestor_rekor_checkpoint_age_seconds{backend="sigstore-prod"} 120
|
||||
|
||||
# Counter: total anomalies detected (all types)
|
||||
attestor_rekor_anomalies_detected_total{type="RootHashMismatch",severity="critical"} 0
|
||||
```
|
||||
|
||||
## Incident Response Procedures
|
||||
|
||||
### Level 1: Root Hash Mismatch (CRITICAL)
|
||||
|
||||
**Symptoms:**
|
||||
- `attestor_rekor_checkpoint_mismatch_total` increments
|
||||
- Alert received: "rekor.checkpoint.divergence"
|
||||
|
||||
**Immediate Actions:**
|
||||
1. **Quarantine all affected proofs** - Do not rely on any inclusion proofs from the affected log until resolved
|
||||
2. **Suspend automated verifications** - Halt any automated systems that depend on the log
|
||||
3. **Preserve evidence** - Capture both checkpoints (expected and actual) with full metadata
|
||||
4. **Alert security team** - This is a potential compromise indicator
|
||||
|
||||
**Investigation Steps:**
|
||||
1. Verify the mismatch isn't a local storage corruption
|
||||
```bash
|
||||
stella attestor checkpoint verify --origin rekor.sigstore.dev --tree-size 12345678
|
||||
```
|
||||
2. Cross-check with independent sources (other clients, mirrors)
|
||||
3. Check if Sigstore has published any incident reports
|
||||
4. Review network logs for MITM indicators
|
||||
|
||||
**Resolution:**
|
||||
- If confirmed attack: Follow security incident process
|
||||
- If local corruption: Resync from trusted source
|
||||
- If upstream issue: Wait for Sigstore remediation, follow their guidance
|
||||
|
||||
### Level 2: Tree Size Rollback (CRITICAL)
|
||||
|
||||
**Symptoms:**
|
||||
- `attestor_rekor_checkpoint_rollback_detected_total` increments
|
||||
- Alert received: "rekor.checkpoint.rollback"
|
||||
|
||||
**Immediate Actions:**
|
||||
1. **Reject the checkpoint** - Do not accept or store it
|
||||
2. **Log full details** for forensic analysis
|
||||
3. **Check network path** - Could indicate MITM or DNS hijacking
|
||||
|
||||
**Investigation Steps:**
|
||||
1. Verify current log state directly:
|
||||
```bash
|
||||
curl -s https://rekor.sigstore.dev/api/v1/log | jq .treeSize
|
||||
```
|
||||
2. Compare with stored latest tree size
|
||||
3. Check DNS resolution and TLS certificate chain
|
||||
|
||||
**Resolution:**
|
||||
- If network attack: Remediate network path, rotate credentials
|
||||
- If temporary glitch: Monitor for repetition
|
||||
- If persistent: Escalate to upstream provider
|
||||
|
||||
### Level 3: Cross-Log Divergence (WARNING)
|
||||
|
||||
**Symptoms:**
|
||||
- `attestor_rekor_cross_log_divergence_total` increments
|
||||
- Alert received: "rekor.checkpoint.cross_log_divergence"
|
||||
|
||||
**Immediate Actions:**
|
||||
1. **Do not panic** - Mirrors may have legitimate lag
|
||||
2. **Check mirror sync status** - May be catching up
|
||||
|
||||
**Investigation Steps:**
|
||||
1. Compare tree sizes:
|
||||
```bash
|
||||
stella attestor checkpoint list --origins rekor.sigstore.dev,mirror.example.com
|
||||
```
|
||||
2. If same tree size with different roots: Escalate to CRITICAL
|
||||
3. If different tree sizes: Allow time for sync
|
||||
4. If persistent: Investigate mirror operator
|
||||
|
||||
**Resolution:**
|
||||
- Sync lag: Monitor until caught up
|
||||
- Persistent divergence: Disable mirror, investigate, or remove from trust list
|
||||
|
||||
### Level 4: Stale Checkpoint (WARNING)
|
||||
|
||||
**Symptoms:**
|
||||
- `attestor_rekor_checkpoint_age_seconds` exceeds threshold
|
||||
- Log health status: DEGRADED or UNHEALTHY
|
||||
|
||||
**Immediate Actions:**
|
||||
1. Check log service status
|
||||
2. Verify network connectivity to log
|
||||
|
||||
**Investigation Steps:**
|
||||
1. Check Sigstore status page
|
||||
2. Test direct API access:
|
||||
```bash
|
||||
curl -I https://rekor.sigstore.dev/api/v1/log
|
||||
```
|
||||
3. Review recent checkpoint fetch attempts
|
||||
|
||||
**Resolution:**
|
||||
- Upstream outage: Wait, rely on cached data
|
||||
- Local network issue: Restore connectivity
|
||||
- Persistent: Consider failover to mirror
|
||||
|
||||
## Configuration
|
||||
|
||||
### Detector Options
|
||||
|
||||
```yaml
|
||||
attestor:
|
||||
divergenceDetection:
|
||||
# Enable checkpoint monitoring
|
||||
enabled: true
|
||||
|
||||
# Threshold for "stale checkpoint" warning
|
||||
staleCheckpointThreshold: 1h
|
||||
|
||||
# Threshold for "stale tree size" (no growth)
|
||||
staleTreeSizeThreshold: 2h
|
||||
|
||||
# Log health thresholds
|
||||
degradedCheckpointAgeThreshold: 30m
|
||||
unhealthyCheckpointAgeThreshold: 2h
|
||||
|
||||
# Enable cross-log consistency checks
|
||||
enableCrossLogChecks: true
|
||||
|
||||
# Mirror origins to check against primary
|
||||
mirrorOrigins:
|
||||
- rekor.mirror.example.com
|
||||
- rekor.mirror2.example.com
|
||||
```
|
||||
|
||||
### Alert Options
|
||||
|
||||
```yaml
|
||||
attestor:
|
||||
alerts:
|
||||
# Enable alert publishing to Notify service
|
||||
enabled: true
|
||||
|
||||
# Default tenant for system alerts
|
||||
defaultTenant: system
|
||||
|
||||
# Severity thresholds for alerting
|
||||
alertOnHighSeverity: true
|
||||
alertOnWarning: true
|
||||
alertOnInfo: false
|
||||
|
||||
# Alert stream name
|
||||
stream: attestor.alerts
|
||||
```
|
||||
|
||||
## Runbook Checklist
|
||||
|
||||
### Daily Operations
|
||||
- [ ] Verify `attestor_rekor_checkpoint_age_seconds` < threshold
|
||||
- [ ] Check for any anomaly counter increments
|
||||
- [ ] Review divergence detector logs for warnings
|
||||
|
||||
### Weekly Review
|
||||
- [ ] Audit checkpoint storage integrity
|
||||
- [ ] Verify mirror sync status
|
||||
- [ ] Review and tune alerting thresholds
|
||||
|
||||
### Post-Incident
|
||||
- [ ] Document root cause
|
||||
- [ ] Update detection rules if needed
|
||||
- [ ] Review and improve response procedures
|
||||
- [ ] Share learnings with team
|
||||
|
||||
## See Also
|
||||
|
||||
- [Rekor Verification Design](../modules/attestor/rekor-verification-design.md)
|
||||
- [Attestor Architecture](../modules/attestor/architecture.md)
|
||||
- [Sigstore Rekor Documentation](https://docs.sigstore.dev/rekor/overview/)
|
||||
- [Certificate Transparency RFC 6962](https://www.rfc-editor.org/rfc/rfc6962)
|
||||
443
docs/operations/dual-control-ceremony-runbook.md
Normal file
443
docs/operations/dual-control-ceremony-runbook.md
Normal file
@@ -0,0 +1,443 @@
|
||||
# Dual-Control Ceremony Runbook
|
||||
|
||||
This runbook documents M-of-N threshold signing ceremonies for high-assurance key operations in Stella Ops.
|
||||
|
||||
> **Sprint:** SPRINT_20260112_018_SIGNER_dual_control_ceremonies
|
||||
|
||||
## Overview
|
||||
|
||||
Dual-control ceremonies ensure critical cryptographic operations require approval from multiple authorized individuals before execution. This prevents single points of compromise for sensitive operations like:
|
||||
|
||||
- Root key rotation
|
||||
- Trust anchor updates
|
||||
- Emergency key revocation
|
||||
- HSM key generation
|
||||
- Recovery key activation
|
||||
|
||||
## When Ceremonies Are Required
|
||||
|
||||
| Operation | Default Threshold | Configurable |
|
||||
|-----------|------------------|--------------|
|
||||
| Root signing key rotation | 2-of-3 | Yes |
|
||||
| Trust anchor update | 2-of-3 | Yes |
|
||||
| Key revocation | 2-of-3 | Yes |
|
||||
| HSM key generation | 2-of-4 | Yes |
|
||||
| Recovery key activation | 3-of-5 | Yes |
|
||||
|
||||
## Ceremony Lifecycle
|
||||
|
||||
### State Machine
|
||||
|
||||
```
|
||||
+------------------+
|
||||
| Pending |
|
||||
+--------+---------+
|
||||
|
|
||||
| Approvals collected
|
||||
v
|
||||
+-------------+-------------+
|
||||
| PartiallyApproved |
|
||||
+-------------+-------------+
|
||||
|
|
||||
| Threshold reached
|
||||
v
|
||||
+--------+---------+
|
||||
| Approved |
|
||||
+--------+---------+
|
||||
|
|
||||
| Execute
|
||||
v
|
||||
+--------+---------+
|
||||
| Executed |
|
||||
+------------------+
|
||||
|
||||
Alternative paths:
|
||||
- Pending -> Expired (timeout)
|
||||
- Pending -> Cancelled (initiator cancel)
|
||||
- PartiallyApproved -> Expired (timeout)
|
||||
- PartiallyApproved -> Cancelled
|
||||
```
|
||||
|
||||
### State Descriptions
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| `Pending` | Ceremony created, awaiting first approval |
|
||||
| `PartiallyApproved` | At least one approval, threshold not reached |
|
||||
| `Approved` | Threshold reached, ready for execution |
|
||||
| `Executed` | Operation completed successfully |
|
||||
| `Expired` | Timeout reached without execution |
|
||||
| `Cancelled` | Explicitly cancelled before execution |
|
||||
|
||||
## Creating a Ceremony
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
stella ceremony create \
|
||||
--type key-rotation \
|
||||
--subject "Root signing key Q1-2026" \
|
||||
--threshold 2 \
|
||||
--required-approvers 3 \
|
||||
--expires-in 24h \
|
||||
--payload '{"keyId": "root-2026-q1", "algorithm": "ecdsa-p384"}'
|
||||
```
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
curl -X POST https://signer.example.com/api/v1/ceremonies \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "key-rotation",
|
||||
"subject": "Root signing key Q1-2026",
|
||||
"threshold": 2,
|
||||
"requiredApprovers": 3,
|
||||
"expiresAt": "2026-01-17T10:00:00Z",
|
||||
"payload": {
|
||||
"keyId": "root-2026-q1",
|
||||
"algorithm": "ecdsa-p384"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### Response
|
||||
|
||||
```json
|
||||
{
|
||||
"ceremonyId": "cer-abc123",
|
||||
"type": "key-rotation",
|
||||
"state": "Pending",
|
||||
"threshold": 2,
|
||||
"requiredApprovers": 3,
|
||||
"currentApprovals": 0,
|
||||
"createdAt": "2026-01-16T10:00:00Z",
|
||||
"expiresAt": "2026-01-17T10:00:00Z",
|
||||
"initiator": "admin@company.com"
|
||||
}
|
||||
```
|
||||
|
||||
## Approving a Ceremony
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Approvers must:
|
||||
1. Be in the ceremony's allowed approvers list
|
||||
2. Have the `ceremony:approve` scope
|
||||
3. Have valid authentication (OIDC or break-glass)
|
||||
4. Not have already approved this ceremony
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
stella ceremony approve \
|
||||
--ceremony-id cer-abc123 \
|
||||
--reason "Reviewed rotation plan, verified key parameters" \
|
||||
--sign
|
||||
```
|
||||
|
||||
The `--sign` flag creates a DSSE signature over the approval using the approver's signing key.
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
curl -X POST https://signer.example.com/api/v1/ceremonies/cer-abc123/approve \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"reason": "Reviewed rotation plan, verified key parameters",
|
||||
"signature": "base64-encoded-dsse-signature"
|
||||
}'
|
||||
```
|
||||
|
||||
### Approval Response
|
||||
|
||||
```json
|
||||
{
|
||||
"ceremonyId": "cer-abc123",
|
||||
"state": "PartiallyApproved",
|
||||
"currentApprovals": 1,
|
||||
"threshold": 2,
|
||||
"approval": {
|
||||
"approvalId": "apr-def456",
|
||||
"approver": "security-lead@company.com",
|
||||
"approvedAt": "2026-01-16T11:30:00Z",
|
||||
"reason": "Reviewed rotation plan, verified key parameters",
|
||||
"signatureValid": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Executing a Ceremony
|
||||
|
||||
Once the approval threshold is reached:
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
stella ceremony execute --ceremony-id cer-abc123
|
||||
```
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
curl -X POST https://signer.example.com/api/v1/ceremonies/cer-abc123/execute \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
|
||||
### Execution Response
|
||||
|
||||
```json
|
||||
{
|
||||
"ceremonyId": "cer-abc123",
|
||||
"state": "Executed",
|
||||
"executedAt": "2026-01-16T14:00:00Z",
|
||||
"result": {
|
||||
"keyId": "root-2026-q1",
|
||||
"publicKey": "-----BEGIN PUBLIC KEY-----...",
|
||||
"fingerprint": "SHA256:abc123...",
|
||||
"activatedAt": "2026-01-16T14:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Monitoring Ceremonies
|
||||
|
||||
### List Active Ceremonies
|
||||
|
||||
```bash
|
||||
# CLI
|
||||
stella ceremony list --state pending,partially-approved
|
||||
|
||||
# API
|
||||
curl "https://signer.example.com/api/v1/ceremonies?state=pending,partially-approved"
|
||||
```
|
||||
|
||||
### Check Ceremony Status
|
||||
|
||||
```bash
|
||||
# CLI
|
||||
stella ceremony status --ceremony-id cer-abc123
|
||||
|
||||
# API
|
||||
curl "https://signer.example.com/api/v1/ceremonies/cer-abc123"
|
||||
```
|
||||
|
||||
## Cancelling a Ceremony
|
||||
|
||||
Ceremonies can be cancelled before execution:
|
||||
|
||||
```bash
|
||||
# CLI
|
||||
stella ceremony cancel \
|
||||
--ceremony-id cer-abc123 \
|
||||
--reason "Postponed due to schedule conflict"
|
||||
|
||||
# API
|
||||
curl -X DELETE https://signer.example.com/api/v1/ceremonies/cer-abc123 \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
|
||||
Only the initiator or users with `ceremony:cancel` scope can cancel.
|
||||
|
||||
## Audit Events
|
||||
|
||||
All ceremony actions are logged:
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `signer.ceremony.initiated` | Ceremony created |
|
||||
| `signer.ceremony.approved` | Approval submitted |
|
||||
| `signer.ceremony.approval_rejected` | Approval rejected (invalid signature, unauthorized) |
|
||||
| `signer.ceremony.executed` | Operation executed |
|
||||
| `signer.ceremony.expired` | Timeout reached |
|
||||
| `signer.ceremony.cancelled` | Explicitly cancelled |
|
||||
|
||||
### Audit Event Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"eventType": "signer.ceremony.approved",
|
||||
"timestamp": "2026-01-16T11:30:00Z",
|
||||
"ceremonyId": "cer-abc123",
|
||||
"ceremonyType": "key-rotation",
|
||||
"actor": "security-lead@company.com",
|
||||
"approvalId": "apr-def456",
|
||||
"currentApprovals": 1,
|
||||
"threshold": 2,
|
||||
"signatureAlgorithm": "ecdsa-p256",
|
||||
"signatureKeyId": "user-signing-key-456"
|
||||
}
|
||||
```
|
||||
|
||||
### Query Audit Logs
|
||||
|
||||
```bash
|
||||
stella audit query \
|
||||
--event-type "signer.ceremony.*" \
|
||||
--since 7d \
|
||||
--ceremony-id cer-abc123
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Ceremony Settings
|
||||
|
||||
```yaml
|
||||
# signer-config.yaml
|
||||
ceremonies:
|
||||
enabled: true
|
||||
defaultTimeout: 24h
|
||||
maxTimeout: 168h # 7 days
|
||||
requireSignedApprovals: true
|
||||
|
||||
thresholds:
|
||||
key-rotation:
|
||||
minimum: 2
|
||||
default: 2
|
||||
maximum: 5
|
||||
key-revocation:
|
||||
minimum: 2
|
||||
default: 3
|
||||
maximum: 5
|
||||
trust-anchor-update:
|
||||
minimum: 2
|
||||
default: 2
|
||||
maximum: 4
|
||||
```
|
||||
|
||||
### Approver Configuration
|
||||
|
||||
```yaml
|
||||
# approvers.yaml
|
||||
approverGroups:
|
||||
- name: key-custodians
|
||||
members:
|
||||
- security-lead@company.com
|
||||
- ciso@company.com
|
||||
- key-officer-1@company.com
|
||||
- key-officer-2@company.com
|
||||
operations:
|
||||
- key-rotation
|
||||
- key-revocation
|
||||
|
||||
- name: trust-admins
|
||||
members:
|
||||
- trust-admin@company.com
|
||||
- security-lead@company.com
|
||||
operations:
|
||||
- trust-anchor-update
|
||||
```
|
||||
|
||||
## Notifications
|
||||
|
||||
Ceremonies trigger notifications to approvers:
|
||||
|
||||
| Event | Notification |
|
||||
|-------|-------------|
|
||||
| Ceremony created | Email/Slack to all eligible approvers |
|
||||
| Approval submitted | Email/Slack to remaining approvers |
|
||||
| Threshold reached | Email/Slack to initiator |
|
||||
| Approaching expiry | Email/Slack at 75% and 90% of timeout |
|
||||
| Expired | Email/Slack to initiator and approvers |
|
||||
|
||||
Configure notifications in `notifier-config.yaml`:
|
||||
|
||||
```yaml
|
||||
notifications:
|
||||
ceremonies:
|
||||
enabled: true
|
||||
channels:
|
||||
- type: email
|
||||
recipients: "@approverGroup"
|
||||
- type: slack
|
||||
webhook: ${SLACK_CEREMONY_WEBHOOK}
|
||||
channel: "#key-ceremonies"
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Approver Requirements
|
||||
|
||||
- Maintain at least N+1 approvers for N-of-M ceremonies
|
||||
- Distribute approvers across security domains
|
||||
- Require hardware tokens for signing keys
|
||||
- Rotate approver list quarterly
|
||||
|
||||
### Ceremony Hygiene
|
||||
|
||||
- Use descriptive subjects for audit clarity
|
||||
- Set reasonable timeouts (not too long, not too short)
|
||||
- Document approval reasons thoroughly
|
||||
- Review executed ceremonies monthly
|
||||
|
||||
### Monitoring
|
||||
|
||||
Set up alerts for:
|
||||
|
||||
```yaml
|
||||
alerts:
|
||||
- name: CeremonyPendingTooLong
|
||||
condition: ceremony.pending_duration > 12h
|
||||
severity: warning
|
||||
|
||||
- name: CeremonyApprovalRejected
|
||||
condition: ceremony.approval_rejected
|
||||
severity: critical
|
||||
|
||||
- name: UnauthorizedCeremonyAttempt
|
||||
condition: ceremony.unauthorized_access
|
||||
severity: critical
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Approval rejected | Invalid signature | Re-sign with correct key |
|
||||
| Cannot approve | Already approved | Different approver must approve |
|
||||
| Cannot execute | Threshold not met | Collect more approvals |
|
||||
| Ceremony expired | Timeout reached | Create new ceremony |
|
||||
|
||||
### Signature Verification Failures
|
||||
|
||||
```bash
|
||||
# Verify signing key is accessible
|
||||
stella auth keys list
|
||||
|
||||
# Test signature
|
||||
echo "test" | stella sign --key-id my-signing-key | stella verify
|
||||
|
||||
# Check key permissions
|
||||
stella auth keys info --key-id my-signing-key
|
||||
```
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Stuck Ceremony
|
||||
|
||||
If a ceremony is stuck (approvers unavailable):
|
||||
|
||||
1. Cancel the stuck ceremony
|
||||
2. Create new ceremony with available approvers
|
||||
3. Document the situation in audit notes
|
||||
|
||||
### Compromised Approver
|
||||
|
||||
If an approver's credentials are compromised:
|
||||
|
||||
1. Revoke approver's signing key immediately
|
||||
2. Cancel any pending ceremonies they created
|
||||
3. Review recent approvals for anomalies
|
||||
4. Remove from approver groups
|
||||
5. Document in incident report
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Key Rotation Runbook](./key-rotation-runbook.md)
|
||||
- [HSM Setup Runbook](./hsm-setup-runbook.md)
|
||||
- [Signer Architecture](../modules/signer/architecture.md)
|
||||
- [Break-Glass Runbook](./break-glass-runbook.md)
|
||||
278
docs/operations/evidence-migration.md
Normal file
278
docs/operations/evidence-migration.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# Evidence Migration Guide
|
||||
|
||||
This guide covers evidence-specific migration procedures during upgrades, schema changes, or disaster recovery scenarios.
|
||||
|
||||
## Overview
|
||||
|
||||
Evidence bundles are cryptographically linked data structures that must maintain integrity across upgrades. This guide ensures chain-of-custody is preserved during migrations.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Scenario | CLI Command | Risk Level | Downtime |
|
||||
|----------|-------------|------------|----------|
|
||||
| Schema upgrade | `stella evidence migrate` | Medium | Minutes |
|
||||
| Reindex after algorithm change | `stella evidence reindex` | Low | None |
|
||||
| Cross-version continuity check | `stella evidence verify-continuity` | None | None |
|
||||
| Full evidence export | `stella evidence export --all` | None | None |
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
### 1. Capture Current State
|
||||
|
||||
```bash
|
||||
# Record current evidence statistics
|
||||
stella evidence stats --detailed > pre-migration-stats.json
|
||||
|
||||
# Export Merkle roots for all tenants
|
||||
stella evidence roots-export --all > pre-migration-roots.json
|
||||
|
||||
# Verify existing evidence integrity
|
||||
stella evidence verify-all --output pre-migration-verify.json
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "ABORT: Evidence integrity check failed"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### 2. Create Evidence Backup
|
||||
|
||||
```bash
|
||||
# Full evidence bundle export
|
||||
stella evidence export \
|
||||
--all \
|
||||
--include-attestations \
|
||||
--include-proofs \
|
||||
--output /backup/evidence-$(date +%Y%m%d)/
|
||||
|
||||
# Verify export integrity
|
||||
stella evidence verify-bundle /backup/evidence-*/
|
||||
```
|
||||
|
||||
### 3. Document Chain-of-Custody
|
||||
|
||||
```bash
|
||||
# Record the current root hashes
|
||||
OLD_MERKLE_ROOT=$(stella evidence roots-export --format json | jq -r '.globalRoot')
|
||||
echo "Pre-migration Merkle root: ${OLD_MERKLE_ROOT}" > custody-log.txt
|
||||
date >> custody-log.txt
|
||||
```
|
||||
|
||||
## Migration Procedures
|
||||
|
||||
### Schema Migration (Version Upgrade)
|
||||
|
||||
When upgrading between versions with schema changes:
|
||||
|
||||
```bash
|
||||
# Step 1: Assess migration impact (dry-run)
|
||||
stella evidence migrate \
|
||||
--from-version 1.0 \
|
||||
--to-version 2.0 \
|
||||
--dry-run
|
||||
|
||||
# Step 2: Review migration plan output
|
||||
# Ensure all changes are expected
|
||||
|
||||
# Step 3: Execute migration
|
||||
stella evidence migrate \
|
||||
--from-version 1.0 \
|
||||
--to-version 2.0
|
||||
|
||||
# Step 4: Verify migration
|
||||
stella evidence verify-all
|
||||
```
|
||||
|
||||
### Evidence Reindex (Algorithm Change)
|
||||
|
||||
When the hashing algorithm or Merkle tree structure changes:
|
||||
|
||||
```bash
|
||||
# Step 1: Assess reindex impact
|
||||
stella evidence reindex \
|
||||
--dry-run \
|
||||
--output reindex-plan.json
|
||||
|
||||
# Review reindex-plan.json for:
|
||||
# - Total records affected
|
||||
# - Estimated duration
|
||||
# - New schema version
|
||||
|
||||
# Step 2: Execute reindex with batching
|
||||
stella evidence reindex \
|
||||
--batch-size 100 \
|
||||
--since 2026-01-01
|
||||
|
||||
# Step 3: Capture new root
|
||||
NEW_MERKLE_ROOT=$(stella evidence roots-export --format json | jq -r '.globalRoot')
|
||||
echo "Post-migration Merkle root: ${NEW_MERKLE_ROOT}" >> custody-log.txt
|
||||
date >> custody-log.txt
|
||||
```
|
||||
|
||||
### Chain-of-Custody Verification
|
||||
|
||||
After any evidence migration, verify continuity:
|
||||
|
||||
```bash
|
||||
# Verify that old proofs remain valid
|
||||
stella evidence verify-continuity \
|
||||
--old-root "${OLD_MERKLE_ROOT}" \
|
||||
--new-root "${NEW_MERKLE_ROOT}" \
|
||||
--output continuity-report.html \
|
||||
--format html
|
||||
|
||||
# Check verification results
|
||||
if grep -q "FAIL" continuity-report.html; then
|
||||
echo "ERROR: Chain-of-custody verification failed!"
|
||||
echo "Review continuity-report.html for details"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Immediate Rollback (Within Migration Window)
|
||||
|
||||
```bash
|
||||
# If migration fails mid-way, rollback is automatic
|
||||
# Check current migration state
|
||||
stella evidence migrate --status
|
||||
|
||||
# Force rollback if needed
|
||||
stella evidence migrate \
|
||||
--rollback \
|
||||
--from-version 2.0
|
||||
```
|
||||
|
||||
### Restore from Backup
|
||||
|
||||
```bash
|
||||
# Step 1: Stop evidence-related services
|
||||
kubectl scale deployment evidence-locker --replicas=0
|
||||
|
||||
# Step 2: Restore PostgreSQL evidence tables
|
||||
pg_restore -d stellaops \
|
||||
--table='evidence.*' \
|
||||
/backup/stellaops-backup.dump
|
||||
|
||||
# Step 3: Restore evidence files
|
||||
stella evidence import /backup/evidence-*/
|
||||
|
||||
# Step 4: Verify restoration
|
||||
stella evidence verify-all
|
||||
|
||||
# Step 5: Restart services
|
||||
kubectl scale deployment evidence-locker --replicas=3
|
||||
```
|
||||
|
||||
## Air-Gap Migration
|
||||
|
||||
For air-gapped environments without network access:
|
||||
|
||||
### Export Phase (Online Environment)
|
||||
|
||||
```bash
|
||||
# Create portable evidence bundle
|
||||
stella evidence export \
|
||||
--all \
|
||||
--portable \
|
||||
--include-schemas \
|
||||
--output /media/airgap-evidence.tar.gz
|
||||
|
||||
# Generate checksums
|
||||
sha256sum /media/airgap-evidence.tar.gz > /media/checksums.txt
|
||||
```
|
||||
|
||||
### Transfer Phase
|
||||
|
||||
1. Copy to removable media
|
||||
2. Verify checksums at destination
|
||||
3. Scan media for security
|
||||
|
||||
### Import Phase (Air-Gap Environment)
|
||||
|
||||
```bash
|
||||
# Verify transfer integrity
|
||||
sha256sum -c /media/checksums.txt
|
||||
|
||||
# Import evidence bundle
|
||||
stella evidence import \
|
||||
--portable \
|
||||
/media/airgap-evidence.tar.gz
|
||||
|
||||
# Verify import
|
||||
stella evidence verify-all
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Migration Stuck or Timeout
|
||||
|
||||
```bash
|
||||
# Check migration status
|
||||
stella evidence migrate --status
|
||||
|
||||
# View migration logs
|
||||
stella evidence migrate --logs
|
||||
|
||||
# Resume from last checkpoint
|
||||
stella evidence migrate --resume
|
||||
```
|
||||
|
||||
### Root Hash Mismatch
|
||||
|
||||
If verification reports root hash mismatch:
|
||||
|
||||
1. **Do not proceed** with upgrade
|
||||
2. Check for data corruption:
|
||||
```bash
|
||||
stella evidence integrity-check --deep
|
||||
```
|
||||
3. Review recent changes to evidence store
|
||||
4. Contact support with integrity report
|
||||
|
||||
### Missing Evidence Records
|
||||
|
||||
```bash
|
||||
# Count records by type
|
||||
stella evidence stats --by-type
|
||||
|
||||
# Find orphaned records
|
||||
stella evidence orphans --list
|
||||
|
||||
# Reconcile with source systems
|
||||
stella evidence reconcile --source attestor
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
|
||||
For large evidence stores (>1M records):
|
||||
|
||||
```bash
|
||||
# Run reindex in parallel batches
|
||||
stella evidence reindex \
|
||||
--parallel 4 \
|
||||
--batch-size 500 \
|
||||
--since 2026-01-01
|
||||
|
||||
# Monitor progress
|
||||
stella evidence reindex --progress
|
||||
```
|
||||
|
||||
## Audit Trail Requirements
|
||||
|
||||
All evidence migrations must maintain audit trail:
|
||||
|
||||
| Event | Required Data | Retention |
|
||||
|-------|---------------|-----------|
|
||||
| Migration Start | Timestamp, version, operator | Permanent |
|
||||
| Schema Change | Before/after schema versions | Permanent |
|
||||
| Root Hash Change | Old root, new root, cross-reference | Permanent |
|
||||
| Verification | Pass/fail, anomalies, timestamps | 7 years |
|
||||
| Rollback | Reason, restored version | Permanent |
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [Upgrade Runbook](upgrade-runbook.md) - Overall upgrade procedures
|
||||
- [Blue-Green Deployment](blue-green-deployment.md) - Zero-downtime deployment
|
||||
- [Evidence Locker Architecture](../modules/evidencelocker/architecture.md) - Technical design
|
||||
- [Air-Gap Operations](airgap-operations-runbook.md) - Offline deployment guide
|
||||
@@ -34,6 +34,8 @@ pkcs11-tool --version
|
||||
|
||||
## SoftHSM2 Setup (Development)
|
||||
|
||||
See [docs/operations/softhsm2-test-environment.md](operations/softhsm2-test-environment.md) for a focused test environment setup.
|
||||
|
||||
### Step 1: Initialize SoftHSM
|
||||
|
||||
```bash
|
||||
@@ -197,7 +199,7 @@ stringData:
|
||||
|
||||
```bash
|
||||
# Run HSM connectivity doctor check
|
||||
stella doctor --check hsm
|
||||
stella doctor --check check.crypto.hsm
|
||||
|
||||
# Expected output:
|
||||
# [PASS] HSM Connectivity
|
||||
|
||||
417
docs/operations/key-escrow-runbook.md
Normal file
417
docs/operations/key-escrow-runbook.md
Normal file
@@ -0,0 +1,417 @@
|
||||
# Key Escrow and Recovery Runbook
|
||||
|
||||
This runbook documents Shamir secret sharing key escrow and recovery procedures in Stella Ops.
|
||||
|
||||
> **Sprint:** SPRINT_20260112_018_CRYPTO_key_escrow_shamir
|
||||
|
||||
## Overview
|
||||
|
||||
Key escrow ensures critical cryptographic keys can be recovered if primary access is lost. Stella Ops uses Shamir's Secret Sharing to split keys into shares distributed among trusted custodians.
|
||||
|
||||
Key features:
|
||||
- M-of-N threshold recovery (any M shares reconstruct the key)
|
||||
- Share encryption at rest
|
||||
- Custodian-based share distribution
|
||||
- Integration with dual-control ceremonies
|
||||
- Full audit trail
|
||||
|
||||
## When to Use Key Escrow
|
||||
|
||||
| Scenario | Escrow Required |
|
||||
|----------|-----------------|
|
||||
| Root signing keys | Yes |
|
||||
| HSM master keys | Yes |
|
||||
| Trust anchor keys | Yes |
|
||||
| Service signing keys | Recommended |
|
||||
| User signing keys | Optional |
|
||||
| Ephemeral keys | No |
|
||||
|
||||
## Shamir Secret Sharing
|
||||
|
||||
### How It Works
|
||||
|
||||
Shamir's Secret Sharing splits a secret into N shares where any M shares can reconstruct the original:
|
||||
|
||||
```
|
||||
Secret S → Split(S, M, N) → [Share₁, Share₂, ..., Shareₙ]
|
||||
|
||||
Any M shares → Combine → Secret S
|
||||
Fewer than M shares → Cannot reconstruct
|
||||
```
|
||||
|
||||
### Configuration Parameters
|
||||
|
||||
| Parameter | Description | Recommended |
|
||||
|-----------|-------------|-------------|
|
||||
| Threshold (M) | Minimum shares needed | 2-3 for keys |
|
||||
| Total Shares (N) | Total shares created | M + 2 minimum |
|
||||
| Share Encryption | Encrypt shares at rest | Always enabled |
|
||||
|
||||
### Threshold Guidelines
|
||||
|
||||
| Key Type | Minimum M | Recommended N | Rationale |
|
||||
|----------|-----------|---------------|-----------|
|
||||
| Root keys | 3 | 5 | High assurance |
|
||||
| HSM keys | 2 | 4 | Availability + security |
|
||||
| Service keys | 2 | 3 | Operational recovery |
|
||||
|
||||
## Escrowing a Key
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
stella escrow create \
|
||||
--key-id root-signing-key-2026 \
|
||||
--threshold 3 \
|
||||
--shares 5 \
|
||||
--custodians custodian-1,custodian-2,custodian-3,custodian-4,custodian-5 \
|
||||
--expires-in 365d \
|
||||
--reason "Annual key escrow for root signing key"
|
||||
```
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
curl -X POST https://signer.example.com/api/v1/escrow \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"keyId": "root-signing-key-2026",
|
||||
"threshold": 3,
|
||||
"totalShares": 5,
|
||||
"custodianIds": [
|
||||
"custodian-1", "custodian-2", "custodian-3",
|
||||
"custodian-4", "custodian-5"
|
||||
],
|
||||
"expirationDays": 365,
|
||||
"reason": "Annual key escrow for root signing key"
|
||||
}'
|
||||
```
|
||||
|
||||
### Escrow Response
|
||||
|
||||
```json
|
||||
{
|
||||
"escrowId": "esc-abc123",
|
||||
"keyId": "root-signing-key-2026",
|
||||
"threshold": 3,
|
||||
"totalShares": 5,
|
||||
"status": "Active",
|
||||
"createdAt": "2026-01-16T10:00:00Z",
|
||||
"expiresAt": "2027-01-16T10:00:00Z",
|
||||
"shares": [
|
||||
{ "shareId": "shr-001", "custodianId": "custodian-1", "distributed": true },
|
||||
{ "shareId": "shr-002", "custodianId": "custodian-2", "distributed": true },
|
||||
{ "shareId": "shr-003", "custodianId": "custodian-3", "distributed": true },
|
||||
{ "shareId": "shr-004", "custodianId": "custodian-4", "distributed": true },
|
||||
{ "shareId": "shr-005", "custodianId": "custodian-5", "distributed": true }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Share Distribution
|
||||
|
||||
### Distribution Methods
|
||||
|
||||
| Method | Security | Use Case |
|
||||
|--------|----------|----------|
|
||||
| Direct API delivery | High | Automated systems |
|
||||
| Encrypted email | Medium | Remote custodians |
|
||||
| In-person ceremony | Highest | Root keys |
|
||||
| Hardware token | Highest | HSM keys |
|
||||
|
||||
### Custodian Requirements
|
||||
|
||||
Each custodian must:
|
||||
1. Have verified identity in Authority
|
||||
2. Complete escrow custodian training
|
||||
3. Have secure share storage capability
|
||||
4. Be geographically distributed (recommended)
|
||||
|
||||
### Verifying Share Distribution
|
||||
|
||||
```bash
|
||||
stella escrow status --escrow-id esc-abc123
|
||||
|
||||
# Output:
|
||||
# Escrow: esc-abc123
|
||||
# Key: root-signing-key-2026
|
||||
# Status: Active
|
||||
# Threshold: 3 of 5
|
||||
# Shares:
|
||||
# [1] custodian-1: Distributed ✓
|
||||
# [2] custodian-2: Distributed ✓
|
||||
# [3] custodian-3: Distributed ✓
|
||||
# [4] custodian-4: Distributed ✓
|
||||
# [5] custodian-5: Distributed ✓
|
||||
```
|
||||
|
||||
## Key Recovery
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Recovery requires:
|
||||
1. Valid recovery request (incident, key loss, rotation)
|
||||
2. Dual-control ceremony approval (if configured)
|
||||
3. Minimum M custodians available with shares
|
||||
4. Secure recovery environment
|
||||
|
||||
### Recovery Workflow
|
||||
|
||||
```
|
||||
1. Initiate recovery request
|
||||
2. (If required) Dual-control ceremony approval
|
||||
3. Collect shares from M custodians
|
||||
4. Verify share checksums
|
||||
5. Reconstruct key
|
||||
6. Verify reconstructed key
|
||||
7. Log recovery event
|
||||
```
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
# Step 1: Initiate recovery
|
||||
stella escrow recover init \
|
||||
--escrow-id esc-abc123 \
|
||||
--reason "HSM failure - emergency key recovery" \
|
||||
--ceremony-required
|
||||
|
||||
# Step 2: Collect shares (each custodian runs)
|
||||
stella escrow recover submit-share \
|
||||
--recovery-id rec-xyz789 \
|
||||
--share-file /secure/my-share.enc \
|
||||
--passphrase-file /secure/passphrase
|
||||
|
||||
# Step 3: Execute recovery (after threshold reached)
|
||||
stella escrow recover execute \
|
||||
--recovery-id rec-xyz789 \
|
||||
--output-key-file /secure/recovered-key.pem
|
||||
```
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
# Initiate recovery
|
||||
curl -X POST https://signer.example.com/api/v1/escrow/esc-abc123/recover \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"reason": "HSM failure - emergency key recovery",
|
||||
"requireCeremony": true
|
||||
}'
|
||||
|
||||
# Submit share
|
||||
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/shares \
|
||||
-H "Authorization: Bearer $CUSTODIAN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"shareId": "shr-001",
|
||||
"encryptedShare": "base64-encoded-share",
|
||||
"checksum": "sha256:abc123..."
|
||||
}'
|
||||
|
||||
# Execute recovery (after threshold)
|
||||
curl -X POST https://signer.example.com/api/v1/recovery/rec-xyz789/execute \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
|
||||
### Recovery Response
|
||||
|
||||
```json
|
||||
{
|
||||
"recoveryId": "rec-xyz789",
|
||||
"status": "Completed",
|
||||
"keyId": "root-signing-key-2026",
|
||||
"sharesCollected": 3,
|
||||
"threshold": 3,
|
||||
"completedAt": "2026-01-16T15:30:00Z",
|
||||
"keyFingerprint": "SHA256:xyz789...",
|
||||
"verified": true
|
||||
}
|
||||
```
|
||||
|
||||
## Share Management
|
||||
|
||||
### Custodian Share Storage
|
||||
|
||||
Custodians should store shares:
|
||||
|
||||
| Storage | Security Level | Notes |
|
||||
|---------|----------------|-------|
|
||||
| HSM | Highest | Preferred for root keys |
|
||||
| Hardware token | High | YubiKey, smart card |
|
||||
| Encrypted file | Medium | AES-256-GCM minimum |
|
||||
| Password manager | Medium | Enterprise vault only |
|
||||
|
||||
### Share Format
|
||||
|
||||
```json
|
||||
{
|
||||
"shareId": "shr-001",
|
||||
"escrowId": "esc-abc123",
|
||||
"index": 1,
|
||||
"threshold": 3,
|
||||
"totalShares": 5,
|
||||
"encryptedData": "base64-encoded-aes-256-gcm-ciphertext",
|
||||
"checksum": "sha256:abc123...",
|
||||
"createdAt": "2026-01-16T10:00:00Z",
|
||||
"expiresAt": "2027-01-16T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Share Rotation
|
||||
|
||||
Re-escrow keys periodically:
|
||||
|
||||
```bash
|
||||
stella escrow re-escrow \
|
||||
--escrow-id esc-abc123 \
|
||||
--new-custodians custodian-1,custodian-2,custodian-6,custodian-7,custodian-8 \
|
||||
--reason "Annual share rotation"
|
||||
```
|
||||
|
||||
This creates new shares and revokes old ones.
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Audit Events
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `escrow.created` | Key escrowed |
|
||||
| `escrow.share.distributed` | Share sent to custodian |
|
||||
| `escrow.share.accessed` | Custodian accessed share |
|
||||
| `recovery.initiated` | Recovery started |
|
||||
| `recovery.share.submitted` | Share submitted for recovery |
|
||||
| `recovery.completed` | Key reconstructed |
|
||||
| `recovery.failed` | Recovery failed |
|
||||
| `escrow.revoked` | Escrow revoked |
|
||||
|
||||
### Query Audit Logs
|
||||
|
||||
```bash
|
||||
stella audit query \
|
||||
--event-type "escrow.*,recovery.*" \
|
||||
--escrow-id esc-abc123 \
|
||||
--since 30d
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Escrow Settings
|
||||
|
||||
```yaml
|
||||
# escrow-config.yaml
|
||||
escrow:
|
||||
enabled: true
|
||||
defaultThreshold: 2
|
||||
minimumThreshold: 2
|
||||
maximumShares: 10
|
||||
shareEncryption:
|
||||
algorithm: AES-256-GCM
|
||||
keyDerivation: HKDF-SHA256
|
||||
requireDualControlForRecovery: true
|
||||
maxRecoveryAttempts: 3
|
||||
recoveryTimeoutHours: 24
|
||||
```
|
||||
|
||||
### Custodian Configuration
|
||||
|
||||
```yaml
|
||||
# custodians.yaml
|
||||
custodians:
|
||||
- id: custodian-1
|
||||
name: "Security Lead"
|
||||
email: security-lead@company.com
|
||||
publicKey: "-----BEGIN PUBLIC KEY-----..."
|
||||
location: "US-East"
|
||||
|
||||
- id: custodian-2
|
||||
name: "Key Officer A"
|
||||
email: key-officer-a@company.com
|
||||
publicKey: "-----BEGIN PUBLIC KEY-----..."
|
||||
location: "EU-West"
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Share Security
|
||||
|
||||
- Never transmit shares in plaintext
|
||||
- Encrypt shares with custodian's public key
|
||||
- Verify checksums before and after storage
|
||||
- Use secure channels for distribution
|
||||
|
||||
### Recovery Security
|
||||
|
||||
- Require dual-control ceremonies for critical keys
|
||||
- Limit recovery time window
|
||||
- Verify recovered key fingerprint
|
||||
- Audit all recovery attempts
|
||||
|
||||
### Custodian Security
|
||||
|
||||
- Verify custodian identity before share access
|
||||
- Geographic distribution reduces collusion risk
|
||||
- Rotate custodians periodically
|
||||
- Train custodians on secure handling
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Share checksum mismatch | Corrupted share | Request re-distribution |
|
||||
| Cannot decrypt share | Wrong passphrase | Verify passphrase |
|
||||
| Recovery timeout | Shares not collected in time | Restart recovery |
|
||||
| Key verification failed | Wrong shares combined | Verify share indices |
|
||||
|
||||
### Verification Failures
|
||||
|
||||
```bash
|
||||
# Verify share integrity
|
||||
stella escrow verify-share --share-file share.enc
|
||||
|
||||
# Test reconstruction with subset
|
||||
stella escrow test-recovery \
|
||||
--escrow-id esc-abc123 \
|
||||
--share-files share1.enc,share2.enc,share3.enc
|
||||
```
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Lost Share
|
||||
|
||||
If a custodian loses their share:
|
||||
|
||||
1. Verify at least M shares remain accessible
|
||||
2. Re-escrow with new share set
|
||||
3. Revoke compromised escrow
|
||||
4. Document incident
|
||||
|
||||
### Compromised Custodian
|
||||
|
||||
If a custodian is compromised:
|
||||
|
||||
1. Do NOT use their share for any recovery
|
||||
2. Re-escrow immediately with new custodians
|
||||
3. Revoke old escrow
|
||||
4. Consider key rotation if threshold was exposed
|
||||
|
||||
### Multiple Lost Shares
|
||||
|
||||
If fewer than M shares are available:
|
||||
|
||||
1. Key cannot be recovered via escrow
|
||||
2. Use backup key if available
|
||||
3. Generate new key and re-establish trust
|
||||
4. Document as key loss incident
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Dual-Control Ceremony Runbook](./dual-control-ceremony-runbook.md)
|
||||
- [Key Rotation Runbook](./key-rotation-runbook.md)
|
||||
- [HSM Setup Runbook](./hsm-setup-runbook.md)
|
||||
- [Cryptography Architecture](../modules/cryptography/architecture.md)
|
||||
362
docs/operations/rekor-sync-guide.md
Normal file
362
docs/operations/rekor-sync-guide.md
Normal file
@@ -0,0 +1,362 @@
|
||||
# Rekor Checkpoint Sync Configuration and Operations
|
||||
|
||||
This guide covers the configuration and operational procedures for the Rekor periodic checkpoint synchronization service.
|
||||
|
||||
## Overview
|
||||
|
||||
The Rekor sync service maintains a local mirror of Rekor transparency log checkpoints and tiles. This enables:
|
||||
|
||||
- **Offline verification**: Verify attestations without network access to Sigstore
|
||||
- **Air-gapped operation**: Run in environments without internet connectivity
|
||||
- **Performance**: Reduce latency by using local checkpoint data
|
||||
- **Auditability**: Maintain local evidence of log state at verification time
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ RekorSyncBackgroundService │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Checkpoint │ │ Signature │ │ Tile │ │
|
||||
│ │ Fetcher │────▶│ Verifier │────▶│ Syncer │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ HTTP Tile │ │ Checkpoint │ │ Tile │
|
||||
│ Client │ │ Store │ │ Cache │
|
||||
└──────────────┘ │ (PostgreSQL) │ │(File System) │
|
||||
│ └──────────────┘ └──────────────┘
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ Rekor │
|
||||
│ Server │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
```yaml
|
||||
attestor:
|
||||
rekorSync:
|
||||
# Enable or disable sync service
|
||||
enabled: true
|
||||
|
||||
# How often to fetch new checkpoints
|
||||
syncInterval: 5m
|
||||
|
||||
# Delay before first sync after startup
|
||||
initialDelay: 30s
|
||||
|
||||
# Enable tile synchronization for full offline support
|
||||
enableTileSync: true
|
||||
|
||||
# Maximum tiles to fetch per sync cycle
|
||||
maxTilesPerSync: 100
|
||||
|
||||
# Backend configurations
|
||||
backends:
|
||||
- id: sigstore-prod
|
||||
origin: rekor.sigstore.dev
|
||||
baseUrl: https://rekor.sigstore.dev
|
||||
publicKeyPath: /etc/stella/keys/rekor-sigstore-prod.pub
|
||||
|
||||
- id: sigstore-staging
|
||||
origin: rekor.sigstage.dev
|
||||
baseUrl: https://rekor.sigstage.dev
|
||||
publicKeyPath: /etc/stella/keys/rekor-sigstore-staging.pub
|
||||
```
|
||||
|
||||
### Checkpoint Store Configuration (PostgreSQL)
|
||||
|
||||
```yaml
|
||||
attestor:
|
||||
checkpointStore:
|
||||
connectionString: "Host=localhost;Database=stella;Username=stella;Password=secret"
|
||||
schema: attestor
|
||||
autoInitializeSchema: true
|
||||
```
|
||||
|
||||
### Tile Cache Configuration (File System)
|
||||
|
||||
```yaml
|
||||
attestor:
|
||||
tileCache:
|
||||
# Base directory for tile storage
|
||||
basePath: /var/lib/stella/attestor/tiles
|
||||
|
||||
# Maximum cache size (0 = unlimited)
|
||||
maxCacheSizeBytes: 10737418240 # 10 GB
|
||||
|
||||
# Auto-prune tiles older than this
|
||||
autoPruneAfter: 720h # 30 days
|
||||
```
|
||||
|
||||
## Operational Procedures
|
||||
|
||||
### Initial Setup
|
||||
|
||||
1. **Initialize the checkpoint store schema**:
|
||||
```bash
|
||||
stella attestor checkpoint-store init --connection "Host=localhost;..."
|
||||
```
|
||||
|
||||
2. **Configure backend(s)**:
|
||||
```bash
|
||||
stella attestor backend add sigstore-prod \
|
||||
--origin rekor.sigstore.dev \
|
||||
--url https://rekor.sigstore.dev \
|
||||
--public-key /path/to/rekor.pub
|
||||
```
|
||||
|
||||
3. **Perform initial sync**:
|
||||
```bash
|
||||
stella attestor sync --backend sigstore-prod --full
|
||||
```
|
||||
|
||||
### Manual Sync Operations
|
||||
|
||||
**Force immediate sync**:
|
||||
```bash
|
||||
stella attestor sync --backend sigstore-prod
|
||||
```
|
||||
|
||||
**Sync all backends**:
|
||||
```bash
|
||||
stella attestor sync --all
|
||||
```
|
||||
|
||||
**Full tile sync** (for offline kit preparation):
|
||||
```bash
|
||||
stella attestor sync --backend sigstore-prod --full-tiles
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
**Check sync status**:
|
||||
```bash
|
||||
stella attestor sync-status
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Backend Origin Tree Size Last Sync Age
|
||||
sigstore-prod rekor.sigstore.dev 45,678,901 2026-01-15 12:34:56 2m 15s
|
||||
sigstore-staging rekor.sigstage.dev 1,234,567 2026-01-15 12:30:00 6m 30s
|
||||
```
|
||||
|
||||
**Check checkpoint history**:
|
||||
```bash
|
||||
stella attestor checkpoints list --backend sigstore-prod --last 10
|
||||
```
|
||||
|
||||
**Check tile cache status**:
|
||||
```bash
|
||||
stella attestor tiles stats --backend sigstore-prod
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Origin: rekor.sigstore.dev
|
||||
Total Tiles: 45,678
|
||||
Cache Size: 1.4 GB
|
||||
Coverage: 100% (tree size 45,678,901)
|
||||
Oldest Tile: 2026-01-01 00:00:00
|
||||
Newest Tile: 2026-01-15 12:34:56
|
||||
```
|
||||
|
||||
### Metrics
|
||||
|
||||
The sync service exposes the following Prometheus metrics:
|
||||
|
||||
```
|
||||
# Counter: checkpoints fetched from remote
|
||||
attestor_rekor_sync_checkpoints_fetched_total{backend="sigstore-prod"} 1234
|
||||
|
||||
# Counter: checkpoints stored locally
|
||||
attestor_rekor_sync_checkpoints_stored_total{backend="sigstore-prod"} 1234
|
||||
|
||||
# Counter: tiles fetched from remote
|
||||
attestor_rekor_sync_tiles_fetched_total{backend="sigstore-prod"} 56789
|
||||
|
||||
# Counter: tiles cached locally
|
||||
attestor_rekor_sync_tiles_cached_total{backend="sigstore-prod"} 56789
|
||||
|
||||
# Histogram: checkpoint age at sync time (seconds)
|
||||
attestor_rekor_sync_checkpoint_age_seconds{backend="sigstore-prod"}
|
||||
|
||||
# Gauge: total tiles cached
|
||||
attestor_rekor_sync_tiles_cached{backend="sigstore-prod"} 45678
|
||||
|
||||
# Gauge: time since last successful sync (seconds)
|
||||
attestor_rekor_sync_last_success_seconds{backend="sigstore-prod"} 135
|
||||
|
||||
# Counter: sync errors
|
||||
attestor_rekor_sync_errors_total{backend="sigstore-prod",error_type="network"} 5
|
||||
```
|
||||
|
||||
### Alerting Recommendations
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: attestor-rekor-sync
|
||||
rules:
|
||||
- alert: RekorSyncStale
|
||||
expr: attestor_rekor_sync_last_success_seconds > 900
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Rekor sync is stale
|
||||
description: "No successful sync in {{ $value }}s for {{ $labels.backend }}"
|
||||
|
||||
- alert: RekorSyncFailing
|
||||
expr: rate(attestor_rekor_sync_errors_total[5m]) > 0.1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Rekor sync experiencing errors
|
||||
description: "Sync errors detected for {{ $labels.backend }}"
|
||||
```
|
||||
|
||||
### Maintenance Tasks
|
||||
|
||||
**Prune old checkpoints**:
|
||||
```bash
|
||||
# Keep only last 30 days of checkpoints
|
||||
stella attestor checkpoints prune --older-than 720h --keep-latest
|
||||
```
|
||||
|
||||
**Prune old tiles**:
|
||||
```bash
|
||||
# Remove tiles for entries no longer needed
|
||||
stella attestor tiles prune --older-than 720h
|
||||
```
|
||||
|
||||
**Verify checkpoint store integrity**:
|
||||
```bash
|
||||
stella attestor checkpoints verify --backend sigstore-prod
|
||||
```
|
||||
|
||||
**Export checkpoints for air-gap**:
|
||||
```bash
|
||||
stella attestor export \
|
||||
--backend sigstore-prod \
|
||||
--output /mnt/airgap/attestor-bundle.tar.gz \
|
||||
--include-tiles
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Sync Not Running
|
||||
|
||||
1. Check service logs:
|
||||
```bash
|
||||
journalctl -u stella-attestor -f
|
||||
```
|
||||
|
||||
2. Verify configuration:
|
||||
```bash
|
||||
stella attestor config validate
|
||||
```
|
||||
|
||||
3. Check database connectivity:
|
||||
```bash
|
||||
stella attestor checkpoint-store test
|
||||
```
|
||||
|
||||
### Signature Verification Failing
|
||||
|
||||
1. Verify public key is correct:
|
||||
```bash
|
||||
stella attestor backend verify-key sigstore-prod
|
||||
```
|
||||
|
||||
2. Check for key rotation:
|
||||
- Monitor Sigstore announcements
|
||||
- Update public key if rotated
|
||||
|
||||
3. Compare with direct fetch:
|
||||
```bash
|
||||
curl -s https://rekor.sigstore.dev/api/v1/log | jq
|
||||
```
|
||||
|
||||
### Tile Cache Issues
|
||||
|
||||
1. Check disk space:
|
||||
```bash
|
||||
df -h /var/lib/stella/attestor/tiles
|
||||
```
|
||||
|
||||
2. Verify permissions:
|
||||
```bash
|
||||
ls -la /var/lib/stella/attestor/tiles
|
||||
```
|
||||
|
||||
3. Clear and resync:
|
||||
```bash
|
||||
stella attestor tiles clear --backend sigstore-prod
|
||||
stella attestor sync --backend sigstore-prod --full-tiles
|
||||
```
|
||||
|
||||
### Database Issues
|
||||
|
||||
1. Check PostgreSQL connectivity:
|
||||
```bash
|
||||
psql -h localhost -U stella -d stella -c "SELECT 1"
|
||||
```
|
||||
|
||||
2. Verify schema exists:
|
||||
```sql
|
||||
SELECT * FROM attestor.rekor_checkpoints LIMIT 1;
|
||||
```
|
||||
|
||||
3. Reinitialize schema if needed:
|
||||
```bash
|
||||
stella attestor checkpoint-store init --force
|
||||
```
|
||||
|
||||
## Air-Gap Operations
|
||||
|
||||
### Preparing an Offline Bundle
|
||||
|
||||
1. Sync to latest checkpoint:
|
||||
```bash
|
||||
stella attestor sync --backend sigstore-prod --full-tiles
|
||||
```
|
||||
|
||||
2. Export bundle:
|
||||
```bash
|
||||
stella attestor export \
|
||||
--backend sigstore-prod \
|
||||
--output offline-attestor-bundle.tar.gz \
|
||||
--include-tiles \
|
||||
--checkpoints-only-verified
|
||||
```
|
||||
|
||||
3. Transfer bundle to air-gapped environment
|
||||
|
||||
### Importing in Air-Gapped Environment
|
||||
|
||||
1. Import the bundle:
|
||||
```bash
|
||||
stella attestor import offline-attestor-bundle.tar.gz
|
||||
```
|
||||
|
||||
2. Verify import:
|
||||
```bash
|
||||
stella attestor sync-status
|
||||
```
|
||||
|
||||
3. Checkpoints and tiles are now available for offline verification
|
||||
|
||||
## See Also
|
||||
|
||||
- [Rekor Verification Design](../modules/attestor/rekor-verification-design.md)
|
||||
- [Checkpoint Divergence Detection](./checkpoint-divergence-runbook.md)
|
||||
- [Offline Kit Preparation](./offline-kit-guide.md)
|
||||
- [Sigstore Rekor Documentation](https://docs.sigstore.dev/rekor/overview/)
|
||||
70
docs/operations/softhsm2-test-environment.md
Normal file
70
docs/operations/softhsm2-test-environment.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# SoftHSM2 Test Environment Setup
|
||||
|
||||
This guide describes how to configure SoftHSM2 for PKCS#11 integration tests and local validation.
|
||||
|
||||
## Install SoftHSM2
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install softhsm2 opensc
|
||||
|
||||
# Verify installation
|
||||
softhsm2-util --version
|
||||
pkcs11-tool --version
|
||||
```
|
||||
|
||||
## Initialize Token
|
||||
|
||||
```bash
|
||||
# Create token directory
|
||||
mkdir -p /var/lib/softhsm/tokens
|
||||
chmod 700 /var/lib/softhsm/tokens
|
||||
|
||||
# Initialize token
|
||||
softhsm2-util --init-token \
|
||||
--slot 0 \
|
||||
--label "StellaOps-Dev" \
|
||||
--so-pin 12345678 \
|
||||
--pin 87654321
|
||||
|
||||
# Verify token
|
||||
softhsm2-util --show-slots
|
||||
```
|
||||
|
||||
## Create a Test Key
|
||||
|
||||
```bash
|
||||
# Generate RSA keypair
|
||||
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
|
||||
--login --pin 87654321 \
|
||||
--keypairgen \
|
||||
--key-type rsa:2048 \
|
||||
--id 01 \
|
||||
--label "stellaops-hsm-test"
|
||||
|
||||
# List objects
|
||||
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
|
||||
--login --pin 87654321 \
|
||||
--list-objects
|
||||
```
|
||||
|
||||
## Environment Variables for Tests
|
||||
|
||||
```bash
|
||||
export STELLAOPS_SOFTHSM_LIB="/usr/lib/softhsm/libsofthsm2.so"
|
||||
export STELLAOPS_SOFTHSM_SLOT="0"
|
||||
export STELLAOPS_SOFTHSM_PIN="87654321"
|
||||
export STELLAOPS_SOFTHSM_KEY_ID="stellaops-hsm-test"
|
||||
export STELLAOPS_SOFTHSM_MECHANISM="RsaSha256"
|
||||
```
|
||||
|
||||
## Run Integration Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/Cryptography/__Tests/StellaOps.Cryptography.Tests/StellaOps.Cryptography.Tests.csproj \
|
||||
--filter FullyQualifiedName~Pkcs11HsmClientIntegrationTests
|
||||
```
|
||||
|
||||
## Notes
|
||||
- The integration tests skip automatically if SoftHSM2 variables are not configured.
|
||||
- Use a dedicated test token; never reuse production tokens.
|
||||
@@ -628,9 +628,150 @@ To allow approved exceptions to cover specific unknown reason codes, set excepti
|
||||
- [Triage Technical Reference](../product/advisories/14-Dec-2025%20-%20Triage%20and%20Unknowns%20Technical%20Reference.md)
|
||||
- [Score Proofs Runbook](./score-proofs-runbook.md)
|
||||
- [Policy Engine](../modules/policy/architecture.md)
|
||||
- [Determinization API](../modules/policy/determinization-api.md)
|
||||
- [VEX Consensus Guide](../VEX_CONSENSUS_GUIDE.md)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-22
|
||||
**Version**: 1.0.0
|
||||
**Sprint**: 3500.0004.0004
|
||||
## 8. Grey Queue Operations
|
||||
|
||||
> **Sprint**: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli
|
||||
|
||||
The Grey Queue handles observations with uncertain status requiring operator attention or additional evidence. These are distinct from standard HOT/WARM/COLD band unknowns.
|
||||
|
||||
### 8.1 Grey Queue Overview
|
||||
|
||||
Grey Queue items have:
|
||||
- **Observation state**: `PendingDeterminization`, `Disputed`, or `GuardedPass`
|
||||
- **Reanalysis fingerprint**: Deterministic ID for reproducible replays
|
||||
- **Triggers**: Events that caused reanalysis
|
||||
- **Conflicts**: Detected evidence disagreements
|
||||
- **Next actions**: Suggested resolution paths
|
||||
|
||||
### 8.2 List Grey Queue Items
|
||||
|
||||
```bash
|
||||
# List all grey queue items
|
||||
stella unknowns list --state grey
|
||||
|
||||
# List by observation state
|
||||
stella unknowns list --observation-state pending-determinization
|
||||
stella unknowns list --observation-state disputed
|
||||
stella unknowns list --observation-state guarded-pass
|
||||
|
||||
# List with fingerprint details
|
||||
stella unknowns list --state grey --show-fingerprint
|
||||
|
||||
# List with conflict summary
|
||||
stella unknowns list --state grey --show-conflicts
|
||||
```
|
||||
|
||||
### 8.3 View Grey Queue Details
|
||||
|
||||
```bash
|
||||
# Show grey queue item with full details
|
||||
stella unknowns show unk-12345678-... --grey
|
||||
|
||||
# Output:
|
||||
# ID: unk-12345678-...
|
||||
# Observation State: Disputed
|
||||
#
|
||||
# Reanalysis Fingerprint:
|
||||
# ID: sha256:abc123...
|
||||
# Computed At: 2026-01-15T10:00:00Z
|
||||
# Policy Config Hash: sha256:def456...
|
||||
#
|
||||
# Triggers (2):
|
||||
# - epss.updated@1 (2026-01-15T09:55:00Z) delta=0.15
|
||||
# - vex.updated@1 (2026-01-15T09:50:00Z)
|
||||
#
|
||||
# Conflicts (1):
|
||||
# - VexStatusConflict: vendor-a reports 'not_affected', vendor-b reports 'affected'
|
||||
# Severity: high
|
||||
# Adjudication: manual_review
|
||||
#
|
||||
# Next Actions:
|
||||
# - trust_resolution: Resolve issuer trust conflict
|
||||
# - manual_review: Escalate to security team
|
||||
|
||||
# Show fingerprint only
|
||||
stella unknowns fingerprint unk-12345678-...
|
||||
|
||||
# Show triggers only
|
||||
stella unknowns triggers unk-12345678-...
|
||||
```
|
||||
|
||||
### 8.4 Grey Queue Triage Actions
|
||||
|
||||
```bash
|
||||
# Resolve a grey queue item (operator determination)
|
||||
stella unknowns resolve unk-12345678-... \
|
||||
--status not_affected \
|
||||
--justification "Verified vendor VEX is authoritative" \
|
||||
--evidence-ref "vex-observation-id-123"
|
||||
|
||||
# Escalate for manual review
|
||||
stella unknowns escalate unk-12345678-... \
|
||||
--priority P1 \
|
||||
--reason "Conflicting VEX requires security team decision"
|
||||
|
||||
# Defer pending additional evidence
|
||||
stella unknowns defer unk-12345678-... \
|
||||
--await vex \
|
||||
--reason "Waiting for upstream vendor VEX statement"
|
||||
```
|
||||
|
||||
### 8.5 Grey Queue Conflict Resolution
|
||||
|
||||
```bash
|
||||
# List items with conflicts
|
||||
stella unknowns list --has-conflicts
|
||||
|
||||
# Filter by conflict type
|
||||
stella unknowns list --conflict-type vex-status-conflict
|
||||
stella unknowns list --conflict-type vex-reachability-contradiction
|
||||
stella unknowns list --conflict-type trust-tie
|
||||
|
||||
# Resolve a conflict manually
|
||||
stella unknowns resolve-conflict unk-12345678-... \
|
||||
--winner vendor-a \
|
||||
--reason "vendor-a is the upstream maintainer"
|
||||
```
|
||||
|
||||
### 8.6 Grey Queue Summary
|
||||
|
||||
```bash
|
||||
# Get grey queue summary
|
||||
stella unknowns summary --grey
|
||||
|
||||
# Output:
|
||||
# Grey Queue: 23 items
|
||||
#
|
||||
# By State:
|
||||
# PendingDeterminization: 15 (65%)
|
||||
# Disputed: 5 (22%)
|
||||
# GuardedPass: 3 (13%)
|
||||
#
|
||||
# Conflicts: 8 items have conflicts
|
||||
# Avg. Triggers: 2.3 per item
|
||||
# Oldest: 7 days
|
||||
```
|
||||
|
||||
### 8.7 Grey Queue Export
|
||||
|
||||
```bash
|
||||
# Export grey queue for analysis
|
||||
stella unknowns export --state grey --format json --output grey-queue.json
|
||||
|
||||
# Export with full fingerprints and triggers
|
||||
stella unknowns export --state grey --verbose --output grey-full.json
|
||||
|
||||
# Export conflicts only
|
||||
stella unknowns export --has-conflicts --format csv --output conflicts.csv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2026-01-16
|
||||
**Version**: 1.1.0
|
||||
**Sprint**: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli
|
||||
|
||||
Reference in New Issue
Block a user