sprints completion. new product advisories prepared
This commit is contained in:
331
docs/operations/break-glass-runbook.md
Normal file
331
docs/operations/break-glass-runbook.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Break-Glass Account Runbook
|
||||
|
||||
This runbook documents emergency access procedures using the break-glass account system when standard authentication is unavailable.
|
||||
|
||||
> **Sprint:** SPRINT_20260112_018_AUTH_local_rbac_fallback
|
||||
|
||||
## Overview
|
||||
|
||||
Break-glass accounts provide emergency administrative access when:
|
||||
- PostgreSQL database is unavailable
|
||||
- OIDC/OAuth2 identity provider is unreachable
|
||||
- Authority service is degraded
|
||||
- Network isolation prevents standard authentication
|
||||
|
||||
Break-glass access is fully audited and time-limited by design.
|
||||
|
||||
## When to Use Break-Glass Access
|
||||
|
||||
| Scenario | Standard Auth | Break-Glass |
|
||||
|----------|---------------|-------------|
|
||||
| Database maintenance | N/A | Use |
|
||||
| IdP outage | Unavailable | Use |
|
||||
| Network partition | Unavailable | Use |
|
||||
| Routine operations | Available | Do NOT use |
|
||||
| Security incident response | May be unavailable | Use with incident code |
|
||||
|
||||
**CRITICAL:** Break-glass access should only be used when standard authentication is genuinely unavailable. All usage is logged and auditable.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Configuration Requirements
|
||||
|
||||
Break-glass must be explicitly enabled in local policy:
|
||||
|
||||
```yaml
|
||||
# /etc/stellaops/authority/local-policy.yaml
|
||||
breakGlass:
|
||||
enabled: true
|
||||
sessionTimeoutMinutes: 15
|
||||
maxExtensions: 2
|
||||
allowedReasonCodes:
|
||||
- database_maintenance
|
||||
- idp_outage
|
||||
- network_partition
|
||||
- security_incident
|
||||
- disaster_recovery
|
||||
accounts:
|
||||
- id: "break-glass-admin"
|
||||
passwordHash: "$argon2id$v=19$m=65536,t=3,p=4$..."
|
||||
roles: ["admin"]
|
||||
```
|
||||
|
||||
### Password Hash Generation
|
||||
|
||||
Generate password hashes using Argon2id:
|
||||
|
||||
```bash
|
||||
# Using argon2 CLI tool
|
||||
echo -n "your-secure-password" | argon2 $(openssl rand -base64 16) -id -t 3 -m 16 -p 4 -l 32 -e
|
||||
|
||||
# Or using stella CLI
|
||||
stella auth hash-password --algorithm argon2id
|
||||
```
|
||||
|
||||
## Break-Glass Login Procedure
|
||||
|
||||
### Step 1: Verify Standard Auth is Unavailable
|
||||
|
||||
Before using break-glass, confirm standard authentication is genuinely unavailable:
|
||||
|
||||
```bash
|
||||
# Check Authority health
|
||||
curl -s https://authority.example.com/health | jq .
|
||||
|
||||
# Check OIDC endpoint
|
||||
curl -s https://idp.example.com/.well-known/openid-configuration
|
||||
|
||||
# Check database connectivity
|
||||
stella doctor check --component postgres
|
||||
```
|
||||
|
||||
### Step 2: Access Break-Glass Login
|
||||
|
||||
Navigate to the break-glass endpoint:
|
||||
|
||||
```
|
||||
https://authority.example.com/break-glass/login
|
||||
```
|
||||
|
||||
Or use the CLI:
|
||||
|
||||
```bash
|
||||
stella auth break-glass login \
|
||||
--account break-glass-admin \
|
||||
--reason database_maintenance
|
||||
```
|
||||
|
||||
### Step 3: Provide Credentials and Reason
|
||||
|
||||
| Field | Description | Required |
|
||||
|-------|-------------|----------|
|
||||
| Account ID | Break-glass account identifier | Yes |
|
||||
| Password | Account password | Yes |
|
||||
| Reason Code | Pre-approved reason code | Yes |
|
||||
| Reason Details | Free-text explanation | Recommended |
|
||||
|
||||
**Approved Reason Codes:**
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `database_maintenance` | Scheduled or emergency database work |
|
||||
| `idp_outage` | Identity provider unavailable |
|
||||
| `network_partition` | Network connectivity issues |
|
||||
| `security_incident` | Active security incident response |
|
||||
| `disaster_recovery` | DR/BCP activation |
|
||||
|
||||
### Step 4: Session Created
|
||||
|
||||
On successful authentication:
|
||||
|
||||
- Session token issued with limited TTL (default: 15 minutes)
|
||||
- Audit event logged: `breakglass.session.created`
|
||||
- All subsequent actions are tagged with break-glass context
|
||||
|
||||
## Session Management
|
||||
|
||||
### Session Timeout
|
||||
|
||||
Break-glass sessions have strict time limits:
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `sessionTimeoutMinutes` | 15 | Session lifetime |
|
||||
| `maxExtensions` | 2 | Maximum session extensions |
|
||||
| Extension period | 15 min | Time added per extension |
|
||||
|
||||
### Extending a Session
|
||||
|
||||
If additional time is needed:
|
||||
|
||||
```bash
|
||||
# CLI
|
||||
stella auth break-glass extend \
|
||||
--session-id <session-id> \
|
||||
--reason "database migration still running"
|
||||
|
||||
# UI
|
||||
# Click "Extend Session" button in break-glass banner
|
||||
```
|
||||
|
||||
Extension requires:
|
||||
1. Re-entering password
|
||||
2. Providing extension reason
|
||||
3. Not exceeding `maxExtensions` limit
|
||||
|
||||
### Session Termination
|
||||
|
||||
Sessions end when:
|
||||
- User explicitly logs out
|
||||
- Session timeout expires
|
||||
- Max extensions reached
|
||||
- Administrator force-terminates
|
||||
|
||||
```bash
|
||||
# Explicit logout
|
||||
stella auth break-glass logout --session-id <session-id>
|
||||
|
||||
# Force terminate (admin)
|
||||
stella auth break-glass terminate --session-id <session-id> --reason "normal auth restored"
|
||||
```
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Audit Events
|
||||
|
||||
All break-glass activity is logged:
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `breakglass.session.created` | Session started |
|
||||
| `breakglass.session.extended` | Session extended |
|
||||
| `breakglass.session.terminated` | User logout |
|
||||
| `breakglass.session.expired` | Timeout reached |
|
||||
| `breakglass.auth.failed` | Authentication failed |
|
||||
| `breakglass.reason.invalid` | Invalid reason code |
|
||||
| `breakglass.extensions.exceeded` | Max extensions reached |
|
||||
|
||||
### Audit Event Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"eventType": "breakglass.session.created",
|
||||
"timestamp": "2026-01-16T10:30:00Z",
|
||||
"accountId": "break-glass-admin",
|
||||
"sessionId": "bg-sess-abc123",
|
||||
"reasonCode": "database_maintenance",
|
||||
"reasonDetails": "PostgreSQL major version upgrade",
|
||||
"sourceIp": "10.0.1.50",
|
||||
"userAgent": "stella-cli/2027.Q1"
|
||||
}
|
||||
```
|
||||
|
||||
### Querying Audit Logs
|
||||
|
||||
```bash
|
||||
# List all break-glass events
|
||||
stella audit query --event-type "breakglass.*" --since "24h"
|
||||
|
||||
# Export for compliance
|
||||
stella audit export \
|
||||
--event-type "breakglass.*" \
|
||||
--start 2026-01-01 \
|
||||
--end 2026-01-31 \
|
||||
--format json \
|
||||
--output break-glass-audit-jan2026.json
|
||||
```
|
||||
|
||||
## Fallback Policy Store
|
||||
|
||||
### Automatic Failover
|
||||
|
||||
When PostgreSQL becomes unavailable:
|
||||
|
||||
1. Authority detects health check failures
|
||||
2. After `failureThreshold` (default: 3) consecutive failures
|
||||
3. Authority switches to local policy store
|
||||
4. Mode changes to `Fallback`
|
||||
5. Event logged: `authority.mode.changed`
|
||||
|
||||
### Policy Store Modes
|
||||
|
||||
| Mode | Description | Available Features |
|
||||
|------|-------------|-------------------|
|
||||
| `Primary` | PostgreSQL available | Full RBAC, user management |
|
||||
| `Fallback` | Using local policy | Break-glass only |
|
||||
| `Degraded` | Both degraded | Emergency access only |
|
||||
|
||||
### Recovery
|
||||
|
||||
When PostgreSQL recovers:
|
||||
|
||||
1. Health checks pass
|
||||
2. After `minFallbackDurationMs` (default: 30s) cooldown
|
||||
3. Authority switches back to Primary
|
||||
4. Fallback sessions can continue until expiry
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Password Policy
|
||||
|
||||
Break-glass account passwords should:
|
||||
- Be at least 20 characters
|
||||
- Include upper, lower, numbers, symbols
|
||||
- Be stored securely (HSM, Vault, split custody)
|
||||
- Be rotated on a schedule (quarterly recommended)
|
||||
|
||||
### Access Control
|
||||
|
||||
- Limit break-glass accounts to essential personnel
|
||||
- Use separate accounts per operator when possible
|
||||
- Review access list quarterly
|
||||
- Disable unused accounts immediately
|
||||
|
||||
### Monitoring
|
||||
|
||||
Set up alerts for break-glass activity:
|
||||
|
||||
```yaml
|
||||
# Alert rule example
|
||||
- alert: BreakGlassSessionCreated
|
||||
expr: stellaops_breakglass_sessions_created_total > 0
|
||||
for: 0m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Break-glass session created
|
||||
description: A break-glass session was created. Verify this is expected.
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Login Failures
|
||||
|
||||
| Error | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| `invalid_credentials` | Wrong password | Verify password |
|
||||
| `invalid_reason_code` | Reason not in allowed list | Use approved reason code |
|
||||
| `account_disabled` | Account explicitly disabled | Contact administrator |
|
||||
| `break_glass_disabled` | Feature disabled in config | Enable in local-policy.yaml |
|
||||
|
||||
### Session Issues
|
||||
|
||||
| Issue | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Session expired immediately | Clock skew | Sync server time |
|
||||
| Cannot extend | Max extensions reached | Log out and re-authenticate |
|
||||
| Actions failing | Insufficient roles | Verify account has required roles |
|
||||
|
||||
### Policy Store Issues
|
||||
|
||||
```bash
|
||||
# Check policy store status
|
||||
stella doctor check --component authority
|
||||
|
||||
# Verify local policy file
|
||||
stella auth policy validate --file /etc/stellaops/authority/local-policy.yaml
|
||||
|
||||
# Force reload policy
|
||||
stella auth policy reload
|
||||
```
|
||||
|
||||
## Compliance Notes
|
||||
|
||||
Break-glass usage must be:
|
||||
- Documented in incident reports
|
||||
- Reviewed during security audits
|
||||
- Reported in compliance dashboards
|
||||
- Justified for each session
|
||||
|
||||
Retain audit logs for:
|
||||
- SOC 2: 1 year minimum
|
||||
- HIPAA: 6 years
|
||||
- PCI-DSS: 1 year
|
||||
- Internal policy: As defined
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Local RBAC Policy Schema](../modules/authority/local-policy-schema.md)
|
||||
- [Authority Architecture](../modules/authority/architecture.md)
|
||||
- [Offline Operations](../operations/airgap-operations-runbook.md)
|
||||
- [Audit System](../modules/audit/architecture.md)
|
||||
Reference in New Issue
Block a user