Files
git.stella-ops.org/docs/operations/break-glass-runbook.md

8.6 KiB

Break-Glass Account Runbook

This runbook documents emergency access procedures using the break-glass account system when standard authentication is unavailable.

Sprint: SPRINT_20260112_018_AUTH_local_rbac_fallback

Overview

Break-glass accounts provide emergency administrative access when:

  • PostgreSQL database is unavailable
  • OIDC/OAuth2 identity provider is unreachable
  • Authority service is degraded
  • Network isolation prevents standard authentication

Break-glass access is fully audited and time-limited by design.

When to Use Break-Glass Access

Scenario Standard Auth Break-Glass
Database maintenance N/A Use
IdP outage Unavailable Use
Network partition Unavailable Use
Routine operations Available Do NOT use
Security incident response May be unavailable Use with incident code

CRITICAL: Break-glass access should only be used when standard authentication is genuinely unavailable. All usage is logged and auditable.

Prerequisites

Configuration Requirements

Break-glass must be explicitly enabled in local policy:

# /etc/stellaops/authority/local-policy.yaml
breakGlass:
  enabled: true
  sessionTimeoutMinutes: 15
  maxExtensions: 2
  allowedReasonCodes:
    - database_maintenance
    - idp_outage
    - network_partition
    - security_incident
    - disaster_recovery
  accounts:
    - id: "break-glass-admin"
      passwordHash: "$argon2id$v=19$m=65536,t=3,p=4$..."
      roles: ["admin"]

Password Hash Generation

Generate password hashes using Argon2id:

# Using argon2 CLI tool
echo -n "your-secure-password" | argon2 $(openssl rand -base64 16) -id -t 3 -m 16 -p 4 -l 32 -e

# Or using stella CLI
stella auth hash-password --algorithm argon2id

Break-Glass Login Procedure

Step 1: Verify Standard Auth is Unavailable

Before using break-glass, confirm standard authentication is genuinely unavailable:

# Check Authority health
curl -s https://authority.example.com/health | jq .

# Check OIDC endpoint
curl -s https://idp.example.com/.well-known/openid-configuration

# Check database connectivity
stella doctor check --component postgres

Step 2: Access Break-Glass Login

Navigate to the break-glass endpoint:

https://authority.example.com/break-glass/login

Or use the CLI:

stella auth break-glass login \
  --account break-glass-admin \
  --reason database_maintenance

Step 3: Provide Credentials and Reason

Field Description Required
Account ID Break-glass account identifier Yes
Password Account password Yes
Reason Code Pre-approved reason code Yes
Reason Details Free-text explanation Recommended

Approved Reason Codes:

Code Description
database_maintenance Scheduled or emergency database work
idp_outage Identity provider unavailable
network_partition Network connectivity issues
security_incident Active security incident response
disaster_recovery DR/BCP activation

Step 4: Session Created

On successful authentication:

  • Session token issued with limited TTL (default: 15 minutes)
  • Audit event logged: breakglass.session.created
  • All subsequent actions are tagged with break-glass context

Session Management

Session Timeout

Break-glass sessions have strict time limits:

Setting Default Description
sessionTimeoutMinutes 15 Session lifetime
maxExtensions 2 Maximum session extensions
Extension period 15 min Time added per extension

Extending a Session

If additional time is needed:

# CLI
stella auth break-glass extend \
  --session-id <session-id> \
  --reason "database migration still running"

# UI
# Click "Extend Session" button in break-glass banner

Extension requires:

  1. Re-entering password
  2. Providing extension reason
  3. Not exceeding maxExtensions limit

Session Termination

Sessions end when:

  • User explicitly logs out
  • Session timeout expires
  • Max extensions reached
  • Administrator force-terminates
# Explicit logout
stella auth break-glass logout --session-id <session-id>

# Force terminate (admin)
stella auth break-glass terminate --session-id <session-id> --reason "normal auth restored"

Audit Trail

Audit Events

All break-glass activity is logged:

Event Description
breakglass.session.created Session started
breakglass.session.extended Session extended
breakglass.session.terminated User logout
breakglass.session.expired Timeout reached
breakglass.auth.failed Authentication failed
breakglass.reason.invalid Invalid reason code
breakglass.extensions.exceeded Max extensions reached

Audit Event Structure

{
  "eventType": "breakglass.session.created",
  "timestamp": "2026-01-16T10:30:00Z",
  "accountId": "break-glass-admin",
  "sessionId": "bg-sess-abc123",
  "reasonCode": "database_maintenance",
  "reasonDetails": "PostgreSQL major version upgrade",
  "sourceIp": "10.0.1.50",
  "userAgent": "stella-cli/2027.Q1"
}

Querying Audit Logs

# List all break-glass events
stella audit query --event-type "breakglass.*" --since "24h"

# Export for compliance
stella audit export \
  --event-type "breakglass.*" \
  --start 2026-01-01 \
  --end 2026-01-31 \
  --format json \
  --output break-glass-audit-jan2026.json

Fallback Policy Store

Automatic Failover

When PostgreSQL becomes unavailable:

  1. Authority detects health check failures
  2. After failureThreshold (default: 3) consecutive failures
  3. Authority switches to local policy store
  4. Mode changes to Fallback
  5. Event logged: authority.mode.changed

Policy Store Modes

Mode Description Available Features
Primary PostgreSQL available Full RBAC, user management
Fallback Using local policy Break-glass only
Degraded Both degraded Emergency access only

Recovery

When PostgreSQL recovers:

  1. Health checks pass
  2. After minFallbackDurationMs (default: 30s) cooldown
  3. Authority switches back to Primary
  4. Fallback sessions can continue until expiry

Security Considerations

Password Policy

Break-glass account passwords should:

  • Be at least 20 characters
  • Include upper, lower, numbers, symbols
  • Be stored securely (HSM, Vault, split custody)
  • Be rotated on a schedule (quarterly recommended)

Access Control

  • Limit break-glass accounts to essential personnel
  • Use separate accounts per operator when possible
  • Review access list quarterly
  • Disable unused accounts immediately

Monitoring

Set up alerts for break-glass activity:

# Alert rule example
- alert: BreakGlassSessionCreated
  expr: stellaops_breakglass_sessions_created_total > 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Break-glass session created
    description: A break-glass session was created. Verify this is expected.

Troubleshooting

Login Failures

Error Cause Resolution
invalid_credentials Wrong password Verify password
invalid_reason_code Reason not in allowed list Use approved reason code
account_disabled Account explicitly disabled Contact administrator
break_glass_disabled Feature disabled in config Enable in local-policy.yaml

Session Issues

Issue Cause Resolution
Session expired immediately Clock skew Sync server time
Cannot extend Max extensions reached Log out and re-authenticate
Actions failing Insufficient roles Verify account has required roles

Policy Store Issues

# Check policy store status
stella doctor check --component authority

# Verify local policy file
stella auth policy validate --file /etc/stellaops/authority/local-policy.yaml

# Force reload policy
stella auth policy reload

Compliance Notes

Break-glass usage must be:

  • Documented in incident reports
  • Reviewed during security audits
  • Reported in compliance dashboards
  • Justified for each session

Retain audit logs for:

  • SOC 2: 1 year minimum
  • HIPAA: 6 years
  • PCI-DSS: 1 year
  • Internal policy: As defined