Files
git.stella-ops.org/docs/modules/authority/operations/backup-restore.md
master 8bbfe4d2d2 feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00

6.0 KiB
Raw Blame History

Authority Backup & Restore Runbook

Scope

  • Applies to: StellaOps Authority deployments running the official ops/authority/docker-compose.authority.yaml stack or equivalent Kubernetes packaging.
  • Artifacts covered: PostgreSQL (stellaops-authority database), Authority configuration (etc/authority.yaml), plugin manifests under etc/authority.plugins/, and signing key material stored in the authority-keys volume (defaults to /app/keys inside the container).
  • Frequency: Run the full procedure prior to upgrades, before rotating keys, and at least once per 24 h in production. Store snapshots in an encrypted, access-controlled vault.

Inventory Checklist

Component Location (compose default) Notes
PostgreSQL data postgres-data volume (/var/lib/docker/volumes/.../postgres-data) Contains all Authority tables (authority_user, authority_client, authority_token, etc.).
Configuration etc/authority.yaml Mounted read-only into the container at /etc/authority.yaml.
Plugin manifests etc/authority.plugins/*.yaml Includes standard.yaml with tokenSigning.keyDirectory.
Signing keys authority-keys volume -> /app/keys Path is derived from tokenSigning.keyDirectory (defaults to ../keys relative to the manifest).

TIP: Confirm the deployed key directory via tokenSigning.keyDirectory in etc/authority.plugins/standard.yaml; some installations relocate keys to /var/lib/stellaops/authority/keys.

Hot Backup (no downtime)

  1. Create output directory: mkdir -p backup/$(date +%Y-%m-%d) on the host.
  2. Dump PostgreSQL:
    docker compose -f ops/authority/docker-compose.authority.yaml exec postgres \
      pg_dump -Fc -d stellaops-authority \
      -f /dump/authority-$(date +%Y%m%dT%H%M%SZ).dump
    docker compose -f ops/authority/docker-compose.authority.yaml cp \
      postgres:/dump/authority-$(date +%Y%m%dT%H%M%SZ).dump backup/
    
    The pg_dump archive preserves indexes and can be restored with pg_restore.
  3. Capture configuration + manifests:
    cp etc/authority.yaml backup/
    rsync -a etc/authority.plugins/ backup/authority.plugins/
    
  4. Export signing keys: the compose file maps authority-keys to a local Docker volume. Snapshot it without stopping the service:
    docker run --rm \
      -v authority-keys:/keys \
      -v "$(pwd)/backup:/backup" \
      busybox tar czf /backup/authority-keys-$(date +%Y%m%dT%H%M%SZ).tar.gz -C /keys .
    
  5. Checksum: generate SHA-256 digests for every file and store them alongside the artefacts.
  6. Encrypt & upload: wrap the backup folder using your secrets management standard (e.g., age, GPG) and upload to the designated offline vault.

Cold Backup (planned downtime)

  1. Notify stakeholders and drain traffic (CLI clients should refresh tokens afterwards).
  2. Stop services:
    docker compose -f ops/authority/docker-compose.authority.yaml down
    
  3. Back up volumes directly using tar:
    docker run --rm -v postgres-data:/data -v "$(pwd)/backup:/backup" \
      busybox tar czf /backup/postgres-data-$(date +%Y%m%d).tar.gz -C /data .
    docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
      busybox tar czf /backup/authority-keys-$(date +%Y%m%d).tar.gz -C /keys .
    
  4. Copy configuration + manifests as in the hot backup (steps 36).
  5. Restart services and verify health:
    docker compose -f ops/authority/docker-compose.authority.yaml up -d
    curl -fsS http://localhost:8080/ready
    

Restore Procedure

  1. Provision clean volumes: remove existing volumes if you're rebuilding a node (docker volume rm postgres-data authority-keys), then recreate the compose stack so empty volumes exist.
  2. Restore PostgreSQL:
    docker compose exec -T postgres pg_restore -d stellaops-authority --clean < backup/authority-YYYYMMDDTHHMMSSZ.dump
    
    Use --clean to drop existing objects before restoring; omit if doing a partial restore.
  3. Restore configuration/manifests: copy authority.yaml and authority.plugins/* into place before starting the Authority container.
  4. Restore signing keys: untar into the mounted volume:
    docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
      busybox tar xzf /backup/authority-keys-YYYYMMDD.tar.gz -C /keys
    
    Ensure file permissions remain 600 for private keys (chmod -R 600).
  5. Start services & validate:
    docker compose up -d
    curl -fsS http://localhost:8080/health
    
  6. Validate JWKS and tokens: call /jwks and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in docs/11_AUTHORITY.md using ops/authority/key-rotation.sh to invoke /internal/signing/rotate.

Disaster Recovery Notes

  • Air-gapped replication: replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning.
  • Retention: maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually.
  • Key compromise: if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see ops/authority/key-rotation.sh and docs/11_AUTHORITY.md), and publish a revocation notice.
  • PostgreSQL version: keep dump/restore images pinned to the deployment version (compose uses postgres:16). Npgsql 8.x requires PostgreSQL 12+—clusters still on older versions must be upgraded before restore.

Verification Checklist

  • /ready reports all identity providers ready.
  • OAuth flows issue tokens signed by the restored keys.
  • PluginRegistrationSummary logs expected providers on startup.
  • Revocation manifest export (dotnet run --project src/Authority/StellaOps.Authority) succeeds.
  • Monitoring dashboards show metrics resuming (see OPS5 deliverables).