Files
git.stella-ops.org/devops/services/aoc/supersedes-rollout.md
2025-12-26 18:11:06 +02:00

2.4 KiB

Supersedes backfill rollout plan (DEVOPS-AOC-19-101)

Scope: Concelier Link-Not-Merge backfill and supersedes processing once advisory_raw idempotency index is in staging.

Preconditions

  • Idempotency index verified in staging (advisory_raw duplicate inserts rejected; log hash recorded).
  • LNM migrations 21-101/102 applied (shards, TTL, tombstones).
  • Event transport to NATS/Redis disabled during backfill to avoid noisy downstream replays.
  • Offline kit mirror includes current hashes for advisory_raw and backfill bundle.

Rollout steps (staging → prod)

  1. Freeze window (announce 24h prior)
    • Pause Concelier ingest workers (CONCELIER_INGEST_ENABLED=false).
    • Stop outbox publisher or point to blackhole NATS subject.
  2. Dry-run (staging)
    • Run backfill job with --dry-run to emit counts only.
    • Verify: new supersedes records count == expected; no write errors; idempotency violations = 0.
    • Capture logs + SHA256 of generated report.
  3. Prod execution
    • Run backfill job with --batch-size=500 and --stop-on-error.
    • Monitor: insert rate, error rate, Mongo oplog lag; target <5% CPU on primary.
  4. Validation
    • Run consistency check:
      • advisory_observations count stable (no drop).
      • Supersedes edges present for all prior conflicts.
      • Idempotency index hit rate <0.1%.
    • Run API spot check: /advisories/summary returns supersedes metadata; advisory.linkset.updated events absent during freeze.
  5. Unfreeze
    • Re-enable ingest + outbox publisher.
    • Trigger single advisory.observation.updated@1 replay to confirm event path is healthy.

Rollback

  • If errors >0 or idempotency violations observed:
    • Stop job, keep ingest paused.
    • Run rollback script ops/devops/scripts/rollback-lnm-backfill.js to remove supersedes/tombstones inserted in current window.
    • Restore Mongo from last checkpointed snapshot if rollback script fails.

Evidence to capture

  • Job command + arguments.
  • SHA256 of backfill bundle and report.
  • Idempotency violation count.
  • Post-run consistency report (JSON) stored under ops/devops/artifacts/aoc-supersedes/<timestamp>/.

Monitoring/Alerts

  • Add temporary Grafana panel for idempotency violations and Mongo ops/sec during job.
  • Alert if job runtime exceeds 2h or if oplog lag > 60s.

Owners

  • Run: DevOps Guild
  • Approvals: Concelier Storage Guild + Platform Security