CD/CD consolidation

This commit is contained in:
StellaOps Bot
2025-12-26 17:32:23 +02:00
parent a866eb6277
commit c786faae84
638 changed files with 3821 additions and 181 deletions

View File

@@ -0,0 +1,50 @@
# Supersedes backfill rollout plan (DEVOPS-AOC-19-101)
Scope: Concelier Link-Not-Merge backfill and supersedes processing once advisory_raw idempotency index is in staging.
## Preconditions
- Idempotency index verified in staging (`advisory_raw` duplicate inserts rejected; log hash recorded).
- LNM migrations 21-101/102 applied (shards, TTL, tombstones).
- Event transport to NATS/Redis disabled during backfill to avoid noisy downstream replays.
- Offline kit mirror includes current hashes for `advisory_raw` and backfill bundle.
## Rollout steps (staging → prod)
1) **Freeze window** (announce 24h prior)
- Pause Concelier ingest workers (`CONCELIER_INGEST_ENABLED=false`).
- Stop outbox publisher or point to blackhole NATS subject.
2) **Dry-run (staging)**
- Run backfill job with `--dry-run` to emit counts only.
- Verify: new supersedes records count == expected; no write errors; idempotency violations = 0.
- Capture logs + SHA256 of generated report.
3) **Prod execution**
- Run backfill job with `--batch-size=500` and `--stop-on-error`.
- Monitor: insert rate, error rate, Mongo oplog lag; target <5% CPU on primary.
4) **Validation**
- Run consistency check:
- `advisory_observations` count stable (no drop).
- Supersedes edges present for all prior conflicts.
- Idempotency index hit rate <0.1%.
- Run API spot check: `/advisories/summary` returns supersedes metadata; `advisory.linkset.updated` events absent during freeze.
5) **Unfreeze**
- Re-enable ingest + outbox publisher.
- Trigger single `advisory.observation.updated@1` replay to confirm event path is healthy.
## Rollback
- If errors >0 or idempotency violations observed:
- Stop job, keep ingest paused.
- Run rollback script `ops/devops/scripts/rollback-lnm-backfill.js` to remove supersedes/tombstones inserted in current window.
- Restore Mongo from last checkpointed snapshot if rollback script fails.
## Evidence to capture
- Job command + arguments.
- SHA256 of backfill bundle and report.
- Idempotency violation count.
- Post-run consistency report (JSON) stored under `ops/devops/artifacts/aoc-supersedes/<timestamp>/`.
## Monitoring/Alerts
- Add temporary Grafana panel for idempotency violations and Mongo ops/sec during job.
- Alert if job runtime exceeds 2h or if oplog lag > 60s.
## Owners
- Run: DevOps Guild
- Approvals: Concelier Storage Guild + Platform Security