Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
- Introduced `ui_bench_driver.mjs` to read scenarios and fixture manifest, generating a deterministic run plan. - Created `ui_bench_plan.md` outlining the purpose, scope, and next steps for the benchmark. - Added `ui_bench_scenarios.json` containing various scenarios for graph UI interactions. - Implemented tests for CLI commands, ensuring bundle verification and telemetry defaults. - Developed schemas for orchestrator components, including replay manifests and event envelopes. - Added mock API for risk management, including listing and statistics functionalities. - Implemented models for risk profiles and query options to support the new API.
51 lines
2.4 KiB
Markdown
51 lines
2.4 KiB
Markdown
# Supersedes backfill rollout plan (DEVOPS-AOC-19-101)
|
|
|
|
Scope: Concelier Link-Not-Merge backfill and supersedes processing once advisory_raw idempotency index is in staging.
|
|
|
|
## Preconditions
|
|
- Idempotency index verified in staging (`advisory_raw` duplicate inserts rejected; log hash recorded).
|
|
- LNM migrations 21-101/102 applied (shards, TTL, tombstones).
|
|
- Event transport to NATS/Redis disabled during backfill to avoid noisy downstream replays.
|
|
- Offline kit mirror includes current hashes for `advisory_raw` and backfill bundle.
|
|
|
|
## Rollout steps (staging → prod)
|
|
1) **Freeze window** (announce 24h prior)
|
|
- Pause Concelier ingest workers (`CONCELIER_INGEST_ENABLED=false`).
|
|
- Stop outbox publisher or point to blackhole NATS subject.
|
|
2) **Dry-run (staging)**
|
|
- Run backfill job with `--dry-run` to emit counts only.
|
|
- Verify: new supersedes records count == expected; no write errors; idempotency violations = 0.
|
|
- Capture logs + SHA256 of generated report.
|
|
3) **Prod execution**
|
|
- Run backfill job with `--batch-size=500` and `--stop-on-error`.
|
|
- Monitor: insert rate, error rate, Mongo oplog lag; target <5% CPU on primary.
|
|
4) **Validation**
|
|
- Run consistency check:
|
|
- `advisory_observations` count stable (no drop).
|
|
- Supersedes edges present for all prior conflicts.
|
|
- Idempotency index hit rate <0.1%.
|
|
- Run API spot check: `/advisories/summary` returns supersedes metadata; `advisory.linkset.updated` events absent during freeze.
|
|
5) **Unfreeze**
|
|
- Re-enable ingest + outbox publisher.
|
|
- Trigger single `advisory.observation.updated@1` replay to confirm event path is healthy.
|
|
|
|
## Rollback
|
|
- If errors >0 or idempotency violations observed:
|
|
- Stop job, keep ingest paused.
|
|
- Run rollback script `ops/devops/scripts/rollback-lnm-backfill.js` to remove supersedes/tombstones inserted in current window.
|
|
- Restore Mongo from last checkpointed snapshot if rollback script fails.
|
|
|
|
## Evidence to capture
|
|
- Job command + arguments.
|
|
- SHA256 of backfill bundle and report.
|
|
- Idempotency violation count.
|
|
- Post-run consistency report (JSON) stored under `ops/devops/artifacts/aoc-supersedes/<timestamp>/`.
|
|
|
|
## Monitoring/Alerts
|
|
- Add temporary Grafana panel for idempotency violations and Mongo ops/sec during job.
|
|
- Alert if job runtime exceeds 2h or if oplog lag > 60s.
|
|
|
|
## Owners
|
|
- Run: DevOps Guild
|
|
- Approvals: Concelier Storage Guild + Platform Security
|