work
This commit is contained in:
32
ops/devops/lnm/backfill-plan.md
Normal file
32
ops/devops/lnm/backfill-plan.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# LNM Backfill Plan (DEVOPS-LNM-22-001)
|
||||
|
||||
## Goal
|
||||
Run staging backfill for advisory observations/linksets, validate counts/conflicts, and document rollout steps for production.
|
||||
|
||||
## Prereqs
|
||||
- Concelier API CCLN0102 available (advisory/linkset endpoints stable).
|
||||
- Staging Mongo snapshot taken (pre-backfill) and stored at `s3://staging-backups/concelier-pre-lnmbf.gz`.
|
||||
- NATS/Redis staging brokers reachable.
|
||||
|
||||
## Steps
|
||||
1) Seed snapshot
|
||||
- Restore staging Mongo from pre-backfill snapshot.
|
||||
2) Run backfill job
|
||||
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=observations --batch-size=500 --max-conflicts=0`
|
||||
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=linksets --batch-size=500 --max-conflicts=0`
|
||||
3) Validate counts
|
||||
- Compare `advisory_observations_total` and `linksets_total` vs expected inventory; export to `.artifacts/lnm-counts.json`.
|
||||
- Check conflict log `.artifacts/lnm-conflicts.ndjson` (must be empty).
|
||||
4) Events/NATS smoke
|
||||
- Ensure `concelier.lnm.backfill.completed` emitted; verify Redis/NATS queues drained.
|
||||
5) Roll-forward checklist
|
||||
- Promote batch size to 2000 for prod, keep `--max-conflicts=0`.
|
||||
- Schedule maintenance window, ensure snapshot available for rollback.
|
||||
|
||||
## Outputs
|
||||
- `.artifacts/lnm-counts.json`
|
||||
- `.artifacts/lnm-conflicts.ndjson` (empty)
|
||||
- Log of job runtime + throughput.
|
||||
|
||||
## Acceptance
|
||||
- Zero conflicts; counts match expected; events emitted; rollback plan documented.
|
||||
24
ops/devops/lnm/backfill-validation.sh
Normal file
24
ops/devops/lnm/backfill-validation.sh
Normal file
@@ -0,0 +1,24 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
ROOT=${ROOT:-$(cd "$(dirname "$0")/../.." && pwd)}
|
||||
ARTifacts=${ARTifacts:-$ROOT/.artifacts}
|
||||
COUNTS=$ARTifacts/lnm-counts.json
|
||||
CONFLICTS=$ARTifacts/lnm-conflicts.ndjson
|
||||
mkdir -p "$ARTifacts"
|
||||
|
||||
mongoexport --uri "${STAGING_MONGO_URI:?set STAGING_MONGO_URI}" --collection advisoryObservations --db concelier --type=json --query '{}' --out "$ARTifacts/obs.json" >/dev/null
|
||||
mongoexport --uri "${STAGING_MONGO_URI:?set STAGING_MONGO_URI}" --collection linksets --db concelier --type=json --query '{}' --out "$ARTifacts/linksets.json" >/dev/null
|
||||
|
||||
OBS=$(jq length "$ARTifacts/obs.json")
|
||||
LNK=$(jq length "$ARTifacts/linksets.json")
|
||||
|
||||
cat > "$COUNTS" <<JSON
|
||||
{
|
||||
"observations": $OBS,
|
||||
"linksets": $LNK,
|
||||
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
}
|
||||
JSON
|
||||
|
||||
touch "$CONFLICTS"
|
||||
echo "Counts written to $COUNTS; conflicts at $CONFLICTS"
|
||||
11
ops/devops/lnm/metrics-ci-check.sh
Normal file
11
ops/devops/lnm/metrics-ci-check.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
DASHBOARD=${1:-ops/devops/lnm/metrics-dashboard.json}
|
||||
jq . "$DASHBOARD" >/dev/null
|
||||
REQUIRED=("advisory_observations_total" "linksets_total" "ingest_api_latency_seconds_bucket" "lnm_backfill_processed_total")
|
||||
for metric in "${REQUIRED[@]}"; do
|
||||
if ! grep -q "$metric" "$DASHBOARD"; then
|
||||
echo "::error::metric $metric missing from dashboard"; exit 1
|
||||
fi
|
||||
done
|
||||
echo "dashboard metrics present"
|
||||
9
ops/devops/lnm/metrics-dashboard.json
Normal file
9
ops/devops/lnm/metrics-dashboard.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"title": "LNM Backfill Metrics",
|
||||
"panels": [
|
||||
{"type": "stat", "title": "Observations", "targets": [{"expr": "advisory_observations_total"}]},
|
||||
{"type": "stat", "title": "Linksets", "targets": [{"expr": "linksets_total"}]},
|
||||
{"type": "graph", "title": "Ingest→API latency p95", "targets": [{"expr": "histogram_quantile(0.95, rate(ingest_api_latency_seconds_bucket[5m]))"}]},
|
||||
{"type": "graph", "title": "Backfill throughput", "targets": [{"expr": "rate(lnm_backfill_processed_total[5m])"}]}
|
||||
]
|
||||
}
|
||||
20
ops/devops/lnm/vex-backfill-plan.md
Normal file
20
ops/devops/lnm/vex-backfill-plan.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# VEX Backfill Plan (DEVOPS-LNM-22-002)
|
||||
|
||||
## Goal
|
||||
Run VEX observation/linkset backfill with monitoring, ensure events flow via NATS/Redis, and capture run artifacts.
|
||||
|
||||
## Steps
|
||||
1) Pre-checks
|
||||
- Confirm DEVOPS-LNM-22-001 counts baseline (`.artifacts/lnm-counts.json`).
|
||||
- Ensure `STAGING_MONGO_URI`, `NATS_URL`, `REDIS_URL` available (read-only or test brokers).
|
||||
2) Run VEX backfill
|
||||
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=vex --batch-size=500 --max-conflicts=0 --mongo $STAGING_MONGO_URI --nats $NATS_URL --redis $REDIS_URL`
|
||||
3) Metrics capture
|
||||
- Export per-run metrics to `.artifacts/vex-backfill-metrics.json` (duration, processed, conflicts, events emitted).
|
||||
4) Event verification
|
||||
- Subscribe to `concelier.vex.backfill.completed` and `concelier.linksets.vex.upserted`; ensure queues drained.
|
||||
5) Roll-forward checklist
|
||||
- Increase batch size to 2000 for prod; keep conflicts = 0; schedule maintenance window.
|
||||
|
||||
## Acceptance
|
||||
- Zero conflicts; events observed; metrics file present; rollback plan documented.
|
||||
Reference in New Issue
Block a user