This commit is contained in:
StellaOps Bot
2025-11-23 23:40:10 +02:00
parent c13355923f
commit 029002ad05
93 changed files with 2160 additions and 285 deletions

View File

@@ -0,0 +1,32 @@
# LNM Backfill Plan (DEVOPS-LNM-22-001)
## Goal
Run staging backfill for advisory observations/linksets, validate counts/conflicts, and document rollout steps for production.
## Prereqs
- Concelier API CCLN0102 available (advisory/linkset endpoints stable).
- Staging Mongo snapshot taken (pre-backfill) and stored at `s3://staging-backups/concelier-pre-lnmbf.gz`.
- NATS/Redis staging brokers reachable.
## Steps
1) Seed snapshot
- Restore staging Mongo from pre-backfill snapshot.
2) Run backfill job
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=observations --batch-size=500 --max-conflicts=0`
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=linksets --batch-size=500 --max-conflicts=0`
3) Validate counts
- Compare `advisory_observations_total` and `linksets_total` vs expected inventory; export to `.artifacts/lnm-counts.json`.
- Check conflict log `.artifacts/lnm-conflicts.ndjson` (must be empty).
4) Events/NATS smoke
- Ensure `concelier.lnm.backfill.completed` emitted; verify Redis/NATS queues drained.
5) Roll-forward checklist
- Promote batch size to 2000 for prod, keep `--max-conflicts=0`.
- Schedule maintenance window, ensure snapshot available for rollback.
## Outputs
- `.artifacts/lnm-counts.json`
- `.artifacts/lnm-conflicts.ndjson` (empty)
- Log of job runtime + throughput.
## Acceptance
- Zero conflicts; counts match expected; events emitted; rollback plan documented.

View File

@@ -0,0 +1,24 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT=${ROOT:-$(cd "$(dirname "$0")/../.." && pwd)}
ARTifacts=${ARTifacts:-$ROOT/.artifacts}
COUNTS=$ARTifacts/lnm-counts.json
CONFLICTS=$ARTifacts/lnm-conflicts.ndjson
mkdir -p "$ARTifacts"
mongoexport --uri "${STAGING_MONGO_URI:?set STAGING_MONGO_URI}" --collection advisoryObservations --db concelier --type=json --query '{}' --out "$ARTifacts/obs.json" >/dev/null
mongoexport --uri "${STAGING_MONGO_URI:?set STAGING_MONGO_URI}" --collection linksets --db concelier --type=json --query '{}' --out "$ARTifacts/linksets.json" >/dev/null
OBS=$(jq length "$ARTifacts/obs.json")
LNK=$(jq length "$ARTifacts/linksets.json")
cat > "$COUNTS" <<JSON
{
"observations": $OBS,
"linksets": $LNK,
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
}
JSON
touch "$CONFLICTS"
echo "Counts written to $COUNTS; conflicts at $CONFLICTS"

View File

@@ -0,0 +1,11 @@
#!/usr/bin/env bash
set -euo pipefail
DASHBOARD=${1:-ops/devops/lnm/metrics-dashboard.json}
jq . "$DASHBOARD" >/dev/null
REQUIRED=("advisory_observations_total" "linksets_total" "ingest_api_latency_seconds_bucket" "lnm_backfill_processed_total")
for metric in "${REQUIRED[@]}"; do
if ! grep -q "$metric" "$DASHBOARD"; then
echo "::error::metric $metric missing from dashboard"; exit 1
fi
done
echo "dashboard metrics present"

View File

@@ -0,0 +1,9 @@
{
"title": "LNM Backfill Metrics",
"panels": [
{"type": "stat", "title": "Observations", "targets": [{"expr": "advisory_observations_total"}]},
{"type": "stat", "title": "Linksets", "targets": [{"expr": "linksets_total"}]},
{"type": "graph", "title": "Ingest→API latency p95", "targets": [{"expr": "histogram_quantile(0.95, rate(ingest_api_latency_seconds_bucket[5m]))"}]},
{"type": "graph", "title": "Backfill throughput", "targets": [{"expr": "rate(lnm_backfill_processed_total[5m])"}]}
]
}

View File

@@ -0,0 +1,20 @@
# VEX Backfill Plan (DEVOPS-LNM-22-002)
## Goal
Run VEX observation/linkset backfill with monitoring, ensure events flow via NATS/Redis, and capture run artifacts.
## Steps
1) Pre-checks
- Confirm DEVOPS-LNM-22-001 counts baseline (`.artifacts/lnm-counts.json`).
- Ensure `STAGING_MONGO_URI`, `NATS_URL`, `REDIS_URL` available (read-only or test brokers).
2) Run VEX backfill
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=vex --batch-size=500 --max-conflicts=0 --mongo $STAGING_MONGO_URI --nats $NATS_URL --redis $REDIS_URL`
3) Metrics capture
- Export per-run metrics to `.artifacts/vex-backfill-metrics.json` (duration, processed, conflicts, events emitted).
4) Event verification
- Subscribe to `concelier.vex.backfill.completed` and `concelier.linksets.vex.upserted`; ensure queues drained.
5) Roll-forward checklist
- Increase batch size to 2000 for prod; keep conflicts = 0; schedule maintenance window.
## Acceptance
- Zero conflicts; events observed; metrics file present; rollback plan documented.