CD/CD consolidation
This commit is contained in:
57
devops/tools/lnm/alerts/lnm-alerts.yaml
Normal file
57
devops/tools/lnm/alerts/lnm-alerts.yaml
Normal file
@@ -0,0 +1,57 @@
|
||||
# LNM Migration Alert Rules
|
||||
# Prometheus alerting rules for linkset/advisory migrations
|
||||
|
||||
groups:
|
||||
- name: lnm-migration
|
||||
rules:
|
||||
- alert: LnmMigrationErrorRate
|
||||
expr: rate(lnm_migration_errors_total[5m]) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
team: concelier
|
||||
annotations:
|
||||
summary: "LNM migration error rate elevated"
|
||||
description: "Migration errors: {{ $value | printf \"%.2f\" }}/s"
|
||||
|
||||
- alert: LnmBackfillStalled
|
||||
expr: increase(lnm_backfill_processed_total[10m]) == 0 and lnm_backfill_running == 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
team: concelier
|
||||
annotations:
|
||||
summary: "LNM backfill stalled"
|
||||
description: "No progress in 10 minutes while backfill is running"
|
||||
|
||||
- alert: LnmLinksetCountMismatch
|
||||
expr: abs(lnm_linksets_total - lnm_linksets_expected) > 100
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
team: concelier
|
||||
annotations:
|
||||
summary: "Linkset count mismatch"
|
||||
description: "Expected {{ $labels.expected }}, got {{ $value }}"
|
||||
|
||||
- alert: LnmObservationsBacklogHigh
|
||||
expr: lnm_observations_backlog > 10000
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
team: excititor
|
||||
annotations:
|
||||
summary: "Advisory observations backlog high"
|
||||
description: "Backlog: {{ $value }} items"
|
||||
|
||||
- name: lnm-sla
|
||||
rules:
|
||||
- alert: LnmIngestToApiLatencyHigh
|
||||
expr: histogram_quantile(0.95, rate(lnm_ingest_to_api_latency_seconds_bucket[5m])) > 30
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
team: platform
|
||||
annotations:
|
||||
summary: "Ingest to API latency exceeds SLA"
|
||||
description: "P95 latency: {{ $value | printf \"%.1f\" }}s (SLA: 30s)"
|
||||
32
devops/tools/lnm/backfill-plan.md
Normal file
32
devops/tools/lnm/backfill-plan.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# LNM Backfill Plan (DEVOPS-LNM-22-001)
|
||||
|
||||
## Goal
|
||||
Run staging backfill for advisory observations/linksets, validate counts/conflicts, and document rollout steps for production.
|
||||
|
||||
## Prereqs
|
||||
- Concelier API CCLN0102 available (advisory/linkset endpoints stable).
|
||||
- Staging Mongo snapshot taken (pre-backfill) and stored at `s3://staging-backups/concelier-pre-lnmbf.gz`.
|
||||
- NATS/Redis staging brokers reachable.
|
||||
|
||||
## Steps
|
||||
1) Seed snapshot
|
||||
- Restore staging Mongo from pre-backfill snapshot.
|
||||
2) Run backfill job
|
||||
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=observations --batch-size=500 --max-conflicts=0`
|
||||
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=linksets --batch-size=500 --max-conflicts=0`
|
||||
3) Validate counts
|
||||
- Compare `advisory_observations_total` and `linksets_total` vs expected inventory; export to `.artifacts/lnm-counts.json`.
|
||||
- Check conflict log `.artifacts/lnm-conflicts.ndjson` (must be empty).
|
||||
4) Events/NATS smoke
|
||||
- Ensure `concelier.lnm.backfill.completed` emitted; verify Redis/NATS queues drained.
|
||||
5) Roll-forward checklist
|
||||
- Promote batch size to 2000 for prod, keep `--max-conflicts=0`.
|
||||
- Schedule maintenance window, ensure snapshot available for rollback.
|
||||
|
||||
## Outputs
|
||||
- `.artifacts/lnm-counts.json`
|
||||
- `.artifacts/lnm-conflicts.ndjson` (empty)
|
||||
- Log of job runtime + throughput.
|
||||
|
||||
## Acceptance
|
||||
- Zero conflicts; counts match expected; events emitted; rollback plan documented.
|
||||
24
devops/tools/lnm/backfill-validation.sh
Normal file
24
devops/tools/lnm/backfill-validation.sh
Normal file
@@ -0,0 +1,24 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
ROOT=${ROOT:-$(cd "$(dirname "$0")/../.." && pwd)}
|
||||
ARTifacts=${ARTifacts:-$ROOT/.artifacts}
|
||||
COUNTS=$ARTifacts/lnm-counts.json
|
||||
CONFLICTS=$ARTifacts/lnm-conflicts.ndjson
|
||||
mkdir -p "$ARTifacts"
|
||||
|
||||
mongoexport --uri "${STAGING_MONGO_URI:?set STAGING_MONGO_URI}" --collection advisoryObservations --db concelier --type=json --query '{}' --out "$ARTifacts/obs.json" >/dev/null
|
||||
mongoexport --uri "${STAGING_MONGO_URI:?set STAGING_MONGO_URI}" --collection linksets --db concelier --type=json --query '{}' --out "$ARTifacts/linksets.json" >/dev/null
|
||||
|
||||
OBS=$(jq length "$ARTifacts/obs.json")
|
||||
LNK=$(jq length "$ARTifacts/linksets.json")
|
||||
|
||||
cat > "$COUNTS" <<JSON
|
||||
{
|
||||
"observations": $OBS,
|
||||
"linksets": $LNK,
|
||||
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
}
|
||||
JSON
|
||||
|
||||
touch "$CONFLICTS"
|
||||
echo "Counts written to $COUNTS; conflicts at $CONFLICTS"
|
||||
51
devops/tools/lnm/dashboards/lnm-migration.json
Normal file
51
devops/tools/lnm/dashboards/lnm-migration.json
Normal file
@@ -0,0 +1,51 @@
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "LNM Migration Dashboard",
|
||||
"uid": "lnm-migration",
|
||||
"tags": ["lnm", "migration", "concelier", "excititor"],
|
||||
"timezone": "utc",
|
||||
"refresh": "30s",
|
||||
"panels": [
|
||||
{
|
||||
"title": "Migration Progress",
|
||||
"type": "stat",
|
||||
"gridPos": {"x": 0, "y": 0, "w": 6, "h": 4},
|
||||
"targets": [
|
||||
{"expr": "lnm_backfill_processed_total", "legendFormat": "Processed"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Error Rate",
|
||||
"type": "graph",
|
||||
"gridPos": {"x": 6, "y": 0, "w": 12, "h": 4},
|
||||
"targets": [
|
||||
{"expr": "rate(lnm_migration_errors_total[5m])", "legendFormat": "Errors/s"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Linksets Total",
|
||||
"type": "stat",
|
||||
"gridPos": {"x": 18, "y": 0, "w": 6, "h": 4},
|
||||
"targets": [
|
||||
{"expr": "lnm_linksets_total", "legendFormat": "Total"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Observations Backlog",
|
||||
"type": "graph",
|
||||
"gridPos": {"x": 0, "y": 4, "w": 12, "h": 6},
|
||||
"targets": [
|
||||
{"expr": "lnm_observations_backlog", "legendFormat": "Backlog"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Ingest to API Latency (P95)",
|
||||
"type": "graph",
|
||||
"gridPos": {"x": 12, "y": 4, "w": 12, "h": 6},
|
||||
"targets": [
|
||||
{"expr": "histogram_quantile(0.95, rate(lnm_ingest_to_api_latency_seconds_bucket[5m]))", "legendFormat": "P95"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
11
devops/tools/lnm/metrics-ci-check.sh
Normal file
11
devops/tools/lnm/metrics-ci-check.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
DASHBOARD=${1:-ops/devops/lnm/metrics-dashboard.json}
|
||||
jq . "$DASHBOARD" >/dev/null
|
||||
REQUIRED=("advisory_observations_total" "linksets_total" "ingest_api_latency_seconds_bucket" "lnm_backfill_processed_total")
|
||||
for metric in "${REQUIRED[@]}"; do
|
||||
if ! grep -q "$metric" "$DASHBOARD"; then
|
||||
echo "::error::metric $metric missing from dashboard"; exit 1
|
||||
fi
|
||||
done
|
||||
echo "dashboard metrics present"
|
||||
9
devops/tools/lnm/metrics-dashboard.json
Normal file
9
devops/tools/lnm/metrics-dashboard.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"title": "LNM Backfill Metrics",
|
||||
"panels": [
|
||||
{"type": "stat", "title": "Observations", "targets": [{"expr": "advisory_observations_total"}]},
|
||||
{"type": "stat", "title": "Linksets", "targets": [{"expr": "linksets_total"}]},
|
||||
{"type": "graph", "title": "Ingest→API latency p95", "targets": [{"expr": "histogram_quantile(0.95, rate(ingest_api_latency_seconds_bucket[5m]))"}]},
|
||||
{"type": "graph", "title": "Backfill throughput", "targets": [{"expr": "rate(lnm_backfill_processed_total[5m])"}]}
|
||||
]
|
||||
}
|
||||
92
devops/tools/lnm/package-runner.sh
Normal file
92
devops/tools/lnm/package-runner.sh
Normal file
@@ -0,0 +1,92 @@
|
||||
#!/usr/bin/env bash
|
||||
# Package LNM migration runner for release/offline kit
|
||||
# Usage: ./package-runner.sh
|
||||
# Dev mode: COSIGN_ALLOW_DEV_KEY=1 COSIGN_PASSWORD=stellaops-dev ./package-runner.sh
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
ROOT=$(cd "$(dirname "$0")/../../.." && pwd)
|
||||
OUT_DIR="${OUT_DIR:-$ROOT/out/lnm}"
|
||||
CREATED="${CREATED:-$(date -u +%Y-%m-%dT%H:%M:%SZ)}"
|
||||
|
||||
mkdir -p "$OUT_DIR/runner"
|
||||
|
||||
echo "==> LNM Migration Runner Packaging"
|
||||
|
||||
# Key resolution
|
||||
resolve_key() {
|
||||
if [[ -n "${COSIGN_PRIVATE_KEY_B64:-}" ]]; then
|
||||
local tmp_key="$OUT_DIR/.cosign.key"
|
||||
echo "$COSIGN_PRIVATE_KEY_B64" | base64 -d > "$tmp_key"
|
||||
chmod 600 "$tmp_key"
|
||||
echo "$tmp_key"
|
||||
elif [[ -f "$ROOT/tools/cosign/cosign.key" ]]; then
|
||||
echo "$ROOT/tools/cosign/cosign.key"
|
||||
elif [[ "${COSIGN_ALLOW_DEV_KEY:-0}" == "1" && -f "$ROOT/tools/cosign/cosign.dev.key" ]]; then
|
||||
echo "[info] Using development key" >&2
|
||||
echo "$ROOT/tools/cosign/cosign.dev.key"
|
||||
else
|
||||
echo ""
|
||||
fi
|
||||
}
|
||||
|
||||
# Build migration runner if project exists
|
||||
MIGRATION_PROJECT="$ROOT/src/Concelier/__Libraries/StellaOps.Concelier.Migrations/StellaOps.Concelier.Migrations.csproj"
|
||||
if [[ -f "$MIGRATION_PROJECT" ]]; then
|
||||
echo "==> Building migration runner..."
|
||||
dotnet publish "$MIGRATION_PROJECT" -c Release -o "$OUT_DIR/runner" --no-restore 2>/dev/null || \
|
||||
echo "[info] Build skipped (may need restore or project doesn't exist yet)"
|
||||
else
|
||||
echo "[info] Migration project not found; creating placeholder"
|
||||
cat > "$OUT_DIR/runner/README.txt" <<EOF
|
||||
LNM Migration Runner Placeholder
|
||||
Build from: src/Concelier/__Libraries/StellaOps.Concelier.Migrations/
|
||||
Created: $CREATED
|
||||
Status: Awaiting upstream migration project
|
||||
EOF
|
||||
fi
|
||||
|
||||
# Create runner bundle
|
||||
echo "==> Creating runner bundle..."
|
||||
RUNNER_TAR="$OUT_DIR/lnm-migration-runner.tar.gz"
|
||||
tar -czf "$RUNNER_TAR" -C "$OUT_DIR/runner" .
|
||||
|
||||
# Compute hash
|
||||
sha256() { sha256sum "$1" | awk '{print $1}'; }
|
||||
RUNNER_HASH=$(sha256 "$RUNNER_TAR")
|
||||
|
||||
# Generate manifest
|
||||
MANIFEST="$OUT_DIR/lnm-migration-runner.manifest.json"
|
||||
cat > "$MANIFEST" <<EOF
|
||||
{
|
||||
"schemaVersion": "1.0.0",
|
||||
"created": "$CREATED",
|
||||
"runner": {
|
||||
"path": "lnm-migration-runner.tar.gz",
|
||||
"sha256": "$RUNNER_HASH"
|
||||
},
|
||||
"migrations": {
|
||||
"22-001": {"status": "infrastructure-ready", "description": "Advisory observations/linksets staging"},
|
||||
"22-002": {"status": "infrastructure-ready", "description": "VEX observation/linkset backfill"},
|
||||
"22-003": {"status": "infrastructure-ready", "description": "Metrics monitoring"}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Sign if key available
|
||||
KEY_FILE=$(resolve_key)
|
||||
if [[ -n "$KEY_FILE" ]] && command -v cosign &>/dev/null; then
|
||||
echo "==> Signing bundle..."
|
||||
COSIGN_PASSWORD="${COSIGN_PASSWORD:-}" cosign sign-blob \
|
||||
--key "$KEY_FILE" \
|
||||
--bundle "$OUT_DIR/lnm-migration-runner.dsse.json" \
|
||||
--tlog-upload=false --yes "$RUNNER_TAR" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Generate checksums
|
||||
cd "$OUT_DIR"
|
||||
sha256sum lnm-migration-runner.tar.gz lnm-migration-runner.manifest.json > SHA256SUMS
|
||||
|
||||
echo "==> LNM runner packaging complete"
|
||||
echo " Bundle: $RUNNER_TAR"
|
||||
echo " Manifest: $MANIFEST"
|
||||
53
devops/tools/lnm/tooling-infrastructure.md
Normal file
53
devops/tools/lnm/tooling-infrastructure.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# LNM (Link-Not-Merge) Tooling Infrastructure
|
||||
|
||||
## Scope (DEVOPS-LNM-TOOLING-22-000)
|
||||
Package and tooling for linkset/advisory migrations across Concelier and Excititor.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Migration Runner
|
||||
Location: `src/Concelier/__Libraries/StellaOps.Concelier.Migrations/`
|
||||
|
||||
```bash
|
||||
# Build migration runner
|
||||
dotnet publish src/Concelier/__Libraries/StellaOps.Concelier.Migrations \
|
||||
-c Release -o out/lnm/runner
|
||||
|
||||
# Package
|
||||
./ops/devops/lnm/package-runner.sh
|
||||
```
|
||||
|
||||
### 2. Backfill Tool
|
||||
Location: `src/Concelier/StellaOps.Concelier.Backfill/` (when available)
|
||||
|
||||
```bash
|
||||
# Dev mode backfill with sample data
|
||||
COSIGN_ALLOW_DEV_KEY=1 ./ops/devops/lnm/run-backfill.sh --dry-run
|
||||
|
||||
# Production backfill
|
||||
./ops/devops/lnm/run-backfill.sh --batch-size=500
|
||||
```
|
||||
|
||||
### 3. Monitoring Dashboard
|
||||
- Grafana dashboard: `ops/devops/lnm/dashboards/lnm-migration.json`
|
||||
- Alert rules: `ops/devops/lnm/alerts/lnm-alerts.yaml`
|
||||
|
||||
## CI Workflows
|
||||
|
||||
| Workflow | Purpose |
|
||||
|----------|---------|
|
||||
| `lnm-migration-ci.yml` | Build/test migration runner |
|
||||
| `lnm-backfill-staging.yml` | Run backfill in staging |
|
||||
| `lnm-metrics-ci.yml` | Validate migration metrics |
|
||||
|
||||
## Outputs
|
||||
- `out/lnm/runner/` - Migration runner binaries
|
||||
- `out/lnm/backfill-report.json` - Backfill results
|
||||
- `out/lnm/SHA256SUMS` - Checksums
|
||||
|
||||
## Status
|
||||
- [x] Infrastructure plan created
|
||||
- [ ] Migration runner project (awaiting upstream)
|
||||
- [ ] Backfill tool (awaiting upstream)
|
||||
- [x] CI workflow templates ready
|
||||
- [x] Monitoring templates ready
|
||||
20
devops/tools/lnm/vex-backfill-plan.md
Normal file
20
devops/tools/lnm/vex-backfill-plan.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# VEX Backfill Plan (DEVOPS-LNM-22-002)
|
||||
|
||||
## Goal
|
||||
Run VEX observation/linkset backfill with monitoring, ensure events flow via NATS/Redis, and capture run artifacts.
|
||||
|
||||
## Steps
|
||||
1) Pre-checks
|
||||
- Confirm DEVOPS-LNM-22-001 counts baseline (`.artifacts/lnm-counts.json`).
|
||||
- Ensure `STAGING_MONGO_URI`, `NATS_URL`, `REDIS_URL` available (read-only or test brokers).
|
||||
2) Run VEX backfill
|
||||
- `dotnet run --project src/Concelier/StellaOps.Concelier.Backfill -- --mode=vex --batch-size=500 --max-conflicts=0 --mongo $STAGING_MONGO_URI --nats $NATS_URL --redis $REDIS_URL`
|
||||
3) Metrics capture
|
||||
- Export per-run metrics to `.artifacts/vex-backfill-metrics.json` (duration, processed, conflicts, events emitted).
|
||||
4) Event verification
|
||||
- Subscribe to `concelier.vex.backfill.completed` and `concelier.linksets.vex.upserted`; ensure queues drained.
|
||||
5) Roll-forward checklist
|
||||
- Increase batch size to 2000 for prod; keep conflicts = 0; schedule maintenance window.
|
||||
|
||||
## Acceptance
|
||||
- Zero conflicts; events observed; metrics file present; rollback plan documented.
|
||||
Reference in New Issue
Block a user