Files
git.stella-ops.org/docs/ops/concelier-nkcki-operations.md

3.2 KiB
Raw Blame History

NKCKI Connector Operations Guide

Overview

The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each *.json.zip attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.

Configuration

Key options exposed through concelier:sources:ru-nkcki:http:

  • maxBulletinsPerFetch limits new bulletin downloads in a single run (default 5).
  • maxListingPagesPerFetch maximum listing pages visited during pagination (default 3).
  • listingCacheDuration minimum interval between listing fetches before falling back to cached artefacts (default 00:10:00).
  • cacheDirectory optional path for persisted bulletin archives used during offline or failure scenarios.
  • requestDelay delay inserted between bulletin downloads to respect upstream politeness.

When operating in offline-first mode, set cacheDirectory to a writable path (e.g. /var/lib/concelier/cache/ru-nkcki) and pre-populate bulletin archives via the offline kit.

Telemetry

RuNkckiDiagnostics emits the following metrics under meter StellaOps.Concelier.Connector.Ru.Nkcki:

  • nkcki.listing.fetch.attempts / nkcki.listing.fetch.success / nkcki.listing.fetch.failures
  • nkcki.listing.pages.visited (histogram, pages)
  • nkcki.listing.attachments.discovered / nkcki.listing.attachments.new
  • nkcki.bulletin.fetch.success / nkcki.bulletin.fetch.cached / nkcki.bulletin.fetch.failures
  • nkcki.entries.processed (histogram, entries)

Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.

Archive Backfill Strategy

Bitrix pagination surfaces archives via ?PAGEN_1=n. The connector now walks up to maxListingPagesPerFetch pages, deduplicating bulletin IDs and maintaining a rolling knownBulletins window. Backfill strategy:

  1. Enumerate pages from newest to oldest, respecting maxListingPagesPerFetch and listingCacheDuration to avoid refetch storms.
  2. Persist every *.json.zip attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
  3. During archive replay, ProcessCachedBulletinsAsync enqueues missing documents while respecting maxVulnerabilitiesPerFetch.
  4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in docs/concelier-connector-research-20251011.md).

For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.

Failure Handling

  • Listing failures mark the source state with exponential backoff while attempting cache replay.
  • Bulletin fetches fall back to cached copies before surfacing an error.
  • Mongo integration tests rely on bundled OpenSSL 1.1 libraries (tools/openssl/linux-x64) to keep Mongo2Go operational on modern distros.

Refer to ru-nkcki entries in src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md for outstanding items.