3.2 KiB
NKCKI Connector Operations Guide
Overview
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each *.json.zip attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
Configuration
Key options exposed through concelier:sources:ru-nkcki:http:
maxBulletinsPerFetch– limits new bulletin downloads in a single run (default5).maxListingPagesPerFetch– maximum listing pages visited during pagination (default3).listingCacheDuration– minimum interval between listing fetches before falling back to cached artefacts (default00:10:00).cacheDirectory– optional path for persisted bulletin archives used during offline or failure scenarios.requestDelay– delay inserted between bulletin downloads to respect upstream politeness.
When operating in offline-first mode, set cacheDirectory to a writable path (e.g. /var/lib/concelier/cache/ru-nkcki) and pre-populate bulletin archives via the offline kit.
Telemetry
RuNkckiDiagnostics emits the following metrics under meter StellaOps.Concelier.Connector.Ru.Nkcki:
nkcki.listing.fetch.attempts/nkcki.listing.fetch.success/nkcki.listing.fetch.failuresnkcki.listing.pages.visited(histogram,pages)nkcki.listing.attachments.discovered/nkcki.listing.attachments.newnkcki.bulletin.fetch.success/nkcki.bulletin.fetch.cached/nkcki.bulletin.fetch.failuresnkcki.entries.processed(histogram,entries)
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
Archive Backfill Strategy
Bitrix pagination surfaces archives via ?PAGEN_1=n. The connector now walks up to maxListingPagesPerFetch pages, deduplicating bulletin IDs and maintaining a rolling knownBulletins window. Backfill strategy:
- Enumerate pages from newest to oldest, respecting
maxListingPagesPerFetchandlistingCacheDurationto avoid refetch storms. - Persist every
*.json.zipattachment to the configured cache directory. This enables replay when listing access is temporarily blocked. - During archive replay,
ProcessCachedBulletinsAsyncenqueues missing documents while respectingmaxVulnerabilitiesPerFetch. - For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in
docs/concelier-connector-research-20251011.md).
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
Failure Handling
- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (
tools/openssl/linux-x64) to keepMongo2Gooperational on modern distros.
Refer to ru-nkcki entries in src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md for outstanding items.