# NKCKI Connector Operations Guide ## Overview The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring. ## Configuration Key options exposed through `concelier:sources:ru-nkcki:http`: - `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`). - `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`). - `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`). - `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios. - `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness. When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit. ## Telemetry `RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`: - `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures` - `nkcki.listing.pages.visited` (histogram, `pages`) - `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new` - `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures` - `nkcki.entries.processed` (histogram, `entries`) Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates. ## Archive Backfill Strategy Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy: 1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms. 2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked. 3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`. 4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`). For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs. ## Failure Handling - Listing failures mark the source state with exponential backoff while attempting cache replay. - Bulletin fetches fall back to cached copies before surfacing an error. - Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros. Refer to `ru-nkcki` entries in `src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.