git.stella-ops.org/docs/ops/feedser-nkcki-operations.md

# NKCKI Connector Operations Guide

## Overview

The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.

## Configuration

Key options exposed through `feedser:sources:ru-nkcki:http`:

- `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`).
- `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`).
- `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
- `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios.
- `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness.

When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/feedser/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.

## Telemetry

`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Feedser.Source.Ru.Nkcki`:

- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
- `nkcki.listing.pages.visited` (histogram, `pages`)
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
- `nkcki.entries.processed` (histogram, `entries`)

Integrate these counters into standard Feedser observability dashboards to track crawl coverage and cache hit rates.

## Archive Backfill Strategy

Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:

1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/feedser-connector-research-20251011.md`).

For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.

## Failure Handling

- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.

Refer to `ru-nkcki` entries in `src/StellaOps.Feedser.Source.Ru.Nkcki/TASKS.md` for outstanding items.