49 lines
		
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			49 lines
		
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # NKCKI Connector Operations Guide
 | ||
| 
 | ||
| ## Overview
 | ||
| 
 | ||
| The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
 | ||
| 
 | ||
| ## Configuration
 | ||
| 
 | ||
| Key options exposed through `concelier:sources:ru-nkcki:http`:
 | ||
| 
 | ||
| - `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`).
 | ||
| - `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`).
 | ||
| - `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
 | ||
| - `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios.
 | ||
| - `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness.
 | ||
| 
 | ||
| When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
 | ||
| 
 | ||
| ## Telemetry
 | ||
| 
 | ||
| `RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
 | ||
| 
 | ||
| - `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
 | ||
| - `nkcki.listing.pages.visited` (histogram, `pages`)
 | ||
| - `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
 | ||
| - `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
 | ||
| - `nkcki.entries.processed` (histogram, `entries`)
 | ||
| 
 | ||
| Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
 | ||
| 
 | ||
| ## Archive Backfill Strategy
 | ||
| 
 | ||
| Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
 | ||
| 
 | ||
| 1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
 | ||
| 2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
 | ||
| 3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
 | ||
| 4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
 | ||
| 
 | ||
| For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
 | ||
| 
 | ||
| ## Failure Handling
 | ||
| 
 | ||
| - Listing failures mark the source state with exponential backoff while attempting cache replay.
 | ||
| - Bulletin fetches fall back to cached copies before surfacing an error.
 | ||
| - Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
 | ||
| 
 | ||
| Refer to `ru-nkcki` entries in `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.
 |