49 lines
		
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			49 lines
		
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# NKCKI Connector Operations Guide
 | 
						||
 | 
						||
## Overview
 | 
						||
 | 
						||
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
 | 
						||
 | 
						||
## Configuration
 | 
						||
 | 
						||
Key options exposed through `concelier:sources:ru-nkcki:http`:
 | 
						||
 | 
						||
- `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`).
 | 
						||
- `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`).
 | 
						||
- `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
 | 
						||
- `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios.
 | 
						||
- `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness.
 | 
						||
 | 
						||
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
 | 
						||
 | 
						||
## Telemetry
 | 
						||
 | 
						||
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
 | 
						||
 | 
						||
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
 | 
						||
- `nkcki.listing.pages.visited` (histogram, `pages`)
 | 
						||
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
 | 
						||
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
 | 
						||
- `nkcki.entries.processed` (histogram, `entries`)
 | 
						||
 | 
						||
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
 | 
						||
 | 
						||
## Archive Backfill Strategy
 | 
						||
 | 
						||
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
 | 
						||
 | 
						||
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
 | 
						||
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
 | 
						||
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
 | 
						||
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
 | 
						||
 | 
						||
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
 | 
						||
 | 
						||
## Failure Handling
 | 
						||
 | 
						||
- Listing failures mark the source state with exponential backoff while attempting cache replay.
 | 
						||
- Bulletin fetches fall back to cached copies before surfacing an error.
 | 
						||
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
 | 
						||
 | 
						||
Refer to `ru-nkcki` entries in `src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.
 |