49 lines
3.2 KiB
Markdown
49 lines
3.2 KiB
Markdown
# NKCKI Connector Operations Guide
|
||
|
||
## Overview
|
||
|
||
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
|
||
|
||
## Configuration
|
||
|
||
Key options exposed through `concelier:sources:ru-nkcki:http`:
|
||
|
||
- `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`).
|
||
- `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`).
|
||
- `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
|
||
- `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios.
|
||
- `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness.
|
||
|
||
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
|
||
|
||
## Telemetry
|
||
|
||
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
|
||
|
||
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
|
||
- `nkcki.listing.pages.visited` (histogram, `pages`)
|
||
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
|
||
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
|
||
- `nkcki.entries.processed` (histogram, `entries`)
|
||
|
||
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
|
||
|
||
## Archive Backfill Strategy
|
||
|
||
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
|
||
|
||
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
|
||
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
|
||
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
|
||
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
|
||
|
||
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
|
||
|
||
## Failure Handling
|
||
|
||
- Listing failures mark the source state with exponential backoff while attempting cache replay.
|
||
- Bulletin fetches fall back to cached copies before surfacing an error.
|
||
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
|
||
|
||
Refer to `ru-nkcki` entries in `src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.
|