diff --git a/deploy/helm/stellaops/README-mock.md b/deploy/helm/stellaops/README-mock.md new file mode 100644 index 000000000..2683f1665 --- /dev/null +++ b/deploy/helm/stellaops/README-mock.md @@ -0,0 +1,16 @@ +# Mock Overlay (Dev Only) + +Purpose: let deployment tasks progress with placeholder digests until real releases land. + +Use: +```bash +helm template mock ./deploy/helm/stellaops -f deploy/helm/stellaops/values-mock.yaml +``` + +Contents: +- Mock deployments for orchestrator, policy-registry, packs-registry, task-runner, VEX Lens, issuer-directory, findings-ledger, vuln-explorer-api. +- Image pins pulled from `deploy/releases/2025.09-mock-dev.yaml`. + +Notes: +- Annotated with `stellaops.dev/mock: "true"` to discourage production use. +- Swap to real values once official digests publish; keep mock overlay gated behind `mock.enabled`. diff --git a/docs/implplan/SPRINT_0190_0001_0001_cvss_v4_receipts.md b/docs/implplan/SPRINT_0190_0001_0001_cvss_v4_receipts.md index 19780768a..186b5e9d4 100644 --- a/docs/implplan/SPRINT_0190_0001_0001_cvss_v4_receipts.md +++ b/docs/implplan/SPRINT_0190_0001_0001_cvss_v4_receipts.md @@ -34,7 +34,7 @@ | 6 | CVSS-DSSE-190-006 | DONE (2025-11-28) | Depends on 190-005; uses Attestor primitives. | Policy Guild · Attestor Guild (`src/Policy/StellaOps.Policy.Scoring`, `src/Attestor/StellaOps.Attestor.Envelope`) | Attach DSSE attestations to score receipts: create `stella.ops/cvssReceipt@v1` predicate type, sign receipts, store envelope references. | | 7 | CVSS-HISTORY-190-007 | DONE (2025-11-28) | Depends on 190-005. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/History`) | Implement receipt amendment tracking: `AmendReceipt(receiptId, field, newValue, reason, ref)` with history entry creation and re-signing. | | 8 | CVSS-CONCELIER-190-008 | DONE (2025-12-06) | Depends on 190-001; Concelier AGENTS updated 2025-12-06. | Concelier Guild · Policy Guild (`src/Concelier/__Libraries/StellaOps.Concelier.Core`) | Ingest vendor-provided CVSS v4.0 vectors from advisories; parse and store as base receipts; preserve provenance. (Implemented CVSS priority ordering in Advisory → Postgres conversion so v4 vectors are primary and provenance-preserved.) | -| 9 | CVSS-API-190-009 | BLOCKED (2025-12-06) | Depends on 190-005, 190-007; missing Policy Engine CVSS receipt endpoints to proxy. | Policy Guild (`src/Policy/StellaOps.Policy.Gateway`) | REST/gRPC APIs: `POST /cvss/receipts`, `GET /cvss/receipts/{id}`, `PUT /cvss/receipts/{id}/amend`, `GET /cvss/receipts/{id}/history`, `GET /cvss/policies`. | +| 9 | CVSS-API-190-009 | DONE (2025-12-06) | Depends on 190-005, 190-007; Policy Engine + Gateway CVSS endpoints shipped. | Policy Guild (`src/Policy/StellaOps.Policy.Gateway`) | REST APIs delivered: `POST /cvss/receipts`, `GET /cvss/receipts/{id}`, `PUT /cvss/receipts/{id}/amend`, `GET /cvss/receipts/{id}/history`, `GET /cvss/policies`. | | 10 | CVSS-CLI-190-010 | TODO | Depends on 190-009 (API readiness). | CLI Guild (`src/Cli/StellaOps.Cli`) | CLI verbs: `stella cvss score --vuln `, `stella cvss show `, `stella cvss history `, `stella cvss export --format json|pdf`. | | 11 | CVSS-UI-190-011 | TODO | Depends on 190-009 (API readiness). | UI Guild (`src/UI/StellaOps.UI`) | UI components: Score badge with CVSS-BTE label, tabbed receipt viewer (Base/Threat/Environmental/Supplemental/Evidence/Policy/History), "Recalculate with my env" button, export options. | | 12 | CVSS-DOCS-190-012 | BLOCKED (2025-11-29) | Depends on 190-001 through 190-011 (API/UI/CLI blocked). | Docs Guild (`docs/modules/policy/cvss-v4.md`, `docs/09_API_CLI_REFERENCE.md`) | Document CVSS v4.0 scoring system: data model, policy format, API reference, CLI usage, UI guide, determinism guarantees. | @@ -48,7 +48,7 @@ | --- | --- | --- | --- | --- | | W1 Foundation | Policy Guild | None | DONE (2025-11-28) | Tasks 1-4: Data model, engine, tests, policy loader. | | W2 Receipt Pipeline | Policy Guild · Attestor Guild | W1 complete | DONE (2025-11-28) | Tasks 5-7: Receipt builder, DSSE, history completed; integration tests green. | -| W3 Integration | Concelier · Policy · CLI · UI Guilds | W2 complete; AGENTS delivered 2025-12-06 | BLOCKED (2025-12-06) | CVSS-API-190-009 blocked: Policy Engine lacks CVSS receipt endpoints to proxy; CLI/UI depend on it. | +| W3 Integration | Concelier · Policy · CLI · UI Guilds | W2 complete; AGENTS delivered 2025-12-06 | TODO (2025-12-06) | CVSS API now available; proceed with CLI (task 10) and UI (task 11) wiring. | | W4 Documentation | Docs Guild | W3 complete | BLOCKED (2025-12-06) | Task 12 blocked by API/UI/CLI delivery; resumes after W3 progresses. | ## Interlocks @@ -75,11 +75,12 @@ | R3 | Receipt storage grows large with evidence links. | Storage costs; query performance. | Implement evidence reference deduplication; use CAS URIs; Platform Guild. | | R4 | CVSS parser/ruleset changes ungoverned (CVM9). | Score drift, audit gaps. | Version parsers/rulesets; DSSE-sign releases; log scorer version in receipts; dual-review changes. | | R5 | Missing AGENTS for Policy WebService and Concelier ingestion block integration (tasks 8–11). | API/CLI/UI delivery stalled. | AGENTS delivered 2025-12-06 (tasks 15–16). Risk mitigated; monitor API contract approvals. | -| R6 | Policy Engine lacks CVSS receipt endpoints; gateway proxy cannot be implemented yet. | API/CLI/UI tasks remain blocked. | Policy Guild to add receipt API surface in Policy Engine; re-run gateway wiring once available. | +| R6 | Policy Engine lacks CVSS receipt endpoints; gateway proxy cannot be implemented yet. | API/CLI/UI tasks remain blocked. | **Mitigated 2025-12-06:** CVSS receipt endpoints implemented in Policy Engine and Gateway; unblock CLI/UI. | ## Execution Log | Date (UTC) | Update | Owner | | --- | --- | --- | +| 2025-12-06 | CVSS-API-190-009 DONE: added Policy Engine CVSS receipt endpoints and Gateway proxies (`/api/cvss/receipts`, history, amend, policies); W3 unblocked; risk R6 mitigated. | Implementer | | 2025-12-06 | CVSS-CONCELIER-190-008 DONE: prioritized CVSS v4.0 vectors as primary in advisory→Postgres conversion; provenance preserved; enables Policy receipt ingestion. CVSS-API-190-009 set BLOCKED pending Policy Engine CVSS receipt endpoints (risk R6). | Implementer | | 2025-12-06 | Created Policy Gateway AGENTS and refreshed Concelier AGENTS for CVSS v4 ingest (tasks 15–16 DONE); moved tasks 8–11 to TODO, set W3 to TODO, mitigated risk R5. | Project Mgmt | | 2025-12-06 | Added tasks 15–16 to create AGENTS for Policy WebService and Concelier; set Wave 2 to DONE; marked Waves 3–4 BLOCKED until AGENTS exist; captured risk R5. | Project Mgmt | diff --git a/docs/implplan/SPRINT_0502_0001_0001_ops_deployment_ii.md b/docs/implplan/SPRINT_0502_0001_0001_ops_deployment_ii.md index 39429e06d..d3b614e1d 100644 --- a/docs/implplan/SPRINT_0502_0001_0001_ops_deployment_ii.md +++ b/docs/implplan/SPRINT_0502_0001_0001_ops_deployment_ii.md @@ -39,6 +39,7 @@ | 2025-12-06 | CI workflow `.gitea/workflows/mock-dev-release.yml` now packages mock manifest + downloads JSON into `mock-dev-release.tgz` for dev pipelines. | Deployment Guild | | 2025-12-06 | Mock Compose overlay (`deploy/compose/docker-compose.mock.yaml`) documented for dev-only configs using placeholder digests; production pins remain pending. | Deployment Guild | | 2025-12-06 | Added production guard `.gitea/workflows/release-manifest-verify.yml` to fail CI if stable/airgap manifests or downloads JSON omit required components. | Deployment Guild | +| 2025-12-06 | Added Helm mock overlays (`orchestrator/policy/packs/vex/vuln` under `deploy/helm/stellaops/templates/*-mock.yaml`) and `values-mock.yaml`; mock dev release workflow now renders `helm template` with mock values for dev packaging. | Deployment Guild | | 2025-12-05 | HELM-45-003 DONE: added HPA template with per-service overrides, PDB support, Prometheus scrape annotations hook, and production defaults (prod enabled, airgap prometheus on but HPA off). | Deployment Guild | | 2025-12-05 | HELM-45-002 DONE: added ingress/TLS toggles, NetworkPolicy defaults, pod security contexts, and ExternalSecret scaffold (prod enabled, airgap off); documented via values changes and templates (`core.yaml`, `networkpolicy.yaml`, `ingress.yaml`, `externalsecrets.yaml`). | Deployment Guild | | 2025-12-05 | HELM-45-001 DONE: added migration job scaffolding and toggle to Helm chart (`deploy/helm/stellaops/templates/migrations.yaml`, values defaults), kept digest pins, and published install guide (`deploy/helm/stellaops/INSTALL.md`). | Deployment Guild | diff --git a/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup.md b/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup.md index 7a5f376e0..12050330a 100644 --- a/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup.md +++ b/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup.md @@ -52,7 +52,7 @@ | 9 | PG-T7.1.9 | TODO | Depends on PG-T7.1.8 | Infrastructure Guild | Remove MongoDB configuration options | | 10 | PG-T7.1.10 | TODO | Depends on PG-T7.1.9 | Infrastructure Guild | Run full build to verify no broken references | | 14 | PG-T7.1.5a | DOING | Concelier Guild | Concelier: replace Mongo deps with Postgres equivalents; remove MongoDB packages; compat layer added. | -| 15 | PG-T7.1.5b | TODO | Concelier Guild | Build Postgres document/raw storage + state repositories and wire DI. | +| 15 | PG-T7.1.5b | DOING | Concelier Guild | Build Postgres document/raw storage + state repositories and wire DI. | | 16 | PG-T7.1.5c | TODO | Concelier Guild | Refactor connectors/exporters/tests to Postgres storage; delete Storage.Mongo code. | | 17 | PG-T7.1.5d | TODO | Concelier Guild | Add migrations for document/state/export tables; include in air-gap kit. | | 18 | PG-T7.1.5e | TODO | Concelier Guild | Postgres-only Concelier build/tests green; remove Mongo artefacts and update docs. | @@ -122,6 +122,7 @@ | 2025-12-06 | Attempted Scheduler Postgres tests; restore/build fails because `StellaOps.Concelier.Storage.Mongo` project is absent and Concelier connectors reference it. Need phased Concelier plan/shim to unblock test/build runs. | Scheduler Guild | | 2025-12-06 | Began Concelier Mongo compatibility shim: added `FindAsync` to in-memory `IDocumentStore` in Postgres compat layer to unblock connector compile; full Mongo removal still pending. | Infrastructure Guild | | 2025-12-06 | Added lightweight `StellaOps.Concelier.Storage.Mongo` in-memory stub (advisory/dto/document/state/export stores) to unblock Concelier connector build while Postgres rewiring continues; no Mongo driver/runtime. | Infrastructure Guild | +| 2025-12-06 | PG-T7.1.5b set to DOING; began wiring Postgres document store (DI registration, repository find) to replace Mongo bindings. | Concelier Guild | ## Decisions & Risks - Cleanup is strictly after all phases complete; do not start T7 tasks until module cutovers are DONE. diff --git a/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup_tasks.md b/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup_tasks.md index 13fe2b8e4..cfea51c1d 100644 --- a/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup_tasks.md +++ b/docs/implplan/SPRINT_3407_0001_0001_postgres_cleanup_tasks.md @@ -3,7 +3,7 @@ | # | Task ID | Status | Owner | Notes | |---|---|---|---|---| | 1 | PG-T7.1.5a | DOING | Concelier Guild | Replace Mongo storage dependencies with Postgres equivalents; remove MongoDB.Driver/Bson packages from Concelier projects. | -| 2 | PG-T7.1.5b | TODO | Concelier Guild | Implement Postgres document/raw storage (bytea/LargeObject) + state repos to satisfy connector fetch/store paths. | +| 2 | PG-T7.1.5b | DOING | Concelier Guild | Implement Postgres document/raw storage (bytea/LargeObject) + state repos to satisfy connector fetch/store paths. | | 3 | PG-T7.1.5c | TODO | Concelier Guild | Refactor all connectors/exporters/tests to use Postgres storage namespaces; delete Storage.Mongo code/tests. | | 4 | PG-T7.1.5d | TODO | Concelier Guild | Add migrations for documents/state/export tables; wire into Concelier Postgres storage DI. | | 5 | PG-T7.1.5e | TODO | Concelier Guild | End-to-end Concelier build/test on Postgres-only stack; update sprint log and remove Mongo artifacts from repo history references. | diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Acsc/AcscConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Acsc/AcscConnector.cs index a82ef18fd..360351525 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Acsc/AcscConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Acsc/AcscConnector.cs @@ -1,699 +1,699 @@ -using System.Collections.Generic; -using System.Globalization; -using System.IO; -using System.Net; -using System.Net.Http; -using System.Linq; -using System.Threading; -using System.Threading.Tasks; -using System.Xml.Linq; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Connector.Acsc.Configuration; -using StellaOps.Concelier.Connector.Acsc.Internal; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Html; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Acsc; - -public sealed class AcscConnector : IFeedConnector -{ - private static readonly string[] AcceptHeaders = - { - "application/rss+xml", - "application/atom+xml;q=0.9", - "application/xml;q=0.8", - "text/xml;q=0.7", - }; - - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - PropertyNameCaseInsensitive = true, - WriteIndented = false, - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly IHttpClientFactory _httpClientFactory; - private readonly AcscOptions _options; - private readonly AcscDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - private readonly HtmlContentSanitizer _htmlSanitizer = new(); - - public AcscConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IHttpClientFactory httpClientFactory, - IOptions options, - AcscDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => AcscConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var now = _timeProvider.GetUtcNow(); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - - var lastPublished = new Dictionary(cursor.LastPublishedByFeed, StringComparer.OrdinalIgnoreCase); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var failures = new List<(AcscFeedOptions Feed, Exception Error)>(); - - var preferredEndpoint = ResolveInitialPreference(cursor); - AcscEndpointPreference? successPreference = null; - - foreach (var feed in GetEnabledFeeds()) - { - cancellationToken.ThrowIfCancellationRequested(); - - Exception? lastError = null; - bool handled = false; - - foreach (var mode in BuildFetchOrder(preferredEndpoint)) - { - cancellationToken.ThrowIfCancellationRequested(); - if (mode == AcscFetchMode.Relay && !IsRelayConfigured) - { - continue; - } - - var modeName = ModeName(mode); - var targetUri = BuildFeedUri(feed, mode); - - var metadata = CreateMetadata(feed, cursor, modeName); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, targetUri.ToString(), cancellationToken).ConfigureAwait(false); - - var request = new SourceFetchRequest(AcscOptions.HttpClientName, SourceName, targetUri) - { - Metadata = metadata, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = AcceptHeaders, - TimeoutOverride = _options.RequestTimeout, - }; - - try - { - _diagnostics.FetchAttempt(feed.Slug, modeName); - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - - if (result.IsNotModified) - { - _diagnostics.FetchUnchanged(feed.Slug, modeName); - successPreference ??= mode switch - { - AcscFetchMode.Relay => AcscEndpointPreference.Relay, - _ => AcscEndpointPreference.Direct, - }; - handled = true; - _logger.LogDebug("ACSC feed {Feed} returned 304 via {Mode}", feed.Slug, modeName); - break; - } - - if (!result.IsSuccess || result.Document is null) - { - _diagnostics.FetchFailure(feed.Slug, modeName); - lastError = new InvalidOperationException($"Fetch returned no document for {targetUri}"); - continue; - } - - pendingDocuments.Add(result.Document.Id); - successPreference = mode switch - { - AcscFetchMode.Relay => AcscEndpointPreference.Relay, - _ => AcscEndpointPreference.Direct, - }; - handled = true; - _diagnostics.FetchSuccess(feed.Slug, modeName); - _logger.LogInformation("ACSC fetched {Feed} via {Mode} (documentId={DocumentId})", feed.Slug, modeName, result.Document.Id); - - var latestPublished = await TryComputeLatestPublishedAsync(result.Document, cancellationToken).ConfigureAwait(false); - if (latestPublished.HasValue) - { - if (!lastPublished.TryGetValue(feed.Slug, out var existingPublished) || latestPublished.Value > existingPublished) - { - lastPublished[feed.Slug] = latestPublished.Value; - _diagnostics.CursorUpdated(feed.Slug); - _logger.LogDebug("ACSC feed {Feed} advanced published cursor to {Timestamp:O}", feed.Slug, latestPublished.Value); - } - } - - break; - } - catch (HttpRequestException ex) when (ShouldRetryWithRelay(mode)) - { - lastError = ex; - _diagnostics.FetchFallback(feed.Slug, modeName, "http-request"); - _logger.LogWarning(ex, "ACSC fetch via {Mode} failed for {Feed}; attempting relay fallback.", modeName, feed.Slug); - continue; - } - catch (TaskCanceledException ex) when (ShouldRetryWithRelay(mode)) - { - lastError = ex; - _diagnostics.FetchFallback(feed.Slug, modeName, "timeout"); - _logger.LogWarning(ex, "ACSC fetch via {Mode} timed out for {Feed}; attempting relay fallback.", modeName, feed.Slug); - continue; - } - catch (Exception ex) - { - lastError = ex; - _diagnostics.FetchFailure(feed.Slug, modeName); - _logger.LogError(ex, "ACSC fetch failed for {Feed} via {Mode}", feed.Slug, modeName); - break; - } - } - - if (!handled && lastError is not null) - { - failures.Add((feed, lastError)); - } - } - - if (failures.Count > 0) - { - var failureReason = string.Join("; ", failures.Select(f => $"{f.Feed.Slug}: {f.Error.Message}")); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, failureReason, cancellationToken).ConfigureAwait(false); - throw new AggregateException($"ACSC fetch failed for {failures.Count} feed(s): {failureReason}", failures.Select(f => f.Error)); - } - - var updatedPreference = successPreference ?? preferredEndpoint; - if (_options.ForceRelay) - { - updatedPreference = AcscEndpointPreference.Relay; - } - else if (!IsRelayConfigured) - { - updatedPreference = AcscEndpointPreference.Direct; - } - - var updatedCursor = cursor - .WithPreferredEndpoint(updatedPreference) - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastPublished(lastPublished); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var metadata = AcscDocumentMetadata.FromDocument(document); - var feedTag = string.IsNullOrWhiteSpace(metadata.FeedSlug) ? "(unknown)" : metadata.FeedSlug; - - _diagnostics.ParseAttempt(feedTag); - - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure(feedTag, "missingPayload"); - _logger.LogWarning("ACSC document {DocumentId} missing GridFS payload (feed={Feed})", document.Id, feedTag); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(feedTag, "download"); - _logger.LogError(ex, "ACSC failed to download payload for document {DocumentId} (feed={Feed})", document.Id, feedTag); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - try - { - var parsedAt = _timeProvider.GetUtcNow(); - var dto = AcscFeedParser.Parse(rawBytes, metadata.FeedSlug, parsedAt, _htmlSanitizer); - - var json = JsonSerializer.Serialize(dto, SerializerOptions); - var payload = BsonDocument.Parse(json); - - var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); - var dtoRecord = existingDto is null - ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "acsc.feed.v1", payload, parsedAt) - : existingDto with - { - Payload = payload, - SchemaVersion = "acsc.feed.v1", - ValidatedAt = parsedAt, - }; - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - pendingMappings.Add(document.Id); - - _diagnostics.ParseSuccess(feedTag); - _logger.LogInformation("ACSC parsed document {DocumentId} (feed={Feed}, entries={EntryCount})", document.Id, feedTag, dto.Entries.Count); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(feedTag, "parse"); - _logger.LogError(ex, "ACSC parse failed for document {DocumentId} (feed={Feed})", document.Id, feedTag); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var documentIds = cursor.PendingMappings.ToList(); - - foreach (var documentId in documentIds) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - AcscFeedDto? feed; - try - { - var dtoJson = dtoRecord.Payload.ToJson(new JsonWriterSettings - { - OutputMode = JsonOutputMode.RelaxedExtendedJson, - }); - - feed = JsonSerializer.Deserialize(dtoJson, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "ACSC mapping failed to deserialize DTO for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (feed is null) - { - _logger.LogWarning("ACSC mapping encountered null DTO payload for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var mappedAt = _timeProvider.GetUtcNow(); - var advisories = AcscMapper.Map(feed, document, dtoRecord, SourceName, mappedAt); - - if (advisories.Count > 0) - { - foreach (var advisory in advisories) - { - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - } - - _diagnostics.MapSuccess(advisories.Count); - _logger.LogInformation( - "ACSC mapped {Count} advisories from document {DocumentId} (feed={Feed})", - advisories.Count, - document.Id, - feed.FeedSlug ?? "(unknown)"); - } - else - { - _logger.LogInformation( - "ACSC mapping produced no advisories for document {DocumentId} (feed={Feed})", - document.Id, - feed.FeedSlug ?? "(unknown)"); - } - - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ProbeAsync(CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - - if (_options.ForceRelay) - { - if (cursor.PreferredEndpoint != AcscEndpointPreference.Relay) - { - await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Relay), cancellationToken).ConfigureAwait(false); - } - return; - } - - if (!IsRelayConfigured) - { - if (cursor.PreferredEndpoint != AcscEndpointPreference.Direct) - { - await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Direct), cancellationToken).ConfigureAwait(false); - } - return; - } - - var feed = GetEnabledFeeds().FirstOrDefault(); - if (feed is null) - { - return; - } - - var httpClient = _httpClientFactory.CreateClient(AcscOptions.HttpClientName); - httpClient.Timeout = TimeSpan.FromSeconds(15); - - var directUri = BuildFeedUri(feed, AcscFetchMode.Direct); - - try - { - using var headRequest = new HttpRequestMessage(HttpMethod.Head, directUri); - using var response = await httpClient.SendAsync(headRequest, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false); - if (response.IsSuccessStatusCode) - { - if (cursor.PreferredEndpoint != AcscEndpointPreference.Direct) - { - await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Direct), cancellationToken).ConfigureAwait(false); - _logger.LogInformation("ACSC probe succeeded via direct endpoint ({StatusCode}); relay preference cleared.", (int)response.StatusCode); - } - return; - } - - if (response.StatusCode == HttpStatusCode.MethodNotAllowed) - { - using var probeRequest = new HttpRequestMessage(HttpMethod.Get, directUri); - using var probeResponse = await httpClient.SendAsync(probeRequest, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false); - if (probeResponse.IsSuccessStatusCode) - { - if (cursor.PreferredEndpoint != AcscEndpointPreference.Direct) - { - await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Direct), cancellationToken).ConfigureAwait(false); - _logger.LogInformation("ACSC probe succeeded via direct endpoint after GET fallback ({StatusCode}).", (int)probeResponse.StatusCode); - } - return; - } - } - - _logger.LogWarning("ACSC direct probe returned HTTP {StatusCode}; relay preference enabled.", (int)response.StatusCode); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "ACSC direct probe failed; relay preference will be enabled."); - } - - if (cursor.PreferredEndpoint != AcscEndpointPreference.Relay) - { - await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Relay), cancellationToken).ConfigureAwait(false); - } - } - - private bool ShouldRetryWithRelay(AcscFetchMode mode) - => mode == AcscFetchMode.Direct && _options.EnableRelayFallback && IsRelayConfigured && !_options.ForceRelay; - - private IEnumerable BuildFetchOrder(AcscEndpointPreference preference) - { - if (_options.ForceRelay) - { - if (IsRelayConfigured) - { - yield return AcscFetchMode.Relay; - } - yield break; - } - - if (!IsRelayConfigured) - { - yield return AcscFetchMode.Direct; - yield break; - } - - var preferRelay = preference == AcscEndpointPreference.Relay; - if (preference == AcscEndpointPreference.Auto) - { - preferRelay = _options.PreferRelayByDefault; - } - - if (preferRelay) - { - yield return AcscFetchMode.Relay; - if (_options.EnableRelayFallback) - { - yield return AcscFetchMode.Direct; - } - } - else - { - yield return AcscFetchMode.Direct; - if (_options.EnableRelayFallback) - { - yield return AcscFetchMode.Relay; - } - } - } - - private AcscEndpointPreference ResolveInitialPreference(AcscCursor cursor) - { - if (_options.ForceRelay) - { - return AcscEndpointPreference.Relay; - } - - if (!IsRelayConfigured) - { - return AcscEndpointPreference.Direct; - } - - if (cursor.PreferredEndpoint != AcscEndpointPreference.Auto) - { - return cursor.PreferredEndpoint; - } - - return _options.PreferRelayByDefault ? AcscEndpointPreference.Relay : AcscEndpointPreference.Direct; - } - - private async Task TryComputeLatestPublishedAsync(DocumentRecord document, CancellationToken cancellationToken) - { - if (!document.GridFsId.HasValue) - { - return null; - } - - var rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - if (rawBytes.Length == 0) - { - return null; - } - - try - { - using var memoryStream = new MemoryStream(rawBytes, writable: false); - var xml = XDocument.Load(memoryStream, LoadOptions.None); - - DateTimeOffset? latest = null; - foreach (var element in xml.Descendants()) - { - if (!IsEntryElement(element.Name.LocalName)) - { - continue; - } - - var published = ExtractPublished(element); - if (!published.HasValue) - { - continue; - } - - if (latest is null || published.Value > latest.Value) - { - latest = published; - } - } - - return latest; - } - catch (Exception ex) - { - _logger.LogWarning(ex, "ACSC failed to derive published cursor for document {DocumentId} ({Uri})", document.Id, document.Uri); - return null; - } - } - - private static bool IsEntryElement(string localName) - => string.Equals(localName, "item", StringComparison.OrdinalIgnoreCase) - || string.Equals(localName, "entry", StringComparison.OrdinalIgnoreCase); - - private static DateTimeOffset? ExtractPublished(XElement element) - { - foreach (var name in EnumerateTimestampNames(element)) - { - if (DateTimeOffset.TryParse( - name.Value, - CultureInfo.InvariantCulture, - DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeUniversal, - out var parsed)) - { - return parsed.ToUniversalTime(); - } - } - - return null; - } - - private static IEnumerable EnumerateTimestampNames(XElement element) - { - foreach (var child in element.Elements()) - { - var localName = child.Name.LocalName; - if (string.Equals(localName, "pubDate", StringComparison.OrdinalIgnoreCase) || - string.Equals(localName, "published", StringComparison.OrdinalIgnoreCase) || - string.Equals(localName, "updated", StringComparison.OrdinalIgnoreCase) || - string.Equals(localName, "date", StringComparison.OrdinalIgnoreCase)) - { - yield return child; - } - } - } - - private Dictionary CreateMetadata(AcscFeedOptions feed, AcscCursor cursor, string mode) - { - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["acsc.feed.slug"] = feed.Slug, - ["acsc.fetch.mode"] = mode, - }; - - if (cursor.LastPublishedByFeed.TryGetValue(feed.Slug, out var published) && published.HasValue) - { - metadata["acsc.cursor.lastPublished"] = published.Value.ToString("O"); - } - - return metadata; - } - - private Uri BuildFeedUri(AcscFeedOptions feed, AcscFetchMode mode) - { - var baseUri = mode switch - { - AcscFetchMode.Relay when IsRelayConfigured => _options.RelayEndpoint!, - _ => _options.BaseEndpoint, - }; - - return new Uri(baseUri, feed.RelativePath); - } - - private IEnumerable GetEnabledFeeds() - => _options.Feeds.Where(feed => feed is { Enabled: true }); - - private Task GetCursorAsync(CancellationToken cancellationToken) - => GetCursorCoreAsync(cancellationToken); - - private async Task GetCursorCoreAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? AcscCursor.Empty : AcscCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(AcscCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - var completedAt = _timeProvider.GetUtcNow(); - return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); - } - - private bool IsRelayConfigured => _options.RelayEndpoint is not null; - - private static string ModeName(AcscFetchMode mode) => mode switch - { - AcscFetchMode.Relay => "relay", - _ => "direct", - }; - - private enum AcscFetchMode - { - Direct = 0, - Relay = 1, - } -} +using System.Collections.Generic; +using System.Globalization; +using System.IO; +using System.Net; +using System.Net.Http; +using System.Linq; +using System.Threading; +using System.Threading.Tasks; +using System.Xml.Linq; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Connector.Acsc.Configuration; +using StellaOps.Concelier.Connector.Acsc.Internal; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Html; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Acsc; + +public sealed class AcscConnector : IFeedConnector +{ + private static readonly string[] AcceptHeaders = + { + "application/rss+xml", + "application/atom+xml;q=0.9", + "application/xml;q=0.8", + "text/xml;q=0.7", + }; + + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + PropertyNameCaseInsensitive = true, + WriteIndented = false, + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly IHttpClientFactory _httpClientFactory; + private readonly AcscOptions _options; + private readonly AcscDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private readonly HtmlContentSanitizer _htmlSanitizer = new(); + + public AcscConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IHttpClientFactory httpClientFactory, + IOptions options, + AcscDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => AcscConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var now = _timeProvider.GetUtcNow(); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + + var lastPublished = new Dictionary(cursor.LastPublishedByFeed, StringComparer.OrdinalIgnoreCase); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var failures = new List<(AcscFeedOptions Feed, Exception Error)>(); + + var preferredEndpoint = ResolveInitialPreference(cursor); + AcscEndpointPreference? successPreference = null; + + foreach (var feed in GetEnabledFeeds()) + { + cancellationToken.ThrowIfCancellationRequested(); + + Exception? lastError = null; + bool handled = false; + + foreach (var mode in BuildFetchOrder(preferredEndpoint)) + { + cancellationToken.ThrowIfCancellationRequested(); + if (mode == AcscFetchMode.Relay && !IsRelayConfigured) + { + continue; + } + + var modeName = ModeName(mode); + var targetUri = BuildFeedUri(feed, mode); + + var metadata = CreateMetadata(feed, cursor, modeName); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, targetUri.ToString(), cancellationToken).ConfigureAwait(false); + + var request = new SourceFetchRequest(AcscOptions.HttpClientName, SourceName, targetUri) + { + Metadata = metadata, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = AcceptHeaders, + TimeoutOverride = _options.RequestTimeout, + }; + + try + { + _diagnostics.FetchAttempt(feed.Slug, modeName); + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + + if (result.IsNotModified) + { + _diagnostics.FetchUnchanged(feed.Slug, modeName); + successPreference ??= mode switch + { + AcscFetchMode.Relay => AcscEndpointPreference.Relay, + _ => AcscEndpointPreference.Direct, + }; + handled = true; + _logger.LogDebug("ACSC feed {Feed} returned 304 via {Mode}", feed.Slug, modeName); + break; + } + + if (!result.IsSuccess || result.Document is null) + { + _diagnostics.FetchFailure(feed.Slug, modeName); + lastError = new InvalidOperationException($"Fetch returned no document for {targetUri}"); + continue; + } + + pendingDocuments.Add(result.Document.Id); + successPreference = mode switch + { + AcscFetchMode.Relay => AcscEndpointPreference.Relay, + _ => AcscEndpointPreference.Direct, + }; + handled = true; + _diagnostics.FetchSuccess(feed.Slug, modeName); + _logger.LogInformation("ACSC fetched {Feed} via {Mode} (documentId={DocumentId})", feed.Slug, modeName, result.Document.Id); + + var latestPublished = await TryComputeLatestPublishedAsync(result.Document, cancellationToken).ConfigureAwait(false); + if (latestPublished.HasValue) + { + if (!lastPublished.TryGetValue(feed.Slug, out var existingPublished) || latestPublished.Value > existingPublished) + { + lastPublished[feed.Slug] = latestPublished.Value; + _diagnostics.CursorUpdated(feed.Slug); + _logger.LogDebug("ACSC feed {Feed} advanced published cursor to {Timestamp:O}", feed.Slug, latestPublished.Value); + } + } + + break; + } + catch (HttpRequestException ex) when (ShouldRetryWithRelay(mode)) + { + lastError = ex; + _diagnostics.FetchFallback(feed.Slug, modeName, "http-request"); + _logger.LogWarning(ex, "ACSC fetch via {Mode} failed for {Feed}; attempting relay fallback.", modeName, feed.Slug); + continue; + } + catch (TaskCanceledException ex) when (ShouldRetryWithRelay(mode)) + { + lastError = ex; + _diagnostics.FetchFallback(feed.Slug, modeName, "timeout"); + _logger.LogWarning(ex, "ACSC fetch via {Mode} timed out for {Feed}; attempting relay fallback.", modeName, feed.Slug); + continue; + } + catch (Exception ex) + { + lastError = ex; + _diagnostics.FetchFailure(feed.Slug, modeName); + _logger.LogError(ex, "ACSC fetch failed for {Feed} via {Mode}", feed.Slug, modeName); + break; + } + } + + if (!handled && lastError is not null) + { + failures.Add((feed, lastError)); + } + } + + if (failures.Count > 0) + { + var failureReason = string.Join("; ", failures.Select(f => $"{f.Feed.Slug}: {f.Error.Message}")); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, failureReason, cancellationToken).ConfigureAwait(false); + throw new AggregateException($"ACSC fetch failed for {failures.Count} feed(s): {failureReason}", failures.Select(f => f.Error)); + } + + var updatedPreference = successPreference ?? preferredEndpoint; + if (_options.ForceRelay) + { + updatedPreference = AcscEndpointPreference.Relay; + } + else if (!IsRelayConfigured) + { + updatedPreference = AcscEndpointPreference.Direct; + } + + var updatedCursor = cursor + .WithPreferredEndpoint(updatedPreference) + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastPublished(lastPublished); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var metadata = AcscDocumentMetadata.FromDocument(document); + var feedTag = string.IsNullOrWhiteSpace(metadata.FeedSlug) ? "(unknown)" : metadata.FeedSlug; + + _diagnostics.ParseAttempt(feedTag); + + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure(feedTag, "missingPayload"); + _logger.LogWarning("ACSC document {DocumentId} missing GridFS payload (feed={Feed})", document.Id, feedTag); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(feedTag, "download"); + _logger.LogError(ex, "ACSC failed to download payload for document {DocumentId} (feed={Feed})", document.Id, feedTag); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + try + { + var parsedAt = _timeProvider.GetUtcNow(); + var dto = AcscFeedParser.Parse(rawBytes, metadata.FeedSlug, parsedAt, _htmlSanitizer); + + var json = JsonSerializer.Serialize(dto, SerializerOptions); + var payload = BsonDocument.Parse(json); + + var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); + var dtoRecord = existingDto is null + ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "acsc.feed.v1", payload, parsedAt) + : existingDto with + { + Payload = payload, + SchemaVersion = "acsc.feed.v1", + ValidatedAt = parsedAt, + }; + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + pendingMappings.Add(document.Id); + + _diagnostics.ParseSuccess(feedTag); + _logger.LogInformation("ACSC parsed document {DocumentId} (feed={Feed}, entries={EntryCount})", document.Id, feedTag, dto.Entries.Count); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(feedTag, "parse"); + _logger.LogError(ex, "ACSC parse failed for document {DocumentId} (feed={Feed})", document.Id, feedTag); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var documentIds = cursor.PendingMappings.ToList(); + + foreach (var documentId in documentIds) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + AcscFeedDto? feed; + try + { + var dtoJson = dtoRecord.Payload.ToJson(new JsonWriterSettings + { + OutputMode = JsonOutputMode.RelaxedExtendedJson, + }); + + feed = JsonSerializer.Deserialize(dtoJson, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "ACSC mapping failed to deserialize DTO for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (feed is null) + { + _logger.LogWarning("ACSC mapping encountered null DTO payload for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var mappedAt = _timeProvider.GetUtcNow(); + var advisories = AcscMapper.Map(feed, document, dtoRecord, SourceName, mappedAt); + + if (advisories.Count > 0) + { + foreach (var advisory in advisories) + { + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + } + + _diagnostics.MapSuccess(advisories.Count); + _logger.LogInformation( + "ACSC mapped {Count} advisories from document {DocumentId} (feed={Feed})", + advisories.Count, + document.Id, + feed.FeedSlug ?? "(unknown)"); + } + else + { + _logger.LogInformation( + "ACSC mapping produced no advisories for document {DocumentId} (feed={Feed})", + document.Id, + feed.FeedSlug ?? "(unknown)"); + } + + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ProbeAsync(CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + + if (_options.ForceRelay) + { + if (cursor.PreferredEndpoint != AcscEndpointPreference.Relay) + { + await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Relay), cancellationToken).ConfigureAwait(false); + } + return; + } + + if (!IsRelayConfigured) + { + if (cursor.PreferredEndpoint != AcscEndpointPreference.Direct) + { + await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Direct), cancellationToken).ConfigureAwait(false); + } + return; + } + + var feed = GetEnabledFeeds().FirstOrDefault(); + if (feed is null) + { + return; + } + + var httpClient = _httpClientFactory.CreateClient(AcscOptions.HttpClientName); + httpClient.Timeout = TimeSpan.FromSeconds(15); + + var directUri = BuildFeedUri(feed, AcscFetchMode.Direct); + + try + { + using var headRequest = new HttpRequestMessage(HttpMethod.Head, directUri); + using var response = await httpClient.SendAsync(headRequest, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false); + if (response.IsSuccessStatusCode) + { + if (cursor.PreferredEndpoint != AcscEndpointPreference.Direct) + { + await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Direct), cancellationToken).ConfigureAwait(false); + _logger.LogInformation("ACSC probe succeeded via direct endpoint ({StatusCode}); relay preference cleared.", (int)response.StatusCode); + } + return; + } + + if (response.StatusCode == HttpStatusCode.MethodNotAllowed) + { + using var probeRequest = new HttpRequestMessage(HttpMethod.Get, directUri); + using var probeResponse = await httpClient.SendAsync(probeRequest, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false); + if (probeResponse.IsSuccessStatusCode) + { + if (cursor.PreferredEndpoint != AcscEndpointPreference.Direct) + { + await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Direct), cancellationToken).ConfigureAwait(false); + _logger.LogInformation("ACSC probe succeeded via direct endpoint after GET fallback ({StatusCode}).", (int)probeResponse.StatusCode); + } + return; + } + } + + _logger.LogWarning("ACSC direct probe returned HTTP {StatusCode}; relay preference enabled.", (int)response.StatusCode); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "ACSC direct probe failed; relay preference will be enabled."); + } + + if (cursor.PreferredEndpoint != AcscEndpointPreference.Relay) + { + await UpdateCursorAsync(cursor.WithPreferredEndpoint(AcscEndpointPreference.Relay), cancellationToken).ConfigureAwait(false); + } + } + + private bool ShouldRetryWithRelay(AcscFetchMode mode) + => mode == AcscFetchMode.Direct && _options.EnableRelayFallback && IsRelayConfigured && !_options.ForceRelay; + + private IEnumerable BuildFetchOrder(AcscEndpointPreference preference) + { + if (_options.ForceRelay) + { + if (IsRelayConfigured) + { + yield return AcscFetchMode.Relay; + } + yield break; + } + + if (!IsRelayConfigured) + { + yield return AcscFetchMode.Direct; + yield break; + } + + var preferRelay = preference == AcscEndpointPreference.Relay; + if (preference == AcscEndpointPreference.Auto) + { + preferRelay = _options.PreferRelayByDefault; + } + + if (preferRelay) + { + yield return AcscFetchMode.Relay; + if (_options.EnableRelayFallback) + { + yield return AcscFetchMode.Direct; + } + } + else + { + yield return AcscFetchMode.Direct; + if (_options.EnableRelayFallback) + { + yield return AcscFetchMode.Relay; + } + } + } + + private AcscEndpointPreference ResolveInitialPreference(AcscCursor cursor) + { + if (_options.ForceRelay) + { + return AcscEndpointPreference.Relay; + } + + if (!IsRelayConfigured) + { + return AcscEndpointPreference.Direct; + } + + if (cursor.PreferredEndpoint != AcscEndpointPreference.Auto) + { + return cursor.PreferredEndpoint; + } + + return _options.PreferRelayByDefault ? AcscEndpointPreference.Relay : AcscEndpointPreference.Direct; + } + + private async Task TryComputeLatestPublishedAsync(DocumentRecord document, CancellationToken cancellationToken) + { + if (!document.PayloadId.HasValue) + { + return null; + } + + var rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + if (rawBytes.Length == 0) + { + return null; + } + + try + { + using var memoryStream = new MemoryStream(rawBytes, writable: false); + var xml = XDocument.Load(memoryStream, LoadOptions.None); + + DateTimeOffset? latest = null; + foreach (var element in xml.Descendants()) + { + if (!IsEntryElement(element.Name.LocalName)) + { + continue; + } + + var published = ExtractPublished(element); + if (!published.HasValue) + { + continue; + } + + if (latest is null || published.Value > latest.Value) + { + latest = published; + } + } + + return latest; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "ACSC failed to derive published cursor for document {DocumentId} ({Uri})", document.Id, document.Uri); + return null; + } + } + + private static bool IsEntryElement(string localName) + => string.Equals(localName, "item", StringComparison.OrdinalIgnoreCase) + || string.Equals(localName, "entry", StringComparison.OrdinalIgnoreCase); + + private static DateTimeOffset? ExtractPublished(XElement element) + { + foreach (var name in EnumerateTimestampNames(element)) + { + if (DateTimeOffset.TryParse( + name.Value, + CultureInfo.InvariantCulture, + DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeUniversal, + out var parsed)) + { + return parsed.ToUniversalTime(); + } + } + + return null; + } + + private static IEnumerable EnumerateTimestampNames(XElement element) + { + foreach (var child in element.Elements()) + { + var localName = child.Name.LocalName; + if (string.Equals(localName, "pubDate", StringComparison.OrdinalIgnoreCase) || + string.Equals(localName, "published", StringComparison.OrdinalIgnoreCase) || + string.Equals(localName, "updated", StringComparison.OrdinalIgnoreCase) || + string.Equals(localName, "date", StringComparison.OrdinalIgnoreCase)) + { + yield return child; + } + } + } + + private Dictionary CreateMetadata(AcscFeedOptions feed, AcscCursor cursor, string mode) + { + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["acsc.feed.slug"] = feed.Slug, + ["acsc.fetch.mode"] = mode, + }; + + if (cursor.LastPublishedByFeed.TryGetValue(feed.Slug, out var published) && published.HasValue) + { + metadata["acsc.cursor.lastPublished"] = published.Value.ToString("O"); + } + + return metadata; + } + + private Uri BuildFeedUri(AcscFeedOptions feed, AcscFetchMode mode) + { + var baseUri = mode switch + { + AcscFetchMode.Relay when IsRelayConfigured => _options.RelayEndpoint!, + _ => _options.BaseEndpoint, + }; + + return new Uri(baseUri, feed.RelativePath); + } + + private IEnumerable GetEnabledFeeds() + => _options.Feeds.Where(feed => feed is { Enabled: true }); + + private Task GetCursorAsync(CancellationToken cancellationToken) + => GetCursorCoreAsync(cancellationToken); + + private async Task GetCursorCoreAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? AcscCursor.Empty : AcscCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(AcscCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + var completedAt = _timeProvider.GetUtcNow(); + return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); + } + + private bool IsRelayConfigured => _options.RelayEndpoint is not null; + + private static string ModeName(AcscFetchMode mode) => mode switch + { + AcscFetchMode.Relay => "relay", + _ => "direct", + }; + + private enum AcscFetchMode + { + Direct = 0, + Relay = 1, + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cccs/CccsConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cccs/CccsConnector.cs index 2ddcbb14d..aa0d1177d 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cccs/CccsConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cccs/CccsConnector.cs @@ -1,30 +1,30 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Net.Http; -using System.Security.Cryptography; -using System.Text.Json; -using System.Text.Json.Serialization; -using System.Threading; -using System.Threading.Tasks; -using System.Globalization; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Cccs.Configuration; -using StellaOps.Concelier.Connector.Cccs.Internal; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Cccs; - -public sealed class CccsConnector : IFeedConnector -{ +using System; +using System.Collections.Generic; +using System.Linq; +using System.Net.Http; +using System.Security.Cryptography; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Threading; +using System.Threading.Tasks; +using System.Globalization; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Cccs.Configuration; +using StellaOps.Concelier.Connector.Cccs.Internal; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Cccs; + +public sealed class CccsConnector : IFeedConnector +{ private static readonly JsonSerializerOptions RawSerializerOptions = new(JsonSerializerDefaults.Web) { DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, @@ -37,452 +37,452 @@ public sealed class CccsConnector : IFeedConnector private static readonly Uri CanonicalBaseUri = new("https://www.cyber.gc.ca", UriKind.Absolute); private const string DtoSchemaVersion = "cccs.dto.v1"; - - private readonly CccsFeedClient _feedClient; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly CccsHtmlParser _htmlParser; - private readonly CccsDiagnostics _diagnostics; - private readonly CccsOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public CccsConnector( - CccsFeedClient feedClient, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - CccsHtmlParser htmlParser, - CccsDiagnostics diagnostics, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _htmlParser = htmlParser ?? throw new ArgumentNullException(nameof(htmlParser)); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => CccsConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var now = _timeProvider.GetUtcNow(); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = new HashSet(cursor.PendingDocuments); - var pendingMappings = new HashSet(cursor.PendingMappings); - var knownHashes = new Dictionary(cursor.KnownEntryHashes, StringComparer.Ordinal); - var feedsProcessed = 0; - var totalItems = 0; - var added = 0; - var unchanged = 0; - - try - { - foreach (var feed in _options.Feeds) - { - cancellationToken.ThrowIfCancellationRequested(); - - _diagnostics.FetchAttempt(); - var result = await _feedClient.FetchAsync(feed, _options.RequestTimeout, cancellationToken).ConfigureAwait(false); - feedsProcessed++; - totalItems += result.Items.Count; - - if (result.Items.Count == 0) - { - _diagnostics.FetchSuccess(); - await DelayBetweenRequestsAsync(cancellationToken).ConfigureAwait(false); - continue; - } - - var items = result.Items - .Where(static item => !string.IsNullOrWhiteSpace(item.Title)) - .OrderByDescending(item => ParseDate(item.DateModifiedTimestamp) ?? ParseDate(item.DateModified) ?? DateTimeOffset.MinValue) - .ThenByDescending(item => ParseDate(item.DateCreated) ?? DateTimeOffset.MinValue) - .ToList(); - - foreach (var item in items) - { - cancellationToken.ThrowIfCancellationRequested(); - - var documentUri = BuildDocumentUri(item, feed); - var rawDocument = CreateRawDocument(item, feed, result.AlertTypes); - var payload = JsonSerializer.SerializeToUtf8Bytes(rawDocument, RawSerializerOptions); - var sha = ComputeSha256(payload); - - if (knownHashes.TryGetValue(documentUri, out var existingHash) - && string.Equals(existingHash, sha, StringComparison.Ordinal)) - { - unchanged++; - _diagnostics.FetchUnchanged(); - continue; - } - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); - if (existing is not null - && string.Equals(existing.Sha256, sha, StringComparison.OrdinalIgnoreCase) - && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) - { - knownHashes[documentUri] = sha; - unchanged++; - _diagnostics.FetchUnchanged(); - continue; - } - - var gridFsId = await _rawDocumentStorage.UploadAsync( - SourceName, - documentUri, - payload, - "application/json", - expiresAt: null, - cancellationToken).ConfigureAwait(false); - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["cccs.language"] = rawDocument.Language, - ["cccs.sourceId"] = rawDocument.SourceId, - }; - - if (!string.IsNullOrWhiteSpace(rawDocument.SerialNumber)) - { - metadata["cccs.serialNumber"] = rawDocument.SerialNumber!; - } - - if (!string.IsNullOrWhiteSpace(rawDocument.AlertType)) - { - metadata["cccs.alertType"] = rawDocument.AlertType!; - } - - var recordId = existing?.Id ?? Guid.NewGuid(); - var record = new DocumentRecord( - recordId, - SourceName, - documentUri, - now, - sha, - DocumentStatuses.PendingParse, - "application/json", - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: rawDocument.Modified ?? rawDocument.Published ?? result.LastModifiedUtc, - GridFsId: gridFsId, - ExpiresAt: null); - - var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - pendingDocuments.Add(upserted.Id); - pendingMappings.Remove(upserted.Id); - knownHashes[documentUri] = sha; - added++; - _diagnostics.FetchDocument(); - - if (added >= _options.MaxEntriesPerFetch) - { - break; - } - } - - _diagnostics.FetchSuccess(); - await DelayBetweenRequestsAsync(cancellationToken).ConfigureAwait(false); - - if (added >= _options.MaxEntriesPerFetch) - { - break; - } - } - } - catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException or JsonException or InvalidOperationException) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "CCCS fetch failed"); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var trimmedHashes = TrimKnownHashes(knownHashes, _options.MaxKnownEntries); - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithKnownEntryHashes(trimmedHashes) - .WithLastFetch(now); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - _logger.LogInformation( - "CCCS fetch completed feeds={Feeds} items={Items} newDocuments={Added} unchanged={Unchanged} pendingDocuments={PendingDocuments} pendingMappings={PendingMappings}", - feedsProcessed, - totalItems, - added, - unchanged, - pendingDocuments.Count, - pendingMappings.Count); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - var now = _timeProvider.GetUtcNow(); - var parsed = 0; - var parseFailures = 0; - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - _diagnostics.ParseFailure(); - parseFailures++; - continue; - } - - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure(); - _logger.LogWarning("CCCS document {DocumentId} missing GridFS payload", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - parseFailures++; - continue; - } - - byte[] payload; - try - { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogError(ex, "CCCS unable to download raw document {DocumentId}", documentId); - throw; - } - - CccsRawAdvisoryDocument? raw; - try - { - raw = JsonSerializer.Deserialize(payload, RawSerializerOptions); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogWarning(ex, "CCCS failed to deserialize raw document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - parseFailures++; - continue; - } - - if (raw is null) - { - _diagnostics.ParseFailure(); - _logger.LogWarning("CCCS raw document {DocumentId} produced null payload", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - parseFailures++; - continue; - } - - CccsAdvisoryDto dto; - try - { - dto = _htmlParser.Parse(raw); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogWarning(ex, "CCCS failed to parse advisory DTO for {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - parseFailures++; - continue; - } - - var dtoJson = JsonSerializer.Serialize(dto, DtoSerializerOptions); - var dtoBson = BsonDocument.Parse(dtoJson); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, DtoSchemaVersion, dtoBson, now); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - _diagnostics.ParseSuccess(); - parsed++; - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - if (parsed > 0 || parseFailures > 0) - { - _logger.LogInformation( - "CCCS parse completed parsed={Parsed} failures={Failures} pendingDocuments={PendingDocuments} pendingMappings={PendingMappings}", - parsed, - parseFailures, - pendingDocuments.Count, - pendingMappings.Count); - } - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - var mapped = 0; - var mappingFailures = 0; - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingMappings.Remove(documentId); - _diagnostics.MapFailure(); - mappingFailures++; - continue; - } - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - _diagnostics.MapFailure(); - _logger.LogWarning("CCCS document {DocumentId} missing DTO payload", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - mappingFailures++; - continue; - } - - CccsAdvisoryDto? dto; - try - { - var json = dtoRecord.Payload.ToJson(); - dto = JsonSerializer.Deserialize(json, DtoSerializerOptions); - } - catch (Exception ex) - { - _diagnostics.MapFailure(); - _logger.LogWarning(ex, "CCCS failed to deserialize DTO for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - mappingFailures++; - continue; - } - - if (dto is null) - { - _diagnostics.MapFailure(); - _logger.LogWarning("CCCS DTO for document {DocumentId} evaluated to null", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - mappingFailures++; - continue; - } - - try - { - var advisory = CccsMapper.Map(dto, document, dtoRecord.ValidatedAt); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - _diagnostics.MapSuccess(); - mapped++; - } - catch (Exception ex) - { - _diagnostics.MapFailure(); - _logger.LogError(ex, "CCCS mapping failed for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - mappingFailures++; - } - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - if (mapped > 0 || mappingFailures > 0) - { - _logger.LogInformation( - "CCCS map completed mapped={Mapped} failures={Failures} pendingMappings={PendingMappings}", - mapped, - mappingFailures, - pendingMappings.Count); - } - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? CccsCursor.Empty : CccsCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(CccsCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - var completedAt = cursor.LastFetchAt ?? _timeProvider.GetUtcNow(); - return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); - } - - private async Task DelayBetweenRequestsAsync(CancellationToken cancellationToken) - { - if (_options.RequestDelay <= TimeSpan.Zero) - { - return; - } - - try - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - catch (TaskCanceledException) - { - // Ignore cancellation during delay; caller handles. - } - } - + + private readonly CccsFeedClient _feedClient; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly CccsHtmlParser _htmlParser; + private readonly CccsDiagnostics _diagnostics; + private readonly CccsOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public CccsConnector( + CccsFeedClient feedClient, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + CccsHtmlParser htmlParser, + CccsDiagnostics diagnostics, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _htmlParser = htmlParser ?? throw new ArgumentNullException(nameof(htmlParser)); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => CccsConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var now = _timeProvider.GetUtcNow(); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = new HashSet(cursor.PendingDocuments); + var pendingMappings = new HashSet(cursor.PendingMappings); + var knownHashes = new Dictionary(cursor.KnownEntryHashes, StringComparer.Ordinal); + var feedsProcessed = 0; + var totalItems = 0; + var added = 0; + var unchanged = 0; + + try + { + foreach (var feed in _options.Feeds) + { + cancellationToken.ThrowIfCancellationRequested(); + + _diagnostics.FetchAttempt(); + var result = await _feedClient.FetchAsync(feed, _options.RequestTimeout, cancellationToken).ConfigureAwait(false); + feedsProcessed++; + totalItems += result.Items.Count; + + if (result.Items.Count == 0) + { + _diagnostics.FetchSuccess(); + await DelayBetweenRequestsAsync(cancellationToken).ConfigureAwait(false); + continue; + } + + var items = result.Items + .Where(static item => !string.IsNullOrWhiteSpace(item.Title)) + .OrderByDescending(item => ParseDate(item.DateModifiedTimestamp) ?? ParseDate(item.DateModified) ?? DateTimeOffset.MinValue) + .ThenByDescending(item => ParseDate(item.DateCreated) ?? DateTimeOffset.MinValue) + .ToList(); + + foreach (var item in items) + { + cancellationToken.ThrowIfCancellationRequested(); + + var documentUri = BuildDocumentUri(item, feed); + var rawDocument = CreateRawDocument(item, feed, result.AlertTypes); + var payload = JsonSerializer.SerializeToUtf8Bytes(rawDocument, RawSerializerOptions); + var sha = ComputeSha256(payload); + + if (knownHashes.TryGetValue(documentUri, out var existingHash) + && string.Equals(existingHash, sha, StringComparison.Ordinal)) + { + unchanged++; + _diagnostics.FetchUnchanged(); + continue; + } + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); + if (existing is not null + && string.Equals(existing.Sha256, sha, StringComparison.OrdinalIgnoreCase) + && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) + { + knownHashes[documentUri] = sha; + unchanged++; + _diagnostics.FetchUnchanged(); + continue; + } + + var gridFsId = await _rawDocumentStorage.UploadAsync( + SourceName, + documentUri, + payload, + "application/json", + expiresAt: null, + cancellationToken).ConfigureAwait(false); + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["cccs.language"] = rawDocument.Language, + ["cccs.sourceId"] = rawDocument.SourceId, + }; + + if (!string.IsNullOrWhiteSpace(rawDocument.SerialNumber)) + { + metadata["cccs.serialNumber"] = rawDocument.SerialNumber!; + } + + if (!string.IsNullOrWhiteSpace(rawDocument.AlertType)) + { + metadata["cccs.alertType"] = rawDocument.AlertType!; + } + + var recordId = existing?.Id ?? Guid.NewGuid(); + var record = new DocumentRecord( + recordId, + SourceName, + documentUri, + now, + sha, + DocumentStatuses.PendingParse, + "application/json", + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: rawDocument.Modified ?? rawDocument.Published ?? result.LastModifiedUtc, + PayloadId: gridFsId, + ExpiresAt: null); + + var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); + pendingDocuments.Add(upserted.Id); + pendingMappings.Remove(upserted.Id); + knownHashes[documentUri] = sha; + added++; + _diagnostics.FetchDocument(); + + if (added >= _options.MaxEntriesPerFetch) + { + break; + } + } + + _diagnostics.FetchSuccess(); + await DelayBetweenRequestsAsync(cancellationToken).ConfigureAwait(false); + + if (added >= _options.MaxEntriesPerFetch) + { + break; + } + } + } + catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException or JsonException or InvalidOperationException) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "CCCS fetch failed"); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var trimmedHashes = TrimKnownHashes(knownHashes, _options.MaxKnownEntries); + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithKnownEntryHashes(trimmedHashes) + .WithLastFetch(now); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + _logger.LogInformation( + "CCCS fetch completed feeds={Feeds} items={Items} newDocuments={Added} unchanged={Unchanged} pendingDocuments={PendingDocuments} pendingMappings={PendingMappings}", + feedsProcessed, + totalItems, + added, + unchanged, + pendingDocuments.Count, + pendingMappings.Count); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + var now = _timeProvider.GetUtcNow(); + var parsed = 0; + var parseFailures = 0; + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + _diagnostics.ParseFailure(); + parseFailures++; + continue; + } + + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure(); + _logger.LogWarning("CCCS document {DocumentId} missing GridFS payload", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + parseFailures++; + continue; + } + + byte[] payload; + try + { + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogError(ex, "CCCS unable to download raw document {DocumentId}", documentId); + throw; + } + + CccsRawAdvisoryDocument? raw; + try + { + raw = JsonSerializer.Deserialize(payload, RawSerializerOptions); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogWarning(ex, "CCCS failed to deserialize raw document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + parseFailures++; + continue; + } + + if (raw is null) + { + _diagnostics.ParseFailure(); + _logger.LogWarning("CCCS raw document {DocumentId} produced null payload", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + parseFailures++; + continue; + } + + CccsAdvisoryDto dto; + try + { + dto = _htmlParser.Parse(raw); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogWarning(ex, "CCCS failed to parse advisory DTO for {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + parseFailures++; + continue; + } + + var dtoJson = JsonSerializer.Serialize(dto, DtoSerializerOptions); + var dtoBson = BsonDocument.Parse(dtoJson); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, DtoSchemaVersion, dtoBson, now); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + _diagnostics.ParseSuccess(); + parsed++; + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + if (parsed > 0 || parseFailures > 0) + { + _logger.LogInformation( + "CCCS parse completed parsed={Parsed} failures={Failures} pendingDocuments={PendingDocuments} pendingMappings={PendingMappings}", + parsed, + parseFailures, + pendingDocuments.Count, + pendingMappings.Count); + } + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + var mapped = 0; + var mappingFailures = 0; + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingMappings.Remove(documentId); + _diagnostics.MapFailure(); + mappingFailures++; + continue; + } + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + _diagnostics.MapFailure(); + _logger.LogWarning("CCCS document {DocumentId} missing DTO payload", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + mappingFailures++; + continue; + } + + CccsAdvisoryDto? dto; + try + { + var json = dtoRecord.Payload.ToJson(); + dto = JsonSerializer.Deserialize(json, DtoSerializerOptions); + } + catch (Exception ex) + { + _diagnostics.MapFailure(); + _logger.LogWarning(ex, "CCCS failed to deserialize DTO for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + mappingFailures++; + continue; + } + + if (dto is null) + { + _diagnostics.MapFailure(); + _logger.LogWarning("CCCS DTO for document {DocumentId} evaluated to null", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + mappingFailures++; + continue; + } + + try + { + var advisory = CccsMapper.Map(dto, document, dtoRecord.ValidatedAt); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + _diagnostics.MapSuccess(); + mapped++; + } + catch (Exception ex) + { + _diagnostics.MapFailure(); + _logger.LogError(ex, "CCCS mapping failed for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + mappingFailures++; + } + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + if (mapped > 0 || mappingFailures > 0) + { + _logger.LogInformation( + "CCCS map completed mapped={Mapped} failures={Failures} pendingMappings={PendingMappings}", + mapped, + mappingFailures, + pendingMappings.Count); + } + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? CccsCursor.Empty : CccsCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(CccsCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + var completedAt = cursor.LastFetchAt ?? _timeProvider.GetUtcNow(); + return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); + } + + private async Task DelayBetweenRequestsAsync(CancellationToken cancellationToken) + { + if (_options.RequestDelay <= TimeSpan.Zero) + { + return; + } + + try + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + catch (TaskCanceledException) + { + // Ignore cancellation during delay; caller handles. + } + } + private static string BuildDocumentUri(CccsFeedItem item, CccsFeedEndpoint feed) { var candidate = item.Url?.Trim(); @@ -514,107 +514,107 @@ public sealed class CccsConnector : IFeedConnector private static bool IsHttpScheme(string? scheme) => string.Equals(scheme, Uri.UriSchemeHttp, StringComparison.OrdinalIgnoreCase) || string.Equals(scheme, Uri.UriSchemeHttps, StringComparison.OrdinalIgnoreCase); - - private static CccsRawAdvisoryDocument CreateRawDocument(CccsFeedItem item, CccsFeedEndpoint feed, IReadOnlyDictionary taxonomy) - { - var language = string.IsNullOrWhiteSpace(item.Language) ? feed.Language : item.Language!.Trim(); - var identifier = !string.IsNullOrWhiteSpace(item.SerialNumber) - ? item.SerialNumber!.Trim() - : !string.IsNullOrWhiteSpace(item.Uuid) - ? item.Uuid!.Trim() - : $"nid-{item.Nid}"; - - var canonicalUrl = BuildDocumentUri(item, feed); - var bodySegments = item.Body ?? Array.Empty(); - var bodyHtml = string.Join(Environment.NewLine, bodySegments); - var published = ParseDate(item.DateCreated); - var modified = ParseDate(item.DateModifiedTimestamp) ?? ParseDate(item.DateModified); - var alertType = ResolveAlertType(item, taxonomy); - - return new CccsRawAdvisoryDocument - { - SourceId = identifier, - SerialNumber = item.SerialNumber?.Trim(), - Uuid = item.Uuid, - Language = language.ToLowerInvariant(), - Title = item.Title?.Trim() ?? identifier, - Summary = item.Summary?.Trim(), - CanonicalUrl = canonicalUrl, - ExternalUrl = item.ExternalUrl, - BodyHtml = bodyHtml, - BodySegments = bodySegments, - AlertType = alertType, - Subject = item.Subject, - Banner = item.Banner, - Published = published, - Modified = modified, - RawDateCreated = item.DateCreated, - RawDateModified = item.DateModifiedTimestamp ?? item.DateModified, - }; - } - - private static string? ResolveAlertType(CccsFeedItem item, IReadOnlyDictionary taxonomy) - { - if (item.AlertType.ValueKind == JsonValueKind.Number) - { - var id = item.AlertType.GetInt32(); - return taxonomy.TryGetValue(id, out var label) ? label : id.ToString(CultureInfo.InvariantCulture); - } - - if (item.AlertType.ValueKind == JsonValueKind.String) - { - return item.AlertType.GetString(); - } - - if (item.AlertType.ValueKind == JsonValueKind.Array) - { - foreach (var element in item.AlertType.EnumerateArray()) - { - if (element.ValueKind == JsonValueKind.Number) - { - var id = element.GetInt32(); - if (taxonomy.TryGetValue(id, out var label)) - { - return label; - } - } - else if (element.ValueKind == JsonValueKind.String) - { - var label = element.GetString(); - if (!string.IsNullOrWhiteSpace(label)) - { - return label; - } - } - } - } - - return null; - } - - private static Dictionary TrimKnownHashes(Dictionary hashes, int maxEntries) - { - if (hashes.Count <= maxEntries) - { - return hashes; - } - - var overflow = hashes.Count - maxEntries; - foreach (var key in hashes.Keys.Take(overflow).ToList()) - { - hashes.Remove(key); - } - - return hashes; - } - - private static DateTimeOffset? ParseDate(string? value) - => string.IsNullOrWhiteSpace(value) - ? null - : DateTimeOffset.TryParse(value, CultureInfo.InvariantCulture, DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal, out var parsed) - ? parsed - : null; - - private static string ComputeSha256(byte[] payload) - => Convert.ToHexString(SHA256.HashData(payload)).ToLowerInvariant(); -} + + private static CccsRawAdvisoryDocument CreateRawDocument(CccsFeedItem item, CccsFeedEndpoint feed, IReadOnlyDictionary taxonomy) + { + var language = string.IsNullOrWhiteSpace(item.Language) ? feed.Language : item.Language!.Trim(); + var identifier = !string.IsNullOrWhiteSpace(item.SerialNumber) + ? item.SerialNumber!.Trim() + : !string.IsNullOrWhiteSpace(item.Uuid) + ? item.Uuid!.Trim() + : $"nid-{item.Nid}"; + + var canonicalUrl = BuildDocumentUri(item, feed); + var bodySegments = item.Body ?? Array.Empty(); + var bodyHtml = string.Join(Environment.NewLine, bodySegments); + var published = ParseDate(item.DateCreated); + var modified = ParseDate(item.DateModifiedTimestamp) ?? ParseDate(item.DateModified); + var alertType = ResolveAlertType(item, taxonomy); + + return new CccsRawAdvisoryDocument + { + SourceId = identifier, + SerialNumber = item.SerialNumber?.Trim(), + Uuid = item.Uuid, + Language = language.ToLowerInvariant(), + Title = item.Title?.Trim() ?? identifier, + Summary = item.Summary?.Trim(), + CanonicalUrl = canonicalUrl, + ExternalUrl = item.ExternalUrl, + BodyHtml = bodyHtml, + BodySegments = bodySegments, + AlertType = alertType, + Subject = item.Subject, + Banner = item.Banner, + Published = published, + Modified = modified, + RawDateCreated = item.DateCreated, + RawDateModified = item.DateModifiedTimestamp ?? item.DateModified, + }; + } + + private static string? ResolveAlertType(CccsFeedItem item, IReadOnlyDictionary taxonomy) + { + if (item.AlertType.ValueKind == JsonValueKind.Number) + { + var id = item.AlertType.GetInt32(); + return taxonomy.TryGetValue(id, out var label) ? label : id.ToString(CultureInfo.InvariantCulture); + } + + if (item.AlertType.ValueKind == JsonValueKind.String) + { + return item.AlertType.GetString(); + } + + if (item.AlertType.ValueKind == JsonValueKind.Array) + { + foreach (var element in item.AlertType.EnumerateArray()) + { + if (element.ValueKind == JsonValueKind.Number) + { + var id = element.GetInt32(); + if (taxonomy.TryGetValue(id, out var label)) + { + return label; + } + } + else if (element.ValueKind == JsonValueKind.String) + { + var label = element.GetString(); + if (!string.IsNullOrWhiteSpace(label)) + { + return label; + } + } + } + } + + return null; + } + + private static Dictionary TrimKnownHashes(Dictionary hashes, int maxEntries) + { + if (hashes.Count <= maxEntries) + { + return hashes; + } + + var overflow = hashes.Count - maxEntries; + foreach (var key in hashes.Keys.Take(overflow).ToList()) + { + hashes.Remove(key); + } + + return hashes; + } + + private static DateTimeOffset? ParseDate(string? value) + => string.IsNullOrWhiteSpace(value) + ? null + : DateTimeOffset.TryParse(value, CultureInfo.InvariantCulture, DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal, out var parsed) + ? parsed + : null; + + private static string ComputeSha256(byte[] payload) + => Convert.ToHexString(SHA256.HashData(payload)).ToLowerInvariant(); +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertBund/CertBundConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertBund/CertBundConnector.cs index 2c19b78b4..64ecb1d6f 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertBund/CertBundConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertBund/CertBundConnector.cs @@ -245,7 +245,7 @@ public sealed class CertBundConnector : IFeedConnector continue; } - if (!document.GridFsId.HasValue) + if (!document.PayloadId.HasValue) { await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); remainingDocuments.Remove(documentId); @@ -258,7 +258,7 @@ public sealed class CertBundConnector : IFeedConnector byte[] payload; try { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); } catch (Exception ex) { diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertCc/CertCcConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertCc/CertCcConnector.cs index bdfc56d5c..448d10b9f 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertCc/CertCcConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertCc/CertCcConnector.cs @@ -1,779 +1,779 @@ -using System.Collections.Generic; -using System.Globalization; -using System.Net; -using System.Linq; -using System.Net.Http; -using System.Text; -using System.Text.Json; -using System.Text.Json.Serialization; -using System.Threading; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.CertCc.Configuration; -using StellaOps.Concelier.Connector.CertCc.Internal; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.CertCc; - -public sealed class CertCcConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions DtoSerializerOptions = new(JsonSerializerDefaults.Web) - { - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - }; - - private static readonly byte[] EmptyArrayPayload = Encoding.UTF8.GetBytes("[]"); - private static readonly string[] DetailEndpoints = { "note", "vendors", "vuls", "vendors-vuls" }; - - private readonly CertCcSummaryPlanner _summaryPlanner; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly CertCcOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - private readonly CertCcDiagnostics _diagnostics; - - public CertCcConnector( - CertCcSummaryPlanner summaryPlanner, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - CertCcDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _summaryPlanner = summaryPlanner ?? throw new ArgumentNullException(nameof(summaryPlanner)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => CertCcConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var pendingNotes = new HashSet(cursor.PendingNotes, StringComparer.OrdinalIgnoreCase); - var processedNotes = new HashSet(StringComparer.OrdinalIgnoreCase); - - var now = _timeProvider.GetUtcNow(); - var remainingBudget = _options.MaxNotesPerFetch; - - // Resume notes that previously failed before fetching new summaries. - if (pendingNotes.Count > 0 && remainingBudget > 0) - { - var replay = pendingNotes.ToArray(); - foreach (var noteId in replay) - { - if (remainingBudget <= 0) - { - break; - } - - try - { - if (!processedNotes.Add(noteId)) - { - continue; - } - - if (await HasPendingDocumentBundleAsync(noteId, pendingDocuments, cancellationToken).ConfigureAwait(false)) - { - pendingNotes.Remove(noteId); - continue; - } - - await FetchNoteBundleAsync(noteId, null, pendingDocuments, pendingNotes, cancellationToken).ConfigureAwait(false); - if (!pendingNotes.Contains(noteId)) - { - remainingBudget--; - } - } - catch (Exception ex) - { - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - } - - var plan = _summaryPlanner.CreatePlan(cursor.SummaryState); - _diagnostics.PlanEvaluated(plan.Window, plan.Requests.Count); - - try - { - foreach (var request in plan.Requests) - { - cancellationToken.ThrowIfCancellationRequested(); - var shouldProcessNotes = remainingBudget > 0; - - try - { - _diagnostics.SummaryFetchAttempt(request.Scope); - var metadata = BuildSummaryMetadata(request); - var existingSummary = await _documentStore.FindBySourceAndUriAsync(SourceName, request.Uri.ToString(), cancellationToken).ConfigureAwait(false); - var fetchRequest = new SourceFetchRequest( - CertCcOptions.HttpClientName, - SourceName, - HttpMethod.Get, - request.Uri, - metadata, - existingSummary?.Etag, - existingSummary?.LastModified, - null, - new[] { "application/json" }); - - var result = await _fetchService.FetchAsync(fetchRequest, cancellationToken).ConfigureAwait(false); - if (result.IsNotModified) - { - _diagnostics.SummaryFetchUnchanged(request.Scope); - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - _diagnostics.SummaryFetchFailure(request.Scope); - continue; - } - - _diagnostics.SummaryFetchSuccess(request.Scope); - - if (!shouldProcessNotes) - { - continue; - } - - var noteTokens = await ReadSummaryNotesAsync(result.Document, cancellationToken).ConfigureAwait(false); - foreach (var token in noteTokens) - { - if (remainingBudget <= 0) - { - break; - } - - var noteId = TryNormalizeNoteToken(token, out var vuIdentifier); - if (string.IsNullOrEmpty(noteId)) - { - continue; - } - - if (!processedNotes.Add(noteId)) - { - continue; - } - - await FetchNoteBundleAsync(noteId, vuIdentifier, pendingDocuments, pendingNotes, cancellationToken).ConfigureAwait(false); - if (!pendingNotes.Contains(noteId)) - { - remainingBudget--; - } - } - } - catch - { - _diagnostics.SummaryFetchFailure(request.Scope); - throw; - } - } - } - catch (Exception ex) - { - var failureCursor = cursor - .WithPendingSummaries(Array.Empty()) - .WithPendingNotes(pendingNotes) - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastRun(now); - - await UpdateCursorAsync(failureCursor, cancellationToken).ConfigureAwait(false); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var updatedCursor = cursor - .WithSummaryState(plan.NextState) - .WithPendingSummaries(Array.Empty()) - .WithPendingNotes(pendingNotes) - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastRun(now); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task HasPendingDocumentBundleAsync(string noteId, HashSet pendingDocuments, CancellationToken cancellationToken) - { - if (pendingDocuments.Count == 0) - { - return false; - } - - var required = new HashSet(DetailEndpoints, StringComparer.OrdinalIgnoreCase); - - foreach (var documentId in pendingDocuments) - { - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document?.Metadata is null) - { - continue; - } - - if (!document.Metadata.TryGetValue("certcc.noteId", out var metadataNoteId) || - !string.Equals(metadataNoteId, noteId, StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - var endpoint = document.Metadata.TryGetValue("certcc.endpoint", out var endpointValue) - ? endpointValue - : "note"; - - required.Remove(endpoint); - if (required.Count == 0) - { - return true; - } - } - - return false; - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - if (!_options.EnableDetailMapping) - { - return; - } - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - var groups = new Dictionary(StringComparer.OrdinalIgnoreCase); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - continue; - } - - if (!TryGetMetadata(document, "certcc.noteId", out var noteId) || string.IsNullOrWhiteSpace(noteId)) - { - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - continue; - } - - var endpoint = TryGetMetadata(document, "certcc.endpoint", out var endpointValue) - ? endpointValue - : "note"; - - var group = groups.TryGetValue(noteId, out var existing) - ? existing - : (groups[noteId] = new NoteDocumentGroup(noteId)); - - group.Add(endpoint, document); - } - - foreach (var group in groups.Values) - { - cancellationToken.ThrowIfCancellationRequested(); - - if (group.Note is null) - { - continue; - } - - try - { - var noteBytes = await DownloadDocumentAsync(group.Note, cancellationToken).ConfigureAwait(false); - var vendorsBytes = group.Vendors is null - ? EmptyArrayPayload - : await DownloadDocumentAsync(group.Vendors, cancellationToken).ConfigureAwait(false); - var vulsBytes = group.Vuls is null - ? EmptyArrayPayload - : await DownloadDocumentAsync(group.Vuls, cancellationToken).ConfigureAwait(false); - var vendorStatusesBytes = group.VendorStatuses is null - ? EmptyArrayPayload - : await DownloadDocumentAsync(group.VendorStatuses, cancellationToken).ConfigureAwait(false); - - var dto = CertCcNoteParser.Parse(noteBytes, vendorsBytes, vulsBytes, vendorStatusesBytes); - var json = JsonSerializer.Serialize(dto, DtoSerializerOptions); - var payload = MongoDB.Bson.BsonDocument.Parse(json); - - _diagnostics.ParseSuccess( - dto.Vendors.Count, - dto.VendorStatuses.Count, - dto.Vulnerabilities.Count); - - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - group.Note.Id, - SourceName, - "certcc.vince.note.v1", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(group.Note.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - pendingMappings.Add(group.Note.Id); - pendingDocuments.Remove(group.Note.Id); - - if (group.Vendors is not null) - { - await _documentStore.UpdateStatusAsync(group.Vendors.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(group.Vendors.Id); - } - - if (group.Vuls is not null) - { - await _documentStore.UpdateStatusAsync(group.Vuls.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(group.Vuls.Id); - } - - if (group.VendorStatuses is not null) - { - await _documentStore.UpdateStatusAsync(group.VendorStatuses.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(group.VendorStatuses.Id); - } - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogError(ex, "CERT/CC parse failed for note {NoteId}", group.NoteId); - if (group.Note is not null) - { - await _documentStore.UpdateStatusAsync(group.Note.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(group.Note.Id); - } - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - if (!_options.EnableDetailMapping) - { - return; - } - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - try - { - var json = dtoRecord.Payload.ToJson(); - var dto = JsonSerializer.Deserialize(json, DtoSerializerOptions); - if (dto is null) - { - throw new InvalidOperationException($"CERT/CC DTO payload deserialized as null for document {documentId}."); - } - - var advisory = CertCcMapper.Map(dto, document, dtoRecord, SourceName); - var affectedCount = advisory.AffectedPackages.Length; - var normalizedRuleCount = advisory.AffectedPackages.Sum(static package => package.NormalizedVersions.Length); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - _diagnostics.MapSuccess(affectedCount, normalizedRuleCount); - } - catch (Exception ex) - { - _diagnostics.MapFailure(); - _logger.LogError(ex, "CERT/CC mapping failed for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - } - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task FetchNoteBundleAsync( - string noteId, - string? vuIdentifier, - HashSet pendingDocuments, - HashSet pendingNotes, - CancellationToken cancellationToken) - { - var missingEndpoints = new List<(string Endpoint, HttpStatusCode? Status)>(); - - try - { - foreach (var endpoint in DetailEndpoints) - { - cancellationToken.ThrowIfCancellationRequested(); - - var uri = BuildDetailUri(noteId, endpoint); - var metadata = BuildDetailMetadata(noteId, vuIdentifier, endpoint); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, uri.ToString(), cancellationToken).ConfigureAwait(false); - - var request = new SourceFetchRequest(CertCcOptions.HttpClientName, SourceName, uri) - { - Metadata = metadata, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = new[] { "application/json" }, - }; - - SourceFetchResult result; - _diagnostics.DetailFetchAttempt(endpoint); - try - { - result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - } - catch (HttpRequestException httpEx) - { - var status = httpEx.StatusCode ?? TryParseStatusCodeFromMessage(httpEx.Message); - if (ShouldTreatAsMissing(status, endpoint)) - { - _diagnostics.DetailFetchMissing(endpoint); - missingEndpoints.Add((endpoint, status)); - continue; - } - - _diagnostics.DetailFetchFailure(endpoint); - throw; - } - Guid documentId; - if (result.IsSuccess && result.Document is not null) - { - _diagnostics.DetailFetchSuccess(endpoint); - documentId = result.Document.Id; - } - else if (result.IsNotModified) - { - _diagnostics.DetailFetchUnchanged(endpoint); - if (existing is null) - { - continue; - } - - documentId = existing.Id; - } - else - { - _diagnostics.DetailFetchFailure(endpoint); - _logger.LogWarning( - "CERT/CC detail endpoint {Endpoint} returned {StatusCode} for note {NoteId}; will retry.", - endpoint, - (int)result.StatusCode, - noteId); - - throw new HttpRequestException( - $"CERT/CC endpoint '{endpoint}' returned {(int)result.StatusCode} ({result.StatusCode}) for note {noteId}.", - null, - result.StatusCode); - } - - pendingDocuments.Add(documentId); - - if (_options.DetailRequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.DetailRequestDelay, cancellationToken).ConfigureAwait(false); - } - } - - if (missingEndpoints.Count > 0) - { - var formatted = string.Join( - ", ", - missingEndpoints.Select(item => - item.Status.HasValue - ? $"{item.Endpoint} ({(int)item.Status.Value})" - : item.Endpoint)); - - _logger.LogWarning( - "CERT/CC detail fetch completed with missing endpoints for note {NoteId}: {Endpoints}", - noteId, - formatted); - } - - pendingNotes.Remove(noteId); - } - catch (Exception ex) - { - _logger.LogError(ex, "CERT/CC detail fetch failed for note {NoteId}", noteId); - pendingNotes.Add(noteId); - throw; - } - } - - private static Dictionary BuildSummaryMetadata(CertCcSummaryRequest request) - { - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["certcc.scope"] = request.Scope.ToString().ToLowerInvariant(), - ["certcc.year"] = request.Year.ToString("D4", CultureInfo.InvariantCulture), - }; - - if (request.Month.HasValue) - { - metadata["certcc.month"] = request.Month.Value.ToString("D2", CultureInfo.InvariantCulture); - } - - return metadata; - } - - private static Dictionary BuildDetailMetadata(string noteId, string? vuIdentifier, string endpoint) - { - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["certcc.endpoint"] = endpoint, - ["certcc.noteId"] = noteId, - }; - - if (!string.IsNullOrWhiteSpace(vuIdentifier)) - { - metadata["certcc.vuid"] = vuIdentifier; - } - - return metadata; - } - - private async Task> ReadSummaryNotesAsync(DocumentRecord document, CancellationToken cancellationToken) - { - if (!document.GridFsId.HasValue) - { - return Array.Empty(); - } - - var payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - return CertCcSummaryParser.ParseNotes(payload); - } - - private async Task DownloadDocumentAsync(DocumentRecord document, CancellationToken cancellationToken) - { - if (!document.GridFsId.HasValue) - { - throw new InvalidOperationException($"Document {document.Id} has no GridFS payload."); - } - - return await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - - private Uri BuildDetailUri(string noteId, string endpoint) - { - var suffix = endpoint switch - { - "note" => $"{noteId}/", - "vendors" => $"{noteId}/vendors/", - "vuls" => $"{noteId}/vuls/", - "vendors-vuls" => $"{noteId}/vendors/vuls/", - _ => $"{noteId}/", - }; - - return new Uri(_options.BaseApiUri, suffix); - } - - private static string? TryNormalizeNoteToken(string token, out string? vuIdentifier) - { - vuIdentifier = null; - if (string.IsNullOrWhiteSpace(token)) - { - return null; - } - - var trimmed = token.Trim(); - var digits = new string(trimmed.Where(char.IsDigit).ToArray()); - if (digits.Length == 0) - { - return null; - } - - vuIdentifier = trimmed.StartsWith("vu", StringComparison.OrdinalIgnoreCase) - ? trimmed.Replace(" ", string.Empty, StringComparison.Ordinal) - : $"VU#{digits}"; - - return digits; - } - - private static bool TryGetMetadata(DocumentRecord document, string key, out string value) - { - value = string.Empty; - if (document.Metadata is null) - { - return false; - } - - if (!document.Metadata.TryGetValue(key, out var metadataValue)) - { - return false; - } - - value = metadataValue; - return true; - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return CertCcCursor.FromBson(record?.Cursor); - } - - private async Task UpdateCursorAsync(CertCcCursor cursor, CancellationToken cancellationToken) - { - var completedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); - } - - private sealed class NoteDocumentGroup - { - public NoteDocumentGroup(string noteId) - { - NoteId = noteId; - } - - public string NoteId { get; } - - public DocumentRecord? Note { get; private set; } - - public DocumentRecord? Vendors { get; private set; } - - public DocumentRecord? Vuls { get; private set; } - - public DocumentRecord? VendorStatuses { get; private set; } - - public void Add(string endpoint, DocumentRecord document) - { - switch (endpoint) - { - case "note": - Note = document; - break; - case "vendors": - Vendors = document; - break; - case "vuls": - Vuls = document; - break; - case "vendors-vuls": - VendorStatuses = document; - break; - default: - Note ??= document; - break; - } - } - } - - private static bool ShouldTreatAsMissing(HttpStatusCode? statusCode, string endpoint) - { - if (statusCode is null) - { - return false; - } - - if (statusCode is HttpStatusCode.NotFound or HttpStatusCode.Gone) - { - return !string.Equals(endpoint, "note", StringComparison.OrdinalIgnoreCase); - } - - // Treat vendors/vendors-vuls/vuls 403 as optional air-gapped responses. - if (statusCode == HttpStatusCode.Forbidden && !string.Equals(endpoint, "note", StringComparison.OrdinalIgnoreCase)) - { - return true; - } - - return false; - } - - private static HttpStatusCode? TryParseStatusCodeFromMessage(string? message) - { - if (string.IsNullOrWhiteSpace(message)) - { - return null; - } - - const string marker = "status "; - var index = message.IndexOf(marker, StringComparison.OrdinalIgnoreCase); - if (index < 0) - { - return null; - } - - index += marker.Length; - var end = index; - while (end < message.Length && char.IsDigit(message[end])) - { - end++; - } - - if (end == index) - { - return null; - } - - if (int.TryParse(message[index..end], NumberStyles.Integer, CultureInfo.InvariantCulture, out var code) && - Enum.IsDefined(typeof(HttpStatusCode), code)) - { - return (HttpStatusCode)code; - } - - return null; - } -} +using System.Collections.Generic; +using System.Globalization; +using System.Net; +using System.Linq; +using System.Net.Http; +using System.Text; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Threading; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.CertCc.Configuration; +using StellaOps.Concelier.Connector.CertCc.Internal; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.CertCc; + +public sealed class CertCcConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions DtoSerializerOptions = new(JsonSerializerDefaults.Web) + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + }; + + private static readonly byte[] EmptyArrayPayload = Encoding.UTF8.GetBytes("[]"); + private static readonly string[] DetailEndpoints = { "note", "vendors", "vuls", "vendors-vuls" }; + + private readonly CertCcSummaryPlanner _summaryPlanner; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly CertCcOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private readonly CertCcDiagnostics _diagnostics; + + public CertCcConnector( + CertCcSummaryPlanner summaryPlanner, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + CertCcDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _summaryPlanner = summaryPlanner ?? throw new ArgumentNullException(nameof(summaryPlanner)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => CertCcConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var pendingNotes = new HashSet(cursor.PendingNotes, StringComparer.OrdinalIgnoreCase); + var processedNotes = new HashSet(StringComparer.OrdinalIgnoreCase); + + var now = _timeProvider.GetUtcNow(); + var remainingBudget = _options.MaxNotesPerFetch; + + // Resume notes that previously failed before fetching new summaries. + if (pendingNotes.Count > 0 && remainingBudget > 0) + { + var replay = pendingNotes.ToArray(); + foreach (var noteId in replay) + { + if (remainingBudget <= 0) + { + break; + } + + try + { + if (!processedNotes.Add(noteId)) + { + continue; + } + + if (await HasPendingDocumentBundleAsync(noteId, pendingDocuments, cancellationToken).ConfigureAwait(false)) + { + pendingNotes.Remove(noteId); + continue; + } + + await FetchNoteBundleAsync(noteId, null, pendingDocuments, pendingNotes, cancellationToken).ConfigureAwait(false); + if (!pendingNotes.Contains(noteId)) + { + remainingBudget--; + } + } + catch (Exception ex) + { + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + } + + var plan = _summaryPlanner.CreatePlan(cursor.SummaryState); + _diagnostics.PlanEvaluated(plan.Window, plan.Requests.Count); + + try + { + foreach (var request in plan.Requests) + { + cancellationToken.ThrowIfCancellationRequested(); + var shouldProcessNotes = remainingBudget > 0; + + try + { + _diagnostics.SummaryFetchAttempt(request.Scope); + var metadata = BuildSummaryMetadata(request); + var existingSummary = await _documentStore.FindBySourceAndUriAsync(SourceName, request.Uri.ToString(), cancellationToken).ConfigureAwait(false); + var fetchRequest = new SourceFetchRequest( + CertCcOptions.HttpClientName, + SourceName, + HttpMethod.Get, + request.Uri, + metadata, + existingSummary?.Etag, + existingSummary?.LastModified, + null, + new[] { "application/json" }); + + var result = await _fetchService.FetchAsync(fetchRequest, cancellationToken).ConfigureAwait(false); + if (result.IsNotModified) + { + _diagnostics.SummaryFetchUnchanged(request.Scope); + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + _diagnostics.SummaryFetchFailure(request.Scope); + continue; + } + + _diagnostics.SummaryFetchSuccess(request.Scope); + + if (!shouldProcessNotes) + { + continue; + } + + var noteTokens = await ReadSummaryNotesAsync(result.Document, cancellationToken).ConfigureAwait(false); + foreach (var token in noteTokens) + { + if (remainingBudget <= 0) + { + break; + } + + var noteId = TryNormalizeNoteToken(token, out var vuIdentifier); + if (string.IsNullOrEmpty(noteId)) + { + continue; + } + + if (!processedNotes.Add(noteId)) + { + continue; + } + + await FetchNoteBundleAsync(noteId, vuIdentifier, pendingDocuments, pendingNotes, cancellationToken).ConfigureAwait(false); + if (!pendingNotes.Contains(noteId)) + { + remainingBudget--; + } + } + } + catch + { + _diagnostics.SummaryFetchFailure(request.Scope); + throw; + } + } + } + catch (Exception ex) + { + var failureCursor = cursor + .WithPendingSummaries(Array.Empty()) + .WithPendingNotes(pendingNotes) + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastRun(now); + + await UpdateCursorAsync(failureCursor, cancellationToken).ConfigureAwait(false); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var updatedCursor = cursor + .WithSummaryState(plan.NextState) + .WithPendingSummaries(Array.Empty()) + .WithPendingNotes(pendingNotes) + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastRun(now); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task HasPendingDocumentBundleAsync(string noteId, HashSet pendingDocuments, CancellationToken cancellationToken) + { + if (pendingDocuments.Count == 0) + { + return false; + } + + var required = new HashSet(DetailEndpoints, StringComparer.OrdinalIgnoreCase); + + foreach (var documentId in pendingDocuments) + { + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document?.Metadata is null) + { + continue; + } + + if (!document.Metadata.TryGetValue("certcc.noteId", out var metadataNoteId) || + !string.Equals(metadataNoteId, noteId, StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + var endpoint = document.Metadata.TryGetValue("certcc.endpoint", out var endpointValue) + ? endpointValue + : "note"; + + required.Remove(endpoint); + if (required.Count == 0) + { + return true; + } + } + + return false; + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + if (!_options.EnableDetailMapping) + { + return; + } + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + var groups = new Dictionary(StringComparer.OrdinalIgnoreCase); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + continue; + } + + if (!TryGetMetadata(document, "certcc.noteId", out var noteId) || string.IsNullOrWhiteSpace(noteId)) + { + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + continue; + } + + var endpoint = TryGetMetadata(document, "certcc.endpoint", out var endpointValue) + ? endpointValue + : "note"; + + var group = groups.TryGetValue(noteId, out var existing) + ? existing + : (groups[noteId] = new NoteDocumentGroup(noteId)); + + group.Add(endpoint, document); + } + + foreach (var group in groups.Values) + { + cancellationToken.ThrowIfCancellationRequested(); + + if (group.Note is null) + { + continue; + } + + try + { + var noteBytes = await DownloadDocumentAsync(group.Note, cancellationToken).ConfigureAwait(false); + var vendorsBytes = group.Vendors is null + ? EmptyArrayPayload + : await DownloadDocumentAsync(group.Vendors, cancellationToken).ConfigureAwait(false); + var vulsBytes = group.Vuls is null + ? EmptyArrayPayload + : await DownloadDocumentAsync(group.Vuls, cancellationToken).ConfigureAwait(false); + var vendorStatusesBytes = group.VendorStatuses is null + ? EmptyArrayPayload + : await DownloadDocumentAsync(group.VendorStatuses, cancellationToken).ConfigureAwait(false); + + var dto = CertCcNoteParser.Parse(noteBytes, vendorsBytes, vulsBytes, vendorStatusesBytes); + var json = JsonSerializer.Serialize(dto, DtoSerializerOptions); + var payload = MongoDB.Bson.BsonDocument.Parse(json); + + _diagnostics.ParseSuccess( + dto.Vendors.Count, + dto.VendorStatuses.Count, + dto.Vulnerabilities.Count); + + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + group.Note.Id, + SourceName, + "certcc.vince.note.v1", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(group.Note.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + pendingMappings.Add(group.Note.Id); + pendingDocuments.Remove(group.Note.Id); + + if (group.Vendors is not null) + { + await _documentStore.UpdateStatusAsync(group.Vendors.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(group.Vendors.Id); + } + + if (group.Vuls is not null) + { + await _documentStore.UpdateStatusAsync(group.Vuls.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(group.Vuls.Id); + } + + if (group.VendorStatuses is not null) + { + await _documentStore.UpdateStatusAsync(group.VendorStatuses.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(group.VendorStatuses.Id); + } + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogError(ex, "CERT/CC parse failed for note {NoteId}", group.NoteId); + if (group.Note is not null) + { + await _documentStore.UpdateStatusAsync(group.Note.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(group.Note.Id); + } + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + if (!_options.EnableDetailMapping) + { + return; + } + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + try + { + var json = dtoRecord.Payload.ToJson(); + var dto = JsonSerializer.Deserialize(json, DtoSerializerOptions); + if (dto is null) + { + throw new InvalidOperationException($"CERT/CC DTO payload deserialized as null for document {documentId}."); + } + + var advisory = CertCcMapper.Map(dto, document, dtoRecord, SourceName); + var affectedCount = advisory.AffectedPackages.Length; + var normalizedRuleCount = advisory.AffectedPackages.Sum(static package => package.NormalizedVersions.Length); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + _diagnostics.MapSuccess(affectedCount, normalizedRuleCount); + } + catch (Exception ex) + { + _diagnostics.MapFailure(); + _logger.LogError(ex, "CERT/CC mapping failed for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + } + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task FetchNoteBundleAsync( + string noteId, + string? vuIdentifier, + HashSet pendingDocuments, + HashSet pendingNotes, + CancellationToken cancellationToken) + { + var missingEndpoints = new List<(string Endpoint, HttpStatusCode? Status)>(); + + try + { + foreach (var endpoint in DetailEndpoints) + { + cancellationToken.ThrowIfCancellationRequested(); + + var uri = BuildDetailUri(noteId, endpoint); + var metadata = BuildDetailMetadata(noteId, vuIdentifier, endpoint); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, uri.ToString(), cancellationToken).ConfigureAwait(false); + + var request = new SourceFetchRequest(CertCcOptions.HttpClientName, SourceName, uri) + { + Metadata = metadata, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = new[] { "application/json" }, + }; + + SourceFetchResult result; + _diagnostics.DetailFetchAttempt(endpoint); + try + { + result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + } + catch (HttpRequestException httpEx) + { + var status = httpEx.StatusCode ?? TryParseStatusCodeFromMessage(httpEx.Message); + if (ShouldTreatAsMissing(status, endpoint)) + { + _diagnostics.DetailFetchMissing(endpoint); + missingEndpoints.Add((endpoint, status)); + continue; + } + + _diagnostics.DetailFetchFailure(endpoint); + throw; + } + Guid documentId; + if (result.IsSuccess && result.Document is not null) + { + _diagnostics.DetailFetchSuccess(endpoint); + documentId = result.Document.Id; + } + else if (result.IsNotModified) + { + _diagnostics.DetailFetchUnchanged(endpoint); + if (existing is null) + { + continue; + } + + documentId = existing.Id; + } + else + { + _diagnostics.DetailFetchFailure(endpoint); + _logger.LogWarning( + "CERT/CC detail endpoint {Endpoint} returned {StatusCode} for note {NoteId}; will retry.", + endpoint, + (int)result.StatusCode, + noteId); + + throw new HttpRequestException( + $"CERT/CC endpoint '{endpoint}' returned {(int)result.StatusCode} ({result.StatusCode}) for note {noteId}.", + null, + result.StatusCode); + } + + pendingDocuments.Add(documentId); + + if (_options.DetailRequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.DetailRequestDelay, cancellationToken).ConfigureAwait(false); + } + } + + if (missingEndpoints.Count > 0) + { + var formatted = string.Join( + ", ", + missingEndpoints.Select(item => + item.Status.HasValue + ? $"{item.Endpoint} ({(int)item.Status.Value})" + : item.Endpoint)); + + _logger.LogWarning( + "CERT/CC detail fetch completed with missing endpoints for note {NoteId}: {Endpoints}", + noteId, + formatted); + } + + pendingNotes.Remove(noteId); + } + catch (Exception ex) + { + _logger.LogError(ex, "CERT/CC detail fetch failed for note {NoteId}", noteId); + pendingNotes.Add(noteId); + throw; + } + } + + private static Dictionary BuildSummaryMetadata(CertCcSummaryRequest request) + { + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["certcc.scope"] = request.Scope.ToString().ToLowerInvariant(), + ["certcc.year"] = request.Year.ToString("D4", CultureInfo.InvariantCulture), + }; + + if (request.Month.HasValue) + { + metadata["certcc.month"] = request.Month.Value.ToString("D2", CultureInfo.InvariantCulture); + } + + return metadata; + } + + private static Dictionary BuildDetailMetadata(string noteId, string? vuIdentifier, string endpoint) + { + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["certcc.endpoint"] = endpoint, + ["certcc.noteId"] = noteId, + }; + + if (!string.IsNullOrWhiteSpace(vuIdentifier)) + { + metadata["certcc.vuid"] = vuIdentifier; + } + + return metadata; + } + + private async Task> ReadSummaryNotesAsync(DocumentRecord document, CancellationToken cancellationToken) + { + if (!document.PayloadId.HasValue) + { + return Array.Empty(); + } + + var payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + return CertCcSummaryParser.ParseNotes(payload); + } + + private async Task DownloadDocumentAsync(DocumentRecord document, CancellationToken cancellationToken) + { + if (!document.PayloadId.HasValue) + { + throw new InvalidOperationException($"Document {document.Id} has no GridFS payload."); + } + + return await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + + private Uri BuildDetailUri(string noteId, string endpoint) + { + var suffix = endpoint switch + { + "note" => $"{noteId}/", + "vendors" => $"{noteId}/vendors/", + "vuls" => $"{noteId}/vuls/", + "vendors-vuls" => $"{noteId}/vendors/vuls/", + _ => $"{noteId}/", + }; + + return new Uri(_options.BaseApiUri, suffix); + } + + private static string? TryNormalizeNoteToken(string token, out string? vuIdentifier) + { + vuIdentifier = null; + if (string.IsNullOrWhiteSpace(token)) + { + return null; + } + + var trimmed = token.Trim(); + var digits = new string(trimmed.Where(char.IsDigit).ToArray()); + if (digits.Length == 0) + { + return null; + } + + vuIdentifier = trimmed.StartsWith("vu", StringComparison.OrdinalIgnoreCase) + ? trimmed.Replace(" ", string.Empty, StringComparison.Ordinal) + : $"VU#{digits}"; + + return digits; + } + + private static bool TryGetMetadata(DocumentRecord document, string key, out string value) + { + value = string.Empty; + if (document.Metadata is null) + { + return false; + } + + if (!document.Metadata.TryGetValue(key, out var metadataValue)) + { + return false; + } + + value = metadataValue; + return true; + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return CertCcCursor.FromBson(record?.Cursor); + } + + private async Task UpdateCursorAsync(CertCcCursor cursor, CancellationToken cancellationToken) + { + var completedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); + } + + private sealed class NoteDocumentGroup + { + public NoteDocumentGroup(string noteId) + { + NoteId = noteId; + } + + public string NoteId { get; } + + public DocumentRecord? Note { get; private set; } + + public DocumentRecord? Vendors { get; private set; } + + public DocumentRecord? Vuls { get; private set; } + + public DocumentRecord? VendorStatuses { get; private set; } + + public void Add(string endpoint, DocumentRecord document) + { + switch (endpoint) + { + case "note": + Note = document; + break; + case "vendors": + Vendors = document; + break; + case "vuls": + Vuls = document; + break; + case "vendors-vuls": + VendorStatuses = document; + break; + default: + Note ??= document; + break; + } + } + } + + private static bool ShouldTreatAsMissing(HttpStatusCode? statusCode, string endpoint) + { + if (statusCode is null) + { + return false; + } + + if (statusCode is HttpStatusCode.NotFound or HttpStatusCode.Gone) + { + return !string.Equals(endpoint, "note", StringComparison.OrdinalIgnoreCase); + } + + // Treat vendors/vendors-vuls/vuls 403 as optional air-gapped responses. + if (statusCode == HttpStatusCode.Forbidden && !string.Equals(endpoint, "note", StringComparison.OrdinalIgnoreCase)) + { + return true; + } + + return false; + } + + private static HttpStatusCode? TryParseStatusCodeFromMessage(string? message) + { + if (string.IsNullOrWhiteSpace(message)) + { + return null; + } + + const string marker = "status "; + var index = message.IndexOf(marker, StringComparison.OrdinalIgnoreCase); + if (index < 0) + { + return null; + } + + index += marker.Length; + var end = index; + while (end < message.Length && char.IsDigit(message[end])) + { + end++; + } + + if (end == index) + { + return null; + } + + if (int.TryParse(message[index..end], NumberStyles.Integer, CultureInfo.InvariantCulture, out var code) && + Enum.IsDefined(typeof(HttpStatusCode), code)) + { + return (HttpStatusCode)code; + } + + return null; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertFr/CertFrConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertFr/CertFrConnector.cs index 36a6460f7..e3e4351d3 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertFr/CertFrConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertFr/CertFrConnector.cs @@ -1,337 +1,337 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.CertFr.Configuration; -using StellaOps.Concelier.Connector.CertFr.Internal; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.CertFr; - -public sealed class CertFrConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new() - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly CertFrFeedClient _feedClient; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly CertFrOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public CertFrConnector( - CertFrFeedClient feedClient, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => CertFrConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var windowEnd = now; - var lastPublished = cursor.LastPublished ?? now - _options.InitialBackfill; - var windowStart = lastPublished - _options.WindowOverlap; - var minStart = now - _options.InitialBackfill; - if (windowStart < minStart) - { - windowStart = minStart; - } - - IReadOnlyList items; - try - { - items = await _feedClient.LoadAsync(windowStart, windowEnd, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Cert-FR feed load failed {Start:o}-{End:o}", windowStart, windowEnd); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (items.Count == 0) - { - await UpdateCursorAsync(cursor.WithLastPublished(windowEnd), cancellationToken).ConfigureAwait(false); - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; - - foreach (var item in items) - { - cancellationToken.ThrowIfCancellationRequested(); - - try - { - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, item.DetailUri.ToString(), cancellationToken).ConfigureAwait(false); - var request = new SourceFetchRequest(CertFrOptions.HttpClientName, SourceName, item.DetailUri) - { - Metadata = CertFrDocumentMetadata.CreateMetadata(item), - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, - }; - - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - if (result.IsNotModified || !result.IsSuccess || result.Document is null) - { - if (item.Published > maxPublished) - { - maxPublished = item.Published; - } - - continue; - } - - if (existing is not null - && string.Equals(existing.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase) - && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) - { - await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); - if (item.Published > maxPublished) - { - maxPublished = item.Published; - } - - continue; - } - - if (!pendingDocuments.Contains(result.Document.Id)) - { - pendingDocuments.Add(result.Document.Id); - } - - if (item.Published > maxPublished) - { - maxPublished = item.Published; - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - catch (Exception ex) - { - _logger.LogError(ex, "Cert-FR fetch failed for {Uri}", item.DetailUri); - await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - if (maxPublished == DateTimeOffset.MinValue) - { - maxPublished = cursor.LastPublished ?? windowEnd; - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastPublished(maxPublished); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Cert-FR document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - CertFrDocumentMetadata metadata; - try - { - metadata = CertFrDocumentMetadata.FromDocument(document); - } - catch (Exception ex) - { - _logger.LogError(ex, "Cert-FR metadata parse failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - CertFrDto dto; - try - { - var content = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - var html = System.Text.Encoding.UTF8.GetString(content); - dto = CertFrParser.Parse(html, metadata); - } - catch (Exception ex) - { - _logger.LogError(ex, "Cert-FR parse failed for advisory {AdvisoryId} ({Uri})", metadata.AdvisoryId, document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var json = JsonSerializer.Serialize(dto, SerializerOptions); - var payload = BsonDocument.Parse(json); - var validatedAt = _timeProvider.GetUtcNow(); - - var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); - var dtoRecord = existingDto is null - ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "certfr.detail.v1", payload, validatedAt) - : existingDto with - { - Payload = payload, - SchemaVersion = "certfr.detail.v1", - ValidatedAt = validatedAt, - }; - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - CertFrDto? dto; - try - { - var json = dtoRecord.Payload.ToJson(); - dto = JsonSerializer.Deserialize(json, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "Cert-FR DTO deserialization failed for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (dto is null) - { - _logger.LogWarning("Cert-FR DTO payload deserialized as null for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var mappedAt = _timeProvider.GetUtcNow(); - var advisory = CertFrMapper.Map(dto, SourceName, mappedAt); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return CertFrCursor.FromBson(record?.Cursor); - } - - private async Task UpdateCursorAsync(CertFrCursor cursor, CancellationToken cancellationToken) - { - var completedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); - } -} +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.CertFr.Configuration; +using StellaOps.Concelier.Connector.CertFr.Internal; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.CertFr; + +public sealed class CertFrConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new() + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly CertFrFeedClient _feedClient; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly CertFrOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public CertFrConnector( + CertFrFeedClient feedClient, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => CertFrConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var windowEnd = now; + var lastPublished = cursor.LastPublished ?? now - _options.InitialBackfill; + var windowStart = lastPublished - _options.WindowOverlap; + var minStart = now - _options.InitialBackfill; + if (windowStart < minStart) + { + windowStart = minStart; + } + + IReadOnlyList items; + try + { + items = await _feedClient.LoadAsync(windowStart, windowEnd, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Cert-FR feed load failed {Start:o}-{End:o}", windowStart, windowEnd); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (items.Count == 0) + { + await UpdateCursorAsync(cursor.WithLastPublished(windowEnd), cancellationToken).ConfigureAwait(false); + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; + + foreach (var item in items) + { + cancellationToken.ThrowIfCancellationRequested(); + + try + { + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, item.DetailUri.ToString(), cancellationToken).ConfigureAwait(false); + var request = new SourceFetchRequest(CertFrOptions.HttpClientName, SourceName, item.DetailUri) + { + Metadata = CertFrDocumentMetadata.CreateMetadata(item), + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, + }; + + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + if (result.IsNotModified || !result.IsSuccess || result.Document is null) + { + if (item.Published > maxPublished) + { + maxPublished = item.Published; + } + + continue; + } + + if (existing is not null + && string.Equals(existing.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase) + && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) + { + await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); + if (item.Published > maxPublished) + { + maxPublished = item.Published; + } + + continue; + } + + if (!pendingDocuments.Contains(result.Document.Id)) + { + pendingDocuments.Add(result.Document.Id); + } + + if (item.Published > maxPublished) + { + maxPublished = item.Published; + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Cert-FR fetch failed for {Uri}", item.DetailUri); + await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + if (maxPublished == DateTimeOffset.MinValue) + { + maxPublished = cursor.LastPublished ?? windowEnd; + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastPublished(maxPublished); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Cert-FR document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + CertFrDocumentMetadata metadata; + try + { + metadata = CertFrDocumentMetadata.FromDocument(document); + } + catch (Exception ex) + { + _logger.LogError(ex, "Cert-FR metadata parse failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + CertFrDto dto; + try + { + var content = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + var html = System.Text.Encoding.UTF8.GetString(content); + dto = CertFrParser.Parse(html, metadata); + } + catch (Exception ex) + { + _logger.LogError(ex, "Cert-FR parse failed for advisory {AdvisoryId} ({Uri})", metadata.AdvisoryId, document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var json = JsonSerializer.Serialize(dto, SerializerOptions); + var payload = BsonDocument.Parse(json); + var validatedAt = _timeProvider.GetUtcNow(); + + var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); + var dtoRecord = existingDto is null + ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "certfr.detail.v1", payload, validatedAt) + : existingDto with + { + Payload = payload, + SchemaVersion = "certfr.detail.v1", + ValidatedAt = validatedAt, + }; + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + CertFrDto? dto; + try + { + var json = dtoRecord.Payload.ToJson(); + dto = JsonSerializer.Deserialize(json, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "Cert-FR DTO deserialization failed for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (dto is null) + { + _logger.LogWarning("Cert-FR DTO payload deserialized as null for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var mappedAt = _timeProvider.GetUtcNow(); + var advisory = CertFrMapper.Map(dto, SourceName, mappedAt); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return CertFrCursor.FromBson(record?.Cursor); + } + + private async Task UpdateCursorAsync(CertFrCursor cursor, CancellationToken cancellationToken) + { + var completedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertIn/CertInConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertIn/CertInConnector.cs index 510e848e1..c56a2a410 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertIn/CertInConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.CertIn/CertInConnector.cs @@ -1,370 +1,370 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.CertIn.Configuration; -using StellaOps.Concelier.Connector.CertIn.Internal; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.CertIn; - -public sealed class CertInConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.General) - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly CertInClient _client; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly CertInOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public CertInConnector( - CertInClient client, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _client = client ?? throw new ArgumentNullException(nameof(client)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => CertInConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var windowStart = cursor.LastPublished.HasValue - ? cursor.LastPublished.Value - _options.WindowOverlap - : now - _options.WindowSize; - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; - - for (var page = 1; page <= _options.MaxPagesPerFetch; page++) - { - IReadOnlyList listings; - try - { - listings = await _client.GetListingsAsync(page, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "CERT-In listings fetch failed for page {Page}", page); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - if (listings.Count == 0) - { - break; - } - - foreach (var listing in listings.OrderByDescending(static item => item.Published)) - { - if (listing.Published < windowStart) - { - page = _options.MaxPagesPerFetch + 1; - break; - } - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["certin.advisoryId"] = listing.AdvisoryId, - ["certin.title"] = listing.Title, - ["certin.link"] = listing.DetailUri.ToString(), - ["certin.published"] = listing.Published.ToString("O") - }; - - if (!string.IsNullOrWhiteSpace(listing.Summary)) - { - metadata["certin.summary"] = listing.Summary!; - } - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, listing.DetailUri.ToString(), cancellationToken).ConfigureAwait(false); - - SourceFetchResult result; - try - { - result = await _fetchService.FetchAsync( - new SourceFetchRequest(CertInOptions.HttpClientName, SourceName, listing.DetailUri) - { - Metadata = metadata, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "CERT-In fetch failed for {Uri}", listing.DetailUri); - await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(3), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - if (existing is not null - && string.Equals(existing.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase) - && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) - { - await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); - continue; - } - - pendingDocuments.Add(result.Document.Id); - if (listing.Published > maxPublished) - { - maxPublished = listing.Published; - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithLastPublished(maxPublished == DateTimeOffset.MinValue ? cursor.LastPublished : maxPublished); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("CERT-In document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - if (!TryDeserializeListing(document.Metadata, out var listing)) - { - _logger.LogWarning("CERT-In metadata missing for {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to download raw CERT-In document {DocumentId}", document.Id); - throw; - } - - var dto = CertInDetailParser.Parse(listing, rawBytes); - var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "certin.v1", payload, _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var dtoJson = dtoRecord.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings - { - OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, - }); - - CertInAdvisoryDto dto; - try - { - dto = JsonSerializer.Deserialize(dtoJson, SerializerOptions) - ?? throw new InvalidOperationException("Deserialized CERT-In DTO is null."); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize CERT-In DTO for {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var advisory = MapAdvisory(dto, document, dtoRecord); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private Advisory MapAdvisory(CertInAdvisoryDto dto, DocumentRecord document, DtoRecord dtoRecord) - { - var fetchProvenance = new AdvisoryProvenance(SourceName, "document", document.Uri, document.FetchedAt); - var mappingProvenance = new AdvisoryProvenance(SourceName, "mapping", dto.AdvisoryId, dtoRecord.ValidatedAt); - - var aliases = new HashSet(StringComparer.OrdinalIgnoreCase) - { - dto.AdvisoryId, - }; - foreach (var cve in dto.CveIds) - { - aliases.Add(cve); - } - - var references = new List(); - try - { - references.Add(new AdvisoryReference( - dto.Link, - "advisory", - "cert-in", - null, - new AdvisoryProvenance(SourceName, "reference", dto.Link, dtoRecord.ValidatedAt))); - } - catch (ArgumentException) - { - _logger.LogWarning("Invalid CERT-In link {Link} for advisory {AdvisoryId}", dto.Link, dto.AdvisoryId); - } - - foreach (var cve in dto.CveIds) - { - var url = $"https://www.cve.org/CVERecord?id={cve}"; - try - { - references.Add(new AdvisoryReference( - url, - "advisory", - cve, - null, - new AdvisoryProvenance(SourceName, "reference", url, dtoRecord.ValidatedAt))); - } - catch (ArgumentException) - { - // ignore invalid urls - } - } - - foreach (var link in dto.ReferenceLinks) - { - try - { - references.Add(new AdvisoryReference( - link, - "reference", - null, - null, - new AdvisoryProvenance(SourceName, "reference", link, dtoRecord.ValidatedAt))); - } - catch (ArgumentException) - { - // ignore invalid urls - } - } - +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.CertIn.Configuration; +using StellaOps.Concelier.Connector.CertIn.Internal; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.CertIn; + +public sealed class CertInConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.General) + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly CertInClient _client; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly CertInOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public CertInConnector( + CertInClient client, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _client = client ?? throw new ArgumentNullException(nameof(client)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => CertInConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var windowStart = cursor.LastPublished.HasValue + ? cursor.LastPublished.Value - _options.WindowOverlap + : now - _options.WindowSize; + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; + + for (var page = 1; page <= _options.MaxPagesPerFetch; page++) + { + IReadOnlyList listings; + try + { + listings = await _client.GetListingsAsync(page, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "CERT-In listings fetch failed for page {Page}", page); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + if (listings.Count == 0) + { + break; + } + + foreach (var listing in listings.OrderByDescending(static item => item.Published)) + { + if (listing.Published < windowStart) + { + page = _options.MaxPagesPerFetch + 1; + break; + } + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["certin.advisoryId"] = listing.AdvisoryId, + ["certin.title"] = listing.Title, + ["certin.link"] = listing.DetailUri.ToString(), + ["certin.published"] = listing.Published.ToString("O") + }; + + if (!string.IsNullOrWhiteSpace(listing.Summary)) + { + metadata["certin.summary"] = listing.Summary!; + } + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, listing.DetailUri.ToString(), cancellationToken).ConfigureAwait(false); + + SourceFetchResult result; + try + { + result = await _fetchService.FetchAsync( + new SourceFetchRequest(CertInOptions.HttpClientName, SourceName, listing.DetailUri) + { + Metadata = metadata, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "CERT-In fetch failed for {Uri}", listing.DetailUri); + await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(3), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + if (existing is not null + && string.Equals(existing.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase) + && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) + { + await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); + continue; + } + + pendingDocuments.Add(result.Document.Id); + if (listing.Published > maxPublished) + { + maxPublished = listing.Published; + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithLastPublished(maxPublished == DateTimeOffset.MinValue ? cursor.LastPublished : maxPublished); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("CERT-In document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + if (!TryDeserializeListing(document.Metadata, out var listing)) + { + _logger.LogWarning("CERT-In metadata missing for {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download raw CERT-In document {DocumentId}", document.Id); + throw; + } + + var dto = CertInDetailParser.Parse(listing, rawBytes); + var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "certin.v1", payload, _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var dtoJson = dtoRecord.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings + { + OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, + }); + + CertInAdvisoryDto dto; + try + { + dto = JsonSerializer.Deserialize(dtoJson, SerializerOptions) + ?? throw new InvalidOperationException("Deserialized CERT-In DTO is null."); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize CERT-In DTO for {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var advisory = MapAdvisory(dto, document, dtoRecord); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private Advisory MapAdvisory(CertInAdvisoryDto dto, DocumentRecord document, DtoRecord dtoRecord) + { + var fetchProvenance = new AdvisoryProvenance(SourceName, "document", document.Uri, document.FetchedAt); + var mappingProvenance = new AdvisoryProvenance(SourceName, "mapping", dto.AdvisoryId, dtoRecord.ValidatedAt); + + var aliases = new HashSet(StringComparer.OrdinalIgnoreCase) + { + dto.AdvisoryId, + }; + foreach (var cve in dto.CveIds) + { + aliases.Add(cve); + } + + var references = new List(); + try + { + references.Add(new AdvisoryReference( + dto.Link, + "advisory", + "cert-in", + null, + new AdvisoryProvenance(SourceName, "reference", dto.Link, dtoRecord.ValidatedAt))); + } + catch (ArgumentException) + { + _logger.LogWarning("Invalid CERT-In link {Link} for advisory {AdvisoryId}", dto.Link, dto.AdvisoryId); + } + + foreach (var cve in dto.CveIds) + { + var url = $"https://www.cve.org/CVERecord?id={cve}"; + try + { + references.Add(new AdvisoryReference( + url, + "advisory", + cve, + null, + new AdvisoryProvenance(SourceName, "reference", url, dtoRecord.ValidatedAt))); + } + catch (ArgumentException) + { + // ignore invalid urls + } + } + + foreach (var link in dto.ReferenceLinks) + { + try + { + references.Add(new AdvisoryReference( + link, + "reference", + null, + null, + new AdvisoryProvenance(SourceName, "reference", link, dtoRecord.ValidatedAt))); + } + catch (ArgumentException) + { + // ignore invalid urls + } + } + var affectedPackages = dto.VendorNames.Select(vendor => { var provenance = new AdvisoryProvenance(SourceName, "affected", vendor, dtoRecord.ValidatedAt); @@ -398,65 +398,65 @@ public sealed class CertInConnector : IFeedConnector provenance: new[] { provenance }); }) .ToArray(); - - return new Advisory( - dto.AdvisoryId, - dto.Title, - dto.Summary ?? dto.Content, - language: "en", - published: dto.Published, - modified: dto.Published, - severity: dto.Severity, - exploitKnown: false, - aliases: aliases, - references: references, - affectedPackages: affectedPackages, - cvssMetrics: Array.Empty(), - provenance: new[] { fetchProvenance, mappingProvenance }); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? CertInCursor.Empty : CertInCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(CertInCursor cursor, CancellationToken cancellationToken) - { - return _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken); - } - - private static bool TryDeserializeListing(IReadOnlyDictionary? metadata, out CertInListingItem listing) - { - listing = null!; - if (metadata is null) - { - return false; - } - - if (!metadata.TryGetValue("certin.advisoryId", out var advisoryId)) - { - return false; - } - - if (!metadata.TryGetValue("certin.title", out var title)) - { - return false; - } - - if (!metadata.TryGetValue("certin.link", out var link) || !Uri.TryCreate(link, UriKind.Absolute, out var detailUri)) - { - return false; - } - - if (!metadata.TryGetValue("certin.published", out var publishedText) || !DateTimeOffset.TryParse(publishedText, out var published)) - { - return false; - } - - metadata.TryGetValue("certin.summary", out var summary); - - listing = new CertInListingItem(advisoryId, title, detailUri, published.ToUniversalTime(), summary); - return true; - } -} + + return new Advisory( + dto.AdvisoryId, + dto.Title, + dto.Summary ?? dto.Content, + language: "en", + published: dto.Published, + modified: dto.Published, + severity: dto.Severity, + exploitKnown: false, + aliases: aliases, + references: references, + affectedPackages: affectedPackages, + cvssMetrics: Array.Empty(), + provenance: new[] { fetchProvenance, mappingProvenance }); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? CertInCursor.Empty : CertInCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(CertInCursor cursor, CancellationToken cancellationToken) + { + return _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken); + } + + private static bool TryDeserializeListing(IReadOnlyDictionary? metadata, out CertInListingItem listing) + { + listing = null!; + if (metadata is null) + { + return false; + } + + if (!metadata.TryGetValue("certin.advisoryId", out var advisoryId)) + { + return false; + } + + if (!metadata.TryGetValue("certin.title", out var title)) + { + return false; + } + + if (!metadata.TryGetValue("certin.link", out var link) || !Uri.TryCreate(link, UriKind.Absolute, out var detailUri)) + { + return false; + } + + if (!metadata.TryGetValue("certin.published", out var publishedText) || !DateTimeOffset.TryParse(publishedText, out var published)) + { + return false; + } + + metadata.TryGetValue("certin.summary", out var summary); + + listing = new CertInListingItem(advisoryId, title, detailUri, published.ToUniversalTime(), summary); + return true; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/Http/ServiceCollectionExtensions.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/Http/ServiceCollectionExtensions.cs index 7afc7a5f5..240d74e69 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/Http/ServiceCollectionExtensions.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/Http/ServiceCollectionExtensions.cs @@ -3,6 +3,7 @@ using System.Net.Http; using System.Net.Security; using System.Security.Cryptography.X509Certificates; using Microsoft.Extensions.DependencyInjection; +using Microsoft.Extensions.DependencyInjection.Extensions; using Microsoft.Extensions.Options; using StellaOps.Concelier.Connector.Common.Xml; using StellaOps.Concelier.Core.Aoc; @@ -169,7 +170,7 @@ public static class ServiceCollectionExtensions services.AddSingleton(); services.AddConcelierAocGuards(); services.AddConcelierLinksetMappers(); - services.AddSingleton(); + services.TryAddSingleton(); services.AddSingleton(); services.AddSingleton(); diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/State/SourceStateSeedProcessor.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/State/SourceStateSeedProcessor.cs index 03be35904..a034252f4 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/State/SourceStateSeedProcessor.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Common/State/SourceStateSeedProcessor.cs @@ -5,16 +5,16 @@ using StellaOps.Concelier.Connector.Common.Fetch; using StellaOps.Concelier.Storage.Mongo; using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.Common.State; - -/// -/// Persists raw documents and cursor state for connectors that require manual seeding. -/// -public sealed class SourceStateSeedProcessor -{ - private readonly IDocumentStore _documentStore; - private readonly RawDocumentStorage _rawDocumentStorage; + +namespace StellaOps.Concelier.Connector.Common.State; + +/// +/// Persists raw documents and cursor state for connectors that require manual seeding. +/// +public sealed class SourceStateSeedProcessor +{ + private readonly IDocumentStore _documentStore; + private readonly RawDocumentStorage _rawDocumentStorage; private readonly ISourceStateRepository _stateRepository; private readonly TimeProvider _timeProvider; private readonly ILogger _logger; @@ -35,298 +35,298 @@ public sealed class SourceStateSeedProcessor _timeProvider = timeProvider ?? TimeProvider.System; _logger = logger ?? NullLogger.Instance; } - - public async Task ProcessAsync(SourceStateSeedSpecification specification, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(specification); - ArgumentException.ThrowIfNullOrEmpty(specification.Source); - - var completedAt = specification.CompletedAt ?? _timeProvider.GetUtcNow(); - var documentIds = new List(); - var pendingDocumentIds = new HashSet(); - var pendingMappingIds = new HashSet(); - var knownAdvisories = new HashSet(StringComparer.OrdinalIgnoreCase); - - AppendRange(knownAdvisories, specification.KnownAdvisories); - - if (specification.Cursor is { } cursorSeed) - { - AppendRange(pendingDocumentIds, cursorSeed.PendingDocuments); - AppendRange(pendingMappingIds, cursorSeed.PendingMappings); - AppendRange(knownAdvisories, cursorSeed.KnownAdvisories); - } - - foreach (var document in specification.Documents ?? Array.Empty()) - { - cancellationToken.ThrowIfCancellationRequested(); - await ProcessDocumentAsync(specification.Source, document, completedAt, documentIds, pendingDocumentIds, pendingMappingIds, knownAdvisories, cancellationToken).ConfigureAwait(false); - } - - var state = await _stateRepository.TryGetAsync(specification.Source, cancellationToken).ConfigureAwait(false); - var cursor = state?.Cursor ?? new BsonDocument(); - - var newlyPendingDocuments = MergeGuidArray(cursor, "pendingDocuments", pendingDocumentIds); - var newlyPendingMappings = MergeGuidArray(cursor, "pendingMappings", pendingMappingIds); - var newlyKnownAdvisories = MergeStringArray(cursor, "knownAdvisories", knownAdvisories); - - if (specification.Cursor is { } cursorSpec) - { - if (cursorSpec.LastModifiedCursor.HasValue) - { - cursor["lastModifiedCursor"] = cursorSpec.LastModifiedCursor.Value.UtcDateTime; - } - - if (cursorSpec.LastFetchAt.HasValue) - { - cursor["lastFetchAt"] = cursorSpec.LastFetchAt.Value.UtcDateTime; - } - - if (cursorSpec.Additional is not null) - { - foreach (var kvp in cursorSpec.Additional) - { - cursor[kvp.Key] = kvp.Value; - } - } - } - - cursor["lastSeededAt"] = completedAt.UtcDateTime; - await _stateRepository.UpdateCursorAsync(specification.Source, cursor, completedAt, cancellationToken).ConfigureAwait(false); - - _logger.LogInformation( - "Seeded {Documents} document(s) for {Source}. pendingDocuments+= {PendingDocuments}, pendingMappings+= {PendingMappings}, knownAdvisories+= {KnownAdvisories}", - documentIds.Count, - specification.Source, - newlyPendingDocuments.Count, - newlyPendingMappings.Count, - newlyKnownAdvisories.Count); - - return new SourceStateSeedResult( - DocumentsProcessed: documentIds.Count, - PendingDocumentsAdded: newlyPendingDocuments.Count, - PendingMappingsAdded: newlyPendingMappings.Count, - DocumentIds: documentIds.AsReadOnly(), - PendingDocumentIds: newlyPendingDocuments, - PendingMappingIds: newlyPendingMappings, - KnownAdvisoriesAdded: newlyKnownAdvisories, - CompletedAt: completedAt); - } - - private async Task ProcessDocumentAsync( - string source, - SourceStateSeedDocument document, - DateTimeOffset completedAt, - List documentIds, - HashSet pendingDocumentIds, - HashSet pendingMappingIds, - HashSet knownAdvisories, - CancellationToken cancellationToken) - { - if (document is null) - { - throw new ArgumentNullException(nameof(document)); - } - - ArgumentException.ThrowIfNullOrEmpty(document.Uri); - if (document.Content is not { Length: > 0 }) - { - throw new InvalidOperationException($"Seed entry for '{document.Uri}' is missing content bytes."); - } - - var payload = new byte[document.Content.Length]; - Buffer.BlockCopy(document.Content, 0, payload, 0, document.Content.Length); - - if (!document.Uri.Contains("://", StringComparison.Ordinal)) - { - _logger.LogWarning("Seed document URI '{Uri}' does not appear to be absolute.", document.Uri); - } - + + public async Task ProcessAsync(SourceStateSeedSpecification specification, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(specification); + ArgumentException.ThrowIfNullOrEmpty(specification.Source); + + var completedAt = specification.CompletedAt ?? _timeProvider.GetUtcNow(); + var documentIds = new List(); + var pendingDocumentIds = new HashSet(); + var pendingMappingIds = new HashSet(); + var knownAdvisories = new HashSet(StringComparer.OrdinalIgnoreCase); + + AppendRange(knownAdvisories, specification.KnownAdvisories); + + if (specification.Cursor is { } cursorSeed) + { + AppendRange(pendingDocumentIds, cursorSeed.PendingDocuments); + AppendRange(pendingMappingIds, cursorSeed.PendingMappings); + AppendRange(knownAdvisories, cursorSeed.KnownAdvisories); + } + + foreach (var document in specification.Documents ?? Array.Empty()) + { + cancellationToken.ThrowIfCancellationRequested(); + await ProcessDocumentAsync(specification.Source, document, completedAt, documentIds, pendingDocumentIds, pendingMappingIds, knownAdvisories, cancellationToken).ConfigureAwait(false); + } + + var state = await _stateRepository.TryGetAsync(specification.Source, cancellationToken).ConfigureAwait(false); + var cursor = state?.Cursor ?? new BsonDocument(); + + var newlyPendingDocuments = MergeGuidArray(cursor, "pendingDocuments", pendingDocumentIds); + var newlyPendingMappings = MergeGuidArray(cursor, "pendingMappings", pendingMappingIds); + var newlyKnownAdvisories = MergeStringArray(cursor, "knownAdvisories", knownAdvisories); + + if (specification.Cursor is { } cursorSpec) + { + if (cursorSpec.LastModifiedCursor.HasValue) + { + cursor["lastModifiedCursor"] = cursorSpec.LastModifiedCursor.Value.UtcDateTime; + } + + if (cursorSpec.LastFetchAt.HasValue) + { + cursor["lastFetchAt"] = cursorSpec.LastFetchAt.Value.UtcDateTime; + } + + if (cursorSpec.Additional is not null) + { + foreach (var kvp in cursorSpec.Additional) + { + cursor[kvp.Key] = kvp.Value; + } + } + } + + cursor["lastSeededAt"] = completedAt.UtcDateTime; + await _stateRepository.UpdateCursorAsync(specification.Source, cursor, completedAt, cancellationToken).ConfigureAwait(false); + + _logger.LogInformation( + "Seeded {Documents} document(s) for {Source}. pendingDocuments+= {PendingDocuments}, pendingMappings+= {PendingMappings}, knownAdvisories+= {KnownAdvisories}", + documentIds.Count, + specification.Source, + newlyPendingDocuments.Count, + newlyPendingMappings.Count, + newlyKnownAdvisories.Count); + + return new SourceStateSeedResult( + DocumentsProcessed: documentIds.Count, + PendingDocumentsAdded: newlyPendingDocuments.Count, + PendingMappingsAdded: newlyPendingMappings.Count, + DocumentIds: documentIds.AsReadOnly(), + PendingDocumentIds: newlyPendingDocuments, + PendingMappingIds: newlyPendingMappings, + KnownAdvisoriesAdded: newlyKnownAdvisories, + CompletedAt: completedAt); + } + + private async Task ProcessDocumentAsync( + string source, + SourceStateSeedDocument document, + DateTimeOffset completedAt, + List documentIds, + HashSet pendingDocumentIds, + HashSet pendingMappingIds, + HashSet knownAdvisories, + CancellationToken cancellationToken) + { + if (document is null) + { + throw new ArgumentNullException(nameof(document)); + } + + ArgumentException.ThrowIfNullOrEmpty(document.Uri); + if (document.Content is not { Length: > 0 }) + { + throw new InvalidOperationException($"Seed entry for '{document.Uri}' is missing content bytes."); + } + + var payload = new byte[document.Content.Length]; + Buffer.BlockCopy(document.Content, 0, payload, 0, document.Content.Length); + + if (!document.Uri.Contains("://", StringComparison.Ordinal)) + { + _logger.LogWarning("Seed document URI '{Uri}' does not appear to be absolute.", document.Uri); + } + var contentHash = _hash.ComputeHashHex(payload, HashAlgorithms.Sha256); - - var existing = await _documentStore.FindBySourceAndUriAsync(source, document.Uri, cancellationToken).ConfigureAwait(false); - - if (existing?.GridFsId is { } oldGridId) - { - await _rawDocumentStorage.DeleteAsync(oldGridId, cancellationToken).ConfigureAwait(false); - } - - var gridId = await _rawDocumentStorage.UploadAsync( - source, - document.Uri, - payload, - document.ContentType, - document.ExpiresAt, - cancellationToken) - .ConfigureAwait(false); - - var headers = CloneDictionary(document.Headers); - if (!string.IsNullOrWhiteSpace(document.ContentType)) - { - headers ??= new Dictionary(StringComparer.OrdinalIgnoreCase); - if (!headers.ContainsKey("content-type")) - { - headers["content-type"] = document.ContentType!; - } - } - - var metadata = CloneDictionary(document.Metadata); - + + var existing = await _documentStore.FindBySourceAndUriAsync(source, document.Uri, cancellationToken).ConfigureAwait(false); + + if (existing?.PayloadId is { } oldGridId) + { + await _rawDocumentStorage.DeleteAsync(oldGridId, cancellationToken).ConfigureAwait(false); + } + + var gridId = await _rawDocumentStorage.UploadAsync( + source, + document.Uri, + payload, + document.ContentType, + document.ExpiresAt, + cancellationToken) + .ConfigureAwait(false); + + var headers = CloneDictionary(document.Headers); + if (!string.IsNullOrWhiteSpace(document.ContentType)) + { + headers ??= new Dictionary(StringComparer.OrdinalIgnoreCase); + if (!headers.ContainsKey("content-type")) + { + headers["content-type"] = document.ContentType!; + } + } + + var metadata = CloneDictionary(document.Metadata); + var record = new DocumentRecord( document.DocumentId ?? existing?.Id ?? Guid.NewGuid(), source, document.Uri, document.FetchedAt ?? completedAt, contentHash, - string.IsNullOrWhiteSpace(document.Status) ? DocumentStatuses.PendingParse : document.Status, - document.ContentType, - headers, - metadata, - document.Etag, - document.LastModified, - gridId, + string.IsNullOrWhiteSpace(document.Status) ? DocumentStatuses.PendingParse : document.Status, + document.ContentType, + headers, + metadata, + document.Etag, + document.LastModified, + gridId, document.ExpiresAt); var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - - documentIds.Add(upserted.Id); - - if (document.AddToPendingDocuments) - { - pendingDocumentIds.Add(upserted.Id); - } - - if (document.AddToPendingMappings) - { - pendingMappingIds.Add(upserted.Id); - } - - AppendRange(knownAdvisories, document.KnownIdentifiers); - } - - private static Dictionary? CloneDictionary(IReadOnlyDictionary? values) - { - if (values is null || values.Count == 0) - { - return null; - } - - return new Dictionary(values, StringComparer.OrdinalIgnoreCase); - } - - private static IReadOnlyCollection MergeGuidArray(BsonDocument cursor, string field, IReadOnlyCollection additions) - { - if (additions.Count == 0) - { - return Array.Empty(); - } - - var existing = cursor.TryGetValue(field, out var value) && value is BsonArray existingArray - ? existingArray.Select(AsGuid).Where(static g => g != Guid.Empty).ToHashSet() - : new HashSet(); - - var newlyAdded = new List(); - foreach (var guid in additions) - { - if (guid == Guid.Empty) - { - continue; - } - - if (existing.Add(guid)) - { - newlyAdded.Add(guid); - } - } - - if (existing.Count > 0) - { - cursor[field] = new BsonArray(existing - .Select(static g => g.ToString("D")) - .OrderBy(static s => s, StringComparer.OrdinalIgnoreCase)); - } - - return newlyAdded.AsReadOnly(); - } - - private static IReadOnlyCollection MergeStringArray(BsonDocument cursor, string field, IReadOnlyCollection additions) - { - if (additions.Count == 0) - { - return Array.Empty(); - } - - var existing = cursor.TryGetValue(field, out var value) && value is BsonArray existingArray - ? existingArray.Select(static v => v?.AsString ?? string.Empty) - .Where(static s => !string.IsNullOrWhiteSpace(s)) - .ToHashSet(StringComparer.OrdinalIgnoreCase) - : new HashSet(StringComparer.OrdinalIgnoreCase); - - var newlyAdded = new List(); - foreach (var entry in additions) - { - if (string.IsNullOrWhiteSpace(entry)) - { - continue; - } - - var normalized = entry.Trim(); - if (existing.Add(normalized)) - { - newlyAdded.Add(normalized); - } - } - - if (existing.Count > 0) - { - cursor[field] = new BsonArray(existing - .OrderBy(static s => s, StringComparer.OrdinalIgnoreCase)); - } - - return newlyAdded.AsReadOnly(); - } - - private static Guid AsGuid(BsonValue value) - { - if (value is null) - { - return Guid.Empty; - } - - return Guid.TryParse(value.ToString(), out var parsed) ? parsed : Guid.Empty; - } - - private static void AppendRange(HashSet target, IReadOnlyCollection? values) - { - if (values is null) - { - return; - } - - foreach (var guid in values) - { - if (guid != Guid.Empty) - { - target.Add(guid); - } - } - } - - private static void AppendRange(HashSet target, IReadOnlyCollection? values) - { - if (values is null) - { - return; - } - - foreach (var value in values) - { - if (string.IsNullOrWhiteSpace(value)) - { - continue; - } - - target.Add(value.Trim()); - } - } - -} + + documentIds.Add(upserted.Id); + + if (document.AddToPendingDocuments) + { + pendingDocumentIds.Add(upserted.Id); + } + + if (document.AddToPendingMappings) + { + pendingMappingIds.Add(upserted.Id); + } + + AppendRange(knownAdvisories, document.KnownIdentifiers); + } + + private static Dictionary? CloneDictionary(IReadOnlyDictionary? values) + { + if (values is null || values.Count == 0) + { + return null; + } + + return new Dictionary(values, StringComparer.OrdinalIgnoreCase); + } + + private static IReadOnlyCollection MergeGuidArray(BsonDocument cursor, string field, IReadOnlyCollection additions) + { + if (additions.Count == 0) + { + return Array.Empty(); + } + + var existing = cursor.TryGetValue(field, out var value) && value is BsonArray existingArray + ? existingArray.Select(AsGuid).Where(static g => g != Guid.Empty).ToHashSet() + : new HashSet(); + + var newlyAdded = new List(); + foreach (var guid in additions) + { + if (guid == Guid.Empty) + { + continue; + } + + if (existing.Add(guid)) + { + newlyAdded.Add(guid); + } + } + + if (existing.Count > 0) + { + cursor[field] = new BsonArray(existing + .Select(static g => g.ToString("D")) + .OrderBy(static s => s, StringComparer.OrdinalIgnoreCase)); + } + + return newlyAdded.AsReadOnly(); + } + + private static IReadOnlyCollection MergeStringArray(BsonDocument cursor, string field, IReadOnlyCollection additions) + { + if (additions.Count == 0) + { + return Array.Empty(); + } + + var existing = cursor.TryGetValue(field, out var value) && value is BsonArray existingArray + ? existingArray.Select(static v => v?.AsString ?? string.Empty) + .Where(static s => !string.IsNullOrWhiteSpace(s)) + .ToHashSet(StringComparer.OrdinalIgnoreCase) + : new HashSet(StringComparer.OrdinalIgnoreCase); + + var newlyAdded = new List(); + foreach (var entry in additions) + { + if (string.IsNullOrWhiteSpace(entry)) + { + continue; + } + + var normalized = entry.Trim(); + if (existing.Add(normalized)) + { + newlyAdded.Add(normalized); + } + } + + if (existing.Count > 0) + { + cursor[field] = new BsonArray(existing + .OrderBy(static s => s, StringComparer.OrdinalIgnoreCase)); + } + + return newlyAdded.AsReadOnly(); + } + + private static Guid AsGuid(BsonValue value) + { + if (value is null) + { + return Guid.Empty; + } + + return Guid.TryParse(value.ToString(), out var parsed) ? parsed : Guid.Empty; + } + + private static void AppendRange(HashSet target, IReadOnlyCollection? values) + { + if (values is null) + { + return; + } + + foreach (var guid in values) + { + if (guid != Guid.Empty) + { + target.Add(guid); + } + } + } + + private static void AppendRange(HashSet target, IReadOnlyCollection? values) + { + if (values is null) + { + return; + } + + foreach (var value in values) + { + if (string.IsNullOrWhiteSpace(value)) + { + continue; + } + + target.Add(value.Trim()); + } + } + +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cve/CveConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cve/CveConnector.cs index 8948a3794..152cf029d 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cve/CveConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Cve/CveConnector.cs @@ -1,609 +1,609 @@ -using System.Collections.Generic; -using System.Globalization; -using System.IO; -using System.Linq; -using System.Net; -using System.Net.Http; -using System.Text.Json; -using System.Security.Cryptography; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Normalization.Text; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Cve.Configuration; -using StellaOps.Concelier.Connector.Cve.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Cve; - -public sealed class CveConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - PropertyNameCaseInsensitive = true, - WriteIndented = false, - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly CveOptions _options; - private readonly CveDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public CveConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - CveDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => CveConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var now = _timeProvider.GetUtcNow(); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - - if (!_options.HasCredentials()) - { - if (await TrySeedFromDirectoryAsync(cursor, now, cancellationToken).ConfigureAwait(false)) - { - return; - } - - _logger.LogWarning("CVEs fetch skipped: no credentials configured and no seed data found at {SeedDirectory}.", _options.SeedDirectory ?? "(seed directory not configured)"); - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var initialPendingDocuments = pendingDocuments.Count; - var initialPendingMappings = pendingMappings.Count; - var documentsFetched = 0; - var detailFailures = 0; - var detailUnchanged = 0; - var listSuccessCount = 0; - var listUnchangedCount = 0; - - var since = cursor.CurrentWindowStart ?? cursor.LastModifiedExclusive ?? now - _options.InitialBackfill; - if (since > now) - { - since = now; - } - - var windowEnd = cursor.CurrentWindowEnd ?? now; - if (windowEnd <= since) - { - windowEnd = since + TimeSpan.FromMinutes(1); - } - - var page = cursor.NextPage <= 0 ? 1 : cursor.NextPage; - var pagesFetched = 0; - var hasMorePages = true; - DateTimeOffset? maxModified = cursor.LastModifiedExclusive; - - while (hasMorePages && pagesFetched < _options.MaxPagesPerFetch) - { - cancellationToken.ThrowIfCancellationRequested(); - - var requestUri = BuildListRequestUri(since, windowEnd, page, _options.PageSize); - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["since"] = since.ToString("O"), - ["until"] = windowEnd.ToString("O"), - ["page"] = page.ToString(CultureInfo.InvariantCulture), - ["pageSize"] = _options.PageSize.ToString(CultureInfo.InvariantCulture), - }; - - SourceFetchContentResult listResult; - try - { - _diagnostics.FetchAttempt(); - listResult = await _fetchService.FetchContentAsync( - new SourceFetchRequest( - CveOptions.HttpClientName, - SourceName, - requestUri) - { - Metadata = metadata, - AcceptHeaders = new[] { "application/json" }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (HttpRequestException ex) when (IsAuthenticationFailure(ex)) - { - _logger.LogWarning("CVEs fetch requires API credentials ({StatusCode}); falling back to seed data if available.", ex.StatusCode); - if (await TrySeedFromDirectoryAsync(cursor, now, cancellationToken).ConfigureAwait(false)) - { - return; - } - - _logger.LogWarning("CVEs fetch aborted: no seed data available (SeedDirectory={SeedDirectory}).", _options.SeedDirectory ?? "(seed directory not configured)"); - return; - } - catch (HttpRequestException ex) - { - _diagnostics.FetchFailure(); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (listResult.IsNotModified) - { - _diagnostics.FetchUnchanged(); - listUnchangedCount++; - break; - } - - if (!listResult.IsSuccess || listResult.Content is null) - { - _diagnostics.FetchFailure(); - break; - } - - _diagnostics.FetchSuccess(); - listSuccessCount++; - - var pageModel = CveListParser.Parse(listResult.Content, page, _options.PageSize); - - if (pageModel.Items.Count == 0) - { - hasMorePages = false; - } - - foreach (var item in pageModel.Items) - { - cancellationToken.ThrowIfCancellationRequested(); - - var detailUri = BuildDetailRequestUri(item.CveId); - var detailMetadata = new Dictionary(StringComparer.Ordinal) - { - ["cveId"] = item.CveId, - ["page"] = page.ToString(CultureInfo.InvariantCulture), - ["since"] = since.ToString("O"), - ["until"] = windowEnd.ToString("O"), - }; - - SourceFetchResult detailResult; - try - { - detailResult = await _fetchService.FetchAsync( - new SourceFetchRequest( - CveOptions.HttpClientName, - SourceName, - detailUri) - { - Metadata = detailMetadata, - AcceptHeaders = new[] { "application/json" }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (HttpRequestException ex) when (IsAuthenticationFailure(ex)) - { - _diagnostics.FetchFailure(); - _logger.LogWarning(ex, "Failed fetching CVE record {CveId} due to authentication. Seeding if possible.", item.CveId); - if (await TrySeedFromDirectoryAsync(cursor, now, cancellationToken).ConfigureAwait(false)) - { - return; - } - - _logger.LogWarning("CVE record {CveId} skipped; missing credentials and no seed data available.", item.CveId); - continue; - } - catch (HttpRequestException ex) - { - _diagnostics.FetchFailure(); - _logger.LogWarning(ex, "Failed fetching CVE record {CveId}", item.CveId); - continue; - } - - if (detailResult.IsNotModified) - { - _diagnostics.FetchUnchanged(); - detailUnchanged++; - continue; - } - - if (!detailResult.IsSuccess || detailResult.Document is null) - { - _diagnostics.FetchFailure(); - detailFailures++; - continue; - } - - _diagnostics.FetchDocument(); - if (pendingDocuments.Add(detailResult.Document.Id)) - { - documentsFetched++; - } - pendingMappings.Add(detailResult.Document.Id); - } - - if (pageModel.MaxModified.HasValue) - { - if (!maxModified.HasValue || pageModel.MaxModified > maxModified) - { - maxModified = pageModel.MaxModified; - } - } - - hasMorePages = pageModel.HasMorePages; - page = pageModel.NextPageCandidate; - pagesFetched++; - - if (hasMorePages && _options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - if (hasMorePages) - { - updatedCursor = updatedCursor - .WithCurrentWindowStart(since) - .WithCurrentWindowEnd(windowEnd) - .WithNextPage(page); - } - else - { - var nextSince = maxModified ?? windowEnd; - updatedCursor = updatedCursor - .WithLastModifiedExclusive(nextSince) - .WithCurrentWindowStart(null) - .WithCurrentWindowEnd(null) - .WithNextPage(1); - } - - var nextWindowStart = hasMorePages ? since : maxModified ?? windowEnd; - DateTimeOffset? nextWindowEnd = hasMorePages ? windowEnd : null; - var nextPage = hasMorePages ? page : 1; - var windowStartString = since.ToString("O"); - var windowEndString = windowEnd.ToString("O"); - var nextWindowStartString = nextWindowStart.ToString("O"); - var nextWindowEndString = nextWindowEnd?.ToString("O") ?? "(none)"; - - _logger.LogInformation( - "CVEs fetch window {WindowStart}->{WindowEnd} pages={PagesFetched} listSuccess={ListSuccess} detailDocuments={DocumentsFetched} detailFailures={DetailFailures} detailUnchanged={DetailUnchanged} pendingDocuments={PendingDocumentsBefore}->{PendingDocumentsAfter} pendingMappings={PendingMappingsBefore}->{PendingMappingsAfter} hasMorePages={HasMorePages} nextWindowStart={NextWindowStart} nextWindowEnd={NextWindowEnd} nextPage={NextPage}", - windowStartString, - windowEndString, - pagesFetched, - listSuccessCount, - documentsFetched, - detailFailures, - detailUnchanged, - initialPendingDocuments, - pendingDocuments.Count, - initialPendingMappings, - pendingMappings.Count, - hasMorePages, - nextWindowStartString, - nextWindowEndString, - nextPage); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure(); - _logger.LogWarning("CVEs document {DocumentId} missing GridFS content", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogError(ex, "Unable to download CVE raw document {DocumentId}", documentId); - throw; - } - - CveRecordDto dto; - try - { - dto = CveRecordParser.Parse(rawBytes); - } - catch (JsonException ex) - { - _diagnostics.ParseQuarantine(); - _logger.LogError(ex, "Malformed CVE JSON for {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - "cve/5.0", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - _diagnostics.ParseSuccess(); - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - _logger.LogWarning("Skipping CVE mapping for {DocumentId}: DTO or document missing", documentId); - pendingMappings.Remove(documentId); - continue; - } - - CveRecordDto dto; - try - { - dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions) - ?? throw new InvalidOperationException("Deserialized DTO was null."); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize CVE DTO for {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var recordedAt = dtoRecord.ValidatedAt; - var advisory = CveMapper.Map(dto, document, recordedAt); - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - _diagnostics.MapSuccess(1); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task TrySeedFromDirectoryAsync(CveCursor cursor, DateTimeOffset now, CancellationToken cancellationToken) - { - var seedDirectory = _options.SeedDirectory; - if (string.IsNullOrWhiteSpace(seedDirectory) || !Directory.Exists(seedDirectory)) - { - return false; - } - - var detailFiles = Directory.EnumerateFiles(seedDirectory, "CVE-*.json", SearchOption.AllDirectories) - .OrderBy(static path => path, StringComparer.OrdinalIgnoreCase) - .ToArray(); - - if (detailFiles.Length == 0) - { - return false; - } - - var seeded = 0; - DateTimeOffset? maxModified = cursor.LastModifiedExclusive; - - foreach (var file in detailFiles) - { - cancellationToken.ThrowIfCancellationRequested(); - - byte[] payload; - try - { - payload = await File.ReadAllBytesAsync(file, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Unable to read CVE seed file {File}", file); - continue; - } - - CveRecordDto dto; - try - { - dto = CveRecordParser.Parse(payload); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Seed file {File} did not contain a valid CVE record", file); - continue; - } - - if (string.IsNullOrWhiteSpace(dto.CveId)) - { - _logger.LogWarning("Seed file {File} missing CVE identifier", file); - continue; - } - - var uri = $"seed://{dto.CveId}"; - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, uri, cancellationToken).ConfigureAwait(false); - var documentId = existing?.Id ?? Guid.NewGuid(); - - var sha256 = Convert.ToHexString(SHA256.HashData(payload)).ToLowerInvariant(); - var lastModified = dto.Modified ?? dto.Published ?? now; - ObjectId gridId = ObjectId.Empty; - - try - { - if (existing?.GridFsId is ObjectId existingGrid && existingGrid != ObjectId.Empty) - { - gridId = existingGrid; - } - else - { - gridId = await _rawDocumentStorage.UploadAsync(SourceName, uri, payload, "application/json", cancellationToken).ConfigureAwait(false); - } - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Unable to store CVE seed payload for {CveId}", dto.CveId); - continue; - } - - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["seed.file"] = Path.GetFileName(file), - ["seed.directory"] = seedDirectory, - }; - - var document = new DocumentRecord( - documentId, - SourceName, - uri, - now, - sha256, - DocumentStatuses.Mapped, - "application/json", - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: lastModified, - GridFsId: gridId); - - await _documentStore.UpsertAsync(document, cancellationToken).ConfigureAwait(false); - - var advisory = CveMapper.Map(dto, document, now); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - - if (!maxModified.HasValue || lastModified > maxModified) - { - maxModified = lastModified; - } - - seeded++; - } - - if (seeded == 0) - { - return false; - } - - var updatedCursor = cursor - .WithPendingDocuments(Array.Empty()) - .WithPendingMappings(Array.Empty()) - .WithLastModifiedExclusive(maxModified ?? now) - .WithCurrentWindowStart(null) - .WithCurrentWindowEnd(null) - .WithNextPage(1); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - - _logger.LogWarning("Seeded {SeededCount} CVE advisories from {SeedDirectory}; live fetch will resume when credentials are configured.", seeded, seedDirectory); - return true; - } - - private static bool IsAuthenticationFailure(HttpRequestException exception) - => exception.StatusCode is HttpStatusCode.Unauthorized or HttpStatusCode.Forbidden; - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? CveCursor.Empty : CveCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(CveCursor cursor, CancellationToken cancellationToken) - { - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private static Uri BuildListRequestUri(DateTimeOffset since, DateTimeOffset until, int page, int pageSize) - { - var query = $"time_modified.gte={Uri.EscapeDataString(since.ToString("O"))}&time_modified.lte={Uri.EscapeDataString(until.ToString("O"))}&page={page}&size={pageSize}"; - return new Uri($"cve?{query}", UriKind.Relative); - } - - private static Uri BuildDetailRequestUri(string cveId) - { - var encoded = Uri.EscapeDataString(cveId); - return new Uri($"cve/{encoded}", UriKind.Relative); - } -} +using System.Collections.Generic; +using System.Globalization; +using System.IO; +using System.Linq; +using System.Net; +using System.Net.Http; +using System.Text.Json; +using System.Security.Cryptography; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Normalization.Text; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Cve.Configuration; +using StellaOps.Concelier.Connector.Cve.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Cve; + +public sealed class CveConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + PropertyNameCaseInsensitive = true, + WriteIndented = false, + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly CveOptions _options; + private readonly CveDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public CveConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + CveDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => CveConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var now = _timeProvider.GetUtcNow(); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + + if (!_options.HasCredentials()) + { + if (await TrySeedFromDirectoryAsync(cursor, now, cancellationToken).ConfigureAwait(false)) + { + return; + } + + _logger.LogWarning("CVEs fetch skipped: no credentials configured and no seed data found at {SeedDirectory}.", _options.SeedDirectory ?? "(seed directory not configured)"); + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var initialPendingDocuments = pendingDocuments.Count; + var initialPendingMappings = pendingMappings.Count; + var documentsFetched = 0; + var detailFailures = 0; + var detailUnchanged = 0; + var listSuccessCount = 0; + var listUnchangedCount = 0; + + var since = cursor.CurrentWindowStart ?? cursor.LastModifiedExclusive ?? now - _options.InitialBackfill; + if (since > now) + { + since = now; + } + + var windowEnd = cursor.CurrentWindowEnd ?? now; + if (windowEnd <= since) + { + windowEnd = since + TimeSpan.FromMinutes(1); + } + + var page = cursor.NextPage <= 0 ? 1 : cursor.NextPage; + var pagesFetched = 0; + var hasMorePages = true; + DateTimeOffset? maxModified = cursor.LastModifiedExclusive; + + while (hasMorePages && pagesFetched < _options.MaxPagesPerFetch) + { + cancellationToken.ThrowIfCancellationRequested(); + + var requestUri = BuildListRequestUri(since, windowEnd, page, _options.PageSize); + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["since"] = since.ToString("O"), + ["until"] = windowEnd.ToString("O"), + ["page"] = page.ToString(CultureInfo.InvariantCulture), + ["pageSize"] = _options.PageSize.ToString(CultureInfo.InvariantCulture), + }; + + SourceFetchContentResult listResult; + try + { + _diagnostics.FetchAttempt(); + listResult = await _fetchService.FetchContentAsync( + new SourceFetchRequest( + CveOptions.HttpClientName, + SourceName, + requestUri) + { + Metadata = metadata, + AcceptHeaders = new[] { "application/json" }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (HttpRequestException ex) when (IsAuthenticationFailure(ex)) + { + _logger.LogWarning("CVEs fetch requires API credentials ({StatusCode}); falling back to seed data if available.", ex.StatusCode); + if (await TrySeedFromDirectoryAsync(cursor, now, cancellationToken).ConfigureAwait(false)) + { + return; + } + + _logger.LogWarning("CVEs fetch aborted: no seed data available (SeedDirectory={SeedDirectory}).", _options.SeedDirectory ?? "(seed directory not configured)"); + return; + } + catch (HttpRequestException ex) + { + _diagnostics.FetchFailure(); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (listResult.IsNotModified) + { + _diagnostics.FetchUnchanged(); + listUnchangedCount++; + break; + } + + if (!listResult.IsSuccess || listResult.Content is null) + { + _diagnostics.FetchFailure(); + break; + } + + _diagnostics.FetchSuccess(); + listSuccessCount++; + + var pageModel = CveListParser.Parse(listResult.Content, page, _options.PageSize); + + if (pageModel.Items.Count == 0) + { + hasMorePages = false; + } + + foreach (var item in pageModel.Items) + { + cancellationToken.ThrowIfCancellationRequested(); + + var detailUri = BuildDetailRequestUri(item.CveId); + var detailMetadata = new Dictionary(StringComparer.Ordinal) + { + ["cveId"] = item.CveId, + ["page"] = page.ToString(CultureInfo.InvariantCulture), + ["since"] = since.ToString("O"), + ["until"] = windowEnd.ToString("O"), + }; + + SourceFetchResult detailResult; + try + { + detailResult = await _fetchService.FetchAsync( + new SourceFetchRequest( + CveOptions.HttpClientName, + SourceName, + detailUri) + { + Metadata = detailMetadata, + AcceptHeaders = new[] { "application/json" }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (HttpRequestException ex) when (IsAuthenticationFailure(ex)) + { + _diagnostics.FetchFailure(); + _logger.LogWarning(ex, "Failed fetching CVE record {CveId} due to authentication. Seeding if possible.", item.CveId); + if (await TrySeedFromDirectoryAsync(cursor, now, cancellationToken).ConfigureAwait(false)) + { + return; + } + + _logger.LogWarning("CVE record {CveId} skipped; missing credentials and no seed data available.", item.CveId); + continue; + } + catch (HttpRequestException ex) + { + _diagnostics.FetchFailure(); + _logger.LogWarning(ex, "Failed fetching CVE record {CveId}", item.CveId); + continue; + } + + if (detailResult.IsNotModified) + { + _diagnostics.FetchUnchanged(); + detailUnchanged++; + continue; + } + + if (!detailResult.IsSuccess || detailResult.Document is null) + { + _diagnostics.FetchFailure(); + detailFailures++; + continue; + } + + _diagnostics.FetchDocument(); + if (pendingDocuments.Add(detailResult.Document.Id)) + { + documentsFetched++; + } + pendingMappings.Add(detailResult.Document.Id); + } + + if (pageModel.MaxModified.HasValue) + { + if (!maxModified.HasValue || pageModel.MaxModified > maxModified) + { + maxModified = pageModel.MaxModified; + } + } + + hasMorePages = pageModel.HasMorePages; + page = pageModel.NextPageCandidate; + pagesFetched++; + + if (hasMorePages && _options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + if (hasMorePages) + { + updatedCursor = updatedCursor + .WithCurrentWindowStart(since) + .WithCurrentWindowEnd(windowEnd) + .WithNextPage(page); + } + else + { + var nextSince = maxModified ?? windowEnd; + updatedCursor = updatedCursor + .WithLastModifiedExclusive(nextSince) + .WithCurrentWindowStart(null) + .WithCurrentWindowEnd(null) + .WithNextPage(1); + } + + var nextWindowStart = hasMorePages ? since : maxModified ?? windowEnd; + DateTimeOffset? nextWindowEnd = hasMorePages ? windowEnd : null; + var nextPage = hasMorePages ? page : 1; + var windowStartString = since.ToString("O"); + var windowEndString = windowEnd.ToString("O"); + var nextWindowStartString = nextWindowStart.ToString("O"); + var nextWindowEndString = nextWindowEnd?.ToString("O") ?? "(none)"; + + _logger.LogInformation( + "CVEs fetch window {WindowStart}->{WindowEnd} pages={PagesFetched} listSuccess={ListSuccess} detailDocuments={DocumentsFetched} detailFailures={DetailFailures} detailUnchanged={DetailUnchanged} pendingDocuments={PendingDocumentsBefore}->{PendingDocumentsAfter} pendingMappings={PendingMappingsBefore}->{PendingMappingsAfter} hasMorePages={HasMorePages} nextWindowStart={NextWindowStart} nextWindowEnd={NextWindowEnd} nextPage={NextPage}", + windowStartString, + windowEndString, + pagesFetched, + listSuccessCount, + documentsFetched, + detailFailures, + detailUnchanged, + initialPendingDocuments, + pendingDocuments.Count, + initialPendingMappings, + pendingMappings.Count, + hasMorePages, + nextWindowStartString, + nextWindowEndString, + nextPage); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure(); + _logger.LogWarning("CVEs document {DocumentId} missing GridFS content", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogError(ex, "Unable to download CVE raw document {DocumentId}", documentId); + throw; + } + + CveRecordDto dto; + try + { + dto = CveRecordParser.Parse(rawBytes); + } + catch (JsonException ex) + { + _diagnostics.ParseQuarantine(); + _logger.LogError(ex, "Malformed CVE JSON for {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + "cve/5.0", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + _diagnostics.ParseSuccess(); + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + _logger.LogWarning("Skipping CVE mapping for {DocumentId}: DTO or document missing", documentId); + pendingMappings.Remove(documentId); + continue; + } + + CveRecordDto dto; + try + { + dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions) + ?? throw new InvalidOperationException("Deserialized DTO was null."); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize CVE DTO for {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var recordedAt = dtoRecord.ValidatedAt; + var advisory = CveMapper.Map(dto, document, recordedAt); + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + _diagnostics.MapSuccess(1); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task TrySeedFromDirectoryAsync(CveCursor cursor, DateTimeOffset now, CancellationToken cancellationToken) + { + var seedDirectory = _options.SeedDirectory; + if (string.IsNullOrWhiteSpace(seedDirectory) || !Directory.Exists(seedDirectory)) + { + return false; + } + + var detailFiles = Directory.EnumerateFiles(seedDirectory, "CVE-*.json", SearchOption.AllDirectories) + .OrderBy(static path => path, StringComparer.OrdinalIgnoreCase) + .ToArray(); + + if (detailFiles.Length == 0) + { + return false; + } + + var seeded = 0; + DateTimeOffset? maxModified = cursor.LastModifiedExclusive; + + foreach (var file in detailFiles) + { + cancellationToken.ThrowIfCancellationRequested(); + + byte[] payload; + try + { + payload = await File.ReadAllBytesAsync(file, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Unable to read CVE seed file {File}", file); + continue; + } + + CveRecordDto dto; + try + { + dto = CveRecordParser.Parse(payload); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Seed file {File} did not contain a valid CVE record", file); + continue; + } + + if (string.IsNullOrWhiteSpace(dto.CveId)) + { + _logger.LogWarning("Seed file {File} missing CVE identifier", file); + continue; + } + + var uri = $"seed://{dto.CveId}"; + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, uri, cancellationToken).ConfigureAwait(false); + var documentId = existing?.Id ?? Guid.NewGuid(); + + var sha256 = Convert.ToHexString(SHA256.HashData(payload)).ToLowerInvariant(); + var lastModified = dto.Modified ?? dto.Published ?? now; + ObjectId gridId = ObjectId.Empty; + + try + { + if (existing?.PayloadId is ObjectId existingGrid && existingGrid != ObjectId.Empty) + { + gridId = existingGrid; + } + else + { + gridId = await _rawDocumentStorage.UploadAsync(SourceName, uri, payload, "application/json", cancellationToken).ConfigureAwait(false); + } + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Unable to store CVE seed payload for {CveId}", dto.CveId); + continue; + } + + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["seed.file"] = Path.GetFileName(file), + ["seed.directory"] = seedDirectory, + }; + + var document = new DocumentRecord( + documentId, + SourceName, + uri, + now, + sha256, + DocumentStatuses.Mapped, + "application/json", + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: lastModified, + PayloadId: gridId); + + await _documentStore.UpsertAsync(document, cancellationToken).ConfigureAwait(false); + + var advisory = CveMapper.Map(dto, document, now); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + + if (!maxModified.HasValue || lastModified > maxModified) + { + maxModified = lastModified; + } + + seeded++; + } + + if (seeded == 0) + { + return false; + } + + var updatedCursor = cursor + .WithPendingDocuments(Array.Empty()) + .WithPendingMappings(Array.Empty()) + .WithLastModifiedExclusive(maxModified ?? now) + .WithCurrentWindowStart(null) + .WithCurrentWindowEnd(null) + .WithNextPage(1); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + + _logger.LogWarning("Seeded {SeededCount} CVE advisories from {SeedDirectory}; live fetch will resume when credentials are configured.", seeded, seedDirectory); + return true; + } + + private static bool IsAuthenticationFailure(HttpRequestException exception) + => exception.StatusCode is HttpStatusCode.Unauthorized or HttpStatusCode.Forbidden; + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? CveCursor.Empty : CveCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(CveCursor cursor, CancellationToken cancellationToken) + { + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private static Uri BuildListRequestUri(DateTimeOffset since, DateTimeOffset until, int page, int pageSize) + { + var query = $"time_modified.gte={Uri.EscapeDataString(since.ToString("O"))}&time_modified.lte={Uri.EscapeDataString(until.ToString("O"))}&page={page}&size={pageSize}"; + return new Uri($"cve?{query}", UriKind.Relative); + } + + private static Uri BuildDetailRequestUri(string cveId) + { + var encoded = Uri.EscapeDataString(cveId); + return new Uri($"cve/{encoded}", UriKind.Relative); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Debian/DebianConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Debian/DebianConnector.cs index 3666fd92b..6fff821c0 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Debian/DebianConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Debian/DebianConnector.cs @@ -1,637 +1,637 @@ -using System; -using System.Collections.Generic; -using System.Globalization; -using System.Linq; -using System.Net; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Distro.Debian.Configuration; -using StellaOps.Concelier.Connector.Distro.Debian.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Distro.Debian; - -public sealed class DebianConnector : IFeedConnector -{ - private const string SchemaVersion = "debian.v1"; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly DebianOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - private static readonly Action LogMapped = - LoggerMessage.Define( - LogLevel.Information, - new EventId(1, "DebianMapped"), - "Debian advisory {AdvisoryId} mapped with {AffectedCount} packages"); - - public DebianConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => DebianConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var pendingDocuments = new HashSet(cursor.PendingDocuments); - var pendingMappings = new HashSet(cursor.PendingMappings); - var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); - var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); - - var listUri = _options.ListEndpoint; - var listKey = listUri.ToString(); - touchedResources.Add(listKey); - - var existingList = await _documentStore.FindBySourceAndUriAsync(SourceName, listKey, cancellationToken).ConfigureAwait(false); - cursor.TryGetCache(listKey, out var cachedListEntry); - - var listRequest = new SourceFetchRequest(DebianOptions.HttpClientName, SourceName, listUri) - { - Metadata = new Dictionary(StringComparer.Ordinal) - { - ["type"] = "index" - }, - AcceptHeaders = new[] { "text/plain", "text/plain; charset=utf-8" }, - TimeoutOverride = _options.FetchTimeout, - ETag = existingList?.Etag ?? cachedListEntry?.ETag, - LastModified = existingList?.LastModified ?? cachedListEntry?.LastModified, - }; - - SourceFetchResult listResult; - try - { - listResult = await _fetchService.FetchAsync(listRequest, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Debian list fetch failed"); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var lastPublished = cursor.LastPublished ?? (now - _options.InitialBackfill); - var processedIds = new HashSet(cursor.ProcessedAdvisoryIds, StringComparer.OrdinalIgnoreCase); - var newProcessedIds = new HashSet(StringComparer.OrdinalIgnoreCase); - var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; - var processedUpdated = false; - - if (listResult.IsNotModified) - { - if (existingList is not null) - { - fetchCache[listKey] = DebianFetchCacheEntry.FromDocument(existingList); - } - } - else if (listResult.IsSuccess && listResult.Document is not null) - { - fetchCache[listKey] = DebianFetchCacheEntry.FromDocument(listResult.Document); - - if (!listResult.Document.GridFsId.HasValue) - { - _logger.LogWarning("Debian list document {DocumentId} missing GridFS payload", listResult.Document.Id); - } - else - { - byte[] bytes; - try - { - bytes = await _rawDocumentStorage.DownloadAsync(listResult.Document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to download Debian list document {DocumentId}", listResult.Document.Id); - throw; - } - - var text = System.Text.Encoding.UTF8.GetString(bytes); - var entries = DebianListParser.Parse(text); - if (entries.Count > 0) - { - var windowStart = (cursor.LastPublished ?? (now - _options.InitialBackfill)) - _options.ResumeOverlap; - if (windowStart < DateTimeOffset.UnixEpoch) - { - windowStart = DateTimeOffset.UnixEpoch; - } - - ProvenanceDiagnostics.ReportResumeWindow(SourceName, windowStart, _logger); - - var candidates = entries - .Where(entry => entry.Published >= windowStart) - .OrderBy(entry => entry.Published) - .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) - .ToList(); - - if (candidates.Count == 0) - { - candidates = entries - .OrderByDescending(entry => entry.Published) - .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxAdvisoriesPerFetch) - .OrderBy(entry => entry.Published) - .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) - .ToList(); - } - else if (candidates.Count > _options.MaxAdvisoriesPerFetch) - { - candidates = candidates - .OrderByDescending(entry => entry.Published) - .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxAdvisoriesPerFetch) - .OrderBy(entry => entry.Published) - .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) - .ToList(); - } - - foreach (var entry in candidates) - { - cancellationToken.ThrowIfCancellationRequested(); - - var detailUri = new Uri(_options.DetailBaseUri, entry.AdvisoryId); - var cacheKey = detailUri.ToString(); - touchedResources.Add(cacheKey); - - cursor.TryGetCache(cacheKey, out var cachedDetail); - if (!fetchCache.TryGetValue(cacheKey, out var cachedInRun)) - { - cachedInRun = cachedDetail; - } - - var metadata = BuildDetailMetadata(entry); - var existingDetail = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); - - var request = new SourceFetchRequest(DebianOptions.HttpClientName, SourceName, detailUri) - { - Metadata = metadata, - AcceptHeaders = new[] { "text/html", "application/xhtml+xml" }, - TimeoutOverride = _options.FetchTimeout, - ETag = existingDetail?.Etag ?? cachedInRun?.ETag, - LastModified = existingDetail?.LastModified ?? cachedInRun?.LastModified, - }; - - SourceFetchResult result; - try - { - result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to fetch Debian advisory {AdvisoryId}", entry.AdvisoryId); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (result.IsNotModified) - { - if (existingDetail is not null) - { - fetchCache[cacheKey] = DebianFetchCacheEntry.FromDocument(existingDetail); - if (string.Equals(existingDetail.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) - { - pendingDocuments.Remove(existingDetail.Id); - pendingMappings.Remove(existingDetail.Id); - } - } - - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - fetchCache[cacheKey] = DebianFetchCacheEntry.FromDocument(result.Document); - pendingDocuments.Add(result.Document.Id); - pendingMappings.Remove(result.Document.Id); - - if (_options.RequestDelay > TimeSpan.Zero) - { - try - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - catch (TaskCanceledException) - { - break; - } - } - - if (entry.Published > maxPublished) - { - maxPublished = entry.Published; - newProcessedIds.Clear(); - processedUpdated = true; - } - - if (entry.Published == maxPublished) - { - newProcessedIds.Add(entry.AdvisoryId); - processedUpdated = true; - } - } - } - } - } - - if (fetchCache.Count > 0 && touchedResources.Count > 0) - { - var stale = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); - foreach (var key in stale) - { - fetchCache.Remove(key); - } - } - - if (!processedUpdated && cursor.LastPublished.HasValue) - { - maxPublished = cursor.LastPublished.Value; - newProcessedIds = new HashSet(cursor.ProcessedAdvisoryIds, StringComparer.OrdinalIgnoreCase); - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithFetchCache(fetchCache); - - if (processedUpdated && maxPublished > DateTimeOffset.MinValue) - { - updatedCursor = updatedCursor.WithProcessed(maxPublished, newProcessedIds); - } - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remaining = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remaining.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Debian document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - continue; - } - - var metadata = ExtractMetadata(document); - if (metadata is null) - { - _logger.LogWarning("Debian document {DocumentId} missing required metadata", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - continue; - } - - byte[] bytes; - try - { - bytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to download Debian document {DocumentId}", document.Id); - throw; - } - - var html = System.Text.Encoding.UTF8.GetString(bytes); - DebianAdvisoryDto dto; - try - { - dto = DebianHtmlParser.Parse(html, metadata); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Failed to parse Debian advisory {AdvisoryId}", metadata.AdvisoryId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(document.Id); - continue; - } - - var payload = ToBson(dto); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, SchemaVersion, payload, _timeProvider.GetUtcNow()); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remaining.Remove(document.Id); - if (!pendingMappings.Contains(document.Id)) - { - pendingMappings.Add(document.Id); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remaining) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - DebianAdvisoryDto dto; - try - { - dto = FromBson(dtoRecord.Payload); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize Debian DTO for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var advisory = DebianMapper.Map(dto, document, _timeProvider.GetUtcNow()); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - LogMapped(_logger, dto.AdvisoryId, advisory.AffectedPackages.Length, null); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? DebianCursor.Empty : DebianCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(DebianCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private static Dictionary BuildDetailMetadata(DebianListEntry entry) - { - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["debian.id"] = entry.AdvisoryId, - ["debian.published"] = entry.Published.ToString("O", CultureInfo.InvariantCulture), - ["debian.title"] = entry.Title, - ["debian.package"] = entry.SourcePackage - }; - - if (entry.CveIds.Count > 0) - { - metadata["debian.cves"] = string.Join(' ', entry.CveIds); - } - - return metadata; - } - - private static DebianDetailMetadata? ExtractMetadata(DocumentRecord document) - { - if (document.Metadata is null) - { - return null; - } - - if (!document.Metadata.TryGetValue("debian.id", out var id) || string.IsNullOrWhiteSpace(id)) - { - return null; - } - - if (!document.Metadata.TryGetValue("debian.published", out var publishedRaw) - || !DateTimeOffset.TryParse(publishedRaw, CultureInfo.InvariantCulture, DateTimeStyles.AssumeUniversal, out var published)) - { - published = document.FetchedAt; - } - - var title = document.Metadata.TryGetValue("debian.title", out var t) ? t : id; - var package = document.Metadata.TryGetValue("debian.package", out var pkg) && !string.IsNullOrWhiteSpace(pkg) - ? pkg - : id; - - IReadOnlyList cveList = Array.Empty(); - if (document.Metadata.TryGetValue("debian.cves", out var cvesRaw) && !string.IsNullOrWhiteSpace(cvesRaw)) - { - cveList = cvesRaw - .Split(' ', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries) - .Where(static s => !string.IsNullOrWhiteSpace(s)) - .Select(static s => s!) - .Distinct(StringComparer.OrdinalIgnoreCase) - .ToArray(); - } - - return new DebianDetailMetadata( - id.Trim(), - new Uri(document.Uri, UriKind.Absolute), - published.ToUniversalTime(), - title, - package, - cveList); - } - - private static BsonDocument ToBson(DebianAdvisoryDto dto) - { - var packages = new BsonArray(); - foreach (var package in dto.Packages) - { - var packageDoc = new BsonDocument - { - ["package"] = package.Package, - ["release"] = package.Release, - ["status"] = package.Status, - }; - - if (!string.IsNullOrWhiteSpace(package.IntroducedVersion)) - { - packageDoc["introduced"] = package.IntroducedVersion; - } - - if (!string.IsNullOrWhiteSpace(package.FixedVersion)) - { - packageDoc["fixed"] = package.FixedVersion; - } - - if (!string.IsNullOrWhiteSpace(package.LastAffectedVersion)) - { - packageDoc["last"] = package.LastAffectedVersion; - } - - if (package.Published.HasValue) - { - packageDoc["published"] = package.Published.Value.UtcDateTime; - } - - packages.Add(packageDoc); - } - - var references = new BsonArray(dto.References.Select(reference => - { - var doc = new BsonDocument - { - ["url"] = reference.Url - }; - - if (!string.IsNullOrWhiteSpace(reference.Kind)) - { - doc["kind"] = reference.Kind; - } - - if (!string.IsNullOrWhiteSpace(reference.Title)) - { - doc["title"] = reference.Title; - } - - return doc; - })); - - return new BsonDocument - { - ["advisoryId"] = dto.AdvisoryId, - ["sourcePackage"] = dto.SourcePackage, - ["title"] = dto.Title, - ["description"] = dto.Description ?? string.Empty, - ["cves"] = new BsonArray(dto.CveIds), - ["packages"] = packages, - ["references"] = references, - }; - } - - private static DebianAdvisoryDto FromBson(BsonDocument document) - { - var advisoryId = document.GetValue("advisoryId", "").AsString; - var sourcePackage = document.GetValue("sourcePackage", advisoryId).AsString; - var title = document.GetValue("title", advisoryId).AsString; - var description = document.TryGetValue("description", out var desc) ? desc.AsString : null; - - var cves = document.TryGetValue("cves", out var cveArray) && cveArray is BsonArray cvesBson - ? cvesBson.OfType() - .Select(static value => value.ToString()) - .Where(static s => !string.IsNullOrWhiteSpace(s)) - .Select(static s => s!) - .ToArray() - : Array.Empty(); - - var packages = new List(); - if (document.TryGetValue("packages", out var packageArray) && packageArray is BsonArray packagesBson) - { - foreach (var element in packagesBson.OfType()) - { - packages.Add(new DebianPackageStateDto( - element.GetValue("package", sourcePackage).AsString, - element.GetValue("release", string.Empty).AsString, - element.GetValue("status", "unknown").AsString, - element.TryGetValue("introduced", out var introducedValue) ? introducedValue.AsString : null, - element.TryGetValue("fixed", out var fixedValue) ? fixedValue.AsString : null, - element.TryGetValue("last", out var lastValue) ? lastValue.AsString : null, - element.TryGetValue("published", out var publishedValue) - ? publishedValue.BsonType switch - { - BsonType.DateTime => DateTime.SpecifyKind(publishedValue.ToUniversalTime(), DateTimeKind.Utc), - BsonType.String when DateTimeOffset.TryParse(publishedValue.AsString, out var parsed) => parsed.ToUniversalTime(), - _ => (DateTimeOffset?)null, - } - : null)); - } - } - - var references = new List(); - if (document.TryGetValue("references", out var referenceArray) && referenceArray is BsonArray refBson) - { - foreach (var element in refBson.OfType()) - { - references.Add(new DebianReferenceDto( - element.GetValue("url", "").AsString, - element.TryGetValue("kind", out var kind) ? kind.AsString : null, - element.TryGetValue("title", out var titleValue) ? titleValue.AsString : null)); - } - } - - return new DebianAdvisoryDto( - advisoryId, - sourcePackage, - title, - description, - cves, - packages, - references); - } -} +using System; +using System.Collections.Generic; +using System.Globalization; +using System.Linq; +using System.Net; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Distro.Debian.Configuration; +using StellaOps.Concelier.Connector.Distro.Debian.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Distro.Debian; + +public sealed class DebianConnector : IFeedConnector +{ + private const string SchemaVersion = "debian.v1"; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly DebianOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private static readonly Action LogMapped = + LoggerMessage.Define( + LogLevel.Information, + new EventId(1, "DebianMapped"), + "Debian advisory {AdvisoryId} mapped with {AffectedCount} packages"); + + public DebianConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => DebianConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var pendingDocuments = new HashSet(cursor.PendingDocuments); + var pendingMappings = new HashSet(cursor.PendingMappings); + var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); + var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); + + var listUri = _options.ListEndpoint; + var listKey = listUri.ToString(); + touchedResources.Add(listKey); + + var existingList = await _documentStore.FindBySourceAndUriAsync(SourceName, listKey, cancellationToken).ConfigureAwait(false); + cursor.TryGetCache(listKey, out var cachedListEntry); + + var listRequest = new SourceFetchRequest(DebianOptions.HttpClientName, SourceName, listUri) + { + Metadata = new Dictionary(StringComparer.Ordinal) + { + ["type"] = "index" + }, + AcceptHeaders = new[] { "text/plain", "text/plain; charset=utf-8" }, + TimeoutOverride = _options.FetchTimeout, + ETag = existingList?.Etag ?? cachedListEntry?.ETag, + LastModified = existingList?.LastModified ?? cachedListEntry?.LastModified, + }; + + SourceFetchResult listResult; + try + { + listResult = await _fetchService.FetchAsync(listRequest, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Debian list fetch failed"); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var lastPublished = cursor.LastPublished ?? (now - _options.InitialBackfill); + var processedIds = new HashSet(cursor.ProcessedAdvisoryIds, StringComparer.OrdinalIgnoreCase); + var newProcessedIds = new HashSet(StringComparer.OrdinalIgnoreCase); + var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; + var processedUpdated = false; + + if (listResult.IsNotModified) + { + if (existingList is not null) + { + fetchCache[listKey] = DebianFetchCacheEntry.FromDocument(existingList); + } + } + else if (listResult.IsSuccess && listResult.Document is not null) + { + fetchCache[listKey] = DebianFetchCacheEntry.FromDocument(listResult.Document); + + if (!listResult.Document.PayloadId.HasValue) + { + _logger.LogWarning("Debian list document {DocumentId} missing GridFS payload", listResult.Document.Id); + } + else + { + byte[] bytes; + try + { + bytes = await _rawDocumentStorage.DownloadAsync(listResult.Document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download Debian list document {DocumentId}", listResult.Document.Id); + throw; + } + + var text = System.Text.Encoding.UTF8.GetString(bytes); + var entries = DebianListParser.Parse(text); + if (entries.Count > 0) + { + var windowStart = (cursor.LastPublished ?? (now - _options.InitialBackfill)) - _options.ResumeOverlap; + if (windowStart < DateTimeOffset.UnixEpoch) + { + windowStart = DateTimeOffset.UnixEpoch; + } + + ProvenanceDiagnostics.ReportResumeWindow(SourceName, windowStart, _logger); + + var candidates = entries + .Where(entry => entry.Published >= windowStart) + .OrderBy(entry => entry.Published) + .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) + .ToList(); + + if (candidates.Count == 0) + { + candidates = entries + .OrderByDescending(entry => entry.Published) + .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxAdvisoriesPerFetch) + .OrderBy(entry => entry.Published) + .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + else if (candidates.Count > _options.MaxAdvisoriesPerFetch) + { + candidates = candidates + .OrderByDescending(entry => entry.Published) + .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxAdvisoriesPerFetch) + .OrderBy(entry => entry.Published) + .ThenBy(entry => entry.AdvisoryId, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + foreach (var entry in candidates) + { + cancellationToken.ThrowIfCancellationRequested(); + + var detailUri = new Uri(_options.DetailBaseUri, entry.AdvisoryId); + var cacheKey = detailUri.ToString(); + touchedResources.Add(cacheKey); + + cursor.TryGetCache(cacheKey, out var cachedDetail); + if (!fetchCache.TryGetValue(cacheKey, out var cachedInRun)) + { + cachedInRun = cachedDetail; + } + + var metadata = BuildDetailMetadata(entry); + var existingDetail = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); + + var request = new SourceFetchRequest(DebianOptions.HttpClientName, SourceName, detailUri) + { + Metadata = metadata, + AcceptHeaders = new[] { "text/html", "application/xhtml+xml" }, + TimeoutOverride = _options.FetchTimeout, + ETag = existingDetail?.Etag ?? cachedInRun?.ETag, + LastModified = existingDetail?.LastModified ?? cachedInRun?.LastModified, + }; + + SourceFetchResult result; + try + { + result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to fetch Debian advisory {AdvisoryId}", entry.AdvisoryId); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (result.IsNotModified) + { + if (existingDetail is not null) + { + fetchCache[cacheKey] = DebianFetchCacheEntry.FromDocument(existingDetail); + if (string.Equals(existingDetail.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) + { + pendingDocuments.Remove(existingDetail.Id); + pendingMappings.Remove(existingDetail.Id); + } + } + + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + fetchCache[cacheKey] = DebianFetchCacheEntry.FromDocument(result.Document); + pendingDocuments.Add(result.Document.Id); + pendingMappings.Remove(result.Document.Id); + + if (_options.RequestDelay > TimeSpan.Zero) + { + try + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + catch (TaskCanceledException) + { + break; + } + } + + if (entry.Published > maxPublished) + { + maxPublished = entry.Published; + newProcessedIds.Clear(); + processedUpdated = true; + } + + if (entry.Published == maxPublished) + { + newProcessedIds.Add(entry.AdvisoryId); + processedUpdated = true; + } + } + } + } + } + + if (fetchCache.Count > 0 && touchedResources.Count > 0) + { + var stale = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); + foreach (var key in stale) + { + fetchCache.Remove(key); + } + } + + if (!processedUpdated && cursor.LastPublished.HasValue) + { + maxPublished = cursor.LastPublished.Value; + newProcessedIds = new HashSet(cursor.ProcessedAdvisoryIds, StringComparer.OrdinalIgnoreCase); + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithFetchCache(fetchCache); + + if (processedUpdated && maxPublished > DateTimeOffset.MinValue) + { + updatedCursor = updatedCursor.WithProcessed(maxPublished, newProcessedIds); + } + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remaining = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remaining.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Debian document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + continue; + } + + var metadata = ExtractMetadata(document); + if (metadata is null) + { + _logger.LogWarning("Debian document {DocumentId} missing required metadata", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + continue; + } + + byte[] bytes; + try + { + bytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download Debian document {DocumentId}", document.Id); + throw; + } + + var html = System.Text.Encoding.UTF8.GetString(bytes); + DebianAdvisoryDto dto; + try + { + dto = DebianHtmlParser.Parse(html, metadata); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse Debian advisory {AdvisoryId}", metadata.AdvisoryId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(document.Id); + continue; + } + + var payload = ToBson(dto); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, SchemaVersion, payload, _timeProvider.GetUtcNow()); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remaining.Remove(document.Id); + if (!pendingMappings.Contains(document.Id)) + { + pendingMappings.Add(document.Id); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remaining) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + DebianAdvisoryDto dto; + try + { + dto = FromBson(dtoRecord.Payload); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize Debian DTO for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var advisory = DebianMapper.Map(dto, document, _timeProvider.GetUtcNow()); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + LogMapped(_logger, dto.AdvisoryId, advisory.AffectedPackages.Length, null); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? DebianCursor.Empty : DebianCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(DebianCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private static Dictionary BuildDetailMetadata(DebianListEntry entry) + { + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["debian.id"] = entry.AdvisoryId, + ["debian.published"] = entry.Published.ToString("O", CultureInfo.InvariantCulture), + ["debian.title"] = entry.Title, + ["debian.package"] = entry.SourcePackage + }; + + if (entry.CveIds.Count > 0) + { + metadata["debian.cves"] = string.Join(' ', entry.CveIds); + } + + return metadata; + } + + private static DebianDetailMetadata? ExtractMetadata(DocumentRecord document) + { + if (document.Metadata is null) + { + return null; + } + + if (!document.Metadata.TryGetValue("debian.id", out var id) || string.IsNullOrWhiteSpace(id)) + { + return null; + } + + if (!document.Metadata.TryGetValue("debian.published", out var publishedRaw) + || !DateTimeOffset.TryParse(publishedRaw, CultureInfo.InvariantCulture, DateTimeStyles.AssumeUniversal, out var published)) + { + published = document.FetchedAt; + } + + var title = document.Metadata.TryGetValue("debian.title", out var t) ? t : id; + var package = document.Metadata.TryGetValue("debian.package", out var pkg) && !string.IsNullOrWhiteSpace(pkg) + ? pkg + : id; + + IReadOnlyList cveList = Array.Empty(); + if (document.Metadata.TryGetValue("debian.cves", out var cvesRaw) && !string.IsNullOrWhiteSpace(cvesRaw)) + { + cveList = cvesRaw + .Split(' ', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries) + .Where(static s => !string.IsNullOrWhiteSpace(s)) + .Select(static s => s!) + .Distinct(StringComparer.OrdinalIgnoreCase) + .ToArray(); + } + + return new DebianDetailMetadata( + id.Trim(), + new Uri(document.Uri, UriKind.Absolute), + published.ToUniversalTime(), + title, + package, + cveList); + } + + private static BsonDocument ToBson(DebianAdvisoryDto dto) + { + var packages = new BsonArray(); + foreach (var package in dto.Packages) + { + var packageDoc = new BsonDocument + { + ["package"] = package.Package, + ["release"] = package.Release, + ["status"] = package.Status, + }; + + if (!string.IsNullOrWhiteSpace(package.IntroducedVersion)) + { + packageDoc["introduced"] = package.IntroducedVersion; + } + + if (!string.IsNullOrWhiteSpace(package.FixedVersion)) + { + packageDoc["fixed"] = package.FixedVersion; + } + + if (!string.IsNullOrWhiteSpace(package.LastAffectedVersion)) + { + packageDoc["last"] = package.LastAffectedVersion; + } + + if (package.Published.HasValue) + { + packageDoc["published"] = package.Published.Value.UtcDateTime; + } + + packages.Add(packageDoc); + } + + var references = new BsonArray(dto.References.Select(reference => + { + var doc = new BsonDocument + { + ["url"] = reference.Url + }; + + if (!string.IsNullOrWhiteSpace(reference.Kind)) + { + doc["kind"] = reference.Kind; + } + + if (!string.IsNullOrWhiteSpace(reference.Title)) + { + doc["title"] = reference.Title; + } + + return doc; + })); + + return new BsonDocument + { + ["advisoryId"] = dto.AdvisoryId, + ["sourcePackage"] = dto.SourcePackage, + ["title"] = dto.Title, + ["description"] = dto.Description ?? string.Empty, + ["cves"] = new BsonArray(dto.CveIds), + ["packages"] = packages, + ["references"] = references, + }; + } + + private static DebianAdvisoryDto FromBson(BsonDocument document) + { + var advisoryId = document.GetValue("advisoryId", "").AsString; + var sourcePackage = document.GetValue("sourcePackage", advisoryId).AsString; + var title = document.GetValue("title", advisoryId).AsString; + var description = document.TryGetValue("description", out var desc) ? desc.AsString : null; + + var cves = document.TryGetValue("cves", out var cveArray) && cveArray is BsonArray cvesBson + ? cvesBson.OfType() + .Select(static value => value.ToString()) + .Where(static s => !string.IsNullOrWhiteSpace(s)) + .Select(static s => s!) + .ToArray() + : Array.Empty(); + + var packages = new List(); + if (document.TryGetValue("packages", out var packageArray) && packageArray is BsonArray packagesBson) + { + foreach (var element in packagesBson.OfType()) + { + packages.Add(new DebianPackageStateDto( + element.GetValue("package", sourcePackage).AsString, + element.GetValue("release", string.Empty).AsString, + element.GetValue("status", "unknown").AsString, + element.TryGetValue("introduced", out var introducedValue) ? introducedValue.AsString : null, + element.TryGetValue("fixed", out var fixedValue) ? fixedValue.AsString : null, + element.TryGetValue("last", out var lastValue) ? lastValue.AsString : null, + element.TryGetValue("published", out var publishedValue) + ? publishedValue.BsonType switch + { + BsonType.DateTime => DateTime.SpecifyKind(publishedValue.ToUniversalTime(), DateTimeKind.Utc), + BsonType.String when DateTimeOffset.TryParse(publishedValue.AsString, out var parsed) => parsed.ToUniversalTime(), + _ => (DateTimeOffset?)null, + } + : null)); + } + } + + var references = new List(); + if (document.TryGetValue("references", out var referenceArray) && referenceArray is BsonArray refBson) + { + foreach (var element in refBson.OfType()) + { + references.Add(new DebianReferenceDto( + element.GetValue("url", "").AsString, + element.TryGetValue("kind", out var kind) ? kind.AsString : null, + element.TryGetValue("title", out var titleValue) ? titleValue.AsString : null)); + } + } + + return new DebianAdvisoryDto( + advisoryId, + sourcePackage, + title, + description, + cves, + packages, + references); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.RedHat/RedHatConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.RedHat/RedHatConnector.cs index 5efab6beb..1b7944f7c 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.RedHat/RedHatConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.RedHat/RedHatConnector.cs @@ -1,434 +1,434 @@ -using System; -using System.Collections.Generic; -using System.Globalization; -using System.Linq; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Distro.RedHat.Configuration; -using StellaOps.Concelier.Connector.Distro.RedHat.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Distro.RedHat; - -public sealed class RedHatConnector : IFeedConnector -{ - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly ILogger _logger; - private readonly RedHatOptions _options; - private readonly TimeProvider _timeProvider; - - public RedHatConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => RedHatConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var baseline = cursor.LastReleasedOn ?? now - _options.InitialBackfill; - var overlap = _options.Overlap > TimeSpan.Zero ? _options.Overlap : TimeSpan.Zero; - var afterThreshold = baseline - overlap; - if (afterThreshold < DateTimeOffset.UnixEpoch) - { - afterThreshold = DateTimeOffset.UnixEpoch; - } - - ProvenanceDiagnostics.ReportResumeWindow(SourceName, afterThreshold, _logger); - - var processedSet = new HashSet(cursor.ProcessedAdvisoryIds, StringComparer.OrdinalIgnoreCase); - var newSummaries = new List(); - var stopDueToOlderData = false; - var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); - - for (var page = 1; page <= _options.MaxPagesPerFetch; page++) - { - var summaryUri = BuildSummaryUri(afterThreshold, page); - var summaryKey = summaryUri.ToString(); - touchedResources.Add(summaryKey); - - var cachedSummary = cursor.TryGetFetchCache(summaryKey); - var summaryMetadata = new Dictionary(StringComparer.Ordinal) - { - ["page"] = page.ToString(CultureInfo.InvariantCulture), - ["type"] = "summary" - }; - - var summaryRequest = new SourceFetchRequest(RedHatOptions.HttpClientName, SourceName, summaryUri) - { - Metadata = summaryMetadata, - ETag = cachedSummary?.ETag, - LastModified = cachedSummary?.LastModified, - TimeoutOverride = _options.FetchTimeout, - }; - - SourceFetchContentResult summaryResult; - try - { - summaryResult = await _fetchService.FetchContentAsync(summaryRequest, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Red Hat Hydra summary fetch failed for {Uri}", summaryUri); - throw; - } - - if (summaryResult.IsNotModified) - { - if (page == 1) - { - break; - } - - continue; - } - - if (!summaryResult.IsSuccess || summaryResult.Content is null) - { - continue; - } - - cursor = cursor.WithFetchCache(summaryKey, summaryResult.ETag, summaryResult.LastModified); - - using var document = JsonDocument.Parse(summaryResult.Content); - - if (document.RootElement.ValueKind != JsonValueKind.Array) - { - _logger.LogWarning( - "Red Hat Hydra summary response had unexpected payload kind {Kind} for {Uri}", - document.RootElement.ValueKind, - summaryUri); - break; - } - - var pageCount = 0; - foreach (var element in document.RootElement.EnumerateArray()) - { - if (!RedHatSummaryItem.TryParse(element, out var summary)) - { - continue; - } - - pageCount++; - - if (cursor.LastReleasedOn.HasValue) - { - if (summary.ReleasedOn < cursor.LastReleasedOn.Value - overlap) - { - stopDueToOlderData = true; - break; - } - - if (summary.ReleasedOn < cursor.LastReleasedOn.Value) - { - stopDueToOlderData = true; - break; - } - - if (summary.ReleasedOn == cursor.LastReleasedOn.Value && processedSet.Contains(summary.AdvisoryId)) - { - continue; - } - } - - newSummaries.Add(summary); - processedSet.Add(summary.AdvisoryId); - - if (newSummaries.Count >= _options.MaxAdvisoriesPerFetch) - { - break; - } - } - - if (newSummaries.Count >= _options.MaxAdvisoriesPerFetch || stopDueToOlderData) - { - break; - } - - if (pageCount < _options.PageSize) - { - break; - } - } - - if (newSummaries.Count == 0) - { - return; - } - - newSummaries.Sort(static (left, right) => - { - var compare = left.ReleasedOn.CompareTo(right.ReleasedOn); - return compare != 0 - ? compare - : string.CompareOrdinal(left.AdvisoryId, right.AdvisoryId); - }); - - var pendingDocuments = new HashSet(cursor.PendingDocuments); - - foreach (var summary in newSummaries) - { - var resourceUri = summary.ResourceUri; - var resourceKey = resourceUri.ToString(); - touchedResources.Add(resourceKey); - - var cached = cursor.TryGetFetchCache(resourceKey); - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["advisoryId"] = summary.AdvisoryId, - ["releasedOn"] = summary.ReleasedOn.ToString("O", CultureInfo.InvariantCulture) - }; - - var request = new SourceFetchRequest(RedHatOptions.HttpClientName, SourceName, resourceUri) - { - Metadata = metadata, - ETag = cached?.ETag, - LastModified = cached?.LastModified, - TimeoutOverride = _options.FetchTimeout, - }; - - try - { - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - if (result.IsNotModified) - { - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - pendingDocuments.Add(result.Document.Id); - cursor = cursor.WithFetchCache(resourceKey, result.Document.Etag, result.Document.LastModified); - } - catch (Exception ex) - { - _logger.LogError(ex, "Red Hat Hydra advisory fetch failed for {Uri}", resourceUri); - throw; - } - } - - var maxRelease = newSummaries.Max(static item => item.ReleasedOn); - var idsForMaxRelease = newSummaries - .Where(item => item.ReleasedOn == maxRelease) - .Select(item => item.AdvisoryId) - .Distinct(StringComparer.OrdinalIgnoreCase) - .ToArray(); - - RedHatCursor updated; - if (cursor.LastReleasedOn.HasValue && maxRelease == cursor.LastReleasedOn.Value) - { - updated = cursor - .WithPendingDocuments(pendingDocuments) - .AddProcessedAdvisories(idsForMaxRelease) - .PruneFetchCache(touchedResources); - } - else - { - updated = cursor - .WithPendingDocuments(pendingDocuments) - .WithLastReleased(maxRelease, idsForMaxRelease) - .PruneFetchCache(touchedResources); - } - - await UpdateCursorAsync(updated, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingFetch = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - DocumentRecord? document = null; - - try - { - document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingFetch.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Red Hat document {DocumentId} missing GridFS content; skipping", document.Id); - remainingFetch.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - using var jsonDocument = JsonDocument.Parse(rawBytes); - var sanitized = JsonSerializer.Serialize(jsonDocument.RootElement); - var payload = BsonDocument.Parse(sanitized); - - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - "redhat.csaf.v2", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingFetch.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - catch (Exception ex) - { - var uri = document?.Uri ?? documentId.ToString(); - _logger.LogError(ex, "Red Hat CSAF parse failed for {Uri}", uri); - remainingFetch.Remove(documentId); - pendingMappings.Remove(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingFetch) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - try - { - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dto is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var json = dto.Payload.ToJson(new JsonWriterSettings - { - OutputMode = JsonOutputMode.RelaxedExtendedJson, - }); - - using var jsonDocument = JsonDocument.Parse(json); - var advisory = RedHatMapper.Map(SourceName, dto, document, jsonDocument); - if (advisory is null) - { - pendingMappings.Remove(documentId); - continue; - } - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - catch (Exception ex) - { - _logger.LogError(ex, "Red Hat map failed for document {DocumentId}", documentId); - } - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return RedHatCursor.FromBsonDocument(record?.Cursor); - } - - private async Task UpdateCursorAsync(RedHatCursor cursor, CancellationToken cancellationToken) - { - var completedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); - } - - private Uri BuildSummaryUri(DateTimeOffset after, int page) - { - var builder = new UriBuilder(_options.BaseEndpoint); - var basePath = builder.Path?.TrimEnd('/') ?? string.Empty; - var summaryPath = _options.SummaryPath.TrimStart('/'); - builder.Path = string.IsNullOrEmpty(basePath) - ? $"/{summaryPath}" - : $"{basePath}/{summaryPath}"; - - var parameters = new Dictionary(StringComparer.Ordinal) - { - ["after"] = after.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture), - ["per_page"] = _options.PageSize.ToString(CultureInfo.InvariantCulture), - ["page"] = page.ToString(CultureInfo.InvariantCulture) - }; - - builder.Query = string.Join('&', parameters.Select(static kvp => - $"{Uri.EscapeDataString(kvp.Key)}={Uri.EscapeDataString(kvp.Value)}")); - return builder.Uri; - } -} +using System; +using System.Collections.Generic; +using System.Globalization; +using System.Linq; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Distro.RedHat.Configuration; +using StellaOps.Concelier.Connector.Distro.RedHat.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Distro.RedHat; + +public sealed class RedHatConnector : IFeedConnector +{ + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly ILogger _logger; + private readonly RedHatOptions _options; + private readonly TimeProvider _timeProvider; + + public RedHatConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => RedHatConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var baseline = cursor.LastReleasedOn ?? now - _options.InitialBackfill; + var overlap = _options.Overlap > TimeSpan.Zero ? _options.Overlap : TimeSpan.Zero; + var afterThreshold = baseline - overlap; + if (afterThreshold < DateTimeOffset.UnixEpoch) + { + afterThreshold = DateTimeOffset.UnixEpoch; + } + + ProvenanceDiagnostics.ReportResumeWindow(SourceName, afterThreshold, _logger); + + var processedSet = new HashSet(cursor.ProcessedAdvisoryIds, StringComparer.OrdinalIgnoreCase); + var newSummaries = new List(); + var stopDueToOlderData = false; + var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); + + for (var page = 1; page <= _options.MaxPagesPerFetch; page++) + { + var summaryUri = BuildSummaryUri(afterThreshold, page); + var summaryKey = summaryUri.ToString(); + touchedResources.Add(summaryKey); + + var cachedSummary = cursor.TryGetFetchCache(summaryKey); + var summaryMetadata = new Dictionary(StringComparer.Ordinal) + { + ["page"] = page.ToString(CultureInfo.InvariantCulture), + ["type"] = "summary" + }; + + var summaryRequest = new SourceFetchRequest(RedHatOptions.HttpClientName, SourceName, summaryUri) + { + Metadata = summaryMetadata, + ETag = cachedSummary?.ETag, + LastModified = cachedSummary?.LastModified, + TimeoutOverride = _options.FetchTimeout, + }; + + SourceFetchContentResult summaryResult; + try + { + summaryResult = await _fetchService.FetchContentAsync(summaryRequest, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Red Hat Hydra summary fetch failed for {Uri}", summaryUri); + throw; + } + + if (summaryResult.IsNotModified) + { + if (page == 1) + { + break; + } + + continue; + } + + if (!summaryResult.IsSuccess || summaryResult.Content is null) + { + continue; + } + + cursor = cursor.WithFetchCache(summaryKey, summaryResult.ETag, summaryResult.LastModified); + + using var document = JsonDocument.Parse(summaryResult.Content); + + if (document.RootElement.ValueKind != JsonValueKind.Array) + { + _logger.LogWarning( + "Red Hat Hydra summary response had unexpected payload kind {Kind} for {Uri}", + document.RootElement.ValueKind, + summaryUri); + break; + } + + var pageCount = 0; + foreach (var element in document.RootElement.EnumerateArray()) + { + if (!RedHatSummaryItem.TryParse(element, out var summary)) + { + continue; + } + + pageCount++; + + if (cursor.LastReleasedOn.HasValue) + { + if (summary.ReleasedOn < cursor.LastReleasedOn.Value - overlap) + { + stopDueToOlderData = true; + break; + } + + if (summary.ReleasedOn < cursor.LastReleasedOn.Value) + { + stopDueToOlderData = true; + break; + } + + if (summary.ReleasedOn == cursor.LastReleasedOn.Value && processedSet.Contains(summary.AdvisoryId)) + { + continue; + } + } + + newSummaries.Add(summary); + processedSet.Add(summary.AdvisoryId); + + if (newSummaries.Count >= _options.MaxAdvisoriesPerFetch) + { + break; + } + } + + if (newSummaries.Count >= _options.MaxAdvisoriesPerFetch || stopDueToOlderData) + { + break; + } + + if (pageCount < _options.PageSize) + { + break; + } + } + + if (newSummaries.Count == 0) + { + return; + } + + newSummaries.Sort(static (left, right) => + { + var compare = left.ReleasedOn.CompareTo(right.ReleasedOn); + return compare != 0 + ? compare + : string.CompareOrdinal(left.AdvisoryId, right.AdvisoryId); + }); + + var pendingDocuments = new HashSet(cursor.PendingDocuments); + + foreach (var summary in newSummaries) + { + var resourceUri = summary.ResourceUri; + var resourceKey = resourceUri.ToString(); + touchedResources.Add(resourceKey); + + var cached = cursor.TryGetFetchCache(resourceKey); + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["advisoryId"] = summary.AdvisoryId, + ["releasedOn"] = summary.ReleasedOn.ToString("O", CultureInfo.InvariantCulture) + }; + + var request = new SourceFetchRequest(RedHatOptions.HttpClientName, SourceName, resourceUri) + { + Metadata = metadata, + ETag = cached?.ETag, + LastModified = cached?.LastModified, + TimeoutOverride = _options.FetchTimeout, + }; + + try + { + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + if (result.IsNotModified) + { + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + pendingDocuments.Add(result.Document.Id); + cursor = cursor.WithFetchCache(resourceKey, result.Document.Etag, result.Document.LastModified); + } + catch (Exception ex) + { + _logger.LogError(ex, "Red Hat Hydra advisory fetch failed for {Uri}", resourceUri); + throw; + } + } + + var maxRelease = newSummaries.Max(static item => item.ReleasedOn); + var idsForMaxRelease = newSummaries + .Where(item => item.ReleasedOn == maxRelease) + .Select(item => item.AdvisoryId) + .Distinct(StringComparer.OrdinalIgnoreCase) + .ToArray(); + + RedHatCursor updated; + if (cursor.LastReleasedOn.HasValue && maxRelease == cursor.LastReleasedOn.Value) + { + updated = cursor + .WithPendingDocuments(pendingDocuments) + .AddProcessedAdvisories(idsForMaxRelease) + .PruneFetchCache(touchedResources); + } + else + { + updated = cursor + .WithPendingDocuments(pendingDocuments) + .WithLastReleased(maxRelease, idsForMaxRelease) + .PruneFetchCache(touchedResources); + } + + await UpdateCursorAsync(updated, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingFetch = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + DocumentRecord? document = null; + + try + { + document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingFetch.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Red Hat document {DocumentId} missing GridFS content; skipping", document.Id); + remainingFetch.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + using var jsonDocument = JsonDocument.Parse(rawBytes); + var sanitized = JsonSerializer.Serialize(jsonDocument.RootElement); + var payload = BsonDocument.Parse(sanitized); + + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + "redhat.csaf.v2", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingFetch.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + catch (Exception ex) + { + var uri = document?.Uri ?? documentId.ToString(); + _logger.LogError(ex, "Red Hat CSAF parse failed for {Uri}", uri); + remainingFetch.Remove(documentId); + pendingMappings.Remove(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingFetch) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + try + { + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dto is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var json = dto.Payload.ToJson(new JsonWriterSettings + { + OutputMode = JsonOutputMode.RelaxedExtendedJson, + }); + + using var jsonDocument = JsonDocument.Parse(json); + var advisory = RedHatMapper.Map(SourceName, dto, document, jsonDocument); + if (advisory is null) + { + pendingMappings.Remove(documentId); + continue; + } + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + catch (Exception ex) + { + _logger.LogError(ex, "Red Hat map failed for document {DocumentId}", documentId); + } + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return RedHatCursor.FromBsonDocument(record?.Cursor); + } + + private async Task UpdateCursorAsync(RedHatCursor cursor, CancellationToken cancellationToken) + { + var completedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); + } + + private Uri BuildSummaryUri(DateTimeOffset after, int page) + { + var builder = new UriBuilder(_options.BaseEndpoint); + var basePath = builder.Path?.TrimEnd('/') ?? string.Empty; + var summaryPath = _options.SummaryPath.TrimStart('/'); + builder.Path = string.IsNullOrEmpty(basePath) + ? $"/{summaryPath}" + : $"{basePath}/{summaryPath}"; + + var parameters = new Dictionary(StringComparer.Ordinal) + { + ["after"] = after.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture), + ["per_page"] = _options.PageSize.ToString(CultureInfo.InvariantCulture), + ["page"] = page.ToString(CultureInfo.InvariantCulture) + }; + + builder.Query = string.Join('&', parameters.Select(static kvp => + $"{Uri.EscapeDataString(kvp.Key)}={Uri.EscapeDataString(kvp.Value)}")); + return builder.Uri; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Suse/SuseConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Suse/SuseConnector.cs index 96f947d4e..653896439 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Suse/SuseConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Suse/SuseConnector.cs @@ -1,573 +1,573 @@ -using System; -using System.Collections.Generic; -using System.Globalization; -using System.IO; -using System.Linq; -using System.Text; -using System.Text.Json; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Distro.Suse.Configuration; -using StellaOps.Concelier.Connector.Distro.Suse.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Distro.Suse; - -public sealed class SuseConnector : IFeedConnector -{ - private static readonly Action LogMapped = - LoggerMessage.Define( - LogLevel.Information, - new EventId(1, "SuseMapped"), - "SUSE advisory {AdvisoryId} mapped with {AffectedCount} affected packages"); - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly SuseOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public SuseConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => SuseConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var pendingDocuments = new HashSet(cursor.PendingDocuments); - var pendingMappings = new HashSet(cursor.PendingMappings); - var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); - var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); - - var changesUri = _options.ChangesEndpoint; - var changesKey = changesUri.ToString(); - touchedResources.Add(changesKey); - - cursor.TryGetCache(changesKey, out var cachedChanges); - - var changesRequest = new SourceFetchRequest(SuseOptions.HttpClientName, SourceName, changesUri) - { - Metadata = new Dictionary(StringComparer.Ordinal) - { - ["suse.type"] = "changes" - }, - AcceptHeaders = new[] { "text/csv", "text/plain" }, - TimeoutOverride = _options.FetchTimeout, - ETag = cachedChanges?.ETag, - LastModified = cachedChanges?.LastModified, - }; - - SourceFetchResult changesResult; - try - { - changesResult = await _fetchService.FetchAsync(changesRequest, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "SUSE changes.csv fetch failed from {Uri}", changesUri); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var maxModified = cursor.LastModified ?? DateTimeOffset.MinValue; - var processedUpdated = false; - var processedIds = new HashSet(cursor.ProcessedIds, StringComparer.OrdinalIgnoreCase); - var currentWindowIds = new HashSet(StringComparer.OrdinalIgnoreCase); - - IReadOnlyList changeRecords = Array.Empty(); - if (changesResult.IsNotModified) - { - if (cursor.FetchCache.TryGetValue(changesKey, out var existingCache)) - { - fetchCache[changesKey] = existingCache; - } - } - else if (changesResult.IsSuccess && changesResult.Document is not null) - { - fetchCache[changesKey] = SuseFetchCacheEntry.FromDocument(changesResult.Document); - if (changesResult.Document.GridFsId.HasValue) - { - byte[] changesBytes; - try - { - changesBytes = await _rawDocumentStorage.DownloadAsync(changesResult.Document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to download SUSE changes.csv document {DocumentId}", changesResult.Document.Id); - throw; - } - - var csv = Encoding.UTF8.GetString(changesBytes); - changeRecords = SuseChangesParser.Parse(csv); - } - } - - if (changeRecords.Count > 0) - { - var baseline = (cursor.LastModified ?? (now - _options.InitialBackfill)) - _options.ResumeOverlap; - if (baseline < DateTimeOffset.UnixEpoch) - { - baseline = DateTimeOffset.UnixEpoch; - } - - ProvenanceDiagnostics.ReportResumeWindow(SourceName, baseline, _logger); - - var candidates = changeRecords - .Where(record => record.ModifiedAt >= baseline) - .OrderBy(record => record.ModifiedAt) - .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) - .ToList(); - - if (candidates.Count == 0) - { - candidates = changeRecords - .OrderByDescending(record => record.ModifiedAt) - .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxAdvisoriesPerFetch) - .OrderBy(record => record.ModifiedAt) - .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) - .ToList(); - } - else if (candidates.Count > _options.MaxAdvisoriesPerFetch) - { - candidates = candidates - .OrderByDescending(record => record.ModifiedAt) - .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxAdvisoriesPerFetch) - .OrderBy(record => record.ModifiedAt) - .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) - .ToList(); - } - - foreach (var record in candidates) - { - cancellationToken.ThrowIfCancellationRequested(); - - var detailUri = new Uri(_options.AdvisoryBaseUri, record.FileName); - var cacheKey = detailUri.AbsoluteUri; - touchedResources.Add(cacheKey); - - cursor.TryGetCache(cacheKey, out var cachedEntry); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["suse.file"] = record.FileName, - ["suse.modified"] = record.ModifiedAt.ToString("O", CultureInfo.InvariantCulture) - }; - - if (!metadata.ContainsKey("suse.id") && existing?.Metadata?.TryGetValue("suse.id", out var existingId) == true) - { - metadata["suse.id"] = existingId; - } - - var request = new SourceFetchRequest(SuseOptions.HttpClientName, SourceName, detailUri) - { - Metadata = metadata, - AcceptHeaders = new[] { "application/json", "text/json" }, - TimeoutOverride = _options.FetchTimeout, - ETag = existing?.Etag ?? cachedEntry?.ETag, - LastModified = existing?.LastModified ?? cachedEntry?.LastModified, - }; - - SourceFetchResult result; - try - { - result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to fetch SUSE advisory {FileName}", record.FileName); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (result.IsNotModified) - { - if (existing is not null) - { - fetchCache[cacheKey] = SuseFetchCacheEntry.FromDocument(existing); - if (string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) - { - pendingDocuments.Remove(existing.Id); - pendingMappings.Remove(existing.Id); - } - } - - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - fetchCache[cacheKey] = SuseFetchCacheEntry.FromDocument(result.Document); - pendingDocuments.Add(result.Document.Id); - pendingMappings.Remove(result.Document.Id); - currentWindowIds.Add(record.FileName); - - if (record.ModifiedAt > maxModified) - { - maxModified = record.ModifiedAt; - processedUpdated = true; - } - } - } - - if (fetchCache.Count > 0 && touchedResources.Count > 0) - { - var staleKeys = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); - foreach (var key in staleKeys) - { - fetchCache.Remove(key); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithFetchCache(fetchCache); - - if (processedUpdated && currentWindowIds.Count > 0) - { - updatedCursor = updatedCursor.WithProcessed(maxModified, currentWindowIds); - } - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remaining = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remaining.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("SUSE document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - continue; - } - - byte[] bytes; - try - { - bytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to download SUSE document {DocumentId}", document.Id); - throw; - } - - SuseAdvisoryDto dto; - try - { - var json = Encoding.UTF8.GetString(bytes); - dto = SuseCsafParser.Parse(json); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Failed to parse SUSE advisory {Uri}", document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - continue; - } - - var metadata = document.Metadata is null - ? new Dictionary(StringComparer.Ordinal) - : new Dictionary(document.Metadata, StringComparer.Ordinal); - - metadata["suse.id"] = dto.AdvisoryId; - var updatedDocument = document with { Metadata = metadata }; - await _documentStore.UpsertAsync(updatedDocument, cancellationToken).ConfigureAwait(false); - - var payload = ToBson(dto); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "suse.csaf.v1", payload, _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remaining.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remaining) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - SuseAdvisoryDto dto; - try - { - dto = FromBson(dtoRecord.Payload); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize SUSE DTO for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var advisory = SuseMapper.Map(dto, document, _timeProvider.GetUtcNow()); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - - LogMapped(_logger, dto.AdvisoryId, advisory.AffectedPackages.Length, null); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? SuseCursor.Empty : SuseCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(SuseCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private static BsonDocument ToBson(SuseAdvisoryDto dto) - { - var packages = new BsonArray(); - foreach (var package in dto.Packages) - { - var packageDoc = new BsonDocument - { - ["package"] = package.Package, - ["platform"] = package.Platform, - ["canonical"] = package.CanonicalNevra, - ["status"] = package.Status - }; - - if (!string.IsNullOrWhiteSpace(package.Architecture)) - { - packageDoc["arch"] = package.Architecture; - } - - if (!string.IsNullOrWhiteSpace(package.IntroducedVersion)) - { - packageDoc["introduced"] = package.IntroducedVersion; - } - - if (!string.IsNullOrWhiteSpace(package.FixedVersion)) - { - packageDoc["fixed"] = package.FixedVersion; - } - - if (!string.IsNullOrWhiteSpace(package.LastAffectedVersion)) - { - packageDoc["last"] = package.LastAffectedVersion; - } - - packages.Add(packageDoc); - } - - var references = new BsonArray(); - foreach (var reference in dto.References) - { - var referenceDoc = new BsonDocument - { - ["url"] = reference.Url - }; - - if (!string.IsNullOrWhiteSpace(reference.Kind)) - { - referenceDoc["kind"] = reference.Kind; - } - - if (!string.IsNullOrWhiteSpace(reference.Title)) - { - referenceDoc["title"] = reference.Title; - } - - references.Add(referenceDoc); - } - - return new BsonDocument - { - ["advisoryId"] = dto.AdvisoryId, - ["title"] = dto.Title ?? string.Empty, - ["summary"] = dto.Summary ?? string.Empty, - ["published"] = dto.Published.UtcDateTime, - ["cves"] = new BsonArray(dto.CveIds ?? Array.Empty()), - ["packages"] = packages, - ["references"] = references - }; - } - - private static SuseAdvisoryDto FromBson(BsonDocument document) - { - var advisoryId = document.GetValue("advisoryId", string.Empty).AsString; - var title = document.GetValue("title", advisoryId).AsString; - var summary = document.TryGetValue("summary", out var summaryValue) ? summaryValue.AsString : null; - var published = document.TryGetValue("published", out var publishedValue) - ? publishedValue.BsonType switch - { - BsonType.DateTime => DateTime.SpecifyKind(publishedValue.ToUniversalTime(), DateTimeKind.Utc), - BsonType.String when DateTimeOffset.TryParse(publishedValue.AsString, out var parsed) => parsed.ToUniversalTime(), - _ => DateTimeOffset.UtcNow - } - : DateTimeOffset.UtcNow; - - var cves = document.TryGetValue("cves", out var cveArray) && cveArray is BsonArray bsonCves - ? bsonCves.OfType() - .Select(static value => value?.ToString()) - .Where(static value => !string.IsNullOrWhiteSpace(value)) - .Select(static value => value!) - .Distinct(StringComparer.OrdinalIgnoreCase) - .ToArray() - : Array.Empty(); - - var packageList = new List(); - if (document.TryGetValue("packages", out var packageArray) && packageArray is BsonArray bsonPackages) - { - foreach (var element in bsonPackages.OfType()) - { - var package = element.GetValue("package", string.Empty).AsString; - var platform = element.GetValue("platform", string.Empty).AsString; - var canonical = element.GetValue("canonical", string.Empty).AsString; - var status = element.GetValue("status", "unknown").AsString; - - var architecture = element.TryGetValue("arch", out var archValue) ? archValue.AsString : null; - var introduced = element.TryGetValue("introduced", out var introducedValue) ? introducedValue.AsString : null; - var fixedVersion = element.TryGetValue("fixed", out var fixedValue) ? fixedValue.AsString : null; - var last = element.TryGetValue("last", out var lastValue) ? lastValue.AsString : null; - - packageList.Add(new SusePackageStateDto( - package, - platform, - architecture, - canonical, - introduced, - fixedVersion, - last, - status)); - } - } - - var referenceList = new List(); - if (document.TryGetValue("references", out var referenceArray) && referenceArray is BsonArray bsonReferences) - { - foreach (var element in bsonReferences.OfType()) - { - var url = element.GetValue("url", string.Empty).AsString; - if (string.IsNullOrWhiteSpace(url)) - { - continue; - } - - referenceList.Add(new SuseReferenceDto( - url, - element.TryGetValue("kind", out var kindValue) ? kindValue.AsString : null, - element.TryGetValue("title", out var titleValue) ? titleValue.AsString : null)); - } - } - - return new SuseAdvisoryDto( - advisoryId, - string.IsNullOrWhiteSpace(title) ? advisoryId : title, - string.IsNullOrWhiteSpace(summary) ? null : summary, - published, - cves, - packageList, - referenceList); - } -} +using System; +using System.Collections.Generic; +using System.Globalization; +using System.IO; +using System.Linq; +using System.Text; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Distro.Suse.Configuration; +using StellaOps.Concelier.Connector.Distro.Suse.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Distro.Suse; + +public sealed class SuseConnector : IFeedConnector +{ + private static readonly Action LogMapped = + LoggerMessage.Define( + LogLevel.Information, + new EventId(1, "SuseMapped"), + "SUSE advisory {AdvisoryId} mapped with {AffectedCount} affected packages"); + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly SuseOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public SuseConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => SuseConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var pendingDocuments = new HashSet(cursor.PendingDocuments); + var pendingMappings = new HashSet(cursor.PendingMappings); + var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); + var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); + + var changesUri = _options.ChangesEndpoint; + var changesKey = changesUri.ToString(); + touchedResources.Add(changesKey); + + cursor.TryGetCache(changesKey, out var cachedChanges); + + var changesRequest = new SourceFetchRequest(SuseOptions.HttpClientName, SourceName, changesUri) + { + Metadata = new Dictionary(StringComparer.Ordinal) + { + ["suse.type"] = "changes" + }, + AcceptHeaders = new[] { "text/csv", "text/plain" }, + TimeoutOverride = _options.FetchTimeout, + ETag = cachedChanges?.ETag, + LastModified = cachedChanges?.LastModified, + }; + + SourceFetchResult changesResult; + try + { + changesResult = await _fetchService.FetchAsync(changesRequest, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "SUSE changes.csv fetch failed from {Uri}", changesUri); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var maxModified = cursor.LastModified ?? DateTimeOffset.MinValue; + var processedUpdated = false; + var processedIds = new HashSet(cursor.ProcessedIds, StringComparer.OrdinalIgnoreCase); + var currentWindowIds = new HashSet(StringComparer.OrdinalIgnoreCase); + + IReadOnlyList changeRecords = Array.Empty(); + if (changesResult.IsNotModified) + { + if (cursor.FetchCache.TryGetValue(changesKey, out var existingCache)) + { + fetchCache[changesKey] = existingCache; + } + } + else if (changesResult.IsSuccess && changesResult.Document is not null) + { + fetchCache[changesKey] = SuseFetchCacheEntry.FromDocument(changesResult.Document); + if (changesResult.Document.PayloadId.HasValue) + { + byte[] changesBytes; + try + { + changesBytes = await _rawDocumentStorage.DownloadAsync(changesResult.Document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download SUSE changes.csv document {DocumentId}", changesResult.Document.Id); + throw; + } + + var csv = Encoding.UTF8.GetString(changesBytes); + changeRecords = SuseChangesParser.Parse(csv); + } + } + + if (changeRecords.Count > 0) + { + var baseline = (cursor.LastModified ?? (now - _options.InitialBackfill)) - _options.ResumeOverlap; + if (baseline < DateTimeOffset.UnixEpoch) + { + baseline = DateTimeOffset.UnixEpoch; + } + + ProvenanceDiagnostics.ReportResumeWindow(SourceName, baseline, _logger); + + var candidates = changeRecords + .Where(record => record.ModifiedAt >= baseline) + .OrderBy(record => record.ModifiedAt) + .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) + .ToList(); + + if (candidates.Count == 0) + { + candidates = changeRecords + .OrderByDescending(record => record.ModifiedAt) + .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxAdvisoriesPerFetch) + .OrderBy(record => record.ModifiedAt) + .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + else if (candidates.Count > _options.MaxAdvisoriesPerFetch) + { + candidates = candidates + .OrderByDescending(record => record.ModifiedAt) + .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxAdvisoriesPerFetch) + .OrderBy(record => record.ModifiedAt) + .ThenBy(record => record.FileName, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + foreach (var record in candidates) + { + cancellationToken.ThrowIfCancellationRequested(); + + var detailUri = new Uri(_options.AdvisoryBaseUri, record.FileName); + var cacheKey = detailUri.AbsoluteUri; + touchedResources.Add(cacheKey); + + cursor.TryGetCache(cacheKey, out var cachedEntry); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["suse.file"] = record.FileName, + ["suse.modified"] = record.ModifiedAt.ToString("O", CultureInfo.InvariantCulture) + }; + + if (!metadata.ContainsKey("suse.id") && existing?.Metadata?.TryGetValue("suse.id", out var existingId) == true) + { + metadata["suse.id"] = existingId; + } + + var request = new SourceFetchRequest(SuseOptions.HttpClientName, SourceName, detailUri) + { + Metadata = metadata, + AcceptHeaders = new[] { "application/json", "text/json" }, + TimeoutOverride = _options.FetchTimeout, + ETag = existing?.Etag ?? cachedEntry?.ETag, + LastModified = existing?.LastModified ?? cachedEntry?.LastModified, + }; + + SourceFetchResult result; + try + { + result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to fetch SUSE advisory {FileName}", record.FileName); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (result.IsNotModified) + { + if (existing is not null) + { + fetchCache[cacheKey] = SuseFetchCacheEntry.FromDocument(existing); + if (string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) + { + pendingDocuments.Remove(existing.Id); + pendingMappings.Remove(existing.Id); + } + } + + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + fetchCache[cacheKey] = SuseFetchCacheEntry.FromDocument(result.Document); + pendingDocuments.Add(result.Document.Id); + pendingMappings.Remove(result.Document.Id); + currentWindowIds.Add(record.FileName); + + if (record.ModifiedAt > maxModified) + { + maxModified = record.ModifiedAt; + processedUpdated = true; + } + } + } + + if (fetchCache.Count > 0 && touchedResources.Count > 0) + { + var staleKeys = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); + foreach (var key in staleKeys) + { + fetchCache.Remove(key); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithFetchCache(fetchCache); + + if (processedUpdated && currentWindowIds.Count > 0) + { + updatedCursor = updatedCursor.WithProcessed(maxModified, currentWindowIds); + } + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remaining = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remaining.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("SUSE document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + continue; + } + + byte[] bytes; + try + { + bytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download SUSE document {DocumentId}", document.Id); + throw; + } + + SuseAdvisoryDto dto; + try + { + var json = Encoding.UTF8.GetString(bytes); + dto = SuseCsafParser.Parse(json); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse SUSE advisory {Uri}", document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + continue; + } + + var metadata = document.Metadata is null + ? new Dictionary(StringComparer.Ordinal) + : new Dictionary(document.Metadata, StringComparer.Ordinal); + + metadata["suse.id"] = dto.AdvisoryId; + var updatedDocument = document with { Metadata = metadata }; + await _documentStore.UpsertAsync(updatedDocument, cancellationToken).ConfigureAwait(false); + + var payload = ToBson(dto); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "suse.csaf.v1", payload, _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remaining.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remaining) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + SuseAdvisoryDto dto; + try + { + dto = FromBson(dtoRecord.Payload); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize SUSE DTO for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var advisory = SuseMapper.Map(dto, document, _timeProvider.GetUtcNow()); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + + LogMapped(_logger, dto.AdvisoryId, advisory.AffectedPackages.Length, null); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? SuseCursor.Empty : SuseCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(SuseCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private static BsonDocument ToBson(SuseAdvisoryDto dto) + { + var packages = new BsonArray(); + foreach (var package in dto.Packages) + { + var packageDoc = new BsonDocument + { + ["package"] = package.Package, + ["platform"] = package.Platform, + ["canonical"] = package.CanonicalNevra, + ["status"] = package.Status + }; + + if (!string.IsNullOrWhiteSpace(package.Architecture)) + { + packageDoc["arch"] = package.Architecture; + } + + if (!string.IsNullOrWhiteSpace(package.IntroducedVersion)) + { + packageDoc["introduced"] = package.IntroducedVersion; + } + + if (!string.IsNullOrWhiteSpace(package.FixedVersion)) + { + packageDoc["fixed"] = package.FixedVersion; + } + + if (!string.IsNullOrWhiteSpace(package.LastAffectedVersion)) + { + packageDoc["last"] = package.LastAffectedVersion; + } + + packages.Add(packageDoc); + } + + var references = new BsonArray(); + foreach (var reference in dto.References) + { + var referenceDoc = new BsonDocument + { + ["url"] = reference.Url + }; + + if (!string.IsNullOrWhiteSpace(reference.Kind)) + { + referenceDoc["kind"] = reference.Kind; + } + + if (!string.IsNullOrWhiteSpace(reference.Title)) + { + referenceDoc["title"] = reference.Title; + } + + references.Add(referenceDoc); + } + + return new BsonDocument + { + ["advisoryId"] = dto.AdvisoryId, + ["title"] = dto.Title ?? string.Empty, + ["summary"] = dto.Summary ?? string.Empty, + ["published"] = dto.Published.UtcDateTime, + ["cves"] = new BsonArray(dto.CveIds ?? Array.Empty()), + ["packages"] = packages, + ["references"] = references + }; + } + + private static SuseAdvisoryDto FromBson(BsonDocument document) + { + var advisoryId = document.GetValue("advisoryId", string.Empty).AsString; + var title = document.GetValue("title", advisoryId).AsString; + var summary = document.TryGetValue("summary", out var summaryValue) ? summaryValue.AsString : null; + var published = document.TryGetValue("published", out var publishedValue) + ? publishedValue.BsonType switch + { + BsonType.DateTime => DateTime.SpecifyKind(publishedValue.ToUniversalTime(), DateTimeKind.Utc), + BsonType.String when DateTimeOffset.TryParse(publishedValue.AsString, out var parsed) => parsed.ToUniversalTime(), + _ => DateTimeOffset.UtcNow + } + : DateTimeOffset.UtcNow; + + var cves = document.TryGetValue("cves", out var cveArray) && cveArray is BsonArray bsonCves + ? bsonCves.OfType() + .Select(static value => value?.ToString()) + .Where(static value => !string.IsNullOrWhiteSpace(value)) + .Select(static value => value!) + .Distinct(StringComparer.OrdinalIgnoreCase) + .ToArray() + : Array.Empty(); + + var packageList = new List(); + if (document.TryGetValue("packages", out var packageArray) && packageArray is BsonArray bsonPackages) + { + foreach (var element in bsonPackages.OfType()) + { + var package = element.GetValue("package", string.Empty).AsString; + var platform = element.GetValue("platform", string.Empty).AsString; + var canonical = element.GetValue("canonical", string.Empty).AsString; + var status = element.GetValue("status", "unknown").AsString; + + var architecture = element.TryGetValue("arch", out var archValue) ? archValue.AsString : null; + var introduced = element.TryGetValue("introduced", out var introducedValue) ? introducedValue.AsString : null; + var fixedVersion = element.TryGetValue("fixed", out var fixedValue) ? fixedValue.AsString : null; + var last = element.TryGetValue("last", out var lastValue) ? lastValue.AsString : null; + + packageList.Add(new SusePackageStateDto( + package, + platform, + architecture, + canonical, + introduced, + fixedVersion, + last, + status)); + } + } + + var referenceList = new List(); + if (document.TryGetValue("references", out var referenceArray) && referenceArray is BsonArray bsonReferences) + { + foreach (var element in bsonReferences.OfType()) + { + var url = element.GetValue("url", string.Empty).AsString; + if (string.IsNullOrWhiteSpace(url)) + { + continue; + } + + referenceList.Add(new SuseReferenceDto( + url, + element.TryGetValue("kind", out var kindValue) ? kindValue.AsString : null, + element.TryGetValue("title", out var titleValue) ? titleValue.AsString : null)); + } + } + + return new SuseAdvisoryDto( + advisoryId, + string.IsNullOrWhiteSpace(title) ? advisoryId : title, + string.IsNullOrWhiteSpace(summary) ? null : summary, + published, + cves, + packageList, + referenceList); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Ubuntu/UbuntuConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Ubuntu/UbuntuConnector.cs index b5718c944..6579471c4 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Ubuntu/UbuntuConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Distro.Ubuntu/UbuntuConnector.cs @@ -1,6 +1,6 @@ -using System; -using System.Collections.Generic; -using System.Linq; +using System; +using System.Collections.Generic; +using System.Linq; using System.Globalization; using System.Text; using Microsoft.Extensions.Logging; @@ -17,524 +17,524 @@ using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Concelier.Storage.Mongo.Dtos; using StellaOps.Plugin; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.Distro.Ubuntu; - -public sealed class UbuntuConnector : IFeedConnector -{ - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly UbuntuOptions _options; + +namespace StellaOps.Concelier.Connector.Distro.Ubuntu; + +public sealed class UbuntuConnector : IFeedConnector +{ + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly UbuntuOptions _options; private readonly TimeProvider _timeProvider; private readonly ILogger _logger; private readonly ICryptoHash _hash; - - private static readonly Action LogMapped = - LoggerMessage.Define( - LogLevel.Information, - new EventId(1, "UbuntuMapped"), - "Ubuntu notice {NoticeId} mapped with {PackageCount} packages"); - - public UbuntuConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, + + private static readonly Action LogMapped = + LoggerMessage.Define( + LogLevel.Information, + new EventId(1, "UbuntuMapped"), + "Ubuntu notice {NoticeId} mapped with {PackageCount} packages"); + + public UbuntuConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, IOptions options, TimeProvider? timeProvider, ILogger logger, ICryptoHash cryptoHash) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); _options.Validate(); _timeProvider = timeProvider ?? TimeProvider.System; _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _hash = cryptoHash ?? throw new ArgumentNullException(nameof(cryptoHash)); - } - - public string SourceName => UbuntuConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); - var pendingMappings = new HashSet(cursor.PendingMappings); - var processedIds = new HashSet(cursor.ProcessedNoticeIds, StringComparer.OrdinalIgnoreCase); - - var indexResult = await FetchIndexAsync(cursor, fetchCache, now, cancellationToken).ConfigureAwait(false); - - if (indexResult.IsUnchanged) - { - await UpdateCursorAsync(cursor.WithFetchCache(fetchCache), cancellationToken).ConfigureAwait(false); - return; - } - - if (indexResult.Notices.Count == 0) - { - await UpdateCursorAsync(cursor.WithFetchCache(fetchCache), cancellationToken).ConfigureAwait(false); - return; - } - - var notices = indexResult.Notices; - - var baseline = (cursor.LastPublished ?? (now - _options.InitialBackfill)) - _options.ResumeOverlap; - if (baseline < DateTimeOffset.UnixEpoch) - { - baseline = DateTimeOffset.UnixEpoch; - } - - ProvenanceDiagnostics.ReportResumeWindow(SourceName, baseline, _logger); - - var candidates = notices - .Where(notice => notice.Published >= baseline) - .OrderBy(notice => notice.Published) - .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) - .ToList(); - - if (candidates.Count == 0) - { - candidates = notices - .OrderByDescending(notice => notice.Published) - .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxNoticesPerFetch) - .OrderBy(notice => notice.Published) - .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) - .ToList(); - } - else if (candidates.Count > _options.MaxNoticesPerFetch) - { - candidates = candidates - .OrderByDescending(notice => notice.Published) - .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxNoticesPerFetch) - .OrderBy(notice => notice.Published) - .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) - .ToList(); - } - - var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; - var processedWindow = new List(candidates.Count); - - foreach (var notice in candidates) - { - cancellationToken.ThrowIfCancellationRequested(); - - var detailUri = new Uri(_options.NoticeDetailBaseUri, notice.NoticeId); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, detailUri.AbsoluteUri, cancellationToken).ConfigureAwait(false); - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["ubuntu.id"] = notice.NoticeId, - ["ubuntu.published"] = notice.Published.ToString("O") - }; - - var dtoDocument = ToBson(notice); - var sha256 = ComputeNoticeHash(dtoDocument); - - var documentId = existing?.Id ?? Guid.NewGuid(); - var record = new DocumentRecord( - documentId, - SourceName, - detailUri.AbsoluteUri, - now, - sha256, - DocumentStatuses.PendingMap, - "application/json", - Headers: null, - Metadata: metadata, - Etag: existing?.Etag, - LastModified: existing?.LastModified ?? notice.Published, - GridFsId: null); - - await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - - var dtoRecord = new DtoRecord(Guid.NewGuid(), record.Id, SourceName, "ubuntu.notice.v1", dtoDocument, now); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - - pendingMappings.Add(record.Id); - processedIds.Add(notice.NoticeId); - processedWindow.Add(notice.NoticeId); - - if (notice.Published > maxPublished) - { - maxPublished = notice.Published; - } - } - - var updatedCursor = cursor - .WithFetchCache(fetchCache) - .WithPendingDocuments(Array.Empty()) - .WithPendingMappings(pendingMappings) - .WithProcessed(maxPublished, processedWindow); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - => Task.CompletedTask; - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pending = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dto is null || document is null) - { - pending.Remove(documentId); - continue; - } - - UbuntuNoticeDto notice; - try - { - notice = FromBson(dto.Payload); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize Ubuntu notice DTO for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pending.Remove(documentId); - continue; - } - - var advisory = UbuntuMapper.Map(notice, document, _timeProvider.GetUtcNow()); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pending.Remove(documentId); - - LogMapped(_logger, notice.NoticeId, advisory.AffectedPackages.Length, null); - } - - var updatedCursor = cursor.WithPendingMappings(pending); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task FetchIndexAsync( - UbuntuCursor cursor, - IDictionary fetchCache, - DateTimeOffset now, - CancellationToken cancellationToken) - { - var pageSize = Math.Clamp(_options.IndexPageSize, 1, UbuntuOptions.MaxPageSize); - var maxNotices = Math.Clamp(_options.MaxNoticesPerFetch, 1, 200); - var maxPages = Math.Max(1, (int)Math.Ceiling(maxNotices / (double)pageSize)); - var aggregated = new List(Math.Min(maxNotices, pageSize * maxPages)); - var seenNoticeIds = new HashSet(StringComparer.OrdinalIgnoreCase); - - var offset = 0; - var totalResults = int.MaxValue; - - for (var pageIndex = 0; pageIndex < maxPages && offset < totalResults; pageIndex++) - { - var pageUri = BuildIndexUri(_options.NoticesEndpoint, offset, pageSize); - var cacheKey = pageUri.ToString(); - - cursor.TryGetCache(cacheKey, out var cachedEntry); - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["ubuntu.type"] = "index", - ["ubuntu.offset"] = offset.ToString(CultureInfo.InvariantCulture), - ["ubuntu.limit"] = pageSize.ToString(CultureInfo.InvariantCulture) - }; - - var indexRequest = new SourceFetchRequest(UbuntuOptions.HttpClientName, SourceName, pageUri) - { - Metadata = metadata, - ETag = cachedEntry?.ETag, - LastModified = cachedEntry?.LastModified, - TimeoutOverride = _options.FetchTimeout, - AcceptHeaders = new[] { "application/json" } - }; - - SourceFetchResult fetchResult; - try - { - fetchResult = await _fetchService.FetchAsync(indexRequest, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Ubuntu notices index fetch failed for {Uri}", pageUri); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - byte[] payload; - - if (fetchResult.IsNotModified) - { - if (pageIndex == 0) - { - if (cursor.FetchCache.TryGetValue(cacheKey, out var existingCache)) - { - fetchCache[cacheKey] = existingCache; - } - - return UbuntuIndexFetchResult.Unchanged(); - } - - if (!cursor.FetchCache.TryGetValue(cacheKey, out var cachedEntryForPage)) - { - break; - } - - fetchCache[cacheKey] = cachedEntryForPage; - - var existingDocument = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); - if (existingDocument is null || !existingDocument.GridFsId.HasValue) - { - break; - } - - payload = await _rawDocumentStorage.DownloadAsync(existingDocument.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - else - { - if (!fetchResult.IsSuccess || fetchResult.Document is null) - { - continue; - } - - fetchCache[cacheKey] = UbuntuFetchCacheEntry.FromDocument(fetchResult.Document); - - if (!fetchResult.Document.GridFsId.HasValue) - { - _logger.LogWarning("Ubuntu index document {DocumentId} missing GridFS payload", fetchResult.Document.Id); - continue; - } - - payload = await _rawDocumentStorage.DownloadAsync(fetchResult.Document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - - var page = UbuntuNoticeParser.ParseIndex(Encoding.UTF8.GetString(payload)); - - if (page.TotalResults > 0) - { - totalResults = page.TotalResults; - } - - foreach (var notice in page.Notices) - { - if (!seenNoticeIds.Add(notice.NoticeId)) - { - continue; - } - - aggregated.Add(notice); - if (aggregated.Count >= maxNotices) - { - break; - } - } - - if (aggregated.Count >= maxNotices) - { - break; - } - - if (page.Notices.Count < pageSize) - { - break; - } - - offset += pageSize; - } - - return new UbuntuIndexFetchResult(false, aggregated); - } - - private static Uri BuildIndexUri(Uri endpoint, int offset, int limit) - { - var builder = new UriBuilder(endpoint); - var queryBuilder = new StringBuilder(); - - if (!string.IsNullOrEmpty(builder.Query)) - { - var existing = builder.Query.TrimStart('?'); - if (!string.IsNullOrEmpty(existing)) - { - queryBuilder.Append(existing); - if (existing[^1] != '&') - { - queryBuilder.Append('&'); - } - } - } - - queryBuilder.Append("offset="); - queryBuilder.Append(offset.ToString(CultureInfo.InvariantCulture)); - queryBuilder.Append("&limit="); - queryBuilder.Append(limit.ToString(CultureInfo.InvariantCulture)); - - builder.Query = queryBuilder.ToString(); - return builder.Uri; - } - - private sealed record UbuntuIndexFetchResult(bool IsUnchanged, IReadOnlyList Notices) - { - public static UbuntuIndexFetchResult Unchanged() - => new(true, Array.Empty()); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? UbuntuCursor.Empty : UbuntuCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(UbuntuCursor cursor, CancellationToken cancellationToken) - { - var doc = cursor.ToBsonDocument(); - await _stateRepository.UpdateCursorAsync(SourceName, doc, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private static string ComputeNoticeHash(BsonDocument document) - { - var bytes = document.ToBson(); + } + + public string SourceName => UbuntuConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); + var pendingMappings = new HashSet(cursor.PendingMappings); + var processedIds = new HashSet(cursor.ProcessedNoticeIds, StringComparer.OrdinalIgnoreCase); + + var indexResult = await FetchIndexAsync(cursor, fetchCache, now, cancellationToken).ConfigureAwait(false); + + if (indexResult.IsUnchanged) + { + await UpdateCursorAsync(cursor.WithFetchCache(fetchCache), cancellationToken).ConfigureAwait(false); + return; + } + + if (indexResult.Notices.Count == 0) + { + await UpdateCursorAsync(cursor.WithFetchCache(fetchCache), cancellationToken).ConfigureAwait(false); + return; + } + + var notices = indexResult.Notices; + + var baseline = (cursor.LastPublished ?? (now - _options.InitialBackfill)) - _options.ResumeOverlap; + if (baseline < DateTimeOffset.UnixEpoch) + { + baseline = DateTimeOffset.UnixEpoch; + } + + ProvenanceDiagnostics.ReportResumeWindow(SourceName, baseline, _logger); + + var candidates = notices + .Where(notice => notice.Published >= baseline) + .OrderBy(notice => notice.Published) + .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) + .ToList(); + + if (candidates.Count == 0) + { + candidates = notices + .OrderByDescending(notice => notice.Published) + .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxNoticesPerFetch) + .OrderBy(notice => notice.Published) + .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + else if (candidates.Count > _options.MaxNoticesPerFetch) + { + candidates = candidates + .OrderByDescending(notice => notice.Published) + .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxNoticesPerFetch) + .OrderBy(notice => notice.Published) + .ThenBy(notice => notice.NoticeId, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; + var processedWindow = new List(candidates.Count); + + foreach (var notice in candidates) + { + cancellationToken.ThrowIfCancellationRequested(); + + var detailUri = new Uri(_options.NoticeDetailBaseUri, notice.NoticeId); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, detailUri.AbsoluteUri, cancellationToken).ConfigureAwait(false); + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["ubuntu.id"] = notice.NoticeId, + ["ubuntu.published"] = notice.Published.ToString("O") + }; + + var dtoDocument = ToBson(notice); + var sha256 = ComputeNoticeHash(dtoDocument); + + var documentId = existing?.Id ?? Guid.NewGuid(); + var record = new DocumentRecord( + documentId, + SourceName, + detailUri.AbsoluteUri, + now, + sha256, + DocumentStatuses.PendingMap, + "application/json", + Headers: null, + Metadata: metadata, + Etag: existing?.Etag, + LastModified: existing?.LastModified ?? notice.Published, + PayloadId: null); + + await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); + + var dtoRecord = new DtoRecord(Guid.NewGuid(), record.Id, SourceName, "ubuntu.notice.v1", dtoDocument, now); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + + pendingMappings.Add(record.Id); + processedIds.Add(notice.NoticeId); + processedWindow.Add(notice.NoticeId); + + if (notice.Published > maxPublished) + { + maxPublished = notice.Published; + } + } + + var updatedCursor = cursor + .WithFetchCache(fetchCache) + .WithPendingDocuments(Array.Empty()) + .WithPendingMappings(pendingMappings) + .WithProcessed(maxPublished, processedWindow); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + => Task.CompletedTask; + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pending = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dto is null || document is null) + { + pending.Remove(documentId); + continue; + } + + UbuntuNoticeDto notice; + try + { + notice = FromBson(dto.Payload); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize Ubuntu notice DTO for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pending.Remove(documentId); + continue; + } + + var advisory = UbuntuMapper.Map(notice, document, _timeProvider.GetUtcNow()); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pending.Remove(documentId); + + LogMapped(_logger, notice.NoticeId, advisory.AffectedPackages.Length, null); + } + + var updatedCursor = cursor.WithPendingMappings(pending); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task FetchIndexAsync( + UbuntuCursor cursor, + IDictionary fetchCache, + DateTimeOffset now, + CancellationToken cancellationToken) + { + var pageSize = Math.Clamp(_options.IndexPageSize, 1, UbuntuOptions.MaxPageSize); + var maxNotices = Math.Clamp(_options.MaxNoticesPerFetch, 1, 200); + var maxPages = Math.Max(1, (int)Math.Ceiling(maxNotices / (double)pageSize)); + var aggregated = new List(Math.Min(maxNotices, pageSize * maxPages)); + var seenNoticeIds = new HashSet(StringComparer.OrdinalIgnoreCase); + + var offset = 0; + var totalResults = int.MaxValue; + + for (var pageIndex = 0; pageIndex < maxPages && offset < totalResults; pageIndex++) + { + var pageUri = BuildIndexUri(_options.NoticesEndpoint, offset, pageSize); + var cacheKey = pageUri.ToString(); + + cursor.TryGetCache(cacheKey, out var cachedEntry); + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["ubuntu.type"] = "index", + ["ubuntu.offset"] = offset.ToString(CultureInfo.InvariantCulture), + ["ubuntu.limit"] = pageSize.ToString(CultureInfo.InvariantCulture) + }; + + var indexRequest = new SourceFetchRequest(UbuntuOptions.HttpClientName, SourceName, pageUri) + { + Metadata = metadata, + ETag = cachedEntry?.ETag, + LastModified = cachedEntry?.LastModified, + TimeoutOverride = _options.FetchTimeout, + AcceptHeaders = new[] { "application/json" } + }; + + SourceFetchResult fetchResult; + try + { + fetchResult = await _fetchService.FetchAsync(indexRequest, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Ubuntu notices index fetch failed for {Uri}", pageUri); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + byte[] payload; + + if (fetchResult.IsNotModified) + { + if (pageIndex == 0) + { + if (cursor.FetchCache.TryGetValue(cacheKey, out var existingCache)) + { + fetchCache[cacheKey] = existingCache; + } + + return UbuntuIndexFetchResult.Unchanged(); + } + + if (!cursor.FetchCache.TryGetValue(cacheKey, out var cachedEntryForPage)) + { + break; + } + + fetchCache[cacheKey] = cachedEntryForPage; + + var existingDocument = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); + if (existingDocument is null || !existingDocument.PayloadId.HasValue) + { + break; + } + + payload = await _rawDocumentStorage.DownloadAsync(existingDocument.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + else + { + if (!fetchResult.IsSuccess || fetchResult.Document is null) + { + continue; + } + + fetchCache[cacheKey] = UbuntuFetchCacheEntry.FromDocument(fetchResult.Document); + + if (!fetchResult.Document.PayloadId.HasValue) + { + _logger.LogWarning("Ubuntu index document {DocumentId} missing GridFS payload", fetchResult.Document.Id); + continue; + } + + payload = await _rawDocumentStorage.DownloadAsync(fetchResult.Document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + + var page = UbuntuNoticeParser.ParseIndex(Encoding.UTF8.GetString(payload)); + + if (page.TotalResults > 0) + { + totalResults = page.TotalResults; + } + + foreach (var notice in page.Notices) + { + if (!seenNoticeIds.Add(notice.NoticeId)) + { + continue; + } + + aggregated.Add(notice); + if (aggregated.Count >= maxNotices) + { + break; + } + } + + if (aggregated.Count >= maxNotices) + { + break; + } + + if (page.Notices.Count < pageSize) + { + break; + } + + offset += pageSize; + } + + return new UbuntuIndexFetchResult(false, aggregated); + } + + private static Uri BuildIndexUri(Uri endpoint, int offset, int limit) + { + var builder = new UriBuilder(endpoint); + var queryBuilder = new StringBuilder(); + + if (!string.IsNullOrEmpty(builder.Query)) + { + var existing = builder.Query.TrimStart('?'); + if (!string.IsNullOrEmpty(existing)) + { + queryBuilder.Append(existing); + if (existing[^1] != '&') + { + queryBuilder.Append('&'); + } + } + } + + queryBuilder.Append("offset="); + queryBuilder.Append(offset.ToString(CultureInfo.InvariantCulture)); + queryBuilder.Append("&limit="); + queryBuilder.Append(limit.ToString(CultureInfo.InvariantCulture)); + + builder.Query = queryBuilder.ToString(); + return builder.Uri; + } + + private sealed record UbuntuIndexFetchResult(bool IsUnchanged, IReadOnlyList Notices) + { + public static UbuntuIndexFetchResult Unchanged() + => new(true, Array.Empty()); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? UbuntuCursor.Empty : UbuntuCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(UbuntuCursor cursor, CancellationToken cancellationToken) + { + var doc = cursor.ToBsonDocument(); + await _stateRepository.UpdateCursorAsync(SourceName, doc, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private static string ComputeNoticeHash(BsonDocument document) + { + var bytes = document.ToBson(); var hash = _hash.ComputeHash(bytes, HashAlgorithms.Sha256); return Convert.ToHexString(hash).ToLowerInvariant(); } - - private static BsonDocument ToBson(UbuntuNoticeDto notice) - { - var packages = new BsonArray(); - foreach (var package in notice.Packages) - { - packages.Add(new BsonDocument - { - ["release"] = package.Release, - ["package"] = package.Package, - ["version"] = package.Version, - ["pocket"] = package.Pocket, - ["isSource"] = package.IsSource - }); - } - - var references = new BsonArray(); - foreach (var reference in notice.References) - { - var doc = new BsonDocument - { - ["url"] = reference.Url - }; - - if (!string.IsNullOrWhiteSpace(reference.Kind)) - { - doc["kind"] = reference.Kind; - } - - if (!string.IsNullOrWhiteSpace(reference.Title)) - { - doc["title"] = reference.Title; - } - - references.Add(doc); - } - - return new BsonDocument - { - ["noticeId"] = notice.NoticeId, - ["published"] = notice.Published.UtcDateTime, - ["title"] = notice.Title, - ["summary"] = notice.Summary, - ["cves"] = new BsonArray(notice.CveIds ?? Array.Empty()), - ["packages"] = packages, - ["references"] = references - }; - } - - private static UbuntuNoticeDto FromBson(BsonDocument document) - { - var noticeId = document.GetValue("noticeId", string.Empty).AsString; - var published = document.TryGetValue("published", out var publishedValue) - ? publishedValue.BsonType switch - { - BsonType.DateTime => DateTime.SpecifyKind(publishedValue.ToUniversalTime(), DateTimeKind.Utc), - BsonType.String when DateTimeOffset.TryParse(publishedValue.AsString, out var parsed) => parsed.ToUniversalTime(), - _ => DateTimeOffset.UtcNow - } - : DateTimeOffset.UtcNow; - - var title = document.GetValue("title", noticeId).AsString; - var summary = document.GetValue("summary", string.Empty).AsString; - - var cves = document.TryGetValue("cves", out var cveArray) && cveArray is BsonArray cveBson - ? cveBson.OfType() - .Select(static value => value?.ToString()) - .Where(static value => !string.IsNullOrWhiteSpace(value)) - .Select(static value => value!) - .ToArray() - : Array.Empty(); - - var packages = new List(); - if (document.TryGetValue("packages", out var packageArray) && packageArray is BsonArray packageBson) - { - foreach (var element in packageBson.OfType()) - { - packages.Add(new UbuntuReleasePackageDto( - Release: element.GetValue("release", string.Empty).AsString, - Package: element.GetValue("package", string.Empty).AsString, - Version: element.GetValue("version", string.Empty).AsString, - Pocket: element.GetValue("pocket", string.Empty).AsString, - IsSource: element.TryGetValue("isSource", out var sourceValue) && sourceValue.AsBoolean)); - } - } - - var references = new List(); - if (document.TryGetValue("references", out var referenceArray) && referenceArray is BsonArray referenceBson) - { - foreach (var element in referenceBson.OfType()) - { - var url = element.GetValue("url", string.Empty).AsString; - if (string.IsNullOrWhiteSpace(url)) - { - continue; - } - - references.Add(new UbuntuReferenceDto( - url, - element.TryGetValue("kind", out var kindValue) ? kindValue.AsString : null, - element.TryGetValue("title", out var titleValue) ? titleValue.AsString : null)); - } - } - - return new UbuntuNoticeDto( - noticeId, - published, - title, - summary, - cves, - packages, - references); - } -} + + private static BsonDocument ToBson(UbuntuNoticeDto notice) + { + var packages = new BsonArray(); + foreach (var package in notice.Packages) + { + packages.Add(new BsonDocument + { + ["release"] = package.Release, + ["package"] = package.Package, + ["version"] = package.Version, + ["pocket"] = package.Pocket, + ["isSource"] = package.IsSource + }); + } + + var references = new BsonArray(); + foreach (var reference in notice.References) + { + var doc = new BsonDocument + { + ["url"] = reference.Url + }; + + if (!string.IsNullOrWhiteSpace(reference.Kind)) + { + doc["kind"] = reference.Kind; + } + + if (!string.IsNullOrWhiteSpace(reference.Title)) + { + doc["title"] = reference.Title; + } + + references.Add(doc); + } + + return new BsonDocument + { + ["noticeId"] = notice.NoticeId, + ["published"] = notice.Published.UtcDateTime, + ["title"] = notice.Title, + ["summary"] = notice.Summary, + ["cves"] = new BsonArray(notice.CveIds ?? Array.Empty()), + ["packages"] = packages, + ["references"] = references + }; + } + + private static UbuntuNoticeDto FromBson(BsonDocument document) + { + var noticeId = document.GetValue("noticeId", string.Empty).AsString; + var published = document.TryGetValue("published", out var publishedValue) + ? publishedValue.BsonType switch + { + BsonType.DateTime => DateTime.SpecifyKind(publishedValue.ToUniversalTime(), DateTimeKind.Utc), + BsonType.String when DateTimeOffset.TryParse(publishedValue.AsString, out var parsed) => parsed.ToUniversalTime(), + _ => DateTimeOffset.UtcNow + } + : DateTimeOffset.UtcNow; + + var title = document.GetValue("title", noticeId).AsString; + var summary = document.GetValue("summary", string.Empty).AsString; + + var cves = document.TryGetValue("cves", out var cveArray) && cveArray is BsonArray cveBson + ? cveBson.OfType() + .Select(static value => value?.ToString()) + .Where(static value => !string.IsNullOrWhiteSpace(value)) + .Select(static value => value!) + .ToArray() + : Array.Empty(); + + var packages = new List(); + if (document.TryGetValue("packages", out var packageArray) && packageArray is BsonArray packageBson) + { + foreach (var element in packageBson.OfType()) + { + packages.Add(new UbuntuReleasePackageDto( + Release: element.GetValue("release", string.Empty).AsString, + Package: element.GetValue("package", string.Empty).AsString, + Version: element.GetValue("version", string.Empty).AsString, + Pocket: element.GetValue("pocket", string.Empty).AsString, + IsSource: element.TryGetValue("isSource", out var sourceValue) && sourceValue.AsBoolean)); + } + } + + var references = new List(); + if (document.TryGetValue("references", out var referenceArray) && referenceArray is BsonArray referenceBson) + { + foreach (var element in referenceBson.OfType()) + { + var url = element.GetValue("url", string.Empty).AsString; + if (string.IsNullOrWhiteSpace(url)) + { + continue; + } + + references.Add(new UbuntuReferenceDto( + url, + element.TryGetValue("kind", out var kindValue) ? kindValue.AsString : null, + element.TryGetValue("title", out var titleValue) ? titleValue.AsString : null)); + } + } + + return new UbuntuNoticeDto( + noticeId, + published, + title, + summary, + cves, + packages, + references); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ghsa/GhsaConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ghsa/GhsaConnector.cs index b5e43f05a..7f1fc1498 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ghsa/GhsaConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ghsa/GhsaConnector.cs @@ -1,547 +1,547 @@ -using System.Collections.Generic; -using System.Globalization; -using System.Linq; -using System.Net.Http; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Ghsa.Configuration; -using StellaOps.Concelier.Connector.Ghsa.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Ghsa; - -public sealed class GhsaConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - PropertyNameCaseInsensitive = true, - WriteIndented = false, - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly GhsaOptions _options; - private readonly GhsaDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - private readonly object _rateLimitWarningLock = new(); - private readonly Dictionary<(string Phase, string Resource), bool> _rateLimitWarnings = new(); - - public GhsaConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - GhsaDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => GhsaConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var now = _timeProvider.GetUtcNow(); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - var since = cursor.CurrentWindowStart ?? cursor.LastUpdatedExclusive ?? now - _options.InitialBackfill; - if (since > now) - { - since = now; - } - - var until = cursor.CurrentWindowEnd ?? now; - if (until <= since) - { - until = since + TimeSpan.FromMinutes(1); - } - - var page = cursor.NextPage <= 0 ? 1 : cursor.NextPage; - var pagesFetched = 0; - var hasMore = true; - var rateLimitHit = false; - DateTimeOffset? maxUpdated = cursor.LastUpdatedExclusive; - - while (hasMore && pagesFetched < _options.MaxPagesPerFetch) - { - cancellationToken.ThrowIfCancellationRequested(); - - var listUri = BuildListUri(since, until, page, _options.PageSize); - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["since"] = since.ToString("O"), - ["until"] = until.ToString("O"), - ["page"] = page.ToString(CultureInfo.InvariantCulture), - ["pageSize"] = _options.PageSize.ToString(CultureInfo.InvariantCulture), - }; - - SourceFetchContentResult listResult; - try - { - _diagnostics.FetchAttempt(); - listResult = await _fetchService.FetchContentAsync( - new SourceFetchRequest( - GhsaOptions.HttpClientName, - SourceName, - listUri) - { - Metadata = metadata, - AcceptHeaders = new[] { "application/vnd.github+json" }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (HttpRequestException ex) - { - _diagnostics.FetchFailure(); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (listResult.IsNotModified) - { - _diagnostics.FetchUnchanged(); - break; - } - - if (!listResult.IsSuccess || listResult.Content is null) - { - _diagnostics.FetchFailure(); - break; - } - - var deferList = await ApplyRateLimitAsync(listResult.Headers, "list", cancellationToken).ConfigureAwait(false); - if (deferList) - { - rateLimitHit = true; - break; - } - - var pageModel = GhsaListParser.Parse(listResult.Content, page, _options.PageSize); - - if (pageModel.Items.Count == 0) - { - hasMore = false; - } - - foreach (var item in pageModel.Items) - { - cancellationToken.ThrowIfCancellationRequested(); - - var detailUri = BuildDetailUri(item.GhsaId); - var detailMetadata = new Dictionary(StringComparer.Ordinal) - { - ["ghsaId"] = item.GhsaId, - ["page"] = page.ToString(CultureInfo.InvariantCulture), - ["since"] = since.ToString("O"), - ["until"] = until.ToString("O"), - }; - - SourceFetchResult detailResult; - try - { - detailResult = await _fetchService.FetchAsync( - new SourceFetchRequest( - GhsaOptions.HttpClientName, - SourceName, - detailUri) - { - Metadata = detailMetadata, - AcceptHeaders = new[] { "application/vnd.github+json" }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (HttpRequestException ex) - { - _diagnostics.FetchFailure(); - _logger.LogWarning(ex, "Failed fetching GHSA advisory {GhsaId}", item.GhsaId); - continue; - } - - if (detailResult.IsNotModified) - { - _diagnostics.FetchUnchanged(); - continue; - } - - if (!detailResult.IsSuccess || detailResult.Document is null) - { - _diagnostics.FetchFailure(); - continue; - } - - _diagnostics.FetchDocument(); - pendingDocuments.Add(detailResult.Document.Id); - pendingMappings.Add(detailResult.Document.Id); - - var deferDetail = await ApplyRateLimitAsync(detailResult.Document.Headers, "detail", cancellationToken).ConfigureAwait(false); - if (deferDetail) - { - rateLimitHit = true; - break; - } - } - - if (rateLimitHit) - { - break; - } - - if (pageModel.MaxUpdated.HasValue) - { - if (!maxUpdated.HasValue || pageModel.MaxUpdated > maxUpdated) - { - maxUpdated = pageModel.MaxUpdated; - } - } - - hasMore = pageModel.HasMorePages; - page = pageModel.NextPageCandidate; - pagesFetched++; - - if (!rateLimitHit && hasMore && _options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - if (hasMore || rateLimitHit) - { - updatedCursor = updatedCursor - .WithCurrentWindowStart(since) - .WithCurrentWindowEnd(until) - .WithNextPage(page); - } - else - { - var nextSince = maxUpdated ?? until; - updatedCursor = updatedCursor - .WithLastUpdatedExclusive(nextSince) - .WithCurrentWindowStart(null) - .WithCurrentWindowEnd(null) - .WithNextPage(1); - } - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure(); - _logger.LogWarning("GHSA document {DocumentId} missing GridFS content", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogError(ex, "Unable to download GHSA raw document {DocumentId}", documentId); - throw; - } - - GhsaRecordDto dto; - try - { - dto = GhsaRecordParser.Parse(rawBytes); - } - catch (JsonException ex) - { - _diagnostics.ParseQuarantine(); - _logger.LogError(ex, "Malformed GHSA JSON for {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - "ghsa/1.0", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - _diagnostics.ParseSuccess(); - } - - var updatedCursor = cursor.WithPendingDocuments(remainingDocuments); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - _logger.LogWarning("Skipping GHSA mapping for {DocumentId}: DTO or document missing", documentId); - pendingMappings.Remove(documentId); - continue; - } - - GhsaRecordDto dto; - try - { - dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions) - ?? throw new InvalidOperationException("Deserialized DTO was null."); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize GHSA DTO for {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var advisory = GhsaMapper.Map(dto, document, dtoRecord.ValidatedAt); - - if (advisory.CvssMetrics.IsEmpty && !string.IsNullOrWhiteSpace(advisory.CanonicalMetricId)) - { - var fallbackSeverity = string.IsNullOrWhiteSpace(advisory.Severity) - ? "unknown" - : advisory.Severity!; - _diagnostics.CanonicalMetricFallback(advisory.CanonicalMetricId!, fallbackSeverity); - if (_logger.IsEnabled(LogLevel.Debug)) - { - _logger.LogDebug( - "GHSA {GhsaId} emitted canonical metric fallback {CanonicalMetricId} (severity {Severity})", - advisory.AdvisoryKey, - advisory.CanonicalMetricId, - fallbackSeverity); - } - } - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - _diagnostics.MapSuccess(1); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private static Uri BuildListUri(DateTimeOffset since, DateTimeOffset until, int page, int pageSize) - { - var query = $"updated_since={Uri.EscapeDataString(since.ToString("O"))}&updated_until={Uri.EscapeDataString(until.ToString("O"))}&page={page}&per_page={pageSize}"; - return new Uri($"security/advisories?{query}", UriKind.Relative); - } - - private static Uri BuildDetailUri(string ghsaId) - { - var encoded = Uri.EscapeDataString(ghsaId); - return new Uri($"security/advisories/{encoded}", UriKind.Relative); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? GhsaCursor.Empty : GhsaCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(GhsaCursor cursor, CancellationToken cancellationToken) - { - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private bool ShouldLogRateLimitWarning(in GhsaRateLimitSnapshot snapshot, out bool recovered) - { - recovered = false; - - if (!snapshot.Remaining.HasValue) - { - return false; - } - - var key = (snapshot.Phase, snapshot.Resource ?? "global"); - var warn = snapshot.Remaining.Value <= _options.RateLimitWarningThreshold; - - lock (_rateLimitWarningLock) - { - var previouslyWarned = _rateLimitWarnings.TryGetValue(key, out var flagged) && flagged; - - if (warn) - { - if (previouslyWarned) - { - return false; - } - - _rateLimitWarnings[key] = true; - return true; - } - - if (previouslyWarned) - { - _rateLimitWarnings.Remove(key); - recovered = true; - } - - return false; - } - } - - private static double? CalculateHeadroomPercentage(in GhsaRateLimitSnapshot snapshot) - { - if (!snapshot.Limit.HasValue || !snapshot.Remaining.HasValue) - { - return null; - } - - var limit = snapshot.Limit.Value; - if (limit <= 0) - { - return null; - } - - return (double)snapshot.Remaining.Value / limit * 100d; - } - - private static string FormatHeadroom(double? headroomPct) - => headroomPct.HasValue ? $" (headroom {headroomPct.Value:F1}%)" : string.Empty; - - private async Task ApplyRateLimitAsync(IReadOnlyDictionary? headers, string phase, CancellationToken cancellationToken) - { - var snapshot = GhsaRateLimitParser.TryParse(headers, _timeProvider.GetUtcNow(), phase); - if (snapshot is null || !snapshot.Value.HasData) - { - return false; - } - - _diagnostics.RecordRateLimit(snapshot.Value); - - var headroomPct = CalculateHeadroomPercentage(snapshot.Value); - if (ShouldLogRateLimitWarning(snapshot.Value, out var recovered)) - { - var resetMessage = snapshot.Value.ResetAfter.HasValue - ? $" (resets in {snapshot.Value.ResetAfter.Value:c})" - : snapshot.Value.ResetAt.HasValue ? $" (resets at {snapshot.Value.ResetAt.Value:O})" : string.Empty; - - _logger.LogWarning( - "GHSA rate limit warning: remaining {Remaining} of {Limit} for {Phase} {Resource}{ResetMessage}{Headroom}", - snapshot.Value.Remaining, - snapshot.Value.Limit, - phase, - snapshot.Value.Resource ?? "global", - resetMessage, - FormatHeadroom(headroomPct)); - } - else if (recovered) - { - _logger.LogInformation( - "GHSA rate limit recovered for {Phase} {Resource}: remaining {Remaining} of {Limit}{Headroom}", - phase, - snapshot.Value.Resource ?? "global", - snapshot.Value.Remaining, - snapshot.Value.Limit, - FormatHeadroom(headroomPct)); - } - - if (snapshot.Value.Remaining.HasValue && snapshot.Value.Remaining.Value <= 0) - { - _diagnostics.RateLimitExhausted(phase); - var delay = snapshot.Value.RetryAfter ?? snapshot.Value.ResetAfter ?? _options.SecondaryRateLimitBackoff; - - if (delay > TimeSpan.Zero) - { - _logger.LogWarning( - "GHSA rate limit exhausted for {Phase} {Resource}; delaying {Delay}{Headroom}", - phase, - snapshot.Value.Resource ?? "global", - delay, - FormatHeadroom(headroomPct)); - await Task.Delay(delay, cancellationToken).ConfigureAwait(false); - } - - return true; - } - - return false; - } -} +using System.Collections.Generic; +using System.Globalization; +using System.Linq; +using System.Net.Http; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Ghsa.Configuration; +using StellaOps.Concelier.Connector.Ghsa.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Ghsa; + +public sealed class GhsaConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + PropertyNameCaseInsensitive = true, + WriteIndented = false, + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly GhsaOptions _options; + private readonly GhsaDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private readonly object _rateLimitWarningLock = new(); + private readonly Dictionary<(string Phase, string Resource), bool> _rateLimitWarnings = new(); + + public GhsaConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + GhsaDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => GhsaConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var now = _timeProvider.GetUtcNow(); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + var since = cursor.CurrentWindowStart ?? cursor.LastUpdatedExclusive ?? now - _options.InitialBackfill; + if (since > now) + { + since = now; + } + + var until = cursor.CurrentWindowEnd ?? now; + if (until <= since) + { + until = since + TimeSpan.FromMinutes(1); + } + + var page = cursor.NextPage <= 0 ? 1 : cursor.NextPage; + var pagesFetched = 0; + var hasMore = true; + var rateLimitHit = false; + DateTimeOffset? maxUpdated = cursor.LastUpdatedExclusive; + + while (hasMore && pagesFetched < _options.MaxPagesPerFetch) + { + cancellationToken.ThrowIfCancellationRequested(); + + var listUri = BuildListUri(since, until, page, _options.PageSize); + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["since"] = since.ToString("O"), + ["until"] = until.ToString("O"), + ["page"] = page.ToString(CultureInfo.InvariantCulture), + ["pageSize"] = _options.PageSize.ToString(CultureInfo.InvariantCulture), + }; + + SourceFetchContentResult listResult; + try + { + _diagnostics.FetchAttempt(); + listResult = await _fetchService.FetchContentAsync( + new SourceFetchRequest( + GhsaOptions.HttpClientName, + SourceName, + listUri) + { + Metadata = metadata, + AcceptHeaders = new[] { "application/vnd.github+json" }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (HttpRequestException ex) + { + _diagnostics.FetchFailure(); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (listResult.IsNotModified) + { + _diagnostics.FetchUnchanged(); + break; + } + + if (!listResult.IsSuccess || listResult.Content is null) + { + _diagnostics.FetchFailure(); + break; + } + + var deferList = await ApplyRateLimitAsync(listResult.Headers, "list", cancellationToken).ConfigureAwait(false); + if (deferList) + { + rateLimitHit = true; + break; + } + + var pageModel = GhsaListParser.Parse(listResult.Content, page, _options.PageSize); + + if (pageModel.Items.Count == 0) + { + hasMore = false; + } + + foreach (var item in pageModel.Items) + { + cancellationToken.ThrowIfCancellationRequested(); + + var detailUri = BuildDetailUri(item.GhsaId); + var detailMetadata = new Dictionary(StringComparer.Ordinal) + { + ["ghsaId"] = item.GhsaId, + ["page"] = page.ToString(CultureInfo.InvariantCulture), + ["since"] = since.ToString("O"), + ["until"] = until.ToString("O"), + }; + + SourceFetchResult detailResult; + try + { + detailResult = await _fetchService.FetchAsync( + new SourceFetchRequest( + GhsaOptions.HttpClientName, + SourceName, + detailUri) + { + Metadata = detailMetadata, + AcceptHeaders = new[] { "application/vnd.github+json" }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (HttpRequestException ex) + { + _diagnostics.FetchFailure(); + _logger.LogWarning(ex, "Failed fetching GHSA advisory {GhsaId}", item.GhsaId); + continue; + } + + if (detailResult.IsNotModified) + { + _diagnostics.FetchUnchanged(); + continue; + } + + if (!detailResult.IsSuccess || detailResult.Document is null) + { + _diagnostics.FetchFailure(); + continue; + } + + _diagnostics.FetchDocument(); + pendingDocuments.Add(detailResult.Document.Id); + pendingMappings.Add(detailResult.Document.Id); + + var deferDetail = await ApplyRateLimitAsync(detailResult.Document.Headers, "detail", cancellationToken).ConfigureAwait(false); + if (deferDetail) + { + rateLimitHit = true; + break; + } + } + + if (rateLimitHit) + { + break; + } + + if (pageModel.MaxUpdated.HasValue) + { + if (!maxUpdated.HasValue || pageModel.MaxUpdated > maxUpdated) + { + maxUpdated = pageModel.MaxUpdated; + } + } + + hasMore = pageModel.HasMorePages; + page = pageModel.NextPageCandidate; + pagesFetched++; + + if (!rateLimitHit && hasMore && _options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + if (hasMore || rateLimitHit) + { + updatedCursor = updatedCursor + .WithCurrentWindowStart(since) + .WithCurrentWindowEnd(until) + .WithNextPage(page); + } + else + { + var nextSince = maxUpdated ?? until; + updatedCursor = updatedCursor + .WithLastUpdatedExclusive(nextSince) + .WithCurrentWindowStart(null) + .WithCurrentWindowEnd(null) + .WithNextPage(1); + } + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure(); + _logger.LogWarning("GHSA document {DocumentId} missing GridFS content", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogError(ex, "Unable to download GHSA raw document {DocumentId}", documentId); + throw; + } + + GhsaRecordDto dto; + try + { + dto = GhsaRecordParser.Parse(rawBytes); + } + catch (JsonException ex) + { + _diagnostics.ParseQuarantine(); + _logger.LogError(ex, "Malformed GHSA JSON for {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + "ghsa/1.0", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + _diagnostics.ParseSuccess(); + } + + var updatedCursor = cursor.WithPendingDocuments(remainingDocuments); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + _logger.LogWarning("Skipping GHSA mapping for {DocumentId}: DTO or document missing", documentId); + pendingMappings.Remove(documentId); + continue; + } + + GhsaRecordDto dto; + try + { + dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions) + ?? throw new InvalidOperationException("Deserialized DTO was null."); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize GHSA DTO for {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var advisory = GhsaMapper.Map(dto, document, dtoRecord.ValidatedAt); + + if (advisory.CvssMetrics.IsEmpty && !string.IsNullOrWhiteSpace(advisory.CanonicalMetricId)) + { + var fallbackSeverity = string.IsNullOrWhiteSpace(advisory.Severity) + ? "unknown" + : advisory.Severity!; + _diagnostics.CanonicalMetricFallback(advisory.CanonicalMetricId!, fallbackSeverity); + if (_logger.IsEnabled(LogLevel.Debug)) + { + _logger.LogDebug( + "GHSA {GhsaId} emitted canonical metric fallback {CanonicalMetricId} (severity {Severity})", + advisory.AdvisoryKey, + advisory.CanonicalMetricId, + fallbackSeverity); + } + } + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + _diagnostics.MapSuccess(1); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private static Uri BuildListUri(DateTimeOffset since, DateTimeOffset until, int page, int pageSize) + { + var query = $"updated_since={Uri.EscapeDataString(since.ToString("O"))}&updated_until={Uri.EscapeDataString(until.ToString("O"))}&page={page}&per_page={pageSize}"; + return new Uri($"security/advisories?{query}", UriKind.Relative); + } + + private static Uri BuildDetailUri(string ghsaId) + { + var encoded = Uri.EscapeDataString(ghsaId); + return new Uri($"security/advisories/{encoded}", UriKind.Relative); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? GhsaCursor.Empty : GhsaCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(GhsaCursor cursor, CancellationToken cancellationToken) + { + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private bool ShouldLogRateLimitWarning(in GhsaRateLimitSnapshot snapshot, out bool recovered) + { + recovered = false; + + if (!snapshot.Remaining.HasValue) + { + return false; + } + + var key = (snapshot.Phase, snapshot.Resource ?? "global"); + var warn = snapshot.Remaining.Value <= _options.RateLimitWarningThreshold; + + lock (_rateLimitWarningLock) + { + var previouslyWarned = _rateLimitWarnings.TryGetValue(key, out var flagged) && flagged; + + if (warn) + { + if (previouslyWarned) + { + return false; + } + + _rateLimitWarnings[key] = true; + return true; + } + + if (previouslyWarned) + { + _rateLimitWarnings.Remove(key); + recovered = true; + } + + return false; + } + } + + private static double? CalculateHeadroomPercentage(in GhsaRateLimitSnapshot snapshot) + { + if (!snapshot.Limit.HasValue || !snapshot.Remaining.HasValue) + { + return null; + } + + var limit = snapshot.Limit.Value; + if (limit <= 0) + { + return null; + } + + return (double)snapshot.Remaining.Value / limit * 100d; + } + + private static string FormatHeadroom(double? headroomPct) + => headroomPct.HasValue ? $" (headroom {headroomPct.Value:F1}%)" : string.Empty; + + private async Task ApplyRateLimitAsync(IReadOnlyDictionary? headers, string phase, CancellationToken cancellationToken) + { + var snapshot = GhsaRateLimitParser.TryParse(headers, _timeProvider.GetUtcNow(), phase); + if (snapshot is null || !snapshot.Value.HasData) + { + return false; + } + + _diagnostics.RecordRateLimit(snapshot.Value); + + var headroomPct = CalculateHeadroomPercentage(snapshot.Value); + if (ShouldLogRateLimitWarning(snapshot.Value, out var recovered)) + { + var resetMessage = snapshot.Value.ResetAfter.HasValue + ? $" (resets in {snapshot.Value.ResetAfter.Value:c})" + : snapshot.Value.ResetAt.HasValue ? $" (resets at {snapshot.Value.ResetAt.Value:O})" : string.Empty; + + _logger.LogWarning( + "GHSA rate limit warning: remaining {Remaining} of {Limit} for {Phase} {Resource}{ResetMessage}{Headroom}", + snapshot.Value.Remaining, + snapshot.Value.Limit, + phase, + snapshot.Value.Resource ?? "global", + resetMessage, + FormatHeadroom(headroomPct)); + } + else if (recovered) + { + _logger.LogInformation( + "GHSA rate limit recovered for {Phase} {Resource}: remaining {Remaining} of {Limit}{Headroom}", + phase, + snapshot.Value.Resource ?? "global", + snapshot.Value.Remaining, + snapshot.Value.Limit, + FormatHeadroom(headroomPct)); + } + + if (snapshot.Value.Remaining.HasValue && snapshot.Value.Remaining.Value <= 0) + { + _diagnostics.RateLimitExhausted(phase); + var delay = snapshot.Value.RetryAfter ?? snapshot.Value.ResetAfter ?? _options.SecondaryRateLimitBackoff; + + if (delay > TimeSpan.Zero) + { + _logger.LogWarning( + "GHSA rate limit exhausted for {Phase} {Resource}; delaying {Delay}{Headroom}", + phase, + snapshot.Value.Resource ?? "global", + delay, + FormatHeadroom(headroomPct)); + await Task.Delay(delay, cancellationToken).ConfigureAwait(false); + } + + return true; + } + + return false; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Cisa/IcsCisaConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Cisa/IcsCisaConnector.cs index ec1bb6b8f..3f475de4f 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Cisa/IcsCisaConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Cisa/IcsCisaConnector.cs @@ -1,660 +1,660 @@ -using System; -using System.Collections.Generic; -using System.Globalization; -using System.IO; -using System.Linq; -using System.Net; -using System.Net.Http; -using System.Text; -using System.Text.RegularExpressions; -using System.Text.Json; -using System.Text.Json.Serialization; -using System.Threading; -using System.Threading.Tasks; -using AngleSharp.Html.Dom; -using AngleSharp.Html.Parser; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Html; -using StellaOps.Concelier.Connector.Ics.Cisa.Configuration; -using StellaOps.Concelier.Connector.Ics.Cisa.Internal; +using System; +using System.Collections.Generic; +using System.Globalization; +using System.IO; +using System.Linq; +using System.Net; +using System.Net.Http; +using System.Text; +using System.Text.RegularExpressions; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Threading; +using System.Threading.Tasks; +using AngleSharp.Html.Dom; +using AngleSharp.Html.Parser; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Html; +using StellaOps.Concelier.Connector.Ics.Cisa.Configuration; +using StellaOps.Concelier.Connector.Ics.Cisa.Internal; using StellaOps.Concelier.Storage.Mongo; using StellaOps.Concelier.Storage.Mongo.Advisories; using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Concelier.Storage.Mongo.Dtos; using StellaOps.Concelier.Normalization.SemVer; using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Ics.Cisa; - -public sealed class IcsCisaConnector : IFeedConnector -{ + +namespace StellaOps.Concelier.Connector.Ics.Cisa; + +public sealed class IcsCisaConnector : IFeedConnector +{ private const string SchemaVersion = "ics.cisa.feed.v1"; private static readonly string[] RssAcceptHeaders = { "application/rss+xml", "application/xml", "text/xml" }; private static readonly string[] RssFallbackAcceptHeaders = { "application/rss+xml", "application/xml", "text/xml", "*/*" }; private static readonly string[] DetailAcceptHeaders = { "text/html", "application/xhtml+xml", "*/*" }; private static readonly Regex FirmwareRangeRegex = new(@"(?(?:<=?|>=?)?\s*\d+(?:\.\d+){0,2}(?:\s*-\s*\d+(?:\.\d+){0,2})?)", RegexOptions.CultureInvariant); - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly IcsCisaOptions _options; - private readonly IcsCisaFeedParser _parser; - private readonly IcsCisaDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - private readonly HtmlContentSanitizer _htmlSanitizer = new(); - private readonly HtmlParser _htmlParser = new(); - - public IcsCisaConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - IcsCisaFeedParser parser, - IcsCisaDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _parser = parser ?? throw new ArgumentNullException(nameof(parser)); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => IcsCisaConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var now = _timeProvider.GetUtcNow(); - var touched = false; - - foreach (var topic in _options.TopicIds) - { - cancellationToken.ThrowIfCancellationRequested(); - - _diagnostics.FetchAttempt(topic); - var topicUri = _options.BuildTopicUri(topic); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, topicUri.ToString(), cancellationToken).ConfigureAwait(false); - - var request = new SourceFetchRequest(IcsCisaOptions.HttpClientName, SourceName, topicUri) - { - AcceptHeaders = RssAcceptHeaders, - Metadata = new Dictionary(StringComparer.Ordinal) - { - ["icscisa.topicId"] = topic, - }, - }; - - if (existing is not null) - { - request = request with - { - ETag = existing.Etag, - LastModified = existing.LastModified, - }; - } - - SourceFetchResult? result = null; - var documentsAdded = 0; - var usedFallback = false; - - try - { - result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - } - catch (HttpRequestException ex) when (ShouldRetryWithFallback(ex)) - { - _logger.LogWarning(ex, "Retrying CISA ICS topic {TopicId} via Akamai fallback", topic); - _diagnostics.FetchFallback(topic); - usedFallback = true; - var fallbackRequest = request with - { - AcceptHeaders = RssFallbackAcceptHeaders, - Metadata = AppendMetadata(request.Metadata, "icscisa.retry", "akamai"), - }; - - try - { - result = await _fetchService.FetchAsync(fallbackRequest, cancellationToken).ConfigureAwait(false); - } - catch (Exception fallbackEx) when (fallbackEx is HttpRequestException or TaskCanceledException) - { - _diagnostics.FetchFailure(topic); - _logger.LogError(fallbackEx, "Fallback fetch failed for CISA ICS topic {TopicId}", topic); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, fallbackEx.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) - { - _diagnostics.FetchFailure(topic); - _logger.LogError(ex, "Failed to fetch CISA ICS topic {TopicId}", topic); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (result is null) - { - _diagnostics.FetchFailure(topic); - continue; - } - - if (result.IsNotModified) - { - _diagnostics.FetchNotModified(topic); - _logger.LogDebug("CISA ICS topic {TopicId} not modified", topic); - } - else if (result.IsSuccess && result.Document is not null) - { - pendingDocuments.Add(result.Document.Id); - pendingMappings.Remove(result.Document.Id); - touched = true; - documentsAdded++; - _diagnostics.FetchSuccess(topic, 1); - _logger.LogInformation("Fetched CISA ICS topic {TopicId} document {DocumentId}", topic, result.Document.Id); - } - else if (result.IsSuccess) - { - _diagnostics.FetchSuccess(topic, 0); - _logger.LogDebug("CISA ICS topic {TopicId} fetch succeeded without new document (fallback={Fallback})", topic, usedFallback); - } - else - { - _diagnostics.FetchFailure(topic); - _logger.LogWarning("CISA ICS topic {TopicId} returned status {StatusCode}", topic, result.StatusCode); - } - - if (documentsAdded > 0) - { - _logger.LogInformation("CISA ICS topic {TopicId} added {DocumentsAdded} document(s) (fallbackUsed={Fallback})", topic, documentsAdded, usedFallback); - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - - if (!touched) - { - await UpdateCursorAsync(cursor.WithPendingDocuments(pendingDocuments).WithPendingMappings(pendingMappings), cancellationToken).ConfigureAwait(false); - return; - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - DateTimeOffset? latestPublished = cursor.LastPublished; - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var topicId = "unknown"; - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - _diagnostics.ParseFailure(topicId); - continue; - } - - if (document.Metadata is not null && document.Metadata.TryGetValue("icscisa.topicId", out var topicValue)) - { - topicId = topicValue; - } - - if (!document.GridFsId.HasValue) - { - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - _diagnostics.ParseFailure(topicId); - continue; - } - - byte[] bytes; - try - { - bytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to download CISA ICS payload {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - _diagnostics.ParseFailure(topicId); - continue; - } - - IReadOnlyCollection advisories; - try - { - using var stream = new MemoryStream(bytes, writable: false); - var topicUri = Uri.TryCreate(document.Uri, UriKind.Absolute, out var parsed) ? parsed : null; - advisories = _parser.Parse(stream, string.Equals(topicId, "USDHSCISA_19", StringComparison.OrdinalIgnoreCase), parsed); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to parse CISA ICS feed {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - _diagnostics.ParseFailure(topicId); - continue; - } - - var advisoryList = advisories.ToList(); - var detailAttempts = 0; - if (_options.EnableDetailScrape) - { - var enriched = new List(advisoryList.Count); - foreach (var advisory in advisoryList) - { - if (NeedsDetailFetch(advisory)) - { - detailAttempts++; - } - var enrichedAdvisory = await EnrichAdvisoryAsync(advisory, cancellationToken).ConfigureAwait(false); - enriched.Add(enrichedAdvisory); - } - - advisoryList = enriched; - } - - var attachmentTotal = advisoryList.Sum(static advisory => advisory.Attachments is null ? 0 : advisory.Attachments.Count); - - var feedDto = new IcsCisaFeedDto - { - TopicId = topicId, - FeedUri = document.Uri, - Advisories = advisoryList, - }; - - try - { - var json = JsonSerializer.Serialize(feedDto, new JsonSerializerOptions(JsonSerializerDefaults.Web) - { - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - WriteIndented = false, - }); - var bson = BsonDocument.Parse(json); - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - SchemaVersion, - bson, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - pendingMappings.Add(document.Id); - - var docPublished = advisoryList.Count > 0 ? advisoryList.Max(a => a.Published) : (DateTimeOffset?)null; - if (docPublished.HasValue && docPublished > latestPublished) - { - latestPublished = docPublished; - } - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to persist CISA ICS DTO {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - _diagnostics.ParseFailure(topicId); - continue; - } - - _diagnostics.ParseSuccess(topicId, advisoryList.Count, attachmentTotal, detailAttempts); - _logger.LogInformation( - "CISA ICS parse produced advisories={Advisories} attachments={Attachments} detailAttempts={DetailAttempts} topic={TopicId}", - advisoryList.Count, - attachmentTotal, - detailAttempts, - topicId); - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastPublished(latestPublished); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - pendingMappings.Remove(documentId); - _diagnostics.MapFailure("unknown"); - continue; - } - - IcsCisaFeedDto? feedDto; - try - { - var json = dtoRecord.Payload.ToJson(new JsonWriterSettings { OutputMode = JsonOutputMode.RelaxedExtendedJson }); - feedDto = JsonSerializer.Deserialize(json, new JsonSerializerOptions(JsonSerializerDefaults.Web)); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize CISA ICS DTO {DtoId}", dtoRecord.Id); - pendingMappings.Remove(documentId); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - _diagnostics.MapFailure("unknown"); - continue; - } - - if (feedDto is null) - { - pendingMappings.Remove(documentId); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - _diagnostics.MapFailure("unknown"); - continue; - } - - var allMapped = true; - var mappedCount = 0; - foreach (var advisoryDto in feedDto.Advisories) - { - try - { - var advisory = MapAdvisory(dtoRecord, feedDto, advisoryDto); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - _diagnostics.MapSuccess( - advisoryDto.AdvisoryId, - advisory.References.Length, - advisory.AffectedPackages.Length, - advisory.Aliases.Length); - mappedCount++; - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to map CISA ICS advisory {AdvisoryId}", advisoryDto.AdvisoryId); - _diagnostics.MapFailure(advisoryDto.AdvisoryId); - allMapped = false; - } - } - - pendingMappings.Remove(documentId); - - if (!allMapped) - { - _logger.LogWarning( - "CISA ICS mapping failed for document {DocumentId} (mapped={MappedCount} of {Total})", - documentId, - mappedCount, - feedDto.Advisories.Count); - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - continue; - } - - await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - _logger.LogInformation("CISA ICS mapped {MappedCount} advisories from document {DocumentId}", mappedCount, documentId); - } - - await UpdateCursorAsync(cursor.WithPendingMappings(pendingMappings), cancellationToken).ConfigureAwait(false); - } - - private Advisory MapAdvisory(DtoRecord dtoRecord, IcsCisaFeedDto feedDto, IcsCisaAdvisoryDto advisoryDto) - { - var recordedAt = dtoRecord.ValidatedAt; - var fetchProvenance = new AdvisoryProvenance(SourceName, "feed", feedDto.FeedUri, recordedAt); - var mappingProvenance = new AdvisoryProvenance(SourceName, "mapping", advisoryDto.AdvisoryId, _timeProvider.GetUtcNow()); - - var aliases = CombineAliases(advisoryDto); - var references = BuildReferences(advisoryDto, recordedAt).ToList(); - var mitigationReferences = BuildMitigationReferences(advisoryDto, recordedAt); - if (mitigationReferences.Count > 0) - { - references.AddRange(mitigationReferences); - } - - var affectedPackages = BuildAffectedPackages(advisoryDto, recordedAt); - - return new Advisory( - advisoryDto.AdvisoryId, - advisoryDto.Title, - advisoryDto.Summary, - language: "en", - published: advisoryDto.Published, - modified: advisoryDto.Updated ?? advisoryDto.Published, - severity: null, - exploitKnown: false, - aliases: aliases, - references: references, - affectedPackages: affectedPackages, - cvssMetrics: Array.Empty(), - provenance: new[] { fetchProvenance, mappingProvenance }); - } - - internal static IReadOnlyCollection CombineAliases(IcsCisaAdvisoryDto advisoryDto) - { - var set = new HashSet(StringComparer.OrdinalIgnoreCase); - - if (advisoryDto.Aliases is not null) - { - foreach (var alias in advisoryDto.Aliases) - { - if (string.IsNullOrWhiteSpace(alias)) - { - continue; - } - - set.Add(alias.Trim()); - } - } - - if (advisoryDto.CveIds is not null) - { - foreach (var cve in advisoryDto.CveIds) - { - if (string.IsNullOrWhiteSpace(cve)) - { - continue; - } - - set.Add(cve.Trim()); - } - } - - return set.Count == 0 - ? Array.Empty() - : set.OrderBy(static value => value, StringComparer.Ordinal).ToArray(); - } - - internal static IReadOnlyCollection BuildMitigationReferences(IcsCisaAdvisoryDto advisoryDto, DateTimeOffset recordedAt) - { - if (advisoryDto.Mitigations is null || advisoryDto.Mitigations.Count == 0) - { - return Array.Empty(); - } - - var references = new List(); - var baseUrl = Validation.LooksLikeHttpUrl(advisoryDto.Link) ? advisoryDto.Link : null; - var sourceTag = advisoryDto.IsMedical ? "icscisa-medical-mitigation" : "icscisa-mitigation"; - - var index = 0; - foreach (var mitigation in advisoryDto.Mitigations) - { - index++; - if (string.IsNullOrWhiteSpace(mitigation)) - { - continue; - } - - var summary = mitigation.Trim(); - var url = baseUrl is not null - ? $"{baseUrl}#mitigation-{index}" - : $"icscisa:mitigation:{advisoryDto.AdvisoryId}:{index}"; - - references.Add(new AdvisoryReference( - url, - kind: "mitigation", - sourceTag: sourceTag, - summary: summary, - provenance: new AdvisoryProvenance("ics-cisa", "mitigation", url, recordedAt))); - } - - return references.Count == 0 ? Array.Empty() : references; - } - - internal static IReadOnlyCollection BuildReferences(IcsCisaAdvisoryDto advisoryDto, DateTimeOffset recordedAt) - { - var references = new List(); - var seen = new HashSet(StringComparer.OrdinalIgnoreCase); - - if (advisoryDto.Attachments is { Count: > 0 }) - { - foreach (var attachment in advisoryDto.Attachments) - { - if (attachment is null || !Validation.LooksLikeHttpUrl(attachment.Url)) - { - continue; - } - - var url = attachment.Url; - if (!seen.Add(url)) - { - continue; - } - - try - { - references.Add(new AdvisoryReference( - url, - kind: "attachment", - sourceTag: advisoryDto.IsMedical ? "icscisa-medical-attachment" : "icscisa-attachment", - summary: attachment.Title, - provenance: new AdvisoryProvenance("ics-cisa", "attachment", url, recordedAt))); - } - catch (ArgumentException) - { - // ignore invalid URIs - } - } - } - - foreach (var reference in advisoryDto.References ?? Array.Empty()) - { - if (!Validation.LooksLikeHttpUrl(reference)) - { - continue; - } - - if (!seen.Add(reference)) - { - continue; - } - - try - { - references.Add(new AdvisoryReference( - reference, - kind: "advisory", - sourceTag: advisoryDto.IsMedical ? "icscisa-medical" : "icscisa", - summary: null, - provenance: new AdvisoryProvenance("ics-cisa", "reference", reference, recordedAt))); - } - catch (ArgumentException) - { - // ignore invalid URIs - } - } - - if (references.Count == 0 && Validation.LooksLikeHttpUrl(advisoryDto.Link) && seen.Add(advisoryDto.Link)) - { - references.Add(new AdvisoryReference( - advisoryDto.Link, - kind: "advisory", - sourceTag: advisoryDto.IsMedical ? "icscisa-medical" : "icscisa", - summary: null, - provenance: new AdvisoryProvenance("ics-cisa", "reference", advisoryDto.Link, recordedAt))); - } - - return references; - } - - internal static IReadOnlyCollection BuildAffectedPackages(IcsCisaAdvisoryDto advisoryDto, DateTimeOffset recordedAt) - { - var packages = new List(); - var vendors = advisoryDto.Vendors ?? Array.Empty(); - var normalizedVendors = vendors - .Where(static vendor => !string.IsNullOrWhiteSpace(vendor)) - .Select(static vendor => vendor.Trim()) - .Distinct(StringComparer.OrdinalIgnoreCase) - .ToArray(); - - var parsedProducts = (advisoryDto.Products ?? Array.Empty()) - .Where(static product => !string.IsNullOrWhiteSpace(product)) - .Select(ParseProductInfo) - .Where(static product => !string.IsNullOrWhiteSpace(product.Name)) - .ToArray(); - + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly IcsCisaOptions _options; + private readonly IcsCisaFeedParser _parser; + private readonly IcsCisaDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private readonly HtmlContentSanitizer _htmlSanitizer = new(); + private readonly HtmlParser _htmlParser = new(); + + public IcsCisaConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + IcsCisaFeedParser parser, + IcsCisaDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _parser = parser ?? throw new ArgumentNullException(nameof(parser)); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => IcsCisaConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var now = _timeProvider.GetUtcNow(); + var touched = false; + + foreach (var topic in _options.TopicIds) + { + cancellationToken.ThrowIfCancellationRequested(); + + _diagnostics.FetchAttempt(topic); + var topicUri = _options.BuildTopicUri(topic); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, topicUri.ToString(), cancellationToken).ConfigureAwait(false); + + var request = new SourceFetchRequest(IcsCisaOptions.HttpClientName, SourceName, topicUri) + { + AcceptHeaders = RssAcceptHeaders, + Metadata = new Dictionary(StringComparer.Ordinal) + { + ["icscisa.topicId"] = topic, + }, + }; + + if (existing is not null) + { + request = request with + { + ETag = existing.Etag, + LastModified = existing.LastModified, + }; + } + + SourceFetchResult? result = null; + var documentsAdded = 0; + var usedFallback = false; + + try + { + result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + } + catch (HttpRequestException ex) when (ShouldRetryWithFallback(ex)) + { + _logger.LogWarning(ex, "Retrying CISA ICS topic {TopicId} via Akamai fallback", topic); + _diagnostics.FetchFallback(topic); + usedFallback = true; + var fallbackRequest = request with + { + AcceptHeaders = RssFallbackAcceptHeaders, + Metadata = AppendMetadata(request.Metadata, "icscisa.retry", "akamai"), + }; + + try + { + result = await _fetchService.FetchAsync(fallbackRequest, cancellationToken).ConfigureAwait(false); + } + catch (Exception fallbackEx) when (fallbackEx is HttpRequestException or TaskCanceledException) + { + _diagnostics.FetchFailure(topic); + _logger.LogError(fallbackEx, "Fallback fetch failed for CISA ICS topic {TopicId}", topic); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, fallbackEx.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) + { + _diagnostics.FetchFailure(topic); + _logger.LogError(ex, "Failed to fetch CISA ICS topic {TopicId}", topic); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (result is null) + { + _diagnostics.FetchFailure(topic); + continue; + } + + if (result.IsNotModified) + { + _diagnostics.FetchNotModified(topic); + _logger.LogDebug("CISA ICS topic {TopicId} not modified", topic); + } + else if (result.IsSuccess && result.Document is not null) + { + pendingDocuments.Add(result.Document.Id); + pendingMappings.Remove(result.Document.Id); + touched = true; + documentsAdded++; + _diagnostics.FetchSuccess(topic, 1); + _logger.LogInformation("Fetched CISA ICS topic {TopicId} document {DocumentId}", topic, result.Document.Id); + } + else if (result.IsSuccess) + { + _diagnostics.FetchSuccess(topic, 0); + _logger.LogDebug("CISA ICS topic {TopicId} fetch succeeded without new document (fallback={Fallback})", topic, usedFallback); + } + else + { + _diagnostics.FetchFailure(topic); + _logger.LogWarning("CISA ICS topic {TopicId} returned status {StatusCode}", topic, result.StatusCode); + } + + if (documentsAdded > 0) + { + _logger.LogInformation("CISA ICS topic {TopicId} added {DocumentsAdded} document(s) (fallbackUsed={Fallback})", topic, documentsAdded, usedFallback); + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + + if (!touched) + { + await UpdateCursorAsync(cursor.WithPendingDocuments(pendingDocuments).WithPendingMappings(pendingMappings), cancellationToken).ConfigureAwait(false); + return; + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + DateTimeOffset? latestPublished = cursor.LastPublished; + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var topicId = "unknown"; + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + _diagnostics.ParseFailure(topicId); + continue; + } + + if (document.Metadata is not null && document.Metadata.TryGetValue("icscisa.topicId", out var topicValue)) + { + topicId = topicValue; + } + + if (!document.PayloadId.HasValue) + { + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + _diagnostics.ParseFailure(topicId); + continue; + } + + byte[] bytes; + try + { + bytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download CISA ICS payload {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + _diagnostics.ParseFailure(topicId); + continue; + } + + IReadOnlyCollection advisories; + try + { + using var stream = new MemoryStream(bytes, writable: false); + var topicUri = Uri.TryCreate(document.Uri, UriKind.Absolute, out var parsed) ? parsed : null; + advisories = _parser.Parse(stream, string.Equals(topicId, "USDHSCISA_19", StringComparison.OrdinalIgnoreCase), parsed); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to parse CISA ICS feed {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + _diagnostics.ParseFailure(topicId); + continue; + } + + var advisoryList = advisories.ToList(); + var detailAttempts = 0; + if (_options.EnableDetailScrape) + { + var enriched = new List(advisoryList.Count); + foreach (var advisory in advisoryList) + { + if (NeedsDetailFetch(advisory)) + { + detailAttempts++; + } + var enrichedAdvisory = await EnrichAdvisoryAsync(advisory, cancellationToken).ConfigureAwait(false); + enriched.Add(enrichedAdvisory); + } + + advisoryList = enriched; + } + + var attachmentTotal = advisoryList.Sum(static advisory => advisory.Attachments is null ? 0 : advisory.Attachments.Count); + + var feedDto = new IcsCisaFeedDto + { + TopicId = topicId, + FeedUri = document.Uri, + Advisories = advisoryList, + }; + + try + { + var json = JsonSerializer.Serialize(feedDto, new JsonSerializerOptions(JsonSerializerDefaults.Web) + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + WriteIndented = false, + }); + var bson = BsonDocument.Parse(json); + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + SchemaVersion, + bson, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + pendingMappings.Add(document.Id); + + var docPublished = advisoryList.Count > 0 ? advisoryList.Max(a => a.Published) : (DateTimeOffset?)null; + if (docPublished.HasValue && docPublished > latestPublished) + { + latestPublished = docPublished; + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to persist CISA ICS DTO {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + _diagnostics.ParseFailure(topicId); + continue; + } + + _diagnostics.ParseSuccess(topicId, advisoryList.Count, attachmentTotal, detailAttempts); + _logger.LogInformation( + "CISA ICS parse produced advisories={Advisories} attachments={Attachments} detailAttempts={DetailAttempts} topic={TopicId}", + advisoryList.Count, + attachmentTotal, + detailAttempts, + topicId); + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastPublished(latestPublished); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + pendingMappings.Remove(documentId); + _diagnostics.MapFailure("unknown"); + continue; + } + + IcsCisaFeedDto? feedDto; + try + { + var json = dtoRecord.Payload.ToJson(new JsonWriterSettings { OutputMode = JsonOutputMode.RelaxedExtendedJson }); + feedDto = JsonSerializer.Deserialize(json, new JsonSerializerOptions(JsonSerializerDefaults.Web)); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize CISA ICS DTO {DtoId}", dtoRecord.Id); + pendingMappings.Remove(documentId); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + _diagnostics.MapFailure("unknown"); + continue; + } + + if (feedDto is null) + { + pendingMappings.Remove(documentId); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + _diagnostics.MapFailure("unknown"); + continue; + } + + var allMapped = true; + var mappedCount = 0; + foreach (var advisoryDto in feedDto.Advisories) + { + try + { + var advisory = MapAdvisory(dtoRecord, feedDto, advisoryDto); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + _diagnostics.MapSuccess( + advisoryDto.AdvisoryId, + advisory.References.Length, + advisory.AffectedPackages.Length, + advisory.Aliases.Length); + mappedCount++; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to map CISA ICS advisory {AdvisoryId}", advisoryDto.AdvisoryId); + _diagnostics.MapFailure(advisoryDto.AdvisoryId); + allMapped = false; + } + } + + pendingMappings.Remove(documentId); + + if (!allMapped) + { + _logger.LogWarning( + "CISA ICS mapping failed for document {DocumentId} (mapped={MappedCount} of {Total})", + documentId, + mappedCount, + feedDto.Advisories.Count); + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + continue; + } + + await _documentStore.UpdateStatusAsync(documentId, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + _logger.LogInformation("CISA ICS mapped {MappedCount} advisories from document {DocumentId}", mappedCount, documentId); + } + + await UpdateCursorAsync(cursor.WithPendingMappings(pendingMappings), cancellationToken).ConfigureAwait(false); + } + + private Advisory MapAdvisory(DtoRecord dtoRecord, IcsCisaFeedDto feedDto, IcsCisaAdvisoryDto advisoryDto) + { + var recordedAt = dtoRecord.ValidatedAt; + var fetchProvenance = new AdvisoryProvenance(SourceName, "feed", feedDto.FeedUri, recordedAt); + var mappingProvenance = new AdvisoryProvenance(SourceName, "mapping", advisoryDto.AdvisoryId, _timeProvider.GetUtcNow()); + + var aliases = CombineAliases(advisoryDto); + var references = BuildReferences(advisoryDto, recordedAt).ToList(); + var mitigationReferences = BuildMitigationReferences(advisoryDto, recordedAt); + if (mitigationReferences.Count > 0) + { + references.AddRange(mitigationReferences); + } + + var affectedPackages = BuildAffectedPackages(advisoryDto, recordedAt); + + return new Advisory( + advisoryDto.AdvisoryId, + advisoryDto.Title, + advisoryDto.Summary, + language: "en", + published: advisoryDto.Published, + modified: advisoryDto.Updated ?? advisoryDto.Published, + severity: null, + exploitKnown: false, + aliases: aliases, + references: references, + affectedPackages: affectedPackages, + cvssMetrics: Array.Empty(), + provenance: new[] { fetchProvenance, mappingProvenance }); + } + + internal static IReadOnlyCollection CombineAliases(IcsCisaAdvisoryDto advisoryDto) + { + var set = new HashSet(StringComparer.OrdinalIgnoreCase); + + if (advisoryDto.Aliases is not null) + { + foreach (var alias in advisoryDto.Aliases) + { + if (string.IsNullOrWhiteSpace(alias)) + { + continue; + } + + set.Add(alias.Trim()); + } + } + + if (advisoryDto.CveIds is not null) + { + foreach (var cve in advisoryDto.CveIds) + { + if (string.IsNullOrWhiteSpace(cve)) + { + continue; + } + + set.Add(cve.Trim()); + } + } + + return set.Count == 0 + ? Array.Empty() + : set.OrderBy(static value => value, StringComparer.Ordinal).ToArray(); + } + + internal static IReadOnlyCollection BuildMitigationReferences(IcsCisaAdvisoryDto advisoryDto, DateTimeOffset recordedAt) + { + if (advisoryDto.Mitigations is null || advisoryDto.Mitigations.Count == 0) + { + return Array.Empty(); + } + + var references = new List(); + var baseUrl = Validation.LooksLikeHttpUrl(advisoryDto.Link) ? advisoryDto.Link : null; + var sourceTag = advisoryDto.IsMedical ? "icscisa-medical-mitigation" : "icscisa-mitigation"; + + var index = 0; + foreach (var mitigation in advisoryDto.Mitigations) + { + index++; + if (string.IsNullOrWhiteSpace(mitigation)) + { + continue; + } + + var summary = mitigation.Trim(); + var url = baseUrl is not null + ? $"{baseUrl}#mitigation-{index}" + : $"icscisa:mitigation:{advisoryDto.AdvisoryId}:{index}"; + + references.Add(new AdvisoryReference( + url, + kind: "mitigation", + sourceTag: sourceTag, + summary: summary, + provenance: new AdvisoryProvenance("ics-cisa", "mitigation", url, recordedAt))); + } + + return references.Count == 0 ? Array.Empty() : references; + } + + internal static IReadOnlyCollection BuildReferences(IcsCisaAdvisoryDto advisoryDto, DateTimeOffset recordedAt) + { + var references = new List(); + var seen = new HashSet(StringComparer.OrdinalIgnoreCase); + + if (advisoryDto.Attachments is { Count: > 0 }) + { + foreach (var attachment in advisoryDto.Attachments) + { + if (attachment is null || !Validation.LooksLikeHttpUrl(attachment.Url)) + { + continue; + } + + var url = attachment.Url; + if (!seen.Add(url)) + { + continue; + } + + try + { + references.Add(new AdvisoryReference( + url, + kind: "attachment", + sourceTag: advisoryDto.IsMedical ? "icscisa-medical-attachment" : "icscisa-attachment", + summary: attachment.Title, + provenance: new AdvisoryProvenance("ics-cisa", "attachment", url, recordedAt))); + } + catch (ArgumentException) + { + // ignore invalid URIs + } + } + } + + foreach (var reference in advisoryDto.References ?? Array.Empty()) + { + if (!Validation.LooksLikeHttpUrl(reference)) + { + continue; + } + + if (!seen.Add(reference)) + { + continue; + } + + try + { + references.Add(new AdvisoryReference( + reference, + kind: "advisory", + sourceTag: advisoryDto.IsMedical ? "icscisa-medical" : "icscisa", + summary: null, + provenance: new AdvisoryProvenance("ics-cisa", "reference", reference, recordedAt))); + } + catch (ArgumentException) + { + // ignore invalid URIs + } + } + + if (references.Count == 0 && Validation.LooksLikeHttpUrl(advisoryDto.Link) && seen.Add(advisoryDto.Link)) + { + references.Add(new AdvisoryReference( + advisoryDto.Link, + kind: "advisory", + sourceTag: advisoryDto.IsMedical ? "icscisa-medical" : "icscisa", + summary: null, + provenance: new AdvisoryProvenance("ics-cisa", "reference", advisoryDto.Link, recordedAt))); + } + + return references; + } + + internal static IReadOnlyCollection BuildAffectedPackages(IcsCisaAdvisoryDto advisoryDto, DateTimeOffset recordedAt) + { + var packages = new List(); + var vendors = advisoryDto.Vendors ?? Array.Empty(); + var normalizedVendors = vendors + .Where(static vendor => !string.IsNullOrWhiteSpace(vendor)) + .Select(static vendor => vendor.Trim()) + .Distinct(StringComparer.OrdinalIgnoreCase) + .ToArray(); + + var parsedProducts = (advisoryDto.Products ?? Array.Empty()) + .Where(static product => !string.IsNullOrWhiteSpace(product)) + .Select(ParseProductInfo) + .Where(static product => !string.IsNullOrWhiteSpace(product.Name)) + .ToArray(); + if (parsedProducts.Length > 0) { for (var index = 0; index < parsedProducts.Length; index++) @@ -695,29 +695,29 @@ public sealed class IcsCisaConnector : IFeedConnector return packages; } - - if (normalizedVendors.Length == 0) - { - return packages; - } - - foreach (var vendor in normalizedVendors) - { - var provenance = new AdvisoryProvenance("ics-cisa", "affected", vendor, recordedAt); - var vendorExtensions = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["ics.vendor"] = vendor - }; - - var range = new AffectedVersionRange( - rangeKind: "vendor", - introducedVersion: null, - fixedVersion: null, - lastAffectedVersion: null, - rangeExpression: null, - provenance: provenance, - primitives: new RangePrimitives(null, null, null, vendorExtensions)); - + + if (normalizedVendors.Length == 0) + { + return packages; + } + + foreach (var vendor in normalizedVendors) + { + var provenance = new AdvisoryProvenance("ics-cisa", "affected", vendor, recordedAt); + var vendorExtensions = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["ics.vendor"] = vendor + }; + + var range = new AffectedVersionRange( + rangeKind: "vendor", + introducedVersion: null, + fixedVersion: null, + lastAffectedVersion: null, + rangeExpression: null, + provenance: provenance, + primitives: new RangePrimitives(null, null, null, vendorExtensions)); + packages.Add(new AffectedPackage( AffectedPackageTypes.IcsVendor, vendor, @@ -729,8 +729,8 @@ public sealed class IcsCisaConnector : IFeedConnector return packages; } - - + + private static ProductInfo ParseProductInfo(string raw) { var trimmed = raw?.Trim(); @@ -760,89 +760,89 @@ public sealed class IcsCisaConnector : IFeedConnector var parts = trimmed.Split(':', 2); var name = parts[0].Trim(); var versionSegment = parts[1].Trim(); - return new ProductInfo( - string.IsNullOrWhiteSpace(name) ? trimmed : name, - string.IsNullOrWhiteSpace(versionSegment) ? null : versionSegment); - } - - var lastSpace = trimmed.LastIndexOf(' '); - if (lastSpace > 0) - { - var candidateVersion = trimmed[(lastSpace + 1)..].Trim(); - if (Regex.IsMatch(candidateVersion, "^[vV]?[0-9].*")) - { - var name = trimmed[..lastSpace].Trim(); - return new ProductInfo( - string.IsNullOrWhiteSpace(name) ? trimmed : name, - candidateVersion); - } - } - - return new ProductInfo(trimmed, null); - } - - private static SemVerPrimitive? TryCreateSemVerPrimitive(string? versionExpression) - { - if (string.IsNullOrWhiteSpace(versionExpression)) - { - return null; - } - - var normalized = NormalizeSemVer(versionExpression); - if (normalized is null) - { - var trimmed = versionExpression.Trim(); - if (trimmed.StartsWith("v", StringComparison.OrdinalIgnoreCase)) - { - trimmed = trimmed[1..]; - } - - if (Version.TryParse(trimmed, out var parsed)) - { - normalized = string.Join('.', new[] - { - parsed.Major.ToString(CultureInfo.InvariantCulture), - parsed.Minor >= 0 ? parsed.Minor.ToString(CultureInfo.InvariantCulture) : "0", - parsed.Build >= 0 ? parsed.Build.ToString(CultureInfo.InvariantCulture) : "0", - }); - } - } - - if (normalized is null) - { - return null; - } - - return new SemVerPrimitive( - null, - true, - null, - true, - null, - true, - null, - normalized); - } - + return new ProductInfo( + string.IsNullOrWhiteSpace(name) ? trimmed : name, + string.IsNullOrWhiteSpace(versionSegment) ? null : versionSegment); + } + + var lastSpace = trimmed.LastIndexOf(' '); + if (lastSpace > 0) + { + var candidateVersion = trimmed[(lastSpace + 1)..].Trim(); + if (Regex.IsMatch(candidateVersion, "^[vV]?[0-9].*")) + { + var name = trimmed[..lastSpace].Trim(); + return new ProductInfo( + string.IsNullOrWhiteSpace(name) ? trimmed : name, + candidateVersion); + } + } + + return new ProductInfo(trimmed, null); + } + + private static SemVerPrimitive? TryCreateSemVerPrimitive(string? versionExpression) + { + if (string.IsNullOrWhiteSpace(versionExpression)) + { + return null; + } + + var normalized = NormalizeSemVer(versionExpression); + if (normalized is null) + { + var trimmed = versionExpression.Trim(); + if (trimmed.StartsWith("v", StringComparison.OrdinalIgnoreCase)) + { + trimmed = trimmed[1..]; + } + + if (Version.TryParse(trimmed, out var parsed)) + { + normalized = string.Join('.', new[] + { + parsed.Major.ToString(CultureInfo.InvariantCulture), + parsed.Minor >= 0 ? parsed.Minor.ToString(CultureInfo.InvariantCulture) : "0", + parsed.Build >= 0 ? parsed.Build.ToString(CultureInfo.InvariantCulture) : "0", + }); + } + } + + if (normalized is null) + { + return null; + } + + return new SemVerPrimitive( + null, + true, + null, + true, + null, + true, + null, + normalized); + } + private static string? NormalizeSemVer(string rawVersion) { var trimmed = rawVersion.Trim(); if (trimmed.StartsWith("v", StringComparison.OrdinalIgnoreCase)) { - trimmed = trimmed[1..]; - } - - if (!Regex.IsMatch(trimmed, @"^[0-9]+(\.[0-9]+){0,2}$")) - { - return null; - } - - var parts = trimmed.Split('.', StringSplitOptions.RemoveEmptyEntries); - var components = parts.Take(3).ToList(); - while (components.Count < 3) - { - components.Add("0"); - } + trimmed = trimmed[1..]; + } + + if (!Regex.IsMatch(trimmed, @"^[0-9]+(\.[0-9]+){0,2}$")) + { + return null; + } + + var parts = trimmed.Split('.', StringSplitOptions.RemoveEmptyEntries); + var components = parts.Take(3).ToList(); + while (components.Count < 3) + { + components.Add("0"); + } return string.Join('.', components); } @@ -1006,414 +1006,414 @@ public sealed class IcsCisaConnector : IFeedConnector } private sealed record ProductInfo(string? Name, string? VersionExpression); - - private async Task EnrichAdvisoryAsync(IcsCisaAdvisoryDto advisory, CancellationToken cancellationToken) - { - if (!NeedsDetailFetch(advisory)) - { - return advisory; - } - - if (!Uri.TryCreate(advisory.Link, UriKind.Absolute, out var detailUri)) - { - return advisory; - } - - var request = new SourceFetchRequest(IcsCisaOptions.HttpClientName, SourceName, detailUri) - { - AcceptHeaders = DetailAcceptHeaders, - Metadata = AppendMetadata(null, "icscisa.detail", advisory.AdvisoryId), - TimeoutOverride = _options.DetailRequestTimeout, - }; - - try - { - var result = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); - if (!result.IsSuccess || result.Content is null) - { - _diagnostics.DetailFetchFailure(advisory.AdvisoryId); - return advisory; - } - - var html = Encoding.UTF8.GetString(result.Content); - var sanitized = _htmlSanitizer.Sanitize(html, detailUri); - if (string.IsNullOrWhiteSpace(sanitized)) - { - _diagnostics.DetailFetchSuccess(advisory.AdvisoryId); - return advisory with { DetailHtml = sanitized }; - } - - var detailAttachments = _options.CaptureAttachments - ? ParseAttachmentsFromHtml(sanitized, detailUri) - : Array.Empty(); - var mergedAttachments = _options.CaptureAttachments - ? MergeAttachments(advisory.Attachments, detailAttachments) - : advisory.Attachments; - - var detailMitigations = ParseMitigationsFromHtml(sanitized); - var mergedMitigations = MergeMitigations(advisory.Mitigations, detailMitigations); - - var detailReferences = ParseReferencesFromHtml(sanitized, detailUri); - var mergedReferences = MergeReferences(advisory.References, detailReferences); - - var summary = string.IsNullOrWhiteSpace(advisory.Summary) - ? ExtractFirstSentence(sanitized) - : advisory.Summary; - - var descriptionHtml = string.IsNullOrWhiteSpace(advisory.DescriptionHtml) - ? sanitized - : advisory.DescriptionHtml; - - return advisory with - { - DetailHtml = sanitized, - DescriptionHtml = descriptionHtml, - Summary = summary, - References = mergedReferences, - Attachments = mergedAttachments, - Mitigations = mergedMitigations, - }; - } - catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) - { - _logger.LogWarning(ex, "Failed to fetch detail page for {AdvisoryId}", advisory.AdvisoryId); - _diagnostics.DetailFetchFailure(advisory.AdvisoryId); - return advisory; - } - } - - private bool NeedsDetailFetch(IcsCisaAdvisoryDto advisory) - { - if (!_options.EnableDetailScrape) - { - return false; - } - - if (string.IsNullOrWhiteSpace(advisory.DescriptionHtml)) - { - return true; - } - - if (string.IsNullOrWhiteSpace(advisory.Summary)) - { - return true; - } - - if (advisory.Mitigations is null || advisory.Mitigations.Count == 0) - { - return true; - } - - if (_options.CaptureAttachments && (advisory.Attachments is null || advisory.Attachments.Count == 0)) - { - return true; - } - - return false; - } - - private IReadOnlyCollection ParseMitigationsFromHtml(string sanitizedHtml) - { - if (string.IsNullOrWhiteSpace(sanitizedHtml)) - { - return Array.Empty(); - } - - try - { - var document = _htmlParser.ParseDocument(sanitizedHtml); - var mitigations = new List(); - - foreach (var heading in document.QuerySelectorAll("h1, h2, h3, h4, h5, h6")) - { - var headingText = heading.TextContent?.Trim(); - if (!IsMitigationHeading(headingText)) - { - continue; - } - - var node = heading.NextElementSibling; - while (node is not null && node is not IHtmlHeadingElement) - { - if (node is IHtmlParagraphElement or IHtmlDivElement) - { - var content = Validation.CollapseWhitespace(node.TextContent); - if (!string.IsNullOrWhiteSpace(content)) - { - mitigations.Add(content); - } - } - else if (node is IHtmlElement element && (string.Equals(element.TagName, "UL", StringComparison.OrdinalIgnoreCase) || string.Equals(element.TagName, "OL", StringComparison.OrdinalIgnoreCase))) - { - foreach (var item in element.Children) - { - var content = Validation.CollapseWhitespace(item.TextContent); - if (!string.IsNullOrWhiteSpace(content)) - { - mitigations.Add(content); - } - } - } - - node = node.NextElementSibling; - } - } - - return mitigations.Count == 0 ? Array.Empty() : mitigations; - } - catch - { - return Array.Empty(); - } - } - - private static bool IsMitigationHeading(string? headingText) - { - if (string.IsNullOrWhiteSpace(headingText)) - { - return false; - } - - return headingText.Contains("mitigation", StringComparison.OrdinalIgnoreCase); - } - - private IReadOnlyCollection ParseAttachmentsFromHtml(string sanitizedHtml, Uri baseUri) - { - if (string.IsNullOrWhiteSpace(sanitizedHtml)) - { - return Array.Empty(); - } - - try - { - var document = _htmlParser.ParseDocument(sanitizedHtml); - var attachments = new Dictionary(StringComparer.OrdinalIgnoreCase); - - foreach (var anchor in document.QuerySelectorAll("a")) - { - var href = anchor.GetAttribute("href"); - if (string.IsNullOrWhiteSpace(href)) - { - continue; - } - - if (!Uri.TryCreate(baseUri, href, out var resolved)) - { - continue; - } - - var url = resolved.ToString(); - if (!url.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) && - !url.Contains("/pdf", StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - attachments[url] = new IcsCisaAttachmentDto - { - Title = anchor.TextContent?.Trim(), - Url = url, - }; - } - - return attachments.Count == 0 - ? Array.Empty() - : attachments.Values.ToArray(); - } - catch - { - return Array.Empty(); - } - } - - private IReadOnlyCollection ParseReferencesFromHtml(string sanitizedHtml, Uri baseUri) - { - if (string.IsNullOrWhiteSpace(sanitizedHtml)) - { - return Array.Empty(); - } - - try - { - var document = _htmlParser.ParseDocument(sanitizedHtml); - var links = new HashSet(StringComparer.OrdinalIgnoreCase); - - foreach (var anchor in document.QuerySelectorAll("a")) - { - var href = anchor.GetAttribute("href"); - if (string.IsNullOrWhiteSpace(href)) - { - continue; - } - - if (Uri.TryCreate(baseUri, href, out var resolved) && Validation.LooksLikeHttpUrl(resolved.ToString())) - { - links.Add(resolved.ToString()); - } - } - - return links.Count == 0 ? Array.Empty() : links.ToArray(); - } - catch - { - return Array.Empty(); - } - } - - internal static IReadOnlyCollection MergeMitigations(IReadOnlyCollection? existing, IReadOnlyCollection incoming) - { - if ((existing is null || existing.Count == 0) && (incoming is null || incoming.Count == 0)) - { - return Array.Empty(); - } - - var set = new HashSet(StringComparer.OrdinalIgnoreCase); - var merged = new List(); - - if (existing is not null) - { - foreach (var mitigation in existing) - { - var value = mitigation?.Trim(); - if (string.IsNullOrWhiteSpace(value)) - { - continue; - } - - if (set.Add(value)) - { - merged.Add(value); - } - } - } - - if (incoming is not null) - { - foreach (var mitigation in incoming) - { - var value = mitigation?.Trim(); - if (string.IsNullOrWhiteSpace(value)) - { - continue; - } - - if (set.Add(value)) - { - merged.Add(value); - } - } - } - - return merged.Count == 0 ? Array.Empty() : merged; - } - - internal static IReadOnlyCollection MergeAttachments(IReadOnlyCollection? existing, IReadOnlyCollection incoming) - { - if ((existing is null || existing.Count == 0) && (incoming is null || incoming.Count == 0)) - { - return Array.Empty(); - } - - var map = new Dictionary(StringComparer.OrdinalIgnoreCase); - - if (existing is not null) - { - foreach (var attachment in existing) - { - if (attachment is null || string.IsNullOrWhiteSpace(attachment.Url)) - { - continue; - } - - map[attachment.Url] = attachment; - } - } - - if (incoming is not null) - { - foreach (var attachment in incoming) - { - if (attachment is null || string.IsNullOrWhiteSpace(attachment.Url)) - { - continue; - } - - if (!map.ContainsKey(attachment.Url) || string.IsNullOrWhiteSpace(map[attachment.Url].Title)) - { - map[attachment.Url] = attachment; - } - } - } - - return map.Count == 0 ? Array.Empty() : map.Values.ToArray(); - } - - internal static IReadOnlyCollection MergeReferences(IReadOnlyCollection? existing, IReadOnlyCollection incoming) - { - var links = new HashSet(existing ?? Array.Empty(), StringComparer.OrdinalIgnoreCase); - foreach (var link in incoming) - { - if (Validation.LooksLikeHttpUrl(link)) - { - links.Add(link); - } - } - - return links.Count == 0 ? Array.Empty() : links.ToArray(); - } - - internal static string? ExtractFirstSentence(string sanitizedHtml) - { - if (string.IsNullOrWhiteSpace(sanitizedHtml)) - { - return null; - } - - var text = Validation.CollapseWhitespace(sanitizedHtml); - if (text.Length <= 280) - { - return text; - } - - var terminator = text.IndexOf('.', StringComparison.Ordinal); - if (terminator <= 0 || terminator > 280) - { - return text[..Math.Min(280, text.Length)].Trim(); - } - - return text[..(terminator + 1)].Trim(); - } - - internal static IReadOnlyDictionary AppendMetadata(IReadOnlyDictionary? metadata, string key, string value) - { - var dictionary = new Dictionary(StringComparer.Ordinal); - - if (metadata is not null) - { - foreach (var pair in metadata) - { - dictionary[pair.Key] = pair.Value; - } - } - - dictionary[key] = value; - return dictionary; - } - - internal static bool ShouldRetryWithFallback(HttpRequestException exception) - { - var message = exception.Message ?? string.Empty; - return message.Contains(" 403", StringComparison.OrdinalIgnoreCase) - || message.Contains("403", StringComparison.OrdinalIgnoreCase) - || message.Contains("406", StringComparison.OrdinalIgnoreCase); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? IcsCisaCursor.Empty : IcsCisaCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(IcsCisaCursor cursor, CancellationToken cancellationToken) - => _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken); -} + + private async Task EnrichAdvisoryAsync(IcsCisaAdvisoryDto advisory, CancellationToken cancellationToken) + { + if (!NeedsDetailFetch(advisory)) + { + return advisory; + } + + if (!Uri.TryCreate(advisory.Link, UriKind.Absolute, out var detailUri)) + { + return advisory; + } + + var request = new SourceFetchRequest(IcsCisaOptions.HttpClientName, SourceName, detailUri) + { + AcceptHeaders = DetailAcceptHeaders, + Metadata = AppendMetadata(null, "icscisa.detail", advisory.AdvisoryId), + TimeoutOverride = _options.DetailRequestTimeout, + }; + + try + { + var result = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); + if (!result.IsSuccess || result.Content is null) + { + _diagnostics.DetailFetchFailure(advisory.AdvisoryId); + return advisory; + } + + var html = Encoding.UTF8.GetString(result.Content); + var sanitized = _htmlSanitizer.Sanitize(html, detailUri); + if (string.IsNullOrWhiteSpace(sanitized)) + { + _diagnostics.DetailFetchSuccess(advisory.AdvisoryId); + return advisory with { DetailHtml = sanitized }; + } + + var detailAttachments = _options.CaptureAttachments + ? ParseAttachmentsFromHtml(sanitized, detailUri) + : Array.Empty(); + var mergedAttachments = _options.CaptureAttachments + ? MergeAttachments(advisory.Attachments, detailAttachments) + : advisory.Attachments; + + var detailMitigations = ParseMitigationsFromHtml(sanitized); + var mergedMitigations = MergeMitigations(advisory.Mitigations, detailMitigations); + + var detailReferences = ParseReferencesFromHtml(sanitized, detailUri); + var mergedReferences = MergeReferences(advisory.References, detailReferences); + + var summary = string.IsNullOrWhiteSpace(advisory.Summary) + ? ExtractFirstSentence(sanitized) + : advisory.Summary; + + var descriptionHtml = string.IsNullOrWhiteSpace(advisory.DescriptionHtml) + ? sanitized + : advisory.DescriptionHtml; + + return advisory with + { + DetailHtml = sanitized, + DescriptionHtml = descriptionHtml, + Summary = summary, + References = mergedReferences, + Attachments = mergedAttachments, + Mitigations = mergedMitigations, + }; + } + catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) + { + _logger.LogWarning(ex, "Failed to fetch detail page for {AdvisoryId}", advisory.AdvisoryId); + _diagnostics.DetailFetchFailure(advisory.AdvisoryId); + return advisory; + } + } + + private bool NeedsDetailFetch(IcsCisaAdvisoryDto advisory) + { + if (!_options.EnableDetailScrape) + { + return false; + } + + if (string.IsNullOrWhiteSpace(advisory.DescriptionHtml)) + { + return true; + } + + if (string.IsNullOrWhiteSpace(advisory.Summary)) + { + return true; + } + + if (advisory.Mitigations is null || advisory.Mitigations.Count == 0) + { + return true; + } + + if (_options.CaptureAttachments && (advisory.Attachments is null || advisory.Attachments.Count == 0)) + { + return true; + } + + return false; + } + + private IReadOnlyCollection ParseMitigationsFromHtml(string sanitizedHtml) + { + if (string.IsNullOrWhiteSpace(sanitizedHtml)) + { + return Array.Empty(); + } + + try + { + var document = _htmlParser.ParseDocument(sanitizedHtml); + var mitigations = new List(); + + foreach (var heading in document.QuerySelectorAll("h1, h2, h3, h4, h5, h6")) + { + var headingText = heading.TextContent?.Trim(); + if (!IsMitigationHeading(headingText)) + { + continue; + } + + var node = heading.NextElementSibling; + while (node is not null && node is not IHtmlHeadingElement) + { + if (node is IHtmlParagraphElement or IHtmlDivElement) + { + var content = Validation.CollapseWhitespace(node.TextContent); + if (!string.IsNullOrWhiteSpace(content)) + { + mitigations.Add(content); + } + } + else if (node is IHtmlElement element && (string.Equals(element.TagName, "UL", StringComparison.OrdinalIgnoreCase) || string.Equals(element.TagName, "OL", StringComparison.OrdinalIgnoreCase))) + { + foreach (var item in element.Children) + { + var content = Validation.CollapseWhitespace(item.TextContent); + if (!string.IsNullOrWhiteSpace(content)) + { + mitigations.Add(content); + } + } + } + + node = node.NextElementSibling; + } + } + + return mitigations.Count == 0 ? Array.Empty() : mitigations; + } + catch + { + return Array.Empty(); + } + } + + private static bool IsMitigationHeading(string? headingText) + { + if (string.IsNullOrWhiteSpace(headingText)) + { + return false; + } + + return headingText.Contains("mitigation", StringComparison.OrdinalIgnoreCase); + } + + private IReadOnlyCollection ParseAttachmentsFromHtml(string sanitizedHtml, Uri baseUri) + { + if (string.IsNullOrWhiteSpace(sanitizedHtml)) + { + return Array.Empty(); + } + + try + { + var document = _htmlParser.ParseDocument(sanitizedHtml); + var attachments = new Dictionary(StringComparer.OrdinalIgnoreCase); + + foreach (var anchor in document.QuerySelectorAll("a")) + { + var href = anchor.GetAttribute("href"); + if (string.IsNullOrWhiteSpace(href)) + { + continue; + } + + if (!Uri.TryCreate(baseUri, href, out var resolved)) + { + continue; + } + + var url = resolved.ToString(); + if (!url.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) && + !url.Contains("/pdf", StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + attachments[url] = new IcsCisaAttachmentDto + { + Title = anchor.TextContent?.Trim(), + Url = url, + }; + } + + return attachments.Count == 0 + ? Array.Empty() + : attachments.Values.ToArray(); + } + catch + { + return Array.Empty(); + } + } + + private IReadOnlyCollection ParseReferencesFromHtml(string sanitizedHtml, Uri baseUri) + { + if (string.IsNullOrWhiteSpace(sanitizedHtml)) + { + return Array.Empty(); + } + + try + { + var document = _htmlParser.ParseDocument(sanitizedHtml); + var links = new HashSet(StringComparer.OrdinalIgnoreCase); + + foreach (var anchor in document.QuerySelectorAll("a")) + { + var href = anchor.GetAttribute("href"); + if (string.IsNullOrWhiteSpace(href)) + { + continue; + } + + if (Uri.TryCreate(baseUri, href, out var resolved) && Validation.LooksLikeHttpUrl(resolved.ToString())) + { + links.Add(resolved.ToString()); + } + } + + return links.Count == 0 ? Array.Empty() : links.ToArray(); + } + catch + { + return Array.Empty(); + } + } + + internal static IReadOnlyCollection MergeMitigations(IReadOnlyCollection? existing, IReadOnlyCollection incoming) + { + if ((existing is null || existing.Count == 0) && (incoming is null || incoming.Count == 0)) + { + return Array.Empty(); + } + + var set = new HashSet(StringComparer.OrdinalIgnoreCase); + var merged = new List(); + + if (existing is not null) + { + foreach (var mitigation in existing) + { + var value = mitigation?.Trim(); + if (string.IsNullOrWhiteSpace(value)) + { + continue; + } + + if (set.Add(value)) + { + merged.Add(value); + } + } + } + + if (incoming is not null) + { + foreach (var mitigation in incoming) + { + var value = mitigation?.Trim(); + if (string.IsNullOrWhiteSpace(value)) + { + continue; + } + + if (set.Add(value)) + { + merged.Add(value); + } + } + } + + return merged.Count == 0 ? Array.Empty() : merged; + } + + internal static IReadOnlyCollection MergeAttachments(IReadOnlyCollection? existing, IReadOnlyCollection incoming) + { + if ((existing is null || existing.Count == 0) && (incoming is null || incoming.Count == 0)) + { + return Array.Empty(); + } + + var map = new Dictionary(StringComparer.OrdinalIgnoreCase); + + if (existing is not null) + { + foreach (var attachment in existing) + { + if (attachment is null || string.IsNullOrWhiteSpace(attachment.Url)) + { + continue; + } + + map[attachment.Url] = attachment; + } + } + + if (incoming is not null) + { + foreach (var attachment in incoming) + { + if (attachment is null || string.IsNullOrWhiteSpace(attachment.Url)) + { + continue; + } + + if (!map.ContainsKey(attachment.Url) || string.IsNullOrWhiteSpace(map[attachment.Url].Title)) + { + map[attachment.Url] = attachment; + } + } + } + + return map.Count == 0 ? Array.Empty() : map.Values.ToArray(); + } + + internal static IReadOnlyCollection MergeReferences(IReadOnlyCollection? existing, IReadOnlyCollection incoming) + { + var links = new HashSet(existing ?? Array.Empty(), StringComparer.OrdinalIgnoreCase); + foreach (var link in incoming) + { + if (Validation.LooksLikeHttpUrl(link)) + { + links.Add(link); + } + } + + return links.Count == 0 ? Array.Empty() : links.ToArray(); + } + + internal static string? ExtractFirstSentence(string sanitizedHtml) + { + if (string.IsNullOrWhiteSpace(sanitizedHtml)) + { + return null; + } + + var text = Validation.CollapseWhitespace(sanitizedHtml); + if (text.Length <= 280) + { + return text; + } + + var terminator = text.IndexOf('.', StringComparison.Ordinal); + if (terminator <= 0 || terminator > 280) + { + return text[..Math.Min(280, text.Length)].Trim(); + } + + return text[..(terminator + 1)].Trim(); + } + + internal static IReadOnlyDictionary AppendMetadata(IReadOnlyDictionary? metadata, string key, string value) + { + var dictionary = new Dictionary(StringComparer.Ordinal); + + if (metadata is not null) + { + foreach (var pair in metadata) + { + dictionary[pair.Key] = pair.Value; + } + } + + dictionary[key] = value; + return dictionary; + } + + internal static bool ShouldRetryWithFallback(HttpRequestException exception) + { + var message = exception.Message ?? string.Empty; + return message.Contains(" 403", StringComparison.OrdinalIgnoreCase) + || message.Contains("403", StringComparison.OrdinalIgnoreCase) + || message.Contains("406", StringComparison.OrdinalIgnoreCase); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? IcsCisaCursor.Empty : IcsCisaCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(IcsCisaCursor cursor, CancellationToken cancellationToken) + => _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken); +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Kaspersky/KasperskyConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Kaspersky/KasperskyConnector.cs index 8c47b7bce..50388f523 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Kaspersky/KasperskyConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ics.Kaspersky/KasperskyConnector.cs @@ -1,384 +1,384 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Ics.Kaspersky.Configuration; -using StellaOps.Concelier.Connector.Ics.Kaspersky.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Ics.Kaspersky; - -public sealed class KasperskyConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.General) - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly KasperskyFeedClient _feedClient; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly KasperskyOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public KasperskyConnector( - KasperskyFeedClient feedClient, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => KasperskyConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var windowStart = cursor.LastPublished.HasValue - ? cursor.LastPublished.Value - _options.WindowOverlap - : now - _options.WindowSize; - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; - var cursorState = cursor; - var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); - - for (var page = 1; page <= _options.MaxPagesPerFetch; page++) - { - IReadOnlyList items; - try - { - items = await _feedClient.GetItemsAsync(page, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to load Kaspersky ICS feed page {Page}", page); - await _stateRepository.MarkFailureAsync( - SourceName, - now, - TimeSpan.FromMinutes(5), - ex.Message, - cancellationToken).ConfigureAwait(false); - throw; - } - if (items.Count == 0) - { - break; - } - - foreach (var item in items) - { - if (item.Published < windowStart) - { - page = _options.MaxPagesPerFetch + 1; - break; - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["kaspersky.title"] = item.Title, - ["kaspersky.link"] = item.Link.ToString(), - ["kaspersky.published"] = item.Published.ToString("O"), - }; - - if (!string.IsNullOrWhiteSpace(item.Summary)) - { - metadata["kaspersky.summary"] = item.Summary!; - } - - var slug = ExtractSlug(item.Link); - if (!string.IsNullOrWhiteSpace(slug)) - { - metadata["kaspersky.slug"] = slug; - } - - var resourceKey = item.Link.ToString(); - touchedResources.Add(resourceKey); - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, resourceKey, cancellationToken).ConfigureAwait(false); - - var fetchRequest = new SourceFetchRequest(KasperskyOptions.HttpClientName, SourceName, item.Link) - { - Metadata = metadata, - }; - - if (cursorState.TryGetFetchMetadata(resourceKey, out var cachedFetch)) - { - fetchRequest = fetchRequest with - { - ETag = cachedFetch.ETag, - LastModified = cachedFetch.LastModified, - }; - } - - SourceFetchResult result; - try - { - result = await _fetchService.FetchAsync(fetchRequest, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to fetch Kaspersky advisory {Link}", item.Link); - await _stateRepository.MarkFailureAsync( - SourceName, - _timeProvider.GetUtcNow(), - TimeSpan.FromMinutes(5), - ex.Message, - cancellationToken).ConfigureAwait(false); - throw; - } - - if (result.IsNotModified) - { - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - if (existing is not null - && string.Equals(existing.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase) - && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) - { - await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); - cursorState = cursorState.WithFetchMetadata(resourceKey, result.Document.Etag, result.Document.LastModified); - if (item.Published > maxPublished) - { - maxPublished = item.Published; - } - - continue; - } - - pendingDocuments.Add(result.Document.Id); - cursorState = cursorState.WithFetchMetadata(resourceKey, result.Document.Etag, result.Document.LastModified); - if (item.Published > maxPublished) - { - maxPublished = item.Published; - } - } - } - - cursorState = cursorState.PruneFetchCache(touchedResources); - - var updatedCursor = cursorState - .WithPendingDocuments(pendingDocuments) - .WithLastPublished(maxPublished == DateTimeOffset.MinValue ? cursor.LastPublished : maxPublished); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Kaspersky document {DocumentId} missing GridFS content", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - var metadata = document.Metadata ?? new Dictionary(); - var title = metadata.TryGetValue("kaspersky.title", out var titleValue) ? titleValue : document.Uri; - var link = metadata.TryGetValue("kaspersky.link", out var linkValue) ? linkValue : document.Uri; - var published = metadata.TryGetValue("kaspersky.published", out var publishedValue) && DateTimeOffset.TryParse(publishedValue, out var parsedPublished) - ? parsedPublished.ToUniversalTime() - : document.FetchedAt; - var summary = metadata.TryGetValue("kaspersky.summary", out var summaryValue) ? summaryValue : null; - var slug = metadata.TryGetValue("kaspersky.slug", out var slugValue) ? slugValue : ExtractSlug(new Uri(link, UriKind.Absolute)); - var advisoryKey = string.IsNullOrWhiteSpace(slug) ? Guid.NewGuid().ToString("N") : slug; - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed downloading raw Kaspersky document {DocumentId}", document.Id); - throw; - } - - var dto = KasperskyAdvisoryParser.Parse(advisoryKey, title, link, published, summary, rawBytes); - var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "ics.kaspersky/1", payload, _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dto is null || document is null) - { - _logger.LogWarning("Skipping Kaspersky mapping for {DocumentId}: DTO or document missing", documentId); - pendingMappings.Remove(documentId); - continue; - } - - var dtoJson = dto.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings - { - OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, - }); - - KasperskyAdvisoryDto advisoryDto; - try - { - advisoryDto = JsonSerializer.Deserialize(dtoJson, SerializerOptions) - ?? throw new InvalidOperationException("Deserialized DTO was null."); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize Kaspersky DTO for {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var fetchProvenance = new AdvisoryProvenance(SourceName, "document", document.Uri, document.FetchedAt); - var mappingProvenance = new AdvisoryProvenance(SourceName, "mapping", advisoryDto.AdvisoryKey, dto.ValidatedAt); - - var aliases = new HashSet(StringComparer.OrdinalIgnoreCase) - { - advisoryDto.AdvisoryKey, - }; - foreach (var cve in advisoryDto.CveIds) - { - aliases.Add(cve); - } - - var references = new List(); - try - { - references.Add(new AdvisoryReference( - advisoryDto.Link, - "advisory", - "kaspersky-ics", - null, - new AdvisoryProvenance(SourceName, "reference", advisoryDto.Link, dto.ValidatedAt))); - } - catch (ArgumentException) - { - _logger.LogWarning("Invalid advisory link {Link} for {AdvisoryKey}", advisoryDto.Link, advisoryDto.AdvisoryKey); - } - - foreach (var cve in advisoryDto.CveIds) - { - var url = $"https://www.cve.org/CVERecord?id={cve}"; - try - { - references.Add(new AdvisoryReference( - url, - "advisory", - cve, - null, - new AdvisoryProvenance(SourceName, "reference", url, dto.ValidatedAt))); - } - catch (ArgumentException) - { - // ignore malformed - } - } - +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Ics.Kaspersky.Configuration; +using StellaOps.Concelier.Connector.Ics.Kaspersky.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Ics.Kaspersky; + +public sealed class KasperskyConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.General) + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly KasperskyFeedClient _feedClient; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly KasperskyOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public KasperskyConnector( + KasperskyFeedClient feedClient, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => KasperskyConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var windowStart = cursor.LastPublished.HasValue + ? cursor.LastPublished.Value - _options.WindowOverlap + : now - _options.WindowSize; + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var maxPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; + var cursorState = cursor; + var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); + + for (var page = 1; page <= _options.MaxPagesPerFetch; page++) + { + IReadOnlyList items; + try + { + items = await _feedClient.GetItemsAsync(page, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to load Kaspersky ICS feed page {Page}", page); + await _stateRepository.MarkFailureAsync( + SourceName, + now, + TimeSpan.FromMinutes(5), + ex.Message, + cancellationToken).ConfigureAwait(false); + throw; + } + if (items.Count == 0) + { + break; + } + + foreach (var item in items) + { + if (item.Published < windowStart) + { + page = _options.MaxPagesPerFetch + 1; + break; + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["kaspersky.title"] = item.Title, + ["kaspersky.link"] = item.Link.ToString(), + ["kaspersky.published"] = item.Published.ToString("O"), + }; + + if (!string.IsNullOrWhiteSpace(item.Summary)) + { + metadata["kaspersky.summary"] = item.Summary!; + } + + var slug = ExtractSlug(item.Link); + if (!string.IsNullOrWhiteSpace(slug)) + { + metadata["kaspersky.slug"] = slug; + } + + var resourceKey = item.Link.ToString(); + touchedResources.Add(resourceKey); + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, resourceKey, cancellationToken).ConfigureAwait(false); + + var fetchRequest = new SourceFetchRequest(KasperskyOptions.HttpClientName, SourceName, item.Link) + { + Metadata = metadata, + }; + + if (cursorState.TryGetFetchMetadata(resourceKey, out var cachedFetch)) + { + fetchRequest = fetchRequest with + { + ETag = cachedFetch.ETag, + LastModified = cachedFetch.LastModified, + }; + } + + SourceFetchResult result; + try + { + result = await _fetchService.FetchAsync(fetchRequest, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to fetch Kaspersky advisory {Link}", item.Link); + await _stateRepository.MarkFailureAsync( + SourceName, + _timeProvider.GetUtcNow(), + TimeSpan.FromMinutes(5), + ex.Message, + cancellationToken).ConfigureAwait(false); + throw; + } + + if (result.IsNotModified) + { + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + if (existing is not null + && string.Equals(existing.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase) + && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal)) + { + await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); + cursorState = cursorState.WithFetchMetadata(resourceKey, result.Document.Etag, result.Document.LastModified); + if (item.Published > maxPublished) + { + maxPublished = item.Published; + } + + continue; + } + + pendingDocuments.Add(result.Document.Id); + cursorState = cursorState.WithFetchMetadata(resourceKey, result.Document.Etag, result.Document.LastModified); + if (item.Published > maxPublished) + { + maxPublished = item.Published; + } + } + } + + cursorState = cursorState.PruneFetchCache(touchedResources); + + var updatedCursor = cursorState + .WithPendingDocuments(pendingDocuments) + .WithLastPublished(maxPublished == DateTimeOffset.MinValue ? cursor.LastPublished : maxPublished); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Kaspersky document {DocumentId} missing GridFS content", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + var metadata = document.Metadata ?? new Dictionary(); + var title = metadata.TryGetValue("kaspersky.title", out var titleValue) ? titleValue : document.Uri; + var link = metadata.TryGetValue("kaspersky.link", out var linkValue) ? linkValue : document.Uri; + var published = metadata.TryGetValue("kaspersky.published", out var publishedValue) && DateTimeOffset.TryParse(publishedValue, out var parsedPublished) + ? parsedPublished.ToUniversalTime() + : document.FetchedAt; + var summary = metadata.TryGetValue("kaspersky.summary", out var summaryValue) ? summaryValue : null; + var slug = metadata.TryGetValue("kaspersky.slug", out var slugValue) ? slugValue : ExtractSlug(new Uri(link, UriKind.Absolute)); + var advisoryKey = string.IsNullOrWhiteSpace(slug) ? Guid.NewGuid().ToString("N") : slug; + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed downloading raw Kaspersky document {DocumentId}", document.Id); + throw; + } + + var dto = KasperskyAdvisoryParser.Parse(advisoryKey, title, link, published, summary, rawBytes); + var payload = BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "ics.kaspersky/1", payload, _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dto is null || document is null) + { + _logger.LogWarning("Skipping Kaspersky mapping for {DocumentId}: DTO or document missing", documentId); + pendingMappings.Remove(documentId); + continue; + } + + var dtoJson = dto.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings + { + OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, + }); + + KasperskyAdvisoryDto advisoryDto; + try + { + advisoryDto = JsonSerializer.Deserialize(dtoJson, SerializerOptions) + ?? throw new InvalidOperationException("Deserialized DTO was null."); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize Kaspersky DTO for {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var fetchProvenance = new AdvisoryProvenance(SourceName, "document", document.Uri, document.FetchedAt); + var mappingProvenance = new AdvisoryProvenance(SourceName, "mapping", advisoryDto.AdvisoryKey, dto.ValidatedAt); + + var aliases = new HashSet(StringComparer.OrdinalIgnoreCase) + { + advisoryDto.AdvisoryKey, + }; + foreach (var cve in advisoryDto.CveIds) + { + aliases.Add(cve); + } + + var references = new List(); + try + { + references.Add(new AdvisoryReference( + advisoryDto.Link, + "advisory", + "kaspersky-ics", + null, + new AdvisoryProvenance(SourceName, "reference", advisoryDto.Link, dto.ValidatedAt))); + } + catch (ArgumentException) + { + _logger.LogWarning("Invalid advisory link {Link} for {AdvisoryKey}", advisoryDto.Link, advisoryDto.AdvisoryKey); + } + + foreach (var cve in advisoryDto.CveIds) + { + var url = $"https://www.cve.org/CVERecord?id={cve}"; + try + { + references.Add(new AdvisoryReference( + url, + "advisory", + cve, + null, + new AdvisoryProvenance(SourceName, "reference", url, dto.ValidatedAt))); + } + catch (ArgumentException) + { + // ignore malformed + } + } + var affectedPackages = new List(); foreach (var vendor in advisoryDto.VendorNames) { @@ -413,52 +413,52 @@ public sealed class KasperskyConnector : IFeedConnector statuses: Array.Empty(), provenance: provenance)); } - - var advisory = new Advisory( - advisoryDto.AdvisoryKey, - advisoryDto.Title, - advisoryDto.Summary ?? advisoryDto.Content, - language: "en", - published: advisoryDto.Published, - modified: advisoryDto.Published, - severity: null, - exploitKnown: false, - aliases: aliases, - references: references, - affectedPackages: affectedPackages, - cvssMetrics: Array.Empty(), - provenance: new[] { fetchProvenance, mappingProvenance }); - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? KasperskyCursor.Empty : KasperskyCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(KasperskyCursor cursor, CancellationToken cancellationToken) - { - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private static string? ExtractSlug(Uri link) - { - var segments = link.Segments; - if (segments.Length == 0) - { - return null; - } - - var last = segments[^1].Trim('/'); - return string.IsNullOrWhiteSpace(last) && segments.Length > 1 ? segments[^2].Trim('/') : last; - } -} + + var advisory = new Advisory( + advisoryDto.AdvisoryKey, + advisoryDto.Title, + advisoryDto.Summary ?? advisoryDto.Content, + language: "en", + published: advisoryDto.Published, + modified: advisoryDto.Published, + severity: null, + exploitKnown: false, + aliases: aliases, + references: references, + affectedPackages: affectedPackages, + cvssMetrics: Array.Empty(), + provenance: new[] { fetchProvenance, mappingProvenance }); + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? KasperskyCursor.Empty : KasperskyCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(KasperskyCursor cursor, CancellationToken cancellationToken) + { + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private static string? ExtractSlug(Uri link) + { + var segments = link.Segments; + if (segments.Length == 0) + { + return null; + } + + var last = segments[^1].Trim('/'); + return string.IsNullOrWhiteSpace(last) && segments.Length > 1 ? segments[^2].Trim('/') : last; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Jvn/JvnConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Jvn/JvnConnector.cs index 144932159..d7edc4b16 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Jvn/JvnConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Jvn/JvnConnector.cs @@ -1,325 +1,325 @@ -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Jvn.Configuration; -using StellaOps.Concelier.Connector.Jvn.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.JpFlags; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Jvn; - -public sealed class JvnConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.General) - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly MyJvnClient _client; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly IJpFlagStore _jpFlagStore; - private readonly ISourceStateRepository _stateRepository; - private readonly TimeProvider _timeProvider; - private readonly JvnOptions _options; - private readonly ILogger _logger; - - public JvnConnector( - MyJvnClient client, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - IJpFlagStore jpFlagStore, - ISourceStateRepository stateRepository, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _client = client ?? throw new ArgumentNullException(nameof(client)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _jpFlagStore = jpFlagStore ?? throw new ArgumentNullException(nameof(jpFlagStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => JvnConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var windowEnd = now; - var defaultWindowStart = windowEnd - _options.WindowSize; - - var windowStart = cursor.LastCompletedWindowEnd.HasValue - ? cursor.LastCompletedWindowEnd.Value - _options.WindowOverlap - : defaultWindowStart; - - if (windowStart < defaultWindowStart) - { - windowStart = defaultWindowStart; - } - - if (windowStart >= windowEnd) - { - windowStart = windowEnd - TimeSpan.FromHours(1); - } - - _logger.LogInformation("JVN fetch window {WindowStart:o} - {WindowEnd:o}", windowStart, windowEnd); - - IReadOnlyList overviewItems; - try - { - overviewItems = await _client.GetOverviewAsync(windowStart, windowEnd, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to retrieve JVN overview between {Start:o} and {End:o}", windowStart, windowEnd); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - _logger.LogInformation("JVN overview returned {Count} items", overviewItems.Count); - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - - foreach (var item in overviewItems) - { - cancellationToken.ThrowIfCancellationRequested(); - - var detailUri = _client.BuildDetailUri(item.VulnerabilityId); - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["jvn.vulnId"] = item.VulnerabilityId, - ["jvn.detailUrl"] = detailUri.ToString(), - }; - - if (item.DateFirstPublished.HasValue) - { - metadata["jvn.firstPublished"] = item.DateFirstPublished.Value.ToString("O"); - } - - if (item.DateLastUpdated.HasValue) - { - metadata["jvn.lastUpdated"] = item.DateLastUpdated.Value.ToString("O"); - } - - var result = await _fetchService.FetchAsync( - new SourceFetchRequest(JvnOptions.HttpClientName, SourceName, detailUri) - { - Metadata = metadata - }, - cancellationToken).ConfigureAwait(false); - - if (!result.IsSuccess || result.Document is null) - { - if (!result.IsNotModified) - { - _logger.LogWarning("JVN fetch for {Uri} returned status {Status}", detailUri, result.StatusCode); - } - - continue; - } - - _logger.LogDebug("JVN fetched document {DocumentId}", result.Document.Id); - pendingDocuments.Add(result.Document.Id); - } - - var updatedCursor = cursor - .WithWindow(windowStart, windowEnd) - .WithCompletedWindow(windowEnd) - .WithPendingDocuments(pendingDocuments); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - _logger.LogDebug("JVN parse pending documents: {PendingCount}", cursor.PendingDocuments.Count); - Console.WriteLine($"JVN parse pending count: {cursor.PendingDocuments.Count}"); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - _logger.LogDebug("JVN parsing document {DocumentId}", documentId); - Console.WriteLine($"JVN parsing document {documentId}"); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - _logger.LogWarning("JVN document {DocumentId} no longer exists; skipping", documentId); - remainingDocuments.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("JVN document {DocumentId} is missing GridFS content; marking as failed", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Unable to download raw JVN document {DocumentId}", document.Id); - throw; - } - - JvnDetailDto detail; - try - { - detail = JvnDetailParser.Parse(rawBytes, document.Uri); - } - catch (JvnSchemaValidationException ex) - { - Console.WriteLine($"JVN schema validation exception: {ex.Message}"); - _logger.LogWarning(ex, "JVN schema validation failed for document {DocumentId} ({Uri})", document.Id, document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - throw; - } - - var sanitizedJson = JsonSerializer.Serialize(detail, SerializerOptions); - var payload = BsonDocument.Parse(sanitizedJson); - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - JvnConstants.DtoSchemaVersion, - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - Console.WriteLine($"Added mapping for {documentId}"); - _logger.LogDebug("JVN parsed document {DocumentId}", documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - _logger.LogDebug("JVN map pending mappings: {PendingCount}", cursor.PendingMappings.Count); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dto is null || document is null) - { - _logger.LogWarning("Skipping JVN mapping for {DocumentId}: DTO or document missing", documentId); - pendingMappings.Remove(documentId); - continue; - } - - var dtoJson = dto.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings - { - OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, - }); - - JvnDetailDto detail; - try - { - detail = JsonSerializer.Deserialize(dtoJson, SerializerOptions) - ?? throw new InvalidOperationException("Deserialized DTO was null."); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize JVN DTO for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var (advisory, flag) = JvnAdvisoryMapper.Map(detail, document, dto, _timeProvider); - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _jpFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - pendingMappings.Remove(documentId); - _logger.LogDebug("JVN mapped document {DocumentId}", documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? JvnCursor.Empty : JvnCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(JvnCursor cursor, CancellationToken cancellationToken) - { - var cursorDocument = cursor.ToBsonDocument(); - await _stateRepository.UpdateCursorAsync(SourceName, cursorDocument, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } -} +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Jvn.Configuration; +using StellaOps.Concelier.Connector.Jvn.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.JpFlags; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Jvn; + +public sealed class JvnConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.General) + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly MyJvnClient _client; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly IJpFlagStore _jpFlagStore; + private readonly ISourceStateRepository _stateRepository; + private readonly TimeProvider _timeProvider; + private readonly JvnOptions _options; + private readonly ILogger _logger; + + public JvnConnector( + MyJvnClient client, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + IJpFlagStore jpFlagStore, + ISourceStateRepository stateRepository, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _client = client ?? throw new ArgumentNullException(nameof(client)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _jpFlagStore = jpFlagStore ?? throw new ArgumentNullException(nameof(jpFlagStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => JvnConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var windowEnd = now; + var defaultWindowStart = windowEnd - _options.WindowSize; + + var windowStart = cursor.LastCompletedWindowEnd.HasValue + ? cursor.LastCompletedWindowEnd.Value - _options.WindowOverlap + : defaultWindowStart; + + if (windowStart < defaultWindowStart) + { + windowStart = defaultWindowStart; + } + + if (windowStart >= windowEnd) + { + windowStart = windowEnd - TimeSpan.FromHours(1); + } + + _logger.LogInformation("JVN fetch window {WindowStart:o} - {WindowEnd:o}", windowStart, windowEnd); + + IReadOnlyList overviewItems; + try + { + overviewItems = await _client.GetOverviewAsync(windowStart, windowEnd, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to retrieve JVN overview between {Start:o} and {End:o}", windowStart, windowEnd); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + _logger.LogInformation("JVN overview returned {Count} items", overviewItems.Count); + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + + foreach (var item in overviewItems) + { + cancellationToken.ThrowIfCancellationRequested(); + + var detailUri = _client.BuildDetailUri(item.VulnerabilityId); + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["jvn.vulnId"] = item.VulnerabilityId, + ["jvn.detailUrl"] = detailUri.ToString(), + }; + + if (item.DateFirstPublished.HasValue) + { + metadata["jvn.firstPublished"] = item.DateFirstPublished.Value.ToString("O"); + } + + if (item.DateLastUpdated.HasValue) + { + metadata["jvn.lastUpdated"] = item.DateLastUpdated.Value.ToString("O"); + } + + var result = await _fetchService.FetchAsync( + new SourceFetchRequest(JvnOptions.HttpClientName, SourceName, detailUri) + { + Metadata = metadata + }, + cancellationToken).ConfigureAwait(false); + + if (!result.IsSuccess || result.Document is null) + { + if (!result.IsNotModified) + { + _logger.LogWarning("JVN fetch for {Uri} returned status {Status}", detailUri, result.StatusCode); + } + + continue; + } + + _logger.LogDebug("JVN fetched document {DocumentId}", result.Document.Id); + pendingDocuments.Add(result.Document.Id); + } + + var updatedCursor = cursor + .WithWindow(windowStart, windowEnd) + .WithCompletedWindow(windowEnd) + .WithPendingDocuments(pendingDocuments); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + _logger.LogDebug("JVN parse pending documents: {PendingCount}", cursor.PendingDocuments.Count); + Console.WriteLine($"JVN parse pending count: {cursor.PendingDocuments.Count}"); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + _logger.LogDebug("JVN parsing document {DocumentId}", documentId); + Console.WriteLine($"JVN parsing document {documentId}"); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + _logger.LogWarning("JVN document {DocumentId} no longer exists; skipping", documentId); + remainingDocuments.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("JVN document {DocumentId} is missing GridFS content; marking as failed", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Unable to download raw JVN document {DocumentId}", document.Id); + throw; + } + + JvnDetailDto detail; + try + { + detail = JvnDetailParser.Parse(rawBytes, document.Uri); + } + catch (JvnSchemaValidationException ex) + { + Console.WriteLine($"JVN schema validation exception: {ex.Message}"); + _logger.LogWarning(ex, "JVN schema validation failed for document {DocumentId} ({Uri})", document.Id, document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + throw; + } + + var sanitizedJson = JsonSerializer.Serialize(detail, SerializerOptions); + var payload = BsonDocument.Parse(sanitizedJson); + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + JvnConstants.DtoSchemaVersion, + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + Console.WriteLine($"Added mapping for {documentId}"); + _logger.LogDebug("JVN parsed document {DocumentId}", documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + _logger.LogDebug("JVN map pending mappings: {PendingCount}", cursor.PendingMappings.Count); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dto is null || document is null) + { + _logger.LogWarning("Skipping JVN mapping for {DocumentId}: DTO or document missing", documentId); + pendingMappings.Remove(documentId); + continue; + } + + var dtoJson = dto.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings + { + OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, + }); + + JvnDetailDto detail; + try + { + detail = JsonSerializer.Deserialize(dtoJson, SerializerOptions) + ?? throw new InvalidOperationException("Deserialized DTO was null."); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize JVN DTO for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var (advisory, flag) = JvnAdvisoryMapper.Map(detail, document, dto, _timeProvider); + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _jpFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + pendingMappings.Remove(documentId); + _logger.LogDebug("JVN mapped document {DocumentId}", documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? JvnCursor.Empty : JvnCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(JvnCursor cursor, CancellationToken cancellationToken) + { + var cursorDocument = cursor.ToBsonDocument(); + await _stateRepository.UpdateCursorAsync(SourceName, cursorDocument, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kev/KevConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kev/KevConnector.cs index 5b68656ef..1c4ecd53a 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kev/KevConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kev/KevConnector.cs @@ -1,441 +1,441 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using System.Text.Json.Serialization; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Json; -using StellaOps.Concelier.Connector.Kev.Configuration; -using StellaOps.Concelier.Connector.Kev.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Kev; - -public sealed class KevConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - PropertyNameCaseInsensitive = true, - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - }; - - private const string SchemaVersion = "kev.catalog.v1"; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly KevOptions _options; - private readonly IJsonSchemaValidator _schemaValidator; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - private readonly KevDiagnostics _diagnostics; - - public KevConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - IJsonSchemaValidator schemaValidator, - KevDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => KevConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - try - { - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, _options.FeedUri.ToString(), cancellationToken).ConfigureAwait(false); - - var request = new SourceFetchRequest( - KevOptions.HttpClientName, - SourceName, - _options.FeedUri) - { - Metadata = new Dictionary(StringComparer.Ordinal) - { - ["kev.cursor.catalogVersion"] = cursor.CatalogVersion ?? string.Empty, - ["kev.cursor.catalogReleased"] = cursor.CatalogReleased?.ToString("O") ?? string.Empty, - }, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - TimeoutOverride = _options.RequestTimeout, - AcceptHeaders = new[] { "application/json", "text/json" }, - }; - - _diagnostics.FetchAttempt(); - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - if (result.IsNotModified) - { - _diagnostics.FetchUnchanged(); - _logger.LogInformation( - "KEV catalog not modified (catalogVersion={CatalogVersion}, etag={Etag})", - cursor.CatalogVersion ?? "(unknown)", - existing?.Etag ?? "(none)"); - await UpdateCursorAsync(cursor, cancellationToken).ConfigureAwait(false); - return; - } - - if (!result.IsSuccess || result.Document is null) - { - _diagnostics.FetchFailure(); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), "KEV feed returned no content.", cancellationToken).ConfigureAwait(false); - return; - } - - _diagnostics.FetchSuccess(); - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var pendingDocumentsBefore = pendingDocuments.Count; - var pendingMappingsBefore = pendingMappings.Count; - - pendingDocuments.Add(result.Document.Id); - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - var document = result.Document; - var lastModified = document.LastModified?.ToUniversalTime().ToString("O") ?? "(unknown)"; - _logger.LogInformation( - "Fetched KEV catalog document {DocumentId} (etag={Etag}, lastModified={LastModified}) pendingDocuments={PendingDocumentsBefore}->{PendingDocumentsAfter} pendingMappings={PendingMappingsBefore}->{PendingMappingsAfter}", - document.Id, - document.Etag ?? "(none)", - lastModified, - pendingDocumentsBefore, - pendingDocuments.Count, - pendingMappingsBefore, - pendingMappings.Count); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "KEV fetch failed for {Uri}", _options.FeedUri); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var latestCatalogVersion = cursor.CatalogVersion; - var latestCatalogReleased = cursor.CatalogReleased; - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure("missingPayload", cursor.CatalogVersion); - _logger.LogWarning("KEV document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.ParseFailure("download", cursor.CatalogVersion); - _logger.LogError(ex, "KEV parse failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - KevCatalogDto? catalog = null; - string? catalogVersion = null; - try - { - using var jsonDocument = JsonDocument.Parse(rawBytes); - catalogVersion = TryGetCatalogVersion(jsonDocument.RootElement); - _schemaValidator.Validate(jsonDocument, KevSchemaProvider.Schema, document.Uri); - catalog = jsonDocument.RootElement.Deserialize(SerializerOptions); - } - catch (JsonSchemaValidationException ex) - { - _diagnostics.ParseFailure("schema", catalogVersion); - _logger.LogWarning(ex, "KEV schema validation failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - catch (JsonException ex) - { - _diagnostics.ParseFailure("invalidJson", catalogVersion); - _logger.LogError(ex, "KEV JSON parsing failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - catch (Exception ex) - { - _diagnostics.ParseFailure("deserialize", catalogVersion); - _logger.LogError(ex, "KEV catalog deserialization failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (catalog is null) - { - _diagnostics.ParseFailure("emptyCatalog", catalogVersion); - _logger.LogWarning("KEV catalog payload was empty for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var entryCount = catalog.Vulnerabilities?.Count ?? 0; - var released = catalog.DateReleased?.ToUniversalTime(); - RecordCatalogAnomalies(catalog); - - try - { - var payloadJson = JsonSerializer.Serialize(catalog, SerializerOptions); - var payload = BsonDocument.Parse(payloadJson); - - _logger.LogInformation( - "Parsed KEV catalog document {DocumentId} (version={CatalogVersion}, released={Released}, entries={EntryCount})", - document.Id, - catalog.CatalogVersion ?? "(unknown)", - released, - entryCount); - _diagnostics.CatalogParsed(catalog.CatalogVersion, entryCount); - - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - SchemaVersion, - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - pendingMappings.Add(document.Id); - - latestCatalogVersion = catalog.CatalogVersion ?? latestCatalogVersion; - latestCatalogReleased = catalog.DateReleased ?? latestCatalogReleased; - } - catch (Exception ex) - { - _logger.LogError(ex, "KEV DTO persistence failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings) - .WithCatalogMetadata(latestCatalogVersion, latestCatalogReleased); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - KevCatalogDto? catalog; - try - { - var dtoJson = dtoRecord.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings - { - OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, - }); - - catalog = JsonSerializer.Deserialize(dtoJson, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "KEV mapping: failed to deserialize DTO for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (catalog is null) - { - _logger.LogWarning("KEV mapping: DTO payload was empty for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var feedUri = TryParseUri(document.Uri) ?? _options.FeedUri; - var advisories = KevMapper.Map(catalog, SourceName, feedUri, document.FetchedAt, dtoRecord.ValidatedAt); - var entryCount = catalog.Vulnerabilities?.Count ?? 0; - var mappedCount = advisories.Count; - var skippedCount = Math.Max(0, entryCount - mappedCount); - _logger.LogInformation( - "Mapped {MappedCount}/{EntryCount} KEV advisories from catalog version {CatalogVersion} (skipped={SkippedCount})", - mappedCount, - entryCount, - catalog.CatalogVersion ?? "(unknown)", - skippedCount); - _diagnostics.AdvisoriesMapped(catalog.CatalogVersion, mappedCount); - - foreach (var advisory in advisories) - { - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - } - - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? KevCursor.Empty : KevCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(KevCursor cursor, CancellationToken cancellationToken) - { - return _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken); - } - - private void RecordCatalogAnomalies(KevCatalogDto catalog) - { - ArgumentNullException.ThrowIfNull(catalog); - - var version = catalog.CatalogVersion; - var vulnerabilities = catalog.Vulnerabilities ?? Array.Empty(); - - if (catalog.Count != vulnerabilities.Count) - { - _diagnostics.RecordAnomaly("countMismatch", version); - } - - foreach (var entry in vulnerabilities) - { - if (entry is null) - { - _diagnostics.RecordAnomaly("nullEntry", version); - continue; - } - - if (string.IsNullOrWhiteSpace(entry.CveId)) - { - _diagnostics.RecordAnomaly("missingCveId", version); - } - } - } - - private static string? TryGetCatalogVersion(JsonElement root) - { - if (root.ValueKind != JsonValueKind.Object) - { - return null; - } - - if (root.TryGetProperty("catalogVersion", out var versionElement) && versionElement.ValueKind == JsonValueKind.String) - { - return versionElement.GetString(); - } - - return null; - } - - private static Uri? TryParseUri(string? value) - => Uri.TryCreate(value, UriKind.Absolute, out var uri) ? uri : null; -} +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Json; +using StellaOps.Concelier.Connector.Kev.Configuration; +using StellaOps.Concelier.Connector.Kev.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Kev; + +public sealed class KevConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + PropertyNameCaseInsensitive = true, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + }; + + private const string SchemaVersion = "kev.catalog.v1"; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly KevOptions _options; + private readonly IJsonSchemaValidator _schemaValidator; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private readonly KevDiagnostics _diagnostics; + + public KevConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + IJsonSchemaValidator schemaValidator, + KevDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => KevConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + try + { + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, _options.FeedUri.ToString(), cancellationToken).ConfigureAwait(false); + + var request = new SourceFetchRequest( + KevOptions.HttpClientName, + SourceName, + _options.FeedUri) + { + Metadata = new Dictionary(StringComparer.Ordinal) + { + ["kev.cursor.catalogVersion"] = cursor.CatalogVersion ?? string.Empty, + ["kev.cursor.catalogReleased"] = cursor.CatalogReleased?.ToString("O") ?? string.Empty, + }, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + TimeoutOverride = _options.RequestTimeout, + AcceptHeaders = new[] { "application/json", "text/json" }, + }; + + _diagnostics.FetchAttempt(); + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + if (result.IsNotModified) + { + _diagnostics.FetchUnchanged(); + _logger.LogInformation( + "KEV catalog not modified (catalogVersion={CatalogVersion}, etag={Etag})", + cursor.CatalogVersion ?? "(unknown)", + existing?.Etag ?? "(none)"); + await UpdateCursorAsync(cursor, cancellationToken).ConfigureAwait(false); + return; + } + + if (!result.IsSuccess || result.Document is null) + { + _diagnostics.FetchFailure(); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), "KEV feed returned no content.", cancellationToken).ConfigureAwait(false); + return; + } + + _diagnostics.FetchSuccess(); + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var pendingDocumentsBefore = pendingDocuments.Count; + var pendingMappingsBefore = pendingMappings.Count; + + pendingDocuments.Add(result.Document.Id); + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + var document = result.Document; + var lastModified = document.LastModified?.ToUniversalTime().ToString("O") ?? "(unknown)"; + _logger.LogInformation( + "Fetched KEV catalog document {DocumentId} (etag={Etag}, lastModified={LastModified}) pendingDocuments={PendingDocumentsBefore}->{PendingDocumentsAfter} pendingMappings={PendingMappingsBefore}->{PendingMappingsAfter}", + document.Id, + document.Etag ?? "(none)", + lastModified, + pendingDocumentsBefore, + pendingDocuments.Count, + pendingMappingsBefore, + pendingMappings.Count); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "KEV fetch failed for {Uri}", _options.FeedUri); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var latestCatalogVersion = cursor.CatalogVersion; + var latestCatalogReleased = cursor.CatalogReleased; + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure("missingPayload", cursor.CatalogVersion); + _logger.LogWarning("KEV document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.ParseFailure("download", cursor.CatalogVersion); + _logger.LogError(ex, "KEV parse failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + KevCatalogDto? catalog = null; + string? catalogVersion = null; + try + { + using var jsonDocument = JsonDocument.Parse(rawBytes); + catalogVersion = TryGetCatalogVersion(jsonDocument.RootElement); + _schemaValidator.Validate(jsonDocument, KevSchemaProvider.Schema, document.Uri); + catalog = jsonDocument.RootElement.Deserialize(SerializerOptions); + } + catch (JsonSchemaValidationException ex) + { + _diagnostics.ParseFailure("schema", catalogVersion); + _logger.LogWarning(ex, "KEV schema validation failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + catch (JsonException ex) + { + _diagnostics.ParseFailure("invalidJson", catalogVersion); + _logger.LogError(ex, "KEV JSON parsing failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + catch (Exception ex) + { + _diagnostics.ParseFailure("deserialize", catalogVersion); + _logger.LogError(ex, "KEV catalog deserialization failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (catalog is null) + { + _diagnostics.ParseFailure("emptyCatalog", catalogVersion); + _logger.LogWarning("KEV catalog payload was empty for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var entryCount = catalog.Vulnerabilities?.Count ?? 0; + var released = catalog.DateReleased?.ToUniversalTime(); + RecordCatalogAnomalies(catalog); + + try + { + var payloadJson = JsonSerializer.Serialize(catalog, SerializerOptions); + var payload = BsonDocument.Parse(payloadJson); + + _logger.LogInformation( + "Parsed KEV catalog document {DocumentId} (version={CatalogVersion}, released={Released}, entries={EntryCount})", + document.Id, + catalog.CatalogVersion ?? "(unknown)", + released, + entryCount); + _diagnostics.CatalogParsed(catalog.CatalogVersion, entryCount); + + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + SchemaVersion, + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + pendingMappings.Add(document.Id); + + latestCatalogVersion = catalog.CatalogVersion ?? latestCatalogVersion; + latestCatalogReleased = catalog.DateReleased ?? latestCatalogReleased; + } + catch (Exception ex) + { + _logger.LogError(ex, "KEV DTO persistence failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings) + .WithCatalogMetadata(latestCatalogVersion, latestCatalogReleased); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + KevCatalogDto? catalog; + try + { + var dtoJson = dtoRecord.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings + { + OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, + }); + + catalog = JsonSerializer.Deserialize(dtoJson, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "KEV mapping: failed to deserialize DTO for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (catalog is null) + { + _logger.LogWarning("KEV mapping: DTO payload was empty for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var feedUri = TryParseUri(document.Uri) ?? _options.FeedUri; + var advisories = KevMapper.Map(catalog, SourceName, feedUri, document.FetchedAt, dtoRecord.ValidatedAt); + var entryCount = catalog.Vulnerabilities?.Count ?? 0; + var mappedCount = advisories.Count; + var skippedCount = Math.Max(0, entryCount - mappedCount); + _logger.LogInformation( + "Mapped {MappedCount}/{EntryCount} KEV advisories from catalog version {CatalogVersion} (skipped={SkippedCount})", + mappedCount, + entryCount, + catalog.CatalogVersion ?? "(unknown)", + skippedCount); + _diagnostics.AdvisoriesMapped(catalog.CatalogVersion, mappedCount); + + foreach (var advisory in advisories) + { + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + } + + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? KevCursor.Empty : KevCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(KevCursor cursor, CancellationToken cancellationToken) + { + return _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), _timeProvider.GetUtcNow(), cancellationToken); + } + + private void RecordCatalogAnomalies(KevCatalogDto catalog) + { + ArgumentNullException.ThrowIfNull(catalog); + + var version = catalog.CatalogVersion; + var vulnerabilities = catalog.Vulnerabilities ?? Array.Empty(); + + if (catalog.Count != vulnerabilities.Count) + { + _diagnostics.RecordAnomaly("countMismatch", version); + } + + foreach (var entry in vulnerabilities) + { + if (entry is null) + { + _diagnostics.RecordAnomaly("nullEntry", version); + continue; + } + + if (string.IsNullOrWhiteSpace(entry.CveId)) + { + _diagnostics.RecordAnomaly("missingCveId", version); + } + } + } + + private static string? TryGetCatalogVersion(JsonElement root) + { + if (root.ValueKind != JsonValueKind.Object) + { + return null; + } + + if (root.TryGetProperty("catalogVersion", out var versionElement) && versionElement.ValueKind == JsonValueKind.String) + { + return versionElement.GetString(); + } + + return null; + } + + private static Uri? TryParseUri(string? value) + => Uri.TryCreate(value, UriKind.Absolute, out var uri) ? uri : null; +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kisa/KisaConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kisa/KisaConnector.cs index c8976ca37..02e594e02 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kisa/KisaConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Kisa/KisaConnector.cs @@ -1,136 +1,136 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Kisa.Configuration; -using StellaOps.Concelier.Connector.Kisa.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Kisa; - -public sealed class KisaConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - PropertyNameCaseInsensitive = true, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly KisaFeedClient _feedClient; - private readonly KisaDetailParser _detailParser; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly KisaOptions _options; - private readonly KisaDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public KisaConnector( - KisaFeedClient feedClient, - KisaDetailParser detailParser, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, - KisaDiagnostics diagnostics, - TimeProvider? timeProvider, - ILogger logger) - { - _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); - _detailParser = detailParser ?? throw new ArgumentNullException(nameof(detailParser)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => KisaConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - _diagnostics.FeedAttempt(); - IReadOnlyList items; - - try - { - items = await _feedClient.LoadAsync(cancellationToken).ConfigureAwait(false); - _diagnostics.FeedSuccess(items.Count); - - if (items.Count > 0) - { - _logger.LogInformation("KISA feed returned {ItemCount} advisories", items.Count); - } - else - { - _logger.LogDebug("KISA feed returned no advisories"); - } - } - catch (Exception ex) - { - _diagnostics.FeedFailure(ex.GetType().Name); - _logger.LogError(ex, "KISA feed fetch failed"); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (items.Count == 0) - { - await UpdateCursorAsync(cursor.WithLastFetch(now), cancellationToken).ConfigureAwait(false); - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var knownIds = new HashSet(cursor.KnownIds, StringComparer.OrdinalIgnoreCase); - var processed = 0; - var latestPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; - - foreach (var item in items.OrderByDescending(static i => i.Published)) - { - cancellationToken.ThrowIfCancellationRequested(); - - if (knownIds.Contains(item.AdvisoryId)) - { - continue; - } - - if (processed >= _options.MaxAdvisoriesPerFetch) - { - break; - } - - var category = item.Category; - _diagnostics.DetailAttempt(category); - +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Kisa.Configuration; +using StellaOps.Concelier.Connector.Kisa.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Kisa; + +public sealed class KisaConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + PropertyNameCaseInsensitive = true, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly KisaFeedClient _feedClient; + private readonly KisaDetailParser _detailParser; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly KisaOptions _options; + private readonly KisaDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public KisaConnector( + KisaFeedClient feedClient, + KisaDetailParser detailParser, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, + KisaDiagnostics diagnostics, + TimeProvider? timeProvider, + ILogger logger) + { + _feedClient = feedClient ?? throw new ArgumentNullException(nameof(feedClient)); + _detailParser = detailParser ?? throw new ArgumentNullException(nameof(detailParser)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => KisaConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + _diagnostics.FeedAttempt(); + IReadOnlyList items; + + try + { + items = await _feedClient.LoadAsync(cancellationToken).ConfigureAwait(false); + _diagnostics.FeedSuccess(items.Count); + + if (items.Count > 0) + { + _logger.LogInformation("KISA feed returned {ItemCount} advisories", items.Count); + } + else + { + _logger.LogDebug("KISA feed returned no advisories"); + } + } + catch (Exception ex) + { + _diagnostics.FeedFailure(ex.GetType().Name); + _logger.LogError(ex, "KISA feed fetch failed"); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (items.Count == 0) + { + await UpdateCursorAsync(cursor.WithLastFetch(now), cancellationToken).ConfigureAwait(false); + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var knownIds = new HashSet(cursor.KnownIds, StringComparer.OrdinalIgnoreCase); + var processed = 0; + var latestPublished = cursor.LastPublished ?? DateTimeOffset.MinValue; + + foreach (var item in items.OrderByDescending(static i => i.Published)) + { + cancellationToken.ThrowIfCancellationRequested(); + + if (knownIds.Contains(item.AdvisoryId)) + { + continue; + } + + if (processed >= _options.MaxAdvisoriesPerFetch) + { + break; + } + + var category = item.Category; + _diagnostics.DetailAttempt(category); + try { var detailUri = item.DetailPageUri; @@ -149,125 +149,125 @@ public sealed class KisaConnector : IFeedConnector LastModified = existing?.LastModified, TimeoutOverride = _options.RequestTimeout, }; - - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - if (result.IsNotModified) - { - _diagnostics.DetailUnchanged(category); - _logger.LogDebug("KISA detail {Idx} unchanged ({Category})", item.AdvisoryId, category ?? "unknown"); - knownIds.Add(item.AdvisoryId); - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - _diagnostics.DetailFailure(category, "empty-document"); - _logger.LogWarning("KISA detail fetch returned no document for {Idx}", item.AdvisoryId); - continue; - } - - pendingDocuments.Add(result.Document.Id); - pendingMappings.Remove(result.Document.Id); - knownIds.Add(item.AdvisoryId); - processed++; - _diagnostics.DetailSuccess(category); - _logger.LogInformation( - "KISA fetched detail for {Idx} (documentId={DocumentId}, category={Category})", - item.AdvisoryId, - result.Document.Id, - category ?? "unknown"); - - if (_options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - catch (Exception ex) - { - _diagnostics.DetailFailure(category, ex.GetType().Name); - _logger.LogError(ex, "KISA detail fetch failed for {Idx}", item.AdvisoryId); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (item.Published > latestPublished) - { - latestPublished = item.Published; - _diagnostics.CursorAdvanced(); - _logger.LogDebug("KISA advanced published cursor to {Published:O}", latestPublished); - } - } - - var trimmedKnown = knownIds.Count > _options.MaxKnownAdvisories - ? knownIds.OrderByDescending(id => id, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxKnownAdvisories) - .ToArray() - : knownIds.ToArray(); - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithKnownIds(trimmedKnown) - .WithLastPublished(latestPublished == DateTimeOffset.MinValue ? cursor.LastPublished : latestPublished) - .WithLastFetch(now); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - _logger.LogInformation("KISA fetch stored {Processed} new documents (knownIds={KnownCount})", processed, trimmedKnown.Length); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var now = _timeProvider.GetUtcNow(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - _diagnostics.ParseFailure(null, "document-missing"); - _logger.LogWarning("KISA document {DocumentId} missing during parse", documentId); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var category = GetCategory(document); - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure(category, "missing-gridfs"); - _logger.LogWarning("KISA document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - _diagnostics.ParseAttempt(category); - - byte[] payload; - try - { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(category, "download"); - _logger.LogError(ex, "KISA unable to download document {DocumentId}", document.Id); - throw; - } - + + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + if (result.IsNotModified) + { + _diagnostics.DetailUnchanged(category); + _logger.LogDebug("KISA detail {Idx} unchanged ({Category})", item.AdvisoryId, category ?? "unknown"); + knownIds.Add(item.AdvisoryId); + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + _diagnostics.DetailFailure(category, "empty-document"); + _logger.LogWarning("KISA detail fetch returned no document for {Idx}", item.AdvisoryId); + continue; + } + + pendingDocuments.Add(result.Document.Id); + pendingMappings.Remove(result.Document.Id); + knownIds.Add(item.AdvisoryId); + processed++; + _diagnostics.DetailSuccess(category); + _logger.LogInformation( + "KISA fetched detail for {Idx} (documentId={DocumentId}, category={Category})", + item.AdvisoryId, + result.Document.Id, + category ?? "unknown"); + + if (_options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + catch (Exception ex) + { + _diagnostics.DetailFailure(category, ex.GetType().Name); + _logger.LogError(ex, "KISA detail fetch failed for {Idx}", item.AdvisoryId); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (item.Published > latestPublished) + { + latestPublished = item.Published; + _diagnostics.CursorAdvanced(); + _logger.LogDebug("KISA advanced published cursor to {Published:O}", latestPublished); + } + } + + var trimmedKnown = knownIds.Count > _options.MaxKnownAdvisories + ? knownIds.OrderByDescending(id => id, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxKnownAdvisories) + .ToArray() + : knownIds.ToArray(); + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithKnownIds(trimmedKnown) + .WithLastPublished(latestPublished == DateTimeOffset.MinValue ? cursor.LastPublished : latestPublished) + .WithLastFetch(now); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + _logger.LogInformation("KISA fetch stored {Processed} new documents (knownIds={KnownCount})", processed, trimmedKnown.Length); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var now = _timeProvider.GetUtcNow(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + _diagnostics.ParseFailure(null, "document-missing"); + _logger.LogWarning("KISA document {DocumentId} missing during parse", documentId); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var category = GetCategory(document); + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure(category, "missing-gridfs"); + _logger.LogWarning("KISA document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + _diagnostics.ParseAttempt(category); + + byte[] payload; + try + { + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(category, "download"); + _logger.LogError(ex, "KISA unable to download document {DocumentId}", document.Id); + throw; + } + KisaParsedAdvisory parsed; try { @@ -279,28 +279,28 @@ public sealed class KisaConnector : IFeedConnector { _diagnostics.ParseFailure(category, "parse"); _logger.LogError(ex, "KISA failed to parse detail {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - _diagnostics.ParseSuccess(category); - _logger.LogDebug("KISA parsed detail for {DocumentId} ({Category})", document.Id, category ?? "unknown"); - - var dtoBson = BsonDocument.Parse(JsonSerializer.Serialize(parsed, SerializerOptions)); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "kisa.detail.v1", dtoBson, now); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - pendingMappings.Add(document.Id); - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + _diagnostics.ParseSuccess(category); + _logger.LogDebug("KISA parsed detail for {DocumentId} ({Category})", document.Id, category ?? "unknown"); + + var dtoBson = BsonDocument.Parse(JsonSerializer.Serialize(parsed, SerializerOptions)); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "kisa.detail.v1", dtoBson, now); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + pendingMappings.Add(document.Id); + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); } private static Uri? TryGetUri(IReadOnlyDictionary? metadata, string key) @@ -318,107 +318,107 @@ public sealed class KisaConnector : IFeedConnector return Uri.TryCreate(value, UriKind.Absolute, out var uri) ? uri : null; } - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - _diagnostics.MapFailure(null, "document-missing"); - _logger.LogWarning("KISA document {DocumentId} missing during map", documentId); - pendingMappings.Remove(documentId); - continue; - } - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - _diagnostics.MapFailure(null, "dto-missing"); - _logger.LogWarning("KISA DTO missing for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - KisaParsedAdvisory? parsed; - try - { - parsed = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions); - } - catch (Exception ex) - { - _diagnostics.MapFailure(null, "dto-deserialize"); - _logger.LogError(ex, "KISA failed to deserialize DTO for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (parsed is null) - { - _diagnostics.MapFailure(null, "dto-null"); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - try - { - var advisory = KisaMapper.Map(parsed, document, dtoRecord.ValidatedAt); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - _diagnostics.MapSuccess(parsed.Severity); - _logger.LogInformation("KISA mapped advisory {AdvisoryId} (severity={Severity})", parsed.AdvisoryId, parsed.Severity ?? "unknown"); - } - catch (Exception ex) - { - _diagnostics.MapFailure(parsed.Severity, "map"); - _logger.LogError(ex, "KISA mapping failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private static string? GetCategory(DocumentRecord document) - { - if (document.Metadata is null) - { - return null; - } - - return document.Metadata.TryGetValue("kisa.category", out var category) - ? category - : null; - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? KisaCursor.Empty : KisaCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(KisaCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - var completedAt = cursor.LastFetchAt ?? _timeProvider.GetUtcNow(); - return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); - } -} + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + _diagnostics.MapFailure(null, "document-missing"); + _logger.LogWarning("KISA document {DocumentId} missing during map", documentId); + pendingMappings.Remove(documentId); + continue; + } + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + _diagnostics.MapFailure(null, "dto-missing"); + _logger.LogWarning("KISA DTO missing for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + KisaParsedAdvisory? parsed; + try + { + parsed = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions); + } + catch (Exception ex) + { + _diagnostics.MapFailure(null, "dto-deserialize"); + _logger.LogError(ex, "KISA failed to deserialize DTO for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (parsed is null) + { + _diagnostics.MapFailure(null, "dto-null"); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + try + { + var advisory = KisaMapper.Map(parsed, document, dtoRecord.ValidatedAt); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + _diagnostics.MapSuccess(parsed.Severity); + _logger.LogInformation("KISA mapped advisory {AdvisoryId} (severity={Severity})", parsed.AdvisoryId, parsed.Severity ?? "unknown"); + } + catch (Exception ex) + { + _diagnostics.MapFailure(parsed.Severity, "map"); + _logger.LogError(ex, "KISA mapping failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private static string? GetCategory(DocumentRecord document) + { + if (document.Metadata is null) + { + return null; + } + + return document.Metadata.TryGetValue("kisa.category", out var category) + ? category + : null; + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? KisaCursor.Empty : KisaCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(KisaCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + var completedAt = cursor.LastFetchAt ?? _timeProvider.GetUtcNow(); + return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Nvd/NvdConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Nvd/NvdConnector.cs index 98d98dd59..d6f84a3d9 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Nvd/NvdConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Nvd/NvdConnector.cs @@ -1,568 +1,568 @@ -using System.Globalization; -using System.Text; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Json; -using StellaOps.Concelier.Connector.Common.Cursors; -using StellaOps.Concelier.Connector.Nvd.Configuration; -using StellaOps.Concelier.Connector.Nvd.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.ChangeHistory; +using System.Globalization; +using System.Text; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Json; +using StellaOps.Concelier.Connector.Common.Cursors; +using StellaOps.Concelier.Connector.Nvd.Configuration; +using StellaOps.Concelier.Connector.Nvd.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.ChangeHistory; using StellaOps.Plugin; using Json.Schema; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.Nvd; - -public sealed class NvdConnector : IFeedConnector -{ - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly IChangeHistoryStore _changeHistoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly IJsonSchemaValidator _schemaValidator; + +namespace StellaOps.Concelier.Connector.Nvd; + +public sealed class NvdConnector : IFeedConnector +{ + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly IChangeHistoryStore _changeHistoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly IJsonSchemaValidator _schemaValidator; private readonly NvdOptions _options; private readonly TimeProvider _timeProvider; private readonly ILogger _logger; private readonly NvdDiagnostics _diagnostics; private readonly ICryptoHash _hash; - - private static readonly JsonSchema Schema = NvdSchemaProvider.Schema; - - public NvdConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - IChangeHistoryStore changeHistoryStore, - ISourceStateRepository stateRepository, - IJsonSchemaValidator schemaValidator, + + private static readonly JsonSchema Schema = NvdSchemaProvider.Schema; + + public NvdConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + IChangeHistoryStore changeHistoryStore, + ISourceStateRepository stateRepository, + IJsonSchemaValidator schemaValidator, IOptions options, NvdDiagnostics diagnostics, ICryptoHash hash, TimeProvider? timeProvider, ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _changeHistoryStore = changeHistoryStore ?? throw new ArgumentNullException(nameof(changeHistoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); - _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _changeHistoryStore = changeHistoryStore ?? throw new ArgumentNullException(nameof(changeHistoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); + _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); _hash = hash ?? throw new ArgumentNullException(nameof(hash)); _timeProvider = timeProvider ?? TimeProvider.System; _logger = logger ?? throw new ArgumentNullException(nameof(logger)); } - - public string SourceName => NvdConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - - var windowOptions = new TimeWindowCursorOptions - { - WindowSize = _options.WindowSize, - Overlap = _options.WindowOverlap, - InitialBackfill = _options.InitialBackfill, - }; - - var window = TimeWindowCursorPlanner.GetNextWindow(now, cursor.Window, windowOptions); - var requestUri = BuildRequestUri(window); - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["windowStart"] = window.Start.ToString("O"), - ["windowEnd"] = window.End.ToString("O"), - }; - metadata["startIndex"] = "0"; - - try - { - _diagnostics.FetchAttempt(); - - var result = await _fetchService.FetchAsync( - new SourceFetchRequest( - NvdOptions.HttpClientName, - SourceName, - requestUri) - { - Metadata = metadata - }, - cancellationToken).ConfigureAwait(false); - - if (result.IsNotModified) - { - _diagnostics.FetchUnchanged(); - _logger.LogDebug("NVD window {Start} - {End} returned 304", window.Start, window.End); - await UpdateCursorAsync(cursor.WithWindow(window), cancellationToken).ConfigureAwait(false); - return; - } - - if (!result.IsSuccess || result.Document is null) - { - _diagnostics.FetchFailure(); - return; - } - - _diagnostics.FetchDocument(); - - var pendingDocuments = new HashSet(cursor.PendingDocuments) - { - result.Document.Id - }; - - var additionalDocuments = await FetchAdditionalPagesAsync( - window, - metadata, - result.Document, - cancellationToken).ConfigureAwait(false); - - foreach (var documentId in additionalDocuments) - { - pendingDocuments.Add(documentId); - } - - var updated = cursor - .WithWindow(window) - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(cursor.PendingMappings); - - await UpdateCursorAsync(updated, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "NVD fetch failed for {Uri}", requestUri); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingFetch = cursor.PendingDocuments.ToList(); - var pendingMapping = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - _diagnostics.ParseFailure(); - remainingFetch.Remove(documentId); - pendingMapping.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Document {DocumentId} is missing GridFS content; skipping", documentId); - _diagnostics.ParseFailure(); - remainingFetch.Remove(documentId); - pendingMapping.Remove(documentId); - continue; - } - - var rawBytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - try - { - using var jsonDocument = JsonDocument.Parse(rawBytes); - try - { - _schemaValidator.Validate(jsonDocument, Schema, document.Uri); - } - catch (JsonSchemaValidationException ex) - { - _logger.LogWarning(ex, "NVD schema validation failed for document {DocumentId} ({Uri})", document.Id, document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingFetch.Remove(documentId); - pendingMapping.Remove(documentId); - _diagnostics.ParseQuarantine(); - continue; - } - - var sanitized = JsonSerializer.Serialize(jsonDocument.RootElement); - var payload = BsonDocument.Parse(sanitized); - - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - "nvd.cve.v2", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - _diagnostics.ParseSuccess(); - - remainingFetch.Remove(documentId); - if (!pendingMapping.Contains(documentId)) - { - pendingMapping.Add(documentId); - } - } - catch (JsonException ex) - { - _logger.LogWarning(ex, "Failed to parse NVD JSON payload for document {DocumentId} ({Uri})", document.Id, document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingFetch.Remove(documentId); - pendingMapping.Remove(documentId); - _diagnostics.ParseFailure(); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingFetch) - .WithPendingMappings(pendingMapping); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMapping = cursor.PendingMappings.ToList(); - var now = _timeProvider.GetUtcNow(); - - foreach (var documentId in cursor.PendingMappings) - { - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dto is null || document is null) - { - pendingMapping.Remove(documentId); - continue; - } - - var json = dto.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings - { - OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, - }); - - using var jsonDocument = JsonDocument.Parse(json); - var advisories = NvdMapper.Map(jsonDocument, document, now) - .GroupBy(static advisory => advisory.AdvisoryKey, StringComparer.Ordinal) - .Select(static group => group.First()) - .ToArray(); - - var mappedCount = 0L; - foreach (var advisory in advisories) - { - if (string.IsNullOrWhiteSpace(advisory.AdvisoryKey)) - { - _logger.LogWarning("Skipping advisory with missing key for document {DocumentId} ({Uri})", document.Id, document.Uri); - continue; - } - - var previous = await _advisoryStore.FindAsync(advisory.AdvisoryKey, cancellationToken).ConfigureAwait(false); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - if (previous is not null) - { - await RecordChangeHistoryAsync(advisory, previous, document, now, cancellationToken).ConfigureAwait(false); - } - mappedCount++; - } - - if (mappedCount > 0) - { - _diagnostics.MapSuccess(mappedCount); - } - - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMapping.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMapping); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task> FetchAdditionalPagesAsync( - TimeWindow window, - IReadOnlyDictionary baseMetadata, - DocumentRecord firstDocument, - CancellationToken cancellationToken) - { - if (firstDocument.GridFsId is null) - { - return Array.Empty(); - } - - byte[] rawBytes; - try - { - rawBytes = await _rawDocumentStorage.DownloadAsync(firstDocument.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Unable to download NVD first page {DocumentId} to evaluate pagination", firstDocument.Id); - return Array.Empty(); - } - - try - { - using var jsonDocument = JsonDocument.Parse(rawBytes); - var root = jsonDocument.RootElement; - - if (!TryReadInt32(root, "totalResults", out var totalResults) || !TryReadInt32(root, "resultsPerPage", out var resultsPerPage)) - { - return Array.Empty(); - } - - if (resultsPerPage <= 0 || totalResults <= resultsPerPage) - { - return Array.Empty(); - } - - var fetchedDocuments = new List(); - - foreach (var startIndex in PaginationPlanner.EnumerateAdditionalPages(totalResults, resultsPerPage)) - { - var metadata = new Dictionary(StringComparer.Ordinal); - foreach (var kvp in baseMetadata) - { - metadata[kvp.Key] = kvp.Value; - } - metadata["startIndex"] = startIndex.ToString(CultureInfo.InvariantCulture); - - var request = new SourceFetchRequest( - NvdOptions.HttpClientName, - SourceName, - BuildRequestUri(window, startIndex)) - { - Metadata = metadata - }; - - SourceFetchResult pageResult; - try - { - _diagnostics.FetchAttempt(); - pageResult = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "NVD fetch failed for page starting at {StartIndex}", startIndex); - throw; - } - - if (pageResult.IsNotModified) - { - _diagnostics.FetchUnchanged(); - continue; - } - - if (!pageResult.IsSuccess || pageResult.Document is null) - { - _diagnostics.FetchFailure(); - _logger.LogWarning("NVD fetch for page starting at {StartIndex} returned status {Status}", startIndex, pageResult.StatusCode); - continue; - } - - _diagnostics.FetchDocument(); - fetchedDocuments.Add(pageResult.Document.Id); - } - - return fetchedDocuments; - } - catch (JsonException ex) - { - _logger.LogWarning(ex, "Failed to parse NVD first page {DocumentId} while determining pagination", firstDocument.Id); - return Array.Empty(); - } - } - - private static bool TryReadInt32(JsonElement root, string propertyName, out int value) - { - value = 0; - if (!root.TryGetProperty(propertyName, out var property) || property.ValueKind != JsonValueKind.Number) - { - return false; - } - - if (property.TryGetInt32(out var intValue)) - { - value = intValue; - return true; - } - - if (property.TryGetInt64(out var longValue)) - { - if (longValue > int.MaxValue) - { - value = int.MaxValue; - return true; - } - - value = (int)longValue; - return true; - } - - return false; - } - - private async Task RecordChangeHistoryAsync( - Advisory current, - Advisory previous, - DocumentRecord document, - DateTimeOffset capturedAt, - CancellationToken cancellationToken) - { - if (current.Equals(previous)) - { - return; - } - - var currentSnapshot = SnapshotSerializer.ToSnapshot(current); - var previousSnapshot = SnapshotSerializer.ToSnapshot(previous); - - if (string.Equals(currentSnapshot, previousSnapshot, StringComparison.Ordinal)) - { - return; - } - - var changes = ComputeChanges(previousSnapshot, currentSnapshot); - if (changes.Count == 0) - { - return; - } - - var documentHash = string.IsNullOrWhiteSpace(document.Sha256) - ? ComputeHash(currentSnapshot) - : document.Sha256; - - var record = new ChangeHistoryRecord( - Guid.NewGuid(), - SourceName, - current.AdvisoryKey, - document.Id, - documentHash, - ComputeHash(currentSnapshot), - ComputeHash(previousSnapshot), - currentSnapshot, - previousSnapshot, - changes, - capturedAt); - - await _changeHistoryStore.AddAsync(record, cancellationToken).ConfigureAwait(false); - } - - private static IReadOnlyList ComputeChanges(string previousSnapshot, string currentSnapshot) - { - using var previousDocument = JsonDocument.Parse(previousSnapshot); - using var currentDocument = JsonDocument.Parse(currentSnapshot); - - var previousRoot = previousDocument.RootElement; - var currentRoot = currentDocument.RootElement; - var fields = new HashSet(StringComparer.Ordinal); - - foreach (var property in previousRoot.EnumerateObject()) - { - fields.Add(property.Name); - } - - foreach (var property in currentRoot.EnumerateObject()) - { - fields.Add(property.Name); - } - - var changes = new List(); - foreach (var field in fields.OrderBy(static name => name, StringComparer.Ordinal)) - { - var hasPrevious = previousRoot.TryGetProperty(field, out var previousValue); - var hasCurrent = currentRoot.TryGetProperty(field, out var currentValue); - - if (!hasPrevious && hasCurrent) - { - changes.Add(new ChangeHistoryFieldChange(field, "Added", null, SerializeElement(currentValue))); - continue; - } - - if (hasPrevious && !hasCurrent) - { - changes.Add(new ChangeHistoryFieldChange(field, "Removed", SerializeElement(previousValue), null)); - continue; - } - - if (hasPrevious && hasCurrent && !JsonElement.DeepEquals(previousValue, currentValue)) - { - changes.Add(new ChangeHistoryFieldChange(field, "Modified", SerializeElement(previousValue), SerializeElement(currentValue))); - } - } - - return changes; - } - - private static string SerializeElement(JsonElement element) - => JsonSerializer.Serialize(element, new JsonSerializerOptions { WriteIndented = false }); - + + public string SourceName => NvdConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + + var windowOptions = new TimeWindowCursorOptions + { + WindowSize = _options.WindowSize, + Overlap = _options.WindowOverlap, + InitialBackfill = _options.InitialBackfill, + }; + + var window = TimeWindowCursorPlanner.GetNextWindow(now, cursor.Window, windowOptions); + var requestUri = BuildRequestUri(window); + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["windowStart"] = window.Start.ToString("O"), + ["windowEnd"] = window.End.ToString("O"), + }; + metadata["startIndex"] = "0"; + + try + { + _diagnostics.FetchAttempt(); + + var result = await _fetchService.FetchAsync( + new SourceFetchRequest( + NvdOptions.HttpClientName, + SourceName, + requestUri) + { + Metadata = metadata + }, + cancellationToken).ConfigureAwait(false); + + if (result.IsNotModified) + { + _diagnostics.FetchUnchanged(); + _logger.LogDebug("NVD window {Start} - {End} returned 304", window.Start, window.End); + await UpdateCursorAsync(cursor.WithWindow(window), cancellationToken).ConfigureAwait(false); + return; + } + + if (!result.IsSuccess || result.Document is null) + { + _diagnostics.FetchFailure(); + return; + } + + _diagnostics.FetchDocument(); + + var pendingDocuments = new HashSet(cursor.PendingDocuments) + { + result.Document.Id + }; + + var additionalDocuments = await FetchAdditionalPagesAsync( + window, + metadata, + result.Document, + cancellationToken).ConfigureAwait(false); + + foreach (var documentId in additionalDocuments) + { + pendingDocuments.Add(documentId); + } + + var updated = cursor + .WithWindow(window) + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(cursor.PendingMappings); + + await UpdateCursorAsync(updated, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "NVD fetch failed for {Uri}", requestUri); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingFetch = cursor.PendingDocuments.ToList(); + var pendingMapping = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + _diagnostics.ParseFailure(); + remainingFetch.Remove(documentId); + pendingMapping.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Document {DocumentId} is missing GridFS content; skipping", documentId); + _diagnostics.ParseFailure(); + remainingFetch.Remove(documentId); + pendingMapping.Remove(documentId); + continue; + } + + var rawBytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + try + { + using var jsonDocument = JsonDocument.Parse(rawBytes); + try + { + _schemaValidator.Validate(jsonDocument, Schema, document.Uri); + } + catch (JsonSchemaValidationException ex) + { + _logger.LogWarning(ex, "NVD schema validation failed for document {DocumentId} ({Uri})", document.Id, document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingFetch.Remove(documentId); + pendingMapping.Remove(documentId); + _diagnostics.ParseQuarantine(); + continue; + } + + var sanitized = JsonSerializer.Serialize(jsonDocument.RootElement); + var payload = BsonDocument.Parse(sanitized); + + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + "nvd.cve.v2", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + _diagnostics.ParseSuccess(); + + remainingFetch.Remove(documentId); + if (!pendingMapping.Contains(documentId)) + { + pendingMapping.Add(documentId); + } + } + catch (JsonException ex) + { + _logger.LogWarning(ex, "Failed to parse NVD JSON payload for document {DocumentId} ({Uri})", document.Id, document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingFetch.Remove(documentId); + pendingMapping.Remove(documentId); + _diagnostics.ParseFailure(); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingFetch) + .WithPendingMappings(pendingMapping); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMapping = cursor.PendingMappings.ToList(); + var now = _timeProvider.GetUtcNow(); + + foreach (var documentId in cursor.PendingMappings) + { + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dto is null || document is null) + { + pendingMapping.Remove(documentId); + continue; + } + + var json = dto.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings + { + OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, + }); + + using var jsonDocument = JsonDocument.Parse(json); + var advisories = NvdMapper.Map(jsonDocument, document, now) + .GroupBy(static advisory => advisory.AdvisoryKey, StringComparer.Ordinal) + .Select(static group => group.First()) + .ToArray(); + + var mappedCount = 0L; + foreach (var advisory in advisories) + { + if (string.IsNullOrWhiteSpace(advisory.AdvisoryKey)) + { + _logger.LogWarning("Skipping advisory with missing key for document {DocumentId} ({Uri})", document.Id, document.Uri); + continue; + } + + var previous = await _advisoryStore.FindAsync(advisory.AdvisoryKey, cancellationToken).ConfigureAwait(false); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + if (previous is not null) + { + await RecordChangeHistoryAsync(advisory, previous, document, now, cancellationToken).ConfigureAwait(false); + } + mappedCount++; + } + + if (mappedCount > 0) + { + _diagnostics.MapSuccess(mappedCount); + } + + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMapping.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMapping); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task> FetchAdditionalPagesAsync( + TimeWindow window, + IReadOnlyDictionary baseMetadata, + DocumentRecord firstDocument, + CancellationToken cancellationToken) + { + if (firstDocument.PayloadId is null) + { + return Array.Empty(); + } + + byte[] rawBytes; + try + { + rawBytes = await _rawDocumentStorage.DownloadAsync(firstDocument.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Unable to download NVD first page {DocumentId} to evaluate pagination", firstDocument.Id); + return Array.Empty(); + } + + try + { + using var jsonDocument = JsonDocument.Parse(rawBytes); + var root = jsonDocument.RootElement; + + if (!TryReadInt32(root, "totalResults", out var totalResults) || !TryReadInt32(root, "resultsPerPage", out var resultsPerPage)) + { + return Array.Empty(); + } + + if (resultsPerPage <= 0 || totalResults <= resultsPerPage) + { + return Array.Empty(); + } + + var fetchedDocuments = new List(); + + foreach (var startIndex in PaginationPlanner.EnumerateAdditionalPages(totalResults, resultsPerPage)) + { + var metadata = new Dictionary(StringComparer.Ordinal); + foreach (var kvp in baseMetadata) + { + metadata[kvp.Key] = kvp.Value; + } + metadata["startIndex"] = startIndex.ToString(CultureInfo.InvariantCulture); + + var request = new SourceFetchRequest( + NvdOptions.HttpClientName, + SourceName, + BuildRequestUri(window, startIndex)) + { + Metadata = metadata + }; + + SourceFetchResult pageResult; + try + { + _diagnostics.FetchAttempt(); + pageResult = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "NVD fetch failed for page starting at {StartIndex}", startIndex); + throw; + } + + if (pageResult.IsNotModified) + { + _diagnostics.FetchUnchanged(); + continue; + } + + if (!pageResult.IsSuccess || pageResult.Document is null) + { + _diagnostics.FetchFailure(); + _logger.LogWarning("NVD fetch for page starting at {StartIndex} returned status {Status}", startIndex, pageResult.StatusCode); + continue; + } + + _diagnostics.FetchDocument(); + fetchedDocuments.Add(pageResult.Document.Id); + } + + return fetchedDocuments; + } + catch (JsonException ex) + { + _logger.LogWarning(ex, "Failed to parse NVD first page {DocumentId} while determining pagination", firstDocument.Id); + return Array.Empty(); + } + } + + private static bool TryReadInt32(JsonElement root, string propertyName, out int value) + { + value = 0; + if (!root.TryGetProperty(propertyName, out var property) || property.ValueKind != JsonValueKind.Number) + { + return false; + } + + if (property.TryGetInt32(out var intValue)) + { + value = intValue; + return true; + } + + if (property.TryGetInt64(out var longValue)) + { + if (longValue > int.MaxValue) + { + value = int.MaxValue; + return true; + } + + value = (int)longValue; + return true; + } + + return false; + } + + private async Task RecordChangeHistoryAsync( + Advisory current, + Advisory previous, + DocumentRecord document, + DateTimeOffset capturedAt, + CancellationToken cancellationToken) + { + if (current.Equals(previous)) + { + return; + } + + var currentSnapshot = SnapshotSerializer.ToSnapshot(current); + var previousSnapshot = SnapshotSerializer.ToSnapshot(previous); + + if (string.Equals(currentSnapshot, previousSnapshot, StringComparison.Ordinal)) + { + return; + } + + var changes = ComputeChanges(previousSnapshot, currentSnapshot); + if (changes.Count == 0) + { + return; + } + + var documentHash = string.IsNullOrWhiteSpace(document.Sha256) + ? ComputeHash(currentSnapshot) + : document.Sha256; + + var record = new ChangeHistoryRecord( + Guid.NewGuid(), + SourceName, + current.AdvisoryKey, + document.Id, + documentHash, + ComputeHash(currentSnapshot), + ComputeHash(previousSnapshot), + currentSnapshot, + previousSnapshot, + changes, + capturedAt); + + await _changeHistoryStore.AddAsync(record, cancellationToken).ConfigureAwait(false); + } + + private static IReadOnlyList ComputeChanges(string previousSnapshot, string currentSnapshot) + { + using var previousDocument = JsonDocument.Parse(previousSnapshot); + using var currentDocument = JsonDocument.Parse(currentSnapshot); + + var previousRoot = previousDocument.RootElement; + var currentRoot = currentDocument.RootElement; + var fields = new HashSet(StringComparer.Ordinal); + + foreach (var property in previousRoot.EnumerateObject()) + { + fields.Add(property.Name); + } + + foreach (var property in currentRoot.EnumerateObject()) + { + fields.Add(property.Name); + } + + var changes = new List(); + foreach (var field in fields.OrderBy(static name => name, StringComparer.Ordinal)) + { + var hasPrevious = previousRoot.TryGetProperty(field, out var previousValue); + var hasCurrent = currentRoot.TryGetProperty(field, out var currentValue); + + if (!hasPrevious && hasCurrent) + { + changes.Add(new ChangeHistoryFieldChange(field, "Added", null, SerializeElement(currentValue))); + continue; + } + + if (hasPrevious && !hasCurrent) + { + changes.Add(new ChangeHistoryFieldChange(field, "Removed", SerializeElement(previousValue), null)); + continue; + } + + if (hasPrevious && hasCurrent && !JsonElement.DeepEquals(previousValue, currentValue)) + { + changes.Add(new ChangeHistoryFieldChange(field, "Modified", SerializeElement(previousValue), SerializeElement(currentValue))); + } + } + + return changes; + } + + private static string SerializeElement(JsonElement element) + => JsonSerializer.Serialize(element, new JsonSerializerOptions { WriteIndented = false }); + private string ComputeHash(string snapshot) { var bytes = Encoding.UTF8.GetBytes(snapshot); var hex = _hash.ComputeHashHex(bytes, HashAlgorithms.Sha256); return $"sha256:{hex}"; } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return NvdCursor.FromBsonDocument(record?.Cursor); - } - - private async Task UpdateCursorAsync(NvdCursor cursor, CancellationToken cancellationToken) - { - var completedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); - } - - private Uri BuildRequestUri(TimeWindow window, int startIndex = 0) - { - var builder = new UriBuilder(_options.BaseEndpoint); - - var parameters = new Dictionary - { - ["lastModifiedStartDate"] = window.Start.ToString("yyyy-MM-dd'T'HH:mm:ss.fffK"), - ["lastModifiedEndDate"] = window.End.ToString("yyyy-MM-dd'T'HH:mm:ss.fffK"), - ["resultsPerPage"] = "2000", - }; - - if (startIndex > 0) - { - parameters["startIndex"] = startIndex.ToString(CultureInfo.InvariantCulture); - } - - builder.Query = string.Join("&", parameters.Select(static kvp => $"{System.Net.WebUtility.UrlEncode(kvp.Key)}={System.Net.WebUtility.UrlEncode(kvp.Value)}")); - return builder.Uri; - } -} + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return NvdCursor.FromBsonDocument(record?.Cursor); + } + + private async Task UpdateCursorAsync(NvdCursor cursor, CancellationToken cancellationToken) + { + var completedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); + } + + private Uri BuildRequestUri(TimeWindow window, int startIndex = 0) + { + var builder = new UriBuilder(_options.BaseEndpoint); + + var parameters = new Dictionary + { + ["lastModifiedStartDate"] = window.Start.ToString("yyyy-MM-dd'T'HH:mm:ss.fffK"), + ["lastModifiedEndDate"] = window.End.ToString("yyyy-MM-dd'T'HH:mm:ss.fffK"), + ["resultsPerPage"] = "2000", + }; + + if (startIndex > 0) + { + parameters["startIndex"] = startIndex.ToString(CultureInfo.InvariantCulture); + } + + builder.Query = string.Join("&", parameters.Select(static kvp => $"{System.Net.WebUtility.UrlEncode(kvp.Key)}={System.Net.WebUtility.UrlEncode(kvp.Value)}")); + return builder.Uri; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Osv/OsvConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Osv/OsvConnector.cs index aa3761d2d..445a3c092 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Osv/OsvConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Osv/OsvConnector.cs @@ -1,43 +1,43 @@ -using System; -using System.Collections.Generic; -using System.IO; -using System.IO.Compression; -using System.Linq; -using System.Net; -using System.Net.Http; +using System; +using System.Collections.Generic; +using System.IO; +using System.IO.Compression; +using System.Linq; +using System.Net; +using System.Net.Http; using System.Text.Json; using System.Text.Json.Serialization; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Osv.Configuration; -using StellaOps.Concelier.Connector.Osv.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Osv.Configuration; +using StellaOps.Concelier.Connector.Osv.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Concelier.Storage.Mongo.Dtos; using StellaOps.Plugin; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.Osv; - -public sealed class OsvConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - PropertyNameCaseInsensitive = true, - }; - - private readonly IHttpClientFactory _httpClientFactory; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; + +namespace StellaOps.Concelier.Connector.Osv; + +public sealed class OsvConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + PropertyNameCaseInsensitive = true, + }; + + private readonly IHttpClientFactory _httpClientFactory; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; private readonly IDtoStore _dtoStore; private readonly IAdvisoryStore _advisoryStore; private readonly ISourceStateRepository _stateRepository; @@ -46,10 +46,10 @@ public sealed class OsvConnector : IFeedConnector private readonly ILogger _logger; private readonly OsvDiagnostics _diagnostics; private readonly ICryptoHash _hash; - - public OsvConnector( - IHttpClientFactory httpClientFactory, - RawDocumentStorage rawDocumentStorage, + + public OsvConnector( + IHttpClientFactory httpClientFactory, + RawDocumentStorage rawDocumentStorage, IDocumentStore documentStore, IDtoStore dtoStore, IAdvisoryStore advisoryStore, @@ -73,197 +73,197 @@ public sealed class OsvConnector : IFeedConnector _timeProvider = timeProvider ?? TimeProvider.System; _logger = logger ?? throw new ArgumentNullException(nameof(logger)); } - - public string SourceName => OsvConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var cursorState = cursor; - var remainingCapacity = _options.MaxAdvisoriesPerFetch; - - foreach (var ecosystem in _options.Ecosystems) - { - if (remainingCapacity <= 0) - { - break; - } - - cancellationToken.ThrowIfCancellationRequested(); - - try - { - var result = await FetchEcosystemAsync( - ecosystem, - cursorState, - pendingDocuments, - now, - remainingCapacity, - cancellationToken).ConfigureAwait(false); - - cursorState = result.Cursor; - remainingCapacity -= result.NewDocuments; - } - catch (Exception ex) - { - _logger.LogError(ex, "OSV fetch failed for ecosystem {Ecosystem}", ecosystem); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - cursorState = cursorState - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(cursor.PendingMappings); - - await UpdateCursorAsync(cursorState, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("OSV document {DocumentId} missing GridFS content", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - byte[] bytes; - try - { - bytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Unable to download OSV raw document {DocumentId}", document.Id); - throw; - } - - OsvVulnerabilityDto? dto; - try - { - dto = JsonSerializer.Deserialize(bytes, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Failed to deserialize OSV document {DocumentId} ({Uri})", document.Id, document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - if (dto is null || string.IsNullOrWhiteSpace(dto.Id)) - { - _logger.LogWarning("OSV document {DocumentId} produced empty payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - continue; - } - - var sanitized = JsonSerializer.Serialize(dto, SerializerOptions); - var payload = MongoDB.Bson.BsonDocument.Parse(sanitized); - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - "osv.v1", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dto is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var payloadJson = dto.Payload.ToJson(new JsonWriterSettings - { - OutputMode = JsonOutputMode.RelaxedExtendedJson, - }); - - OsvVulnerabilityDto? osvDto; - try - { - osvDto = JsonSerializer.Deserialize(payloadJson, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize OSV DTO for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (osvDto is null || string.IsNullOrWhiteSpace(osvDto.Id)) - { - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - + + public string SourceName => OsvConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var cursorState = cursor; + var remainingCapacity = _options.MaxAdvisoriesPerFetch; + + foreach (var ecosystem in _options.Ecosystems) + { + if (remainingCapacity <= 0) + { + break; + } + + cancellationToken.ThrowIfCancellationRequested(); + + try + { + var result = await FetchEcosystemAsync( + ecosystem, + cursorState, + pendingDocuments, + now, + remainingCapacity, + cancellationToken).ConfigureAwait(false); + + cursorState = result.Cursor; + remainingCapacity -= result.NewDocuments; + } + catch (Exception ex) + { + _logger.LogError(ex, "OSV fetch failed for ecosystem {Ecosystem}", ecosystem); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + cursorState = cursorState + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(cursor.PendingMappings); + + await UpdateCursorAsync(cursorState, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("OSV document {DocumentId} missing GridFS content", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + byte[] bytes; + try + { + bytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Unable to download OSV raw document {DocumentId}", document.Id); + throw; + } + + OsvVulnerabilityDto? dto; + try + { + dto = JsonSerializer.Deserialize(bytes, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to deserialize OSV document {DocumentId} ({Uri})", document.Id, document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + if (dto is null || string.IsNullOrWhiteSpace(dto.Id)) + { + _logger.LogWarning("OSV document {DocumentId} produced empty payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + continue; + } + + var sanitized = JsonSerializer.Serialize(dto, SerializerOptions); + var payload = MongoDB.Bson.BsonDocument.Parse(sanitized); + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + "osv.v1", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dto is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var payloadJson = dto.Payload.ToJson(new JsonWriterSettings + { + OutputMode = JsonOutputMode.RelaxedExtendedJson, + }); + + OsvVulnerabilityDto? osvDto; + try + { + osvDto = JsonSerializer.Deserialize(payloadJson, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize OSV DTO for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (osvDto is null || string.IsNullOrWhiteSpace(osvDto.Id)) + { + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + var ecosystem = document.Metadata is not null && document.Metadata.TryGetValue("osv.ecosystem", out var ecosystemValue) ? ecosystemValue : "unknown"; @@ -289,232 +289,232 @@ public sealed class OsvConnector : IFeedConnector pendingMappings.Remove(documentId); } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? OsvCursor.Empty : OsvCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(OsvCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } - - private async Task<(OsvCursor Cursor, int NewDocuments)> FetchEcosystemAsync( - string ecosystem, - OsvCursor cursor, - HashSet pendingDocuments, - DateTimeOffset now, - int remainingCapacity, - CancellationToken cancellationToken) - { - var client = _httpClientFactory.CreateClient(OsvOptions.HttpClientName); - client.Timeout = _options.HttpTimeout; - - var archiveUri = BuildArchiveUri(ecosystem); - using var request = new HttpRequestMessage(HttpMethod.Get, archiveUri); - - if (cursor.TryGetArchiveMetadata(ecosystem, out var archiveMetadata)) - { - if (!string.IsNullOrWhiteSpace(archiveMetadata.ETag)) - { - request.Headers.TryAddWithoutValidation("If-None-Match", archiveMetadata.ETag); - } - - if (archiveMetadata.LastModified.HasValue) - { - request.Headers.IfModifiedSince = archiveMetadata.LastModified.Value; - } - } - - using var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false); - - if (response.StatusCode == HttpStatusCode.NotModified) - { - return (cursor, 0); - } - - response.EnsureSuccessStatusCode(); - - await using var archiveStream = await response.Content.ReadAsStreamAsync(cancellationToken).ConfigureAwait(false); - using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read, leaveOpen: false); - - var existingLastModified = cursor.GetLastModified(ecosystem); - var processedIdsSet = cursor.ProcessedIdsByEcosystem.TryGetValue(ecosystem, out var processedIds) - ? new HashSet(processedIds, StringComparer.OrdinalIgnoreCase) - : new HashSet(StringComparer.OrdinalIgnoreCase); - - var currentMaxModified = existingLastModified ?? DateTimeOffset.MinValue; - var currentProcessedIds = new HashSet(processedIdsSet, StringComparer.OrdinalIgnoreCase); - var processedUpdated = false; - var newDocuments = 0; - - var minimumModified = existingLastModified.HasValue - ? existingLastModified.Value - _options.ModifiedTolerance - : now - _options.InitialBackfill; - - ProvenanceDiagnostics.ReportResumeWindow(SourceName, minimumModified, _logger); - - foreach (var entry in archive.Entries) - { - if (remainingCapacity <= 0) - { - break; - } - - cancellationToken.ThrowIfCancellationRequested(); - - if (!entry.FullName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - await using var entryStream = entry.Open(); - using var memory = new MemoryStream(); - await entryStream.CopyToAsync(memory, cancellationToken).ConfigureAwait(false); - var bytes = memory.ToArray(); - - OsvVulnerabilityDto? dto; - try - { - dto = JsonSerializer.Deserialize(bytes, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Failed to parse OSV entry {Entry} for ecosystem {Ecosystem}", entry.FullName, ecosystem); - continue; - } - - if (dto is null || string.IsNullOrWhiteSpace(dto.Id)) - { - continue; - } - - var modified = (dto.Modified ?? dto.Published ?? DateTimeOffset.MinValue).ToUniversalTime(); - if (modified < minimumModified) - { - continue; - } - - if (existingLastModified.HasValue && modified < existingLastModified.Value - _options.ModifiedTolerance) - { - continue; - } - - if (modified < currentMaxModified - _options.ModifiedTolerance) - { - continue; - } - - if (modified == currentMaxModified && currentProcessedIds.Contains(dto.Id)) - { - continue; - } - - var documentUri = BuildDocumentUri(ecosystem, dto.Id); + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? OsvCursor.Empty : OsvCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(OsvCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } + + private async Task<(OsvCursor Cursor, int NewDocuments)> FetchEcosystemAsync( + string ecosystem, + OsvCursor cursor, + HashSet pendingDocuments, + DateTimeOffset now, + int remainingCapacity, + CancellationToken cancellationToken) + { + var client = _httpClientFactory.CreateClient(OsvOptions.HttpClientName); + client.Timeout = _options.HttpTimeout; + + var archiveUri = BuildArchiveUri(ecosystem); + using var request = new HttpRequestMessage(HttpMethod.Get, archiveUri); + + if (cursor.TryGetArchiveMetadata(ecosystem, out var archiveMetadata)) + { + if (!string.IsNullOrWhiteSpace(archiveMetadata.ETag)) + { + request.Headers.TryAddWithoutValidation("If-None-Match", archiveMetadata.ETag); + } + + if (archiveMetadata.LastModified.HasValue) + { + request.Headers.IfModifiedSince = archiveMetadata.LastModified.Value; + } + } + + using var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false); + + if (response.StatusCode == HttpStatusCode.NotModified) + { + return (cursor, 0); + } + + response.EnsureSuccessStatusCode(); + + await using var archiveStream = await response.Content.ReadAsStreamAsync(cancellationToken).ConfigureAwait(false); + using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read, leaveOpen: false); + + var existingLastModified = cursor.GetLastModified(ecosystem); + var processedIdsSet = cursor.ProcessedIdsByEcosystem.TryGetValue(ecosystem, out var processedIds) + ? new HashSet(processedIds, StringComparer.OrdinalIgnoreCase) + : new HashSet(StringComparer.OrdinalIgnoreCase); + + var currentMaxModified = existingLastModified ?? DateTimeOffset.MinValue; + var currentProcessedIds = new HashSet(processedIdsSet, StringComparer.OrdinalIgnoreCase); + var processedUpdated = false; + var newDocuments = 0; + + var minimumModified = existingLastModified.HasValue + ? existingLastModified.Value - _options.ModifiedTolerance + : now - _options.InitialBackfill; + + ProvenanceDiagnostics.ReportResumeWindow(SourceName, minimumModified, _logger); + + foreach (var entry in archive.Entries) + { + if (remainingCapacity <= 0) + { + break; + } + + cancellationToken.ThrowIfCancellationRequested(); + + if (!entry.FullName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + await using var entryStream = entry.Open(); + using var memory = new MemoryStream(); + await entryStream.CopyToAsync(memory, cancellationToken).ConfigureAwait(false); + var bytes = memory.ToArray(); + + OsvVulnerabilityDto? dto; + try + { + dto = JsonSerializer.Deserialize(bytes, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse OSV entry {Entry} for ecosystem {Ecosystem}", entry.FullName, ecosystem); + continue; + } + + if (dto is null || string.IsNullOrWhiteSpace(dto.Id)) + { + continue; + } + + var modified = (dto.Modified ?? dto.Published ?? DateTimeOffset.MinValue).ToUniversalTime(); + if (modified < minimumModified) + { + continue; + } + + if (existingLastModified.HasValue && modified < existingLastModified.Value - _options.ModifiedTolerance) + { + continue; + } + + if (modified < currentMaxModified - _options.ModifiedTolerance) + { + continue; + } + + if (modified == currentMaxModified && currentProcessedIds.Contains(dto.Id)) + { + continue; + } + + var documentUri = BuildDocumentUri(ecosystem, dto.Id); var sha256 = _hash.ComputeHashHex(bytes, HashAlgorithms.Sha256); - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); - if (existing is not null && string.Equals(existing.Sha256, sha256, StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - var gridFsId = await _rawDocumentStorage.UploadAsync(SourceName, documentUri, bytes, "application/json", null, cancellationToken).ConfigureAwait(false); - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["osv.ecosystem"] = ecosystem, - ["osv.id"] = dto.Id, - ["osv.modified"] = modified.ToString("O"), - }; - - var recordId = existing?.Id ?? Guid.NewGuid(); - var record = new DocumentRecord( - recordId, - SourceName, - documentUri, - _timeProvider.GetUtcNow(), - sha256, - DocumentStatuses.PendingParse, - "application/json", - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: modified, - GridFsId: gridFsId, - ExpiresAt: null); - - var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - pendingDocuments.Add(upserted.Id); - newDocuments++; - remainingCapacity--; - - if (modified > currentMaxModified) - { - currentMaxModified = modified; - currentProcessedIds = new HashSet(StringComparer.OrdinalIgnoreCase) { dto.Id }; - processedUpdated = true; - } - else if (modified == currentMaxModified) - { - currentProcessedIds.Add(dto.Id); - processedUpdated = true; - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - try - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - catch (TaskCanceledException) - { - break; - } - } - } - - if (processedUpdated && currentMaxModified != DateTimeOffset.MinValue) - { - cursor = cursor.WithLastModified(ecosystem, currentMaxModified, currentProcessedIds); - } - else if (processedUpdated && existingLastModified.HasValue) - { - cursor = cursor.WithLastModified(ecosystem, existingLastModified.Value, currentProcessedIds); - } - - var etag = response.Headers.ETag?.Tag; - var lastModifiedHeader = response.Content.Headers.LastModified; - cursor = cursor.WithArchiveMetadata(ecosystem, etag, lastModifiedHeader); - - return (cursor, newDocuments); - } - - private Uri BuildArchiveUri(string ecosystem) - { - var trimmed = ecosystem.Trim('/'); - var baseUri = _options.BaseUri; - var builder = new UriBuilder(baseUri); - var path = builder.Path; - if (!path.EndsWith('/')) - { - path += "/"; - } - - path += $"{trimmed}/{_options.ArchiveFileName}"; - builder.Path = path; - return builder.Uri; - } - - private static string BuildDocumentUri(string ecosystem, string vulnerabilityId) - { - var safeId = vulnerabilityId.Replace(' ', '-'); - return $"https://osv-vulnerabilities.storage.googleapis.com/{ecosystem}/{safeId}.json"; - } -} + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); + if (existing is not null && string.Equals(existing.Sha256, sha256, StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + var gridFsId = await _rawDocumentStorage.UploadAsync(SourceName, documentUri, bytes, "application/json", null, cancellationToken).ConfigureAwait(false); + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["osv.ecosystem"] = ecosystem, + ["osv.id"] = dto.Id, + ["osv.modified"] = modified.ToString("O"), + }; + + var recordId = existing?.Id ?? Guid.NewGuid(); + var record = new DocumentRecord( + recordId, + SourceName, + documentUri, + _timeProvider.GetUtcNow(), + sha256, + DocumentStatuses.PendingParse, + "application/json", + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: modified, + PayloadId: gridFsId, + ExpiresAt: null); + + var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); + pendingDocuments.Add(upserted.Id); + newDocuments++; + remainingCapacity--; + + if (modified > currentMaxModified) + { + currentMaxModified = modified; + currentProcessedIds = new HashSet(StringComparer.OrdinalIgnoreCase) { dto.Id }; + processedUpdated = true; + } + else if (modified == currentMaxModified) + { + currentProcessedIds.Add(dto.Id); + processedUpdated = true; + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + try + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + catch (TaskCanceledException) + { + break; + } + } + } + + if (processedUpdated && currentMaxModified != DateTimeOffset.MinValue) + { + cursor = cursor.WithLastModified(ecosystem, currentMaxModified, currentProcessedIds); + } + else if (processedUpdated && existingLastModified.HasValue) + { + cursor = cursor.WithLastModified(ecosystem, existingLastModified.Value, currentProcessedIds); + } + + var etag = response.Headers.ETag?.Tag; + var lastModifiedHeader = response.Content.Headers.LastModified; + cursor = cursor.WithArchiveMetadata(ecosystem, etag, lastModifiedHeader); + + return (cursor, newDocuments); + } + + private Uri BuildArchiveUri(string ecosystem) + { + var trimmed = ecosystem.Trim('/'); + var baseUri = _options.BaseUri; + var builder = new UriBuilder(baseUri); + var path = builder.Path; + if (!path.EndsWith('/')) + { + path += "/"; + } + + path += $"{trimmed}/{_options.ArchiveFileName}"; + builder.Path = path; + return builder.Uri; + } + + private static string BuildDocumentUri(string ecosystem, string vulnerabilityId) + { + var safeId = vulnerabilityId.Replace(' ', '-'); + return $"https://osv-vulnerabilities.storage.googleapis.com/{ecosystem}/{safeId}.json"; + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Bdu/RuBduConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Bdu/RuBduConnector.cs index b1cb9f88d..1d71a8913 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Bdu/RuBduConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Bdu/RuBduConnector.cs @@ -1,531 +1,531 @@ -using System.Collections.Immutable; -using System.Globalization; -using System.IO; -using System.IO.Compression; +using System.Collections.Immutable; +using System.Globalization; +using System.IO; +using System.IO.Compression; using System.Linq; using System.Text.Json; using System.Text.Json.Serialization; -using System.Xml; -using System.Xml.Linq; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Normalization.Cvss; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Ru.Bdu.Configuration; -using StellaOps.Concelier.Connector.Ru.Bdu.Internal; -using StellaOps.Concelier.Storage.Mongo; +using System.Xml; +using System.Xml.Linq; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Normalization.Cvss; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Ru.Bdu.Configuration; +using StellaOps.Concelier.Connector.Ru.Bdu.Internal; +using StellaOps.Concelier.Storage.Mongo; using StellaOps.Concelier.Storage.Mongo.Advisories; using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Concelier.Storage.Mongo.Dtos; using StellaOps.Plugin; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.Ru.Bdu; - -public sealed class RuBduConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly RuBduOptions _options; - private readonly RuBduDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - + +namespace StellaOps.Concelier.Connector.Ru.Bdu; + +public sealed class RuBduConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly RuBduOptions _options; + private readonly RuBduDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private readonly string _cacheDirectory; private readonly string _archiveCachePath; private readonly ICryptoHash _hash; - - public RuBduConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, + + public RuBduConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, RuBduDiagnostics diagnostics, TimeProvider? timeProvider, ILogger logger, ICryptoHash cryptoHash) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _timeProvider = timeProvider ?? TimeProvider.System; + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _timeProvider = timeProvider ?? TimeProvider.System; _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _hash = cryptoHash ?? throw new ArgumentNullException(nameof(cryptoHash)); _cacheDirectory = ResolveCacheDirectory(_options.CacheDirectory); - _archiveCachePath = Path.Combine(_cacheDirectory, "vulxml.zip"); - EnsureCacheDirectory(); - } - - public string SourceName => RuBduConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - _diagnostics.FetchAttempt(); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var now = _timeProvider.GetUtcNow(); - - SourceFetchContentResult? archiveResult = null; - byte[]? archiveContent = null; - var usedCache = false; - - try - { - var request = new SourceFetchRequest(RuBduOptions.HttpClientName, SourceName, _options.DataArchiveUri) - { - AcceptHeaders = new[] - { - "application/zip", - "application/octet-stream", - "application/x-zip-compressed", - }, - TimeoutOverride = _options.RequestTimeout, - }; - - var fetchResult = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); - archiveResult = fetchResult; - - if (fetchResult.IsNotModified) - { - _logger.LogDebug("RU-BDU archive not modified."); - _diagnostics.FetchUnchanged(); - await UpdateCursorAsync(cursor.WithLastSuccessfulFetch(now), cancellationToken).ConfigureAwait(false); - return; - } - - if (fetchResult.IsSuccess && fetchResult.Content is not null) - { - archiveContent = fetchResult.Content; - TryWriteCachedArchive(archiveContent); - } - } - catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) - { - if (TryReadCachedArchive(out var cachedFallback)) - { - _logger.LogWarning(ex, "RU-BDU archive fetch failed; using cached artefact {CachePath}", _archiveCachePath); - archiveContent = cachedFallback; - usedCache = true; - _diagnostics.FetchCacheFallback(); - } - else - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "RU-BDU archive fetch failed for {ArchiveUri}", _options.DataArchiveUri); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - if (archiveContent is null) - { - if (TryReadCachedArchive(out var cachedFallback)) - { - var status = archiveResult?.StatusCode; - _logger.LogWarning("RU-BDU archive unavailable (status={Status}); using cached artefact {CachePath}", status, _archiveCachePath); - archiveContent = cachedFallback; - usedCache = true; - _diagnostics.FetchCacheFallback(); - } - else - { - var status = archiveResult?.StatusCode; - _logger.LogWarning("RU-BDU archive fetch returned no content (status={Status})", status); - _diagnostics.FetchSuccess(addedCount: 0, usedCache: false); - await UpdateCursorAsync(cursor.WithLastSuccessfulFetch(now), cancellationToken).ConfigureAwait(false); - return; - } - } - - var archiveLastModified = archiveResult?.LastModified; - int added; - try - { - added = await ProcessArchiveAsync(archiveContent, now, pendingDocuments, pendingMappings, archiveLastModified, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "RU-BDU archive processing failed"); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - _diagnostics.FetchSuccess(added, usedCache); - if (added > 0) - { - _logger.LogInformation("RU-BDU processed {Added} vulnerabilities (cacheUsed={CacheUsed})", added, usedCache); - } - else - { - _logger.LogDebug("RU-BDU fetch completed with no new vulnerabilities (cacheUsed={CacheUsed})", usedCache); - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastSuccessfulFetch(now); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - _diagnostics.ParseFailure(); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("RU-BDU document {DocumentId} missing GridFS payload", documentId); - _diagnostics.ParseFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - byte[] payload; - try - { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "RU-BDU unable to download raw document {DocumentId}", documentId); - _diagnostics.ParseFailure(); - throw; - } - - RuBduVulnerabilityDto? dto; - try - { - dto = JsonSerializer.Deserialize(payload, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "RU-BDU failed to deserialize document {DocumentId}", documentId); - _diagnostics.ParseFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (dto is null) - { - _logger.LogWarning("RU-BDU document {DocumentId} produced null DTO", documentId); - _diagnostics.ParseFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var bson = MongoDB.Bson.BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "ru-bdu.v1", bson, _timeProvider.GetUtcNow()); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - _diagnostics.ParseSuccess( - dto.Software.IsDefaultOrEmpty ? 0 : dto.Software.Length, - dto.Identifiers.IsDefaultOrEmpty ? 0 : dto.Identifiers.Length, - dto.Sources.IsDefaultOrEmpty ? 0 : dto.Sources.Length); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - _diagnostics.MapFailure(); - pendingMappings.Remove(documentId); - continue; - } - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - _logger.LogWarning("RU-BDU document {DocumentId} missing DTO payload", documentId); - _diagnostics.MapFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - RuBduVulnerabilityDto dto; - try - { - dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToString(), SerializerOptions) ?? throw new InvalidOperationException("DTO deserialized to null"); - } - catch (Exception ex) - { - _logger.LogError(ex, "RU-BDU failed to deserialize DTO for document {DocumentId}", documentId); - _diagnostics.MapFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - try - { - var advisory = RuBduMapper.Map(dto, document, dtoRecord.ValidatedAt); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - _diagnostics.MapSuccess(advisory); - pendingMappings.Remove(documentId); - } - catch (Exception ex) - { - _logger.LogError(ex, "RU-BDU mapping failed for document {DocumentId}", documentId); - _diagnostics.MapFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task ProcessArchiveAsync( - byte[] archiveContent, - DateTimeOffset now, - HashSet pendingDocuments, - HashSet pendingMappings, - DateTimeOffset? archiveLastModified, - CancellationToken cancellationToken) - { - var added = 0; - using var archiveStream = new MemoryStream(archiveContent, writable: false); - using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read, leaveOpen: false); - var entry = archive.GetEntry("export/export.xml") ?? archive.Entries.FirstOrDefault(); - if (entry is null) - { - _logger.LogWarning("RU-BDU archive does not contain export/export.xml; skipping."); - return added; - } - - await using var entryStream = entry.Open(); - using var reader = XmlReader.Create(entryStream, new XmlReaderSettings - { - IgnoreComments = true, - IgnoreWhitespace = true, - DtdProcessing = DtdProcessing.Ignore, - CloseInput = false, - }); - - while (reader.Read()) - { - cancellationToken.ThrowIfCancellationRequested(); - if (reader.NodeType != XmlNodeType.Element || !reader.Name.Equals("vul", StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - if (RuBduXmlParser.TryParse(XNode.ReadFrom(reader) as XElement ?? new XElement("vul")) is not { } dto) - { - continue; - } - - var payload = JsonSerializer.SerializeToUtf8Bytes(dto, SerializerOptions); + _archiveCachePath = Path.Combine(_cacheDirectory, "vulxml.zip"); + EnsureCacheDirectory(); + } + + public string SourceName => RuBduConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + _diagnostics.FetchAttempt(); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var now = _timeProvider.GetUtcNow(); + + SourceFetchContentResult? archiveResult = null; + byte[]? archiveContent = null; + var usedCache = false; + + try + { + var request = new SourceFetchRequest(RuBduOptions.HttpClientName, SourceName, _options.DataArchiveUri) + { + AcceptHeaders = new[] + { + "application/zip", + "application/octet-stream", + "application/x-zip-compressed", + }, + TimeoutOverride = _options.RequestTimeout, + }; + + var fetchResult = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); + archiveResult = fetchResult; + + if (fetchResult.IsNotModified) + { + _logger.LogDebug("RU-BDU archive not modified."); + _diagnostics.FetchUnchanged(); + await UpdateCursorAsync(cursor.WithLastSuccessfulFetch(now), cancellationToken).ConfigureAwait(false); + return; + } + + if (fetchResult.IsSuccess && fetchResult.Content is not null) + { + archiveContent = fetchResult.Content; + TryWriteCachedArchive(archiveContent); + } + } + catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) + { + if (TryReadCachedArchive(out var cachedFallback)) + { + _logger.LogWarning(ex, "RU-BDU archive fetch failed; using cached artefact {CachePath}", _archiveCachePath); + archiveContent = cachedFallback; + usedCache = true; + _diagnostics.FetchCacheFallback(); + } + else + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "RU-BDU archive fetch failed for {ArchiveUri}", _options.DataArchiveUri); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + if (archiveContent is null) + { + if (TryReadCachedArchive(out var cachedFallback)) + { + var status = archiveResult?.StatusCode; + _logger.LogWarning("RU-BDU archive unavailable (status={Status}); using cached artefact {CachePath}", status, _archiveCachePath); + archiveContent = cachedFallback; + usedCache = true; + _diagnostics.FetchCacheFallback(); + } + else + { + var status = archiveResult?.StatusCode; + _logger.LogWarning("RU-BDU archive fetch returned no content (status={Status})", status); + _diagnostics.FetchSuccess(addedCount: 0, usedCache: false); + await UpdateCursorAsync(cursor.WithLastSuccessfulFetch(now), cancellationToken).ConfigureAwait(false); + return; + } + } + + var archiveLastModified = archiveResult?.LastModified; + int added; + try + { + added = await ProcessArchiveAsync(archiveContent, now, pendingDocuments, pendingMappings, archiveLastModified, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "RU-BDU archive processing failed"); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + _diagnostics.FetchSuccess(added, usedCache); + if (added > 0) + { + _logger.LogInformation("RU-BDU processed {Added} vulnerabilities (cacheUsed={CacheUsed})", added, usedCache); + } + else + { + _logger.LogDebug("RU-BDU fetch completed with no new vulnerabilities (cacheUsed={CacheUsed})", usedCache); + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastSuccessfulFetch(now); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + _diagnostics.ParseFailure(); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("RU-BDU document {DocumentId} missing GridFS payload", documentId); + _diagnostics.ParseFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + byte[] payload; + try + { + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "RU-BDU unable to download raw document {DocumentId}", documentId); + _diagnostics.ParseFailure(); + throw; + } + + RuBduVulnerabilityDto? dto; + try + { + dto = JsonSerializer.Deserialize(payload, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "RU-BDU failed to deserialize document {DocumentId}", documentId); + _diagnostics.ParseFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (dto is null) + { + _logger.LogWarning("RU-BDU document {DocumentId} produced null DTO", documentId); + _diagnostics.ParseFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var bson = MongoDB.Bson.BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "ru-bdu.v1", bson, _timeProvider.GetUtcNow()); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + _diagnostics.ParseSuccess( + dto.Software.IsDefaultOrEmpty ? 0 : dto.Software.Length, + dto.Identifiers.IsDefaultOrEmpty ? 0 : dto.Identifiers.Length, + dto.Sources.IsDefaultOrEmpty ? 0 : dto.Sources.Length); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + _diagnostics.MapFailure(); + pendingMappings.Remove(documentId); + continue; + } + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + _logger.LogWarning("RU-BDU document {DocumentId} missing DTO payload", documentId); + _diagnostics.MapFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + RuBduVulnerabilityDto dto; + try + { + dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToString(), SerializerOptions) ?? throw new InvalidOperationException("DTO deserialized to null"); + } + catch (Exception ex) + { + _logger.LogError(ex, "RU-BDU failed to deserialize DTO for document {DocumentId}", documentId); + _diagnostics.MapFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + try + { + var advisory = RuBduMapper.Map(dto, document, dtoRecord.ValidatedAt); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + _diagnostics.MapSuccess(advisory); + pendingMappings.Remove(documentId); + } + catch (Exception ex) + { + _logger.LogError(ex, "RU-BDU mapping failed for document {DocumentId}", documentId); + _diagnostics.MapFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task ProcessArchiveAsync( + byte[] archiveContent, + DateTimeOffset now, + HashSet pendingDocuments, + HashSet pendingMappings, + DateTimeOffset? archiveLastModified, + CancellationToken cancellationToken) + { + var added = 0; + using var archiveStream = new MemoryStream(archiveContent, writable: false); + using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read, leaveOpen: false); + var entry = archive.GetEntry("export/export.xml") ?? archive.Entries.FirstOrDefault(); + if (entry is null) + { + _logger.LogWarning("RU-BDU archive does not contain export/export.xml; skipping."); + return added; + } + + await using var entryStream = entry.Open(); + using var reader = XmlReader.Create(entryStream, new XmlReaderSettings + { + IgnoreComments = true, + IgnoreWhitespace = true, + DtdProcessing = DtdProcessing.Ignore, + CloseInput = false, + }); + + while (reader.Read()) + { + cancellationToken.ThrowIfCancellationRequested(); + if (reader.NodeType != XmlNodeType.Element || !reader.Name.Equals("vul", StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + if (RuBduXmlParser.TryParse(XNode.ReadFrom(reader) as XElement ?? new XElement("vul")) is not { } dto) + { + continue; + } + + var payload = JsonSerializer.SerializeToUtf8Bytes(dto, SerializerOptions); var sha = _hash.ComputeHashHex(payload); - var documentUri = BuildDocumentUri(dto.Identifier); - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); - if (existing is not null && string.Equals(existing.Sha256, sha, StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - var gridFsId = await _rawDocumentStorage.UploadAsync(SourceName, documentUri, payload, "application/json", null, cancellationToken).ConfigureAwait(false); - - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["ru-bdu.identifier"] = dto.Identifier, - }; - - if (!string.IsNullOrWhiteSpace(dto.Name)) - { - metadata["ru-bdu.name"] = dto.Name!; - } - - var recordId = existing?.Id ?? Guid.NewGuid(); - var record = new DocumentRecord( - recordId, - SourceName, - documentUri, - now, - sha, - DocumentStatuses.PendingParse, - "application/json", - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: archiveLastModified ?? dto.IdentifyDate, - GridFsId: gridFsId, - ExpiresAt: null); - - var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - pendingDocuments.Add(upserted.Id); - pendingMappings.Remove(upserted.Id); - added++; - - if (added >= _options.MaxVulnerabilitiesPerFetch) - { - break; - } - } - - return added; - } - - private string ResolveCacheDirectory(string? configuredPath) - { - if (!string.IsNullOrWhiteSpace(configuredPath)) - { - return Path.GetFullPath(Path.IsPathRooted(configuredPath) - ? configuredPath - : Path.Combine(AppContext.BaseDirectory, configuredPath)); - } - - return Path.Combine(AppContext.BaseDirectory, "cache", RuBduConnectorPlugin.SourceName); - } - - private void EnsureCacheDirectory() - { - try - { - Directory.CreateDirectory(_cacheDirectory); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "RU-BDU unable to ensure cache directory {CachePath}", _cacheDirectory); - } - } - - private void TryWriteCachedArchive(byte[] content) - { - try - { - Directory.CreateDirectory(Path.GetDirectoryName(_archiveCachePath)!); - File.WriteAllBytes(_archiveCachePath, content); - } - catch (Exception ex) - { - _logger.LogDebug(ex, "RU-BDU failed to write cache archive {CachePath}", _archiveCachePath); - } - } - - private bool TryReadCachedArchive(out byte[] content) - { - try - { - if (File.Exists(_archiveCachePath)) - { - content = File.ReadAllBytes(_archiveCachePath); - return true; - } - } - catch (Exception ex) - { - _logger.LogDebug(ex, "RU-BDU failed to read cache archive {CachePath}", _archiveCachePath); - } - - content = Array.Empty(); - return false; - } - - private static string BuildDocumentUri(string identifier) - { - var slug = identifier.Contains(':', StringComparison.Ordinal) - ? identifier[(identifier.IndexOf(':') + 1)..] - : identifier; - return $"https://bdu.fstec.ru/vul/{slug}"; - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? RuBduCursor.Empty : RuBduCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(RuBduCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - var completedAt = cursor.LastSuccessfulFetch ?? _timeProvider.GetUtcNow(); - return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); - } -} + var documentUri = BuildDocumentUri(dto.Identifier); + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); + if (existing is not null && string.Equals(existing.Sha256, sha, StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + var gridFsId = await _rawDocumentStorage.UploadAsync(SourceName, documentUri, payload, "application/json", null, cancellationToken).ConfigureAwait(false); + + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["ru-bdu.identifier"] = dto.Identifier, + }; + + if (!string.IsNullOrWhiteSpace(dto.Name)) + { + metadata["ru-bdu.name"] = dto.Name!; + } + + var recordId = existing?.Id ?? Guid.NewGuid(); + var record = new DocumentRecord( + recordId, + SourceName, + documentUri, + now, + sha, + DocumentStatuses.PendingParse, + "application/json", + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: archiveLastModified ?? dto.IdentifyDate, + PayloadId: gridFsId, + ExpiresAt: null); + + var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); + pendingDocuments.Add(upserted.Id); + pendingMappings.Remove(upserted.Id); + added++; + + if (added >= _options.MaxVulnerabilitiesPerFetch) + { + break; + } + } + + return added; + } + + private string ResolveCacheDirectory(string? configuredPath) + { + if (!string.IsNullOrWhiteSpace(configuredPath)) + { + return Path.GetFullPath(Path.IsPathRooted(configuredPath) + ? configuredPath + : Path.Combine(AppContext.BaseDirectory, configuredPath)); + } + + return Path.Combine(AppContext.BaseDirectory, "cache", RuBduConnectorPlugin.SourceName); + } + + private void EnsureCacheDirectory() + { + try + { + Directory.CreateDirectory(_cacheDirectory); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "RU-BDU unable to ensure cache directory {CachePath}", _cacheDirectory); + } + } + + private void TryWriteCachedArchive(byte[] content) + { + try + { + Directory.CreateDirectory(Path.GetDirectoryName(_archiveCachePath)!); + File.WriteAllBytes(_archiveCachePath, content); + } + catch (Exception ex) + { + _logger.LogDebug(ex, "RU-BDU failed to write cache archive {CachePath}", _archiveCachePath); + } + } + + private bool TryReadCachedArchive(out byte[] content) + { + try + { + if (File.Exists(_archiveCachePath)) + { + content = File.ReadAllBytes(_archiveCachePath); + return true; + } + } + catch (Exception ex) + { + _logger.LogDebug(ex, "RU-BDU failed to read cache archive {CachePath}", _archiveCachePath); + } + + content = Array.Empty(); + return false; + } + + private static string BuildDocumentUri(string identifier) + { + var slug = identifier.Contains(':', StringComparison.Ordinal) + ? identifier[(identifier.IndexOf(':') + 1)..] + : identifier; + return $"https://bdu.fstec.ru/vul/{slug}"; + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? RuBduCursor.Empty : RuBduCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(RuBduCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + var completedAt = cursor.LastSuccessfulFetch ?? _timeProvider.GetUtcNow(); + return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Nkcki/RuNkckiConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Nkcki/RuNkckiConnector.cs index 39c75e169..02a44897f 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Nkcki/RuNkckiConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Ru.Nkcki/RuNkckiConnector.cs @@ -1,9 +1,9 @@ -using System.Collections.Immutable; -using System.Collections.Generic; -using System.IO; -using System.IO.Compression; -using System.Net; -using System.Linq; +using System.Collections.Immutable; +using System.Collections.Generic; +using System.IO; +using System.IO.Compression; +using System.Net; +using System.Linq; using System.Text; using System.Text.Json; using System.Text.Json.Serialization; @@ -21,66 +21,66 @@ using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Concelier.Storage.Mongo.Dtos; using StellaOps.Plugin; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.Ru.Nkcki; - -public sealed class RuNkckiConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - }; - - private static readonly string[] ListingAcceptHeaders = - { - "text/html", - "application/xhtml+xml;q=0.9", - "text/plain;q=0.1", - }; - - private static readonly string[] BulletinAcceptHeaders = - { - "application/zip", - "application/octet-stream", - "application/x-zip-compressed", - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly RuNkckiOptions _options; - private readonly TimeProvider _timeProvider; + +namespace StellaOps.Concelier.Connector.Ru.Nkcki; + +public sealed class RuNkckiConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + }; + + private static readonly string[] ListingAcceptHeaders = + { + "text/html", + "application/xhtml+xml;q=0.9", + "text/plain;q=0.1", + }; + + private static readonly string[] BulletinAcceptHeaders = + { + "application/zip", + "application/octet-stream", + "application/x-zip-compressed", + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly RuNkckiOptions _options; + private readonly TimeProvider _timeProvider; private readonly RuNkckiDiagnostics _diagnostics; private readonly ILogger _logger; private readonly string _cacheDirectory; private readonly ICryptoHash _hash; private readonly HtmlParser _htmlParser = new(); - - public RuNkckiConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IOptions options, + + public RuNkckiConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IOptions options, RuNkckiDiagnostics diagnostics, TimeProvider? timeProvider, ILogger logger, ICryptoHash cryptoHash) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); _options.Validate(); _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); @@ -88,862 +88,862 @@ public sealed class RuNkckiConnector : IFeedConnector _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _hash = cryptoHash ?? throw new ArgumentNullException(nameof(cryptoHash)); _cacheDirectory = ResolveCacheDirectory(_options.CacheDirectory); - EnsureCacheDirectory(); - } - - public string SourceName => RuNkckiConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var knownBulletins = cursor.KnownBulletins.ToHashSet(StringComparer.OrdinalIgnoreCase); - var now = _timeProvider.GetUtcNow(); - var processed = 0; - - if (ShouldUseListingCache(cursor, now)) - { - _logger.LogDebug( - "NKCKI listing fetch skipped (cache duration {CacheDuration:c}); processing cached bulletins only", - _options.ListingCacheDuration); - - processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); - await UpdateCursorAsync(cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithKnownBulletins(NormalizeBulletins(knownBulletins)) - .WithLastListingFetch(cursor.LastListingFetchAt ?? now), cancellationToken).ConfigureAwait(false); - return; - } - - ListingFetchSummary listingSummary; - try - { - listingSummary = await LoadListingAsync(cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) - { - _logger.LogWarning(ex, "NKCKI listing fetch failed; attempting cached bulletins"); - _diagnostics.ListingFetchFailure(ex.Message); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - - processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); - await UpdateCursorAsync(cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithKnownBulletins(NormalizeBulletins(knownBulletins)) - .WithLastListingFetch(cursor.LastListingFetchAt ?? now), cancellationToken).ConfigureAwait(false); - return; - } - - var uniqueAttachments = listingSummary.Attachments - .GroupBy(static attachment => attachment.Id, StringComparer.OrdinalIgnoreCase) - .Select(static group => group.First()) - .OrderBy(static attachment => attachment.Id, StringComparer.OrdinalIgnoreCase) - .ToList(); - - var newAttachments = uniqueAttachments - .Where(attachment => !knownBulletins.Contains(attachment.Id)) - .Take(_options.MaxBulletinsPerFetch) - .ToList(); - - _diagnostics.ListingFetchSuccess(listingSummary.PagesVisited, uniqueAttachments.Count, newAttachments.Count); - - if (newAttachments.Count == 0) - { - _logger.LogDebug("NKCKI listing contained no new bulletin attachments"); - processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); - await UpdateCursorAsync(cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithKnownBulletins(NormalizeBulletins(knownBulletins)) - .WithLastListingFetch(now), cancellationToken).ConfigureAwait(false); - return; - } - - var downloaded = 0; - var cachedUsed = 0; - var failures = 0; - - foreach (var attachment in newAttachments) - { - cancellationToken.ThrowIfCancellationRequested(); - - try - { - var request = new SourceFetchRequest(RuNkckiOptions.HttpClientName, SourceName, attachment.Uri) - { - AcceptHeaders = BulletinAcceptHeaders, - TimeoutOverride = _options.RequestTimeout, - }; - - var attachmentResult = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); - if (!attachmentResult.IsSuccess || attachmentResult.Content is null) - { - if (TryReadCachedBulletin(attachment.Id, out var cachedBytes)) - { - _diagnostics.BulletinFetchCached(); - cachedUsed++; - _logger.LogWarning("NKCKI bulletin {BulletinId} unavailable (status={Status}); using cached artefact", attachment.Id, attachmentResult.StatusCode); - processed = await ProcessBulletinEntriesAsync(cachedBytes, attachment.Id, pendingDocuments, pendingMappings, now, processed, cancellationToken).ConfigureAwait(false); - knownBulletins.Add(attachment.Id); - } - else - { - _diagnostics.BulletinFetchFailure(attachmentResult.StatusCode.ToString()); - failures++; - _logger.LogWarning("NKCKI bulletin {BulletinId} returned no content (status={Status})", attachment.Id, attachmentResult.StatusCode); - } - - continue; - } - - _diagnostics.BulletinFetchSuccess(); - downloaded++; - TryWriteCachedBulletin(attachment.Id, attachmentResult.Content); - processed = await ProcessBulletinEntriesAsync(attachmentResult.Content, attachment.Id, pendingDocuments, pendingMappings, now, processed, cancellationToken).ConfigureAwait(false); - knownBulletins.Add(attachment.Id); - } - catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) - { - if (TryReadCachedBulletin(attachment.Id, out var cachedBytes)) - { - _diagnostics.BulletinFetchCached(); - cachedUsed++; - _logger.LogWarning(ex, "NKCKI bulletin fetch failed for {BulletinId}; using cached artefact", attachment.Id); - processed = await ProcessBulletinEntriesAsync(cachedBytes, attachment.Id, pendingDocuments, pendingMappings, now, processed, cancellationToken).ConfigureAwait(false); - knownBulletins.Add(attachment.Id); - } - else - { - _diagnostics.BulletinFetchFailure(ex.Message); - failures++; - _logger.LogWarning(ex, "NKCKI bulletin fetch failed for {BulletinId}", attachment.Id); - await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - if (processed >= _options.MaxVulnerabilitiesPerFetch) - { - break; - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - try - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - catch (TaskCanceledException) - { - break; - } - } - } - - if (processed < _options.MaxVulnerabilitiesPerFetch) - { - processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); - } - - var normalizedBulletins = NormalizeBulletins(knownBulletins); - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithKnownBulletins(normalizedBulletins) - .WithLastListingFetch(now); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - - _logger.LogInformation( - "NKCKI fetch complete: new bulletins {Downloaded}, cached bulletins {Cached}, failures {Failures}, processed entries {Processed}, pending documents {PendingDocuments}, pending mappings {PendingMappings}", - downloaded, - cachedUsed, - failures, - processed, - pendingDocuments.Count, - pendingMappings.Count); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("NKCKI document {DocumentId} missing GridFS payload", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - byte[] payload; - try - { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "NKCKI unable to download raw document {DocumentId}", documentId); - throw; - } - - RuNkckiVulnerabilityDto? dto; - try - { - dto = JsonSerializer.Deserialize(payload, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "NKCKI failed to deserialize document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (dto is null) - { - _logger.LogWarning("NKCKI document {DocumentId} produced null DTO", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var bson = MongoDB.Bson.BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "ru-nkcki.v1", bson, _timeProvider.GetUtcNow()); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - _logger.LogWarning("NKCKI document {DocumentId} missing DTO payload", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - RuNkckiVulnerabilityDto dto; - try - { - dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToString(), SerializerOptions) ?? throw new InvalidOperationException("DTO deserialized to null"); - } - catch (Exception ex) - { - _logger.LogError(ex, "NKCKI failed to deserialize DTO for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - try - { - var advisory = RuNkckiMapper.Map(dto, document, dtoRecord.ValidatedAt); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - catch (Exception ex) - { - _logger.LogError(ex, "NKCKI mapping failed for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - } - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task ProcessCachedBulletinsAsync( - HashSet pendingDocuments, - HashSet pendingMappings, - HashSet knownBulletins, - DateTimeOffset now, - int processed, - CancellationToken cancellationToken) - { - if (!Directory.Exists(_cacheDirectory)) - { - return processed; - } - - var updated = processed; - var cacheFiles = Directory - .EnumerateFiles(_cacheDirectory, "*.json.zip", SearchOption.TopDirectoryOnly) - .OrderBy(static path => path, StringComparer.OrdinalIgnoreCase) - .ToList(); - - foreach (var filePath in cacheFiles) - { - cancellationToken.ThrowIfCancellationRequested(); - - var bulletinId = ExtractBulletinIdFromCachePath(filePath); - if (string.IsNullOrWhiteSpace(bulletinId) || knownBulletins.Contains(bulletinId)) - { - continue; - } - - byte[] content; - try - { - content = File.ReadAllBytes(filePath); - } - catch (Exception ex) - { - _logger.LogDebug(ex, "NKCKI failed to read cached bulletin at {CachePath}", filePath); - continue; - } - - _diagnostics.BulletinFetchCached(); - updated = await ProcessBulletinEntriesAsync(content, bulletinId, pendingDocuments, pendingMappings, now, updated, cancellationToken).ConfigureAwait(false); - knownBulletins.Add(bulletinId); - - if (updated >= _options.MaxVulnerabilitiesPerFetch) - { - break; - } - } - - return updated; - } - - private async Task ProcessBulletinEntriesAsync( - byte[] content, - string bulletinId, - HashSet pendingDocuments, - HashSet pendingMappings, - DateTimeOffset now, - int processed, - CancellationToken cancellationToken) - { - if (content.Length == 0) - { - return processed; - } - - var updated = processed; - using var archiveStream = new MemoryStream(content, writable: false); - using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read, leaveOpen: false); - - foreach (var entry in archive.Entries.OrderBy(static e => e.FullName, StringComparer.OrdinalIgnoreCase)) - { - cancellationToken.ThrowIfCancellationRequested(); - - if (!entry.FullName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - using var entryStream = entry.Open(); - using var buffer = new MemoryStream(); - await entryStream.CopyToAsync(buffer, cancellationToken).ConfigureAwait(false); - - if (buffer.Length == 0) - { - continue; - } - - buffer.Position = 0; - - using var document = await JsonDocument.ParseAsync(buffer, cancellationToken: cancellationToken).ConfigureAwait(false); - updated = await ProcessBulletinJsonElementAsync(document.RootElement, entry.FullName, bulletinId, pendingDocuments, pendingMappings, now, updated, cancellationToken).ConfigureAwait(false); - - if (updated >= _options.MaxVulnerabilitiesPerFetch) - { - break; - } - } - - var delta = updated - processed; - if (delta > 0) - { - _diagnostics.EntriesProcessed(delta); - } - - return updated; - } - - private async Task ProcessBulletinJsonElementAsync( - JsonElement element, - string entryName, - string bulletinId, - HashSet pendingDocuments, - HashSet pendingMappings, - DateTimeOffset now, - int processed, - CancellationToken cancellationToken) - { - var updated = processed; - - switch (element.ValueKind) - { - case JsonValueKind.Array: - foreach (var child in element.EnumerateArray()) - { - cancellationToken.ThrowIfCancellationRequested(); - - if (updated >= _options.MaxVulnerabilitiesPerFetch) - { - break; - } - - if (child.ValueKind != JsonValueKind.Object) - { - continue; - } - - if (await ProcessVulnerabilityObjectAsync(child, entryName, bulletinId, pendingDocuments, pendingMappings, now, cancellationToken).ConfigureAwait(false)) - { - updated++; - } - } - - break; - - case JsonValueKind.Object: - if (await ProcessVulnerabilityObjectAsync(element, entryName, bulletinId, pendingDocuments, pendingMappings, now, cancellationToken).ConfigureAwait(false)) - { - updated++; - } - - break; - } - - return updated; - } - - private async Task ProcessVulnerabilityObjectAsync( - JsonElement element, - string entryName, - string bulletinId, - HashSet pendingDocuments, - HashSet pendingMappings, - DateTimeOffset now, - CancellationToken cancellationToken) - { - RuNkckiVulnerabilityDto dto; - try - { - dto = RuNkckiJsonParser.Parse(element); - } - catch (Exception ex) - { - _logger.LogDebug(ex, "NKCKI failed to parse vulnerability in bulletin {BulletinId} entry {Entry}", bulletinId, entryName); - return false; - } - - var payload = JsonSerializer.SerializeToUtf8Bytes(dto, SerializerOptions); + EnsureCacheDirectory(); + } + + public string SourceName => RuNkckiConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var knownBulletins = cursor.KnownBulletins.ToHashSet(StringComparer.OrdinalIgnoreCase); + var now = _timeProvider.GetUtcNow(); + var processed = 0; + + if (ShouldUseListingCache(cursor, now)) + { + _logger.LogDebug( + "NKCKI listing fetch skipped (cache duration {CacheDuration:c}); processing cached bulletins only", + _options.ListingCacheDuration); + + processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); + await UpdateCursorAsync(cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithKnownBulletins(NormalizeBulletins(knownBulletins)) + .WithLastListingFetch(cursor.LastListingFetchAt ?? now), cancellationToken).ConfigureAwait(false); + return; + } + + ListingFetchSummary listingSummary; + try + { + listingSummary = await LoadListingAsync(cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) + { + _logger.LogWarning(ex, "NKCKI listing fetch failed; attempting cached bulletins"); + _diagnostics.ListingFetchFailure(ex.Message); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + + processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); + await UpdateCursorAsync(cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithKnownBulletins(NormalizeBulletins(knownBulletins)) + .WithLastListingFetch(cursor.LastListingFetchAt ?? now), cancellationToken).ConfigureAwait(false); + return; + } + + var uniqueAttachments = listingSummary.Attachments + .GroupBy(static attachment => attachment.Id, StringComparer.OrdinalIgnoreCase) + .Select(static group => group.First()) + .OrderBy(static attachment => attachment.Id, StringComparer.OrdinalIgnoreCase) + .ToList(); + + var newAttachments = uniqueAttachments + .Where(attachment => !knownBulletins.Contains(attachment.Id)) + .Take(_options.MaxBulletinsPerFetch) + .ToList(); + + _diagnostics.ListingFetchSuccess(listingSummary.PagesVisited, uniqueAttachments.Count, newAttachments.Count); + + if (newAttachments.Count == 0) + { + _logger.LogDebug("NKCKI listing contained no new bulletin attachments"); + processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); + await UpdateCursorAsync(cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithKnownBulletins(NormalizeBulletins(knownBulletins)) + .WithLastListingFetch(now), cancellationToken).ConfigureAwait(false); + return; + } + + var downloaded = 0; + var cachedUsed = 0; + var failures = 0; + + foreach (var attachment in newAttachments) + { + cancellationToken.ThrowIfCancellationRequested(); + + try + { + var request = new SourceFetchRequest(RuNkckiOptions.HttpClientName, SourceName, attachment.Uri) + { + AcceptHeaders = BulletinAcceptHeaders, + TimeoutOverride = _options.RequestTimeout, + }; + + var attachmentResult = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); + if (!attachmentResult.IsSuccess || attachmentResult.Content is null) + { + if (TryReadCachedBulletin(attachment.Id, out var cachedBytes)) + { + _diagnostics.BulletinFetchCached(); + cachedUsed++; + _logger.LogWarning("NKCKI bulletin {BulletinId} unavailable (status={Status}); using cached artefact", attachment.Id, attachmentResult.StatusCode); + processed = await ProcessBulletinEntriesAsync(cachedBytes, attachment.Id, pendingDocuments, pendingMappings, now, processed, cancellationToken).ConfigureAwait(false); + knownBulletins.Add(attachment.Id); + } + else + { + _diagnostics.BulletinFetchFailure(attachmentResult.StatusCode.ToString()); + failures++; + _logger.LogWarning("NKCKI bulletin {BulletinId} returned no content (status={Status})", attachment.Id, attachmentResult.StatusCode); + } + + continue; + } + + _diagnostics.BulletinFetchSuccess(); + downloaded++; + TryWriteCachedBulletin(attachment.Id, attachmentResult.Content); + processed = await ProcessBulletinEntriesAsync(attachmentResult.Content, attachment.Id, pendingDocuments, pendingMappings, now, processed, cancellationToken).ConfigureAwait(false); + knownBulletins.Add(attachment.Id); + } + catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException) + { + if (TryReadCachedBulletin(attachment.Id, out var cachedBytes)) + { + _diagnostics.BulletinFetchCached(); + cachedUsed++; + _logger.LogWarning(ex, "NKCKI bulletin fetch failed for {BulletinId}; using cached artefact", attachment.Id); + processed = await ProcessBulletinEntriesAsync(cachedBytes, attachment.Id, pendingDocuments, pendingMappings, now, processed, cancellationToken).ConfigureAwait(false); + knownBulletins.Add(attachment.Id); + } + else + { + _diagnostics.BulletinFetchFailure(ex.Message); + failures++; + _logger.LogWarning(ex, "NKCKI bulletin fetch failed for {BulletinId}", attachment.Id); + await _stateRepository.MarkFailureAsync(SourceName, now, _options.FailureBackoff, ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + if (processed >= _options.MaxVulnerabilitiesPerFetch) + { + break; + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + try + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + catch (TaskCanceledException) + { + break; + } + } + } + + if (processed < _options.MaxVulnerabilitiesPerFetch) + { + processed = await ProcessCachedBulletinsAsync(pendingDocuments, pendingMappings, knownBulletins, now, processed, cancellationToken).ConfigureAwait(false); + } + + var normalizedBulletins = NormalizeBulletins(knownBulletins); + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithKnownBulletins(normalizedBulletins) + .WithLastListingFetch(now); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + + _logger.LogInformation( + "NKCKI fetch complete: new bulletins {Downloaded}, cached bulletins {Cached}, failures {Failures}, processed entries {Processed}, pending documents {PendingDocuments}, pending mappings {PendingMappings}", + downloaded, + cachedUsed, + failures, + processed, + pendingDocuments.Count, + pendingMappings.Count); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("NKCKI document {DocumentId} missing GridFS payload", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + byte[] payload; + try + { + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "NKCKI unable to download raw document {DocumentId}", documentId); + throw; + } + + RuNkckiVulnerabilityDto? dto; + try + { + dto = JsonSerializer.Deserialize(payload, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "NKCKI failed to deserialize document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (dto is null) + { + _logger.LogWarning("NKCKI document {DocumentId} produced null DTO", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var bson = MongoDB.Bson.BsonDocument.Parse(JsonSerializer.Serialize(dto, SerializerOptions)); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "ru-nkcki.v1", bson, _timeProvider.GetUtcNow()); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + _logger.LogWarning("NKCKI document {DocumentId} missing DTO payload", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + RuNkckiVulnerabilityDto dto; + try + { + dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToString(), SerializerOptions) ?? throw new InvalidOperationException("DTO deserialized to null"); + } + catch (Exception ex) + { + _logger.LogError(ex, "NKCKI failed to deserialize DTO for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + try + { + var advisory = RuNkckiMapper.Map(dto, document, dtoRecord.ValidatedAt); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + catch (Exception ex) + { + _logger.LogError(ex, "NKCKI mapping failed for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + } + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task ProcessCachedBulletinsAsync( + HashSet pendingDocuments, + HashSet pendingMappings, + HashSet knownBulletins, + DateTimeOffset now, + int processed, + CancellationToken cancellationToken) + { + if (!Directory.Exists(_cacheDirectory)) + { + return processed; + } + + var updated = processed; + var cacheFiles = Directory + .EnumerateFiles(_cacheDirectory, "*.json.zip", SearchOption.TopDirectoryOnly) + .OrderBy(static path => path, StringComparer.OrdinalIgnoreCase) + .ToList(); + + foreach (var filePath in cacheFiles) + { + cancellationToken.ThrowIfCancellationRequested(); + + var bulletinId = ExtractBulletinIdFromCachePath(filePath); + if (string.IsNullOrWhiteSpace(bulletinId) || knownBulletins.Contains(bulletinId)) + { + continue; + } + + byte[] content; + try + { + content = File.ReadAllBytes(filePath); + } + catch (Exception ex) + { + _logger.LogDebug(ex, "NKCKI failed to read cached bulletin at {CachePath}", filePath); + continue; + } + + _diagnostics.BulletinFetchCached(); + updated = await ProcessBulletinEntriesAsync(content, bulletinId, pendingDocuments, pendingMappings, now, updated, cancellationToken).ConfigureAwait(false); + knownBulletins.Add(bulletinId); + + if (updated >= _options.MaxVulnerabilitiesPerFetch) + { + break; + } + } + + return updated; + } + + private async Task ProcessBulletinEntriesAsync( + byte[] content, + string bulletinId, + HashSet pendingDocuments, + HashSet pendingMappings, + DateTimeOffset now, + int processed, + CancellationToken cancellationToken) + { + if (content.Length == 0) + { + return processed; + } + + var updated = processed; + using var archiveStream = new MemoryStream(content, writable: false); + using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read, leaveOpen: false); + + foreach (var entry in archive.Entries.OrderBy(static e => e.FullName, StringComparer.OrdinalIgnoreCase)) + { + cancellationToken.ThrowIfCancellationRequested(); + + if (!entry.FullName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + using var entryStream = entry.Open(); + using var buffer = new MemoryStream(); + await entryStream.CopyToAsync(buffer, cancellationToken).ConfigureAwait(false); + + if (buffer.Length == 0) + { + continue; + } + + buffer.Position = 0; + + using var document = await JsonDocument.ParseAsync(buffer, cancellationToken: cancellationToken).ConfigureAwait(false); + updated = await ProcessBulletinJsonElementAsync(document.RootElement, entry.FullName, bulletinId, pendingDocuments, pendingMappings, now, updated, cancellationToken).ConfigureAwait(false); + + if (updated >= _options.MaxVulnerabilitiesPerFetch) + { + break; + } + } + + var delta = updated - processed; + if (delta > 0) + { + _diagnostics.EntriesProcessed(delta); + } + + return updated; + } + + private async Task ProcessBulletinJsonElementAsync( + JsonElement element, + string entryName, + string bulletinId, + HashSet pendingDocuments, + HashSet pendingMappings, + DateTimeOffset now, + int processed, + CancellationToken cancellationToken) + { + var updated = processed; + + switch (element.ValueKind) + { + case JsonValueKind.Array: + foreach (var child in element.EnumerateArray()) + { + cancellationToken.ThrowIfCancellationRequested(); + + if (updated >= _options.MaxVulnerabilitiesPerFetch) + { + break; + } + + if (child.ValueKind != JsonValueKind.Object) + { + continue; + } + + if (await ProcessVulnerabilityObjectAsync(child, entryName, bulletinId, pendingDocuments, pendingMappings, now, cancellationToken).ConfigureAwait(false)) + { + updated++; + } + } + + break; + + case JsonValueKind.Object: + if (await ProcessVulnerabilityObjectAsync(element, entryName, bulletinId, pendingDocuments, pendingMappings, now, cancellationToken).ConfigureAwait(false)) + { + updated++; + } + + break; + } + + return updated; + } + + private async Task ProcessVulnerabilityObjectAsync( + JsonElement element, + string entryName, + string bulletinId, + HashSet pendingDocuments, + HashSet pendingMappings, + DateTimeOffset now, + CancellationToken cancellationToken) + { + RuNkckiVulnerabilityDto dto; + try + { + dto = RuNkckiJsonParser.Parse(element); + } + catch (Exception ex) + { + _logger.LogDebug(ex, "NKCKI failed to parse vulnerability in bulletin {BulletinId} entry {Entry}", bulletinId, entryName); + return false; + } + + var payload = JsonSerializer.SerializeToUtf8Bytes(dto, SerializerOptions); var sha = _hash.ComputeHashHex(payload); - var documentUri = BuildDocumentUri(dto); - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); - if (existing is not null && string.Equals(existing.Sha256, sha, StringComparison.OrdinalIgnoreCase)) - { - return false; - } - - var gridFsId = await _rawDocumentStorage.UploadAsync(SourceName, documentUri, payload, "application/json", null, cancellationToken).ConfigureAwait(false); - - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["ru-nkcki.bulletin"] = bulletinId, - ["ru-nkcki.entry"] = entryName, - }; - - if (!string.IsNullOrWhiteSpace(dto.FstecId)) - { - metadata["ru-nkcki.fstec_id"] = dto.FstecId!; - } - - if (!string.IsNullOrWhiteSpace(dto.MitreId)) - { - metadata["ru-nkcki.mitre_id"] = dto.MitreId!; - } - - var recordId = existing?.Id ?? Guid.NewGuid(); - var lastModified = dto.DateUpdated ?? dto.DatePublished; - var record = new DocumentRecord( - recordId, - SourceName, - documentUri, - now, - sha, - DocumentStatuses.PendingParse, - "application/json", - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: lastModified, - GridFsId: gridFsId, - ExpiresAt: null); - - var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - pendingDocuments.Add(upserted.Id); - pendingMappings.Remove(upserted.Id); - return true; - } - - private Task FetchListingPageAsync(Uri pageUri, CancellationToken cancellationToken) - { - var request = new SourceFetchRequest(RuNkckiOptions.HttpClientName, SourceName, pageUri) - { - AcceptHeaders = ListingAcceptHeaders, - TimeoutOverride = _options.RequestTimeout, - }; - - return _fetchService.FetchContentAsync(request, cancellationToken); - } - - private async Task ParseListingAsync(Uri pageUri, byte[] content, CancellationToken cancellationToken) - { - var html = Encoding.UTF8.GetString(content); - var document = await _htmlParser.ParseDocumentAsync(html, cancellationToken).ConfigureAwait(false); - var attachments = new List(); - var pagination = new List(); - - foreach (var anchor in document.QuerySelectorAll("a[href$='.json.zip']")) - { - var href = anchor.GetAttribute("href"); - if (string.IsNullOrWhiteSpace(href)) - { - continue; - } - - if (!Uri.TryCreate(pageUri, href, out var absoluteUri)) - { - continue; - } - - var id = DeriveBulletinId(absoluteUri); - if (string.IsNullOrWhiteSpace(id)) - { - continue; - } - - var title = anchor.GetAttribute("title"); - if (string.IsNullOrWhiteSpace(title)) - { - title = anchor.TextContent?.Trim(); - } - - attachments.Add(new BulletinAttachment(id, absoluteUri, title ?? id)); - } - - foreach (var anchor in document.QuerySelectorAll("a[href]")) - { - var href = anchor.GetAttribute("href"); - if (string.IsNullOrWhiteSpace(href)) - { - continue; - } - - if (!href.Contains("PAGEN", StringComparison.OrdinalIgnoreCase) - && !href.Contains("page=", StringComparison.OrdinalIgnoreCase)) - { - continue; - } - - if (Uri.TryCreate(pageUri, href, out var absoluteUri)) - { - pagination.Add(absoluteUri); - } - } - - var uniquePagination = pagination - .DistinctBy(static uri => uri.AbsoluteUri, StringComparer.OrdinalIgnoreCase) - .Take(_options.MaxListingPagesPerFetch) - .ToList(); - - return new ListingPageResult(attachments, uniquePagination); - } - - private static string DeriveBulletinId(Uri uri) - { - var fileName = Path.GetFileName(uri.AbsolutePath); - if (string.IsNullOrWhiteSpace(fileName)) - { - return Guid.NewGuid().ToString("N"); - } - - if (fileName.EndsWith(".zip", StringComparison.OrdinalIgnoreCase)) - { - fileName = fileName[..^4]; - } - - if (fileName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) - { - fileName = fileName[..^5]; - } - - return fileName.Replace('_', '-'); - } - - private static string BuildDocumentUri(RuNkckiVulnerabilityDto dto) - { - if (!string.IsNullOrWhiteSpace(dto.FstecId)) - { - var slug = dto.FstecId.Contains(':', StringComparison.Ordinal) - ? dto.FstecId[(dto.FstecId.IndexOf(':') + 1)..] - : dto.FstecId; - return $"https://cert.gov.ru/materialy/uyazvimosti/{slug}"; - } - - if (!string.IsNullOrWhiteSpace(dto.MitreId)) - { - return $"https://nvd.nist.gov/vuln/detail/{dto.MitreId}"; - } - - return $"https://cert.gov.ru/materialy/uyazvimosti/{Guid.NewGuid():N}"; - } - - private string ResolveCacheDirectory(string? configuredPath) - { - if (!string.IsNullOrWhiteSpace(configuredPath)) - { - return Path.GetFullPath(Path.IsPathRooted(configuredPath) - ? configuredPath - : Path.Combine(AppContext.BaseDirectory, configuredPath)); - } - - return Path.Combine(AppContext.BaseDirectory, "cache", RuNkckiConnectorPlugin.SourceName); - } - - private void EnsureCacheDirectory() - { - try - { - Directory.CreateDirectory(_cacheDirectory); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "NKCKI unable to ensure cache directory {CachePath}", _cacheDirectory); - } - } - - private string GetBulletinCachePath(string bulletinId) - { - var fileStem = string.IsNullOrWhiteSpace(bulletinId) - ? Guid.NewGuid().ToString("N") - : Uri.EscapeDataString(bulletinId); - return Path.Combine(_cacheDirectory, $"{fileStem}.json.zip"); - } - - private static string ExtractBulletinIdFromCachePath(string path) - { - if (string.IsNullOrWhiteSpace(path)) - { - return string.Empty; - } - - var fileName = Path.GetFileName(path); - if (fileName.EndsWith(".zip", StringComparison.OrdinalIgnoreCase)) - { - fileName = fileName[..^4]; - } - - if (fileName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) - { - fileName = fileName[..^5]; - } - - return Uri.UnescapeDataString(fileName); - } - - private void TryWriteCachedBulletin(string bulletinId, byte[] content) - { - try - { - var cachePath = GetBulletinCachePath(bulletinId); - Directory.CreateDirectory(Path.GetDirectoryName(cachePath)!); - File.WriteAllBytes(cachePath, content); - } - catch (Exception ex) - { - _logger.LogDebug(ex, "NKCKI failed to cache bulletin {BulletinId}", bulletinId); - } - } - - private bool TryReadCachedBulletin(string bulletinId, out byte[] content) - { - var cachePath = GetBulletinCachePath(bulletinId); - try - { - if (File.Exists(cachePath)) - { - content = File.ReadAllBytes(cachePath); - return true; - } - } - catch (Exception ex) - { - _logger.LogDebug(ex, "NKCKI failed to read cached bulletin {BulletinId}", bulletinId); - } - - content = Array.Empty(); - return false; - } - - private IReadOnlyCollection NormalizeBulletins(IEnumerable bulletins) - { - var normalized = (bulletins ?? Enumerable.Empty()) - .Where(static id => !string.IsNullOrWhiteSpace(id)) - .Distinct(StringComparer.OrdinalIgnoreCase) - .OrderBy(static id => id, StringComparer.OrdinalIgnoreCase) - .ToList(); - - if (normalized.Count <= _options.KnownBulletinCapacity) - { - return normalized.ToArray(); - } - - var skip = normalized.Count - _options.KnownBulletinCapacity; - return normalized.Skip(skip).ToArray(); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? RuNkckiCursor.Empty : RuNkckiCursor.FromBson(state.Cursor); - } - - private Task UpdateCursorAsync(RuNkckiCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - var completedAt = cursor.LastListingFetchAt ?? _timeProvider.GetUtcNow(); - return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); - } - - private readonly record struct ListingFetchSummary(IReadOnlyList Attachments, int PagesVisited); - - private readonly record struct ListingPageResult(IReadOnlyList Attachments, IReadOnlyList PaginationLinks); - - private readonly record struct BulletinAttachment(string Id, Uri Uri, string Title); - - private bool ShouldUseListingCache(RuNkckiCursor cursor, DateTimeOffset now) - { - if (!cursor.LastListingFetchAt.HasValue) - { - return false; - } - - var age = now - cursor.LastListingFetchAt.Value; - return age < _options.ListingCacheDuration; - } - - private async Task LoadListingAsync(CancellationToken cancellationToken) - { - var attachments = new List(); - var visited = 0; - var visitedUris = new HashSet(StringComparer.OrdinalIgnoreCase); - var queue = new Queue(); - queue.Enqueue(_options.ListingUri); - - while (queue.Count > 0 && visited < _options.MaxListingPagesPerFetch) - { - cancellationToken.ThrowIfCancellationRequested(); - - var pageUri = queue.Dequeue(); - if (!visitedUris.Add(pageUri.AbsoluteUri)) - { - continue; - } - - _diagnostics.ListingFetchAttempt(); - - var listingResult = await FetchListingPageAsync(pageUri, cancellationToken).ConfigureAwait(false); - if (!listingResult.IsSuccess || listingResult.Content is null) - { - _diagnostics.ListingFetchFailure(listingResult.StatusCode.ToString()); - _logger.LogWarning("NKCKI listing page {ListingUri} returned no content (status={Status})", pageUri, listingResult.StatusCode); - continue; - } - - visited++; - - var page = await ParseListingAsync(pageUri, listingResult.Content, cancellationToken).ConfigureAwait(false); - attachments.AddRange(page.Attachments); - - foreach (var link in page.PaginationLinks) - { - if (!visitedUris.Contains(link.AbsoluteUri) && queue.Count + visitedUris.Count < _options.MaxListingPagesPerFetch) - { - queue.Enqueue(link); - } - } - - if (attachments.Count >= _options.MaxBulletinsPerFetch * 2) - { - break; - } - } - - return new ListingFetchSummary(attachments, visited); - } -} + var documentUri = BuildDocumentUri(dto); + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, documentUri, cancellationToken).ConfigureAwait(false); + if (existing is not null && string.Equals(existing.Sha256, sha, StringComparison.OrdinalIgnoreCase)) + { + return false; + } + + var gridFsId = await _rawDocumentStorage.UploadAsync(SourceName, documentUri, payload, "application/json", null, cancellationToken).ConfigureAwait(false); + + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["ru-nkcki.bulletin"] = bulletinId, + ["ru-nkcki.entry"] = entryName, + }; + + if (!string.IsNullOrWhiteSpace(dto.FstecId)) + { + metadata["ru-nkcki.fstec_id"] = dto.FstecId!; + } + + if (!string.IsNullOrWhiteSpace(dto.MitreId)) + { + metadata["ru-nkcki.mitre_id"] = dto.MitreId!; + } + + var recordId = existing?.Id ?? Guid.NewGuid(); + var lastModified = dto.DateUpdated ?? dto.DatePublished; + var record = new DocumentRecord( + recordId, + SourceName, + documentUri, + now, + sha, + DocumentStatuses.PendingParse, + "application/json", + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: lastModified, + PayloadId: gridFsId, + ExpiresAt: null); + + var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); + pendingDocuments.Add(upserted.Id); + pendingMappings.Remove(upserted.Id); + return true; + } + + private Task FetchListingPageAsync(Uri pageUri, CancellationToken cancellationToken) + { + var request = new SourceFetchRequest(RuNkckiOptions.HttpClientName, SourceName, pageUri) + { + AcceptHeaders = ListingAcceptHeaders, + TimeoutOverride = _options.RequestTimeout, + }; + + return _fetchService.FetchContentAsync(request, cancellationToken); + } + + private async Task ParseListingAsync(Uri pageUri, byte[] content, CancellationToken cancellationToken) + { + var html = Encoding.UTF8.GetString(content); + var document = await _htmlParser.ParseDocumentAsync(html, cancellationToken).ConfigureAwait(false); + var attachments = new List(); + var pagination = new List(); + + foreach (var anchor in document.QuerySelectorAll("a[href$='.json.zip']")) + { + var href = anchor.GetAttribute("href"); + if (string.IsNullOrWhiteSpace(href)) + { + continue; + } + + if (!Uri.TryCreate(pageUri, href, out var absoluteUri)) + { + continue; + } + + var id = DeriveBulletinId(absoluteUri); + if (string.IsNullOrWhiteSpace(id)) + { + continue; + } + + var title = anchor.GetAttribute("title"); + if (string.IsNullOrWhiteSpace(title)) + { + title = anchor.TextContent?.Trim(); + } + + attachments.Add(new BulletinAttachment(id, absoluteUri, title ?? id)); + } + + foreach (var anchor in document.QuerySelectorAll("a[href]")) + { + var href = anchor.GetAttribute("href"); + if (string.IsNullOrWhiteSpace(href)) + { + continue; + } + + if (!href.Contains("PAGEN", StringComparison.OrdinalIgnoreCase) + && !href.Contains("page=", StringComparison.OrdinalIgnoreCase)) + { + continue; + } + + if (Uri.TryCreate(pageUri, href, out var absoluteUri)) + { + pagination.Add(absoluteUri); + } + } + + var uniquePagination = pagination + .DistinctBy(static uri => uri.AbsoluteUri, StringComparer.OrdinalIgnoreCase) + .Take(_options.MaxListingPagesPerFetch) + .ToList(); + + return new ListingPageResult(attachments, uniquePagination); + } + + private static string DeriveBulletinId(Uri uri) + { + var fileName = Path.GetFileName(uri.AbsolutePath); + if (string.IsNullOrWhiteSpace(fileName)) + { + return Guid.NewGuid().ToString("N"); + } + + if (fileName.EndsWith(".zip", StringComparison.OrdinalIgnoreCase)) + { + fileName = fileName[..^4]; + } + + if (fileName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) + { + fileName = fileName[..^5]; + } + + return fileName.Replace('_', '-'); + } + + private static string BuildDocumentUri(RuNkckiVulnerabilityDto dto) + { + if (!string.IsNullOrWhiteSpace(dto.FstecId)) + { + var slug = dto.FstecId.Contains(':', StringComparison.Ordinal) + ? dto.FstecId[(dto.FstecId.IndexOf(':') + 1)..] + : dto.FstecId; + return $"https://cert.gov.ru/materialy/uyazvimosti/{slug}"; + } + + if (!string.IsNullOrWhiteSpace(dto.MitreId)) + { + return $"https://nvd.nist.gov/vuln/detail/{dto.MitreId}"; + } + + return $"https://cert.gov.ru/materialy/uyazvimosti/{Guid.NewGuid():N}"; + } + + private string ResolveCacheDirectory(string? configuredPath) + { + if (!string.IsNullOrWhiteSpace(configuredPath)) + { + return Path.GetFullPath(Path.IsPathRooted(configuredPath) + ? configuredPath + : Path.Combine(AppContext.BaseDirectory, configuredPath)); + } + + return Path.Combine(AppContext.BaseDirectory, "cache", RuNkckiConnectorPlugin.SourceName); + } + + private void EnsureCacheDirectory() + { + try + { + Directory.CreateDirectory(_cacheDirectory); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "NKCKI unable to ensure cache directory {CachePath}", _cacheDirectory); + } + } + + private string GetBulletinCachePath(string bulletinId) + { + var fileStem = string.IsNullOrWhiteSpace(bulletinId) + ? Guid.NewGuid().ToString("N") + : Uri.EscapeDataString(bulletinId); + return Path.Combine(_cacheDirectory, $"{fileStem}.json.zip"); + } + + private static string ExtractBulletinIdFromCachePath(string path) + { + if (string.IsNullOrWhiteSpace(path)) + { + return string.Empty; + } + + var fileName = Path.GetFileName(path); + if (fileName.EndsWith(".zip", StringComparison.OrdinalIgnoreCase)) + { + fileName = fileName[..^4]; + } + + if (fileName.EndsWith(".json", StringComparison.OrdinalIgnoreCase)) + { + fileName = fileName[..^5]; + } + + return Uri.UnescapeDataString(fileName); + } + + private void TryWriteCachedBulletin(string bulletinId, byte[] content) + { + try + { + var cachePath = GetBulletinCachePath(bulletinId); + Directory.CreateDirectory(Path.GetDirectoryName(cachePath)!); + File.WriteAllBytes(cachePath, content); + } + catch (Exception ex) + { + _logger.LogDebug(ex, "NKCKI failed to cache bulletin {BulletinId}", bulletinId); + } + } + + private bool TryReadCachedBulletin(string bulletinId, out byte[] content) + { + var cachePath = GetBulletinCachePath(bulletinId); + try + { + if (File.Exists(cachePath)) + { + content = File.ReadAllBytes(cachePath); + return true; + } + } + catch (Exception ex) + { + _logger.LogDebug(ex, "NKCKI failed to read cached bulletin {BulletinId}", bulletinId); + } + + content = Array.Empty(); + return false; + } + + private IReadOnlyCollection NormalizeBulletins(IEnumerable bulletins) + { + var normalized = (bulletins ?? Enumerable.Empty()) + .Where(static id => !string.IsNullOrWhiteSpace(id)) + .Distinct(StringComparer.OrdinalIgnoreCase) + .OrderBy(static id => id, StringComparer.OrdinalIgnoreCase) + .ToList(); + + if (normalized.Count <= _options.KnownBulletinCapacity) + { + return normalized.ToArray(); + } + + var skip = normalized.Count - _options.KnownBulletinCapacity; + return normalized.Skip(skip).ToArray(); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? RuNkckiCursor.Empty : RuNkckiCursor.FromBson(state.Cursor); + } + + private Task UpdateCursorAsync(RuNkckiCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + var completedAt = cursor.LastListingFetchAt ?? _timeProvider.GetUtcNow(); + return _stateRepository.UpdateCursorAsync(SourceName, document, completedAt, cancellationToken); + } + + private readonly record struct ListingFetchSummary(IReadOnlyList Attachments, int PagesVisited); + + private readonly record struct ListingPageResult(IReadOnlyList Attachments, IReadOnlyList PaginationLinks); + + private readonly record struct BulletinAttachment(string Id, Uri Uri, string Title); + + private bool ShouldUseListingCache(RuNkckiCursor cursor, DateTimeOffset now) + { + if (!cursor.LastListingFetchAt.HasValue) + { + return false; + } + + var age = now - cursor.LastListingFetchAt.Value; + return age < _options.ListingCacheDuration; + } + + private async Task LoadListingAsync(CancellationToken cancellationToken) + { + var attachments = new List(); + var visited = 0; + var visitedUris = new HashSet(StringComparer.OrdinalIgnoreCase); + var queue = new Queue(); + queue.Enqueue(_options.ListingUri); + + while (queue.Count > 0 && visited < _options.MaxListingPagesPerFetch) + { + cancellationToken.ThrowIfCancellationRequested(); + + var pageUri = queue.Dequeue(); + if (!visitedUris.Add(pageUri.AbsoluteUri)) + { + continue; + } + + _diagnostics.ListingFetchAttempt(); + + var listingResult = await FetchListingPageAsync(pageUri, cancellationToken).ConfigureAwait(false); + if (!listingResult.IsSuccess || listingResult.Content is null) + { + _diagnostics.ListingFetchFailure(listingResult.StatusCode.ToString()); + _logger.LogWarning("NKCKI listing page {ListingUri} returned no content (status={Status})", pageUri, listingResult.StatusCode); + continue; + } + + visited++; + + var page = await ParseListingAsync(pageUri, listingResult.Content, cancellationToken).ConfigureAwait(false); + attachments.AddRange(page.Attachments); + + foreach (var link in page.PaginationLinks) + { + if (!visitedUris.Contains(link.AbsoluteUri) && queue.Count + visitedUris.Count < _options.MaxListingPagesPerFetch) + { + queue.Enqueue(link); + } + } + + if (attachments.Count >= _options.MaxBulletinsPerFetch * 2) + { + break; + } + } + + return new ListingFetchSummary(attachments, visited); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.StellaOpsMirror/StellaOpsMirrorConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.StellaOpsMirror/StellaOpsMirrorConnector.cs index 8aebee988..097027ac2 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.StellaOpsMirror/StellaOpsMirrorConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.StellaOpsMirror/StellaOpsMirrorConnector.cs @@ -1,35 +1,35 @@ -using System; -using System.Collections.Generic; -using System.Linq; +using System; +using System.Collections.Generic; +using System.Linq; using System.Text; using Microsoft.Extensions.Logging; using Microsoft.Extensions.Options; using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.StellaOpsMirror.Client; -using StellaOps.Concelier.Connector.StellaOpsMirror.Internal; -using StellaOps.Concelier.Connector.StellaOpsMirror.Security; -using StellaOps.Concelier.Connector.StellaOpsMirror.Settings; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.StellaOpsMirror.Client; +using StellaOps.Concelier.Connector.StellaOpsMirror.Internal; +using StellaOps.Concelier.Connector.StellaOpsMirror.Security; +using StellaOps.Concelier.Connector.StellaOpsMirror.Settings; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; using StellaOps.Concelier.Storage.Mongo.Documents; using StellaOps.Concelier.Storage.Mongo.Dtos; using StellaOps.Plugin; using StellaOps.Cryptography; - -namespace StellaOps.Concelier.Connector.StellaOpsMirror; - -public sealed class StellaOpsMirrorConnector : IFeedConnector -{ - public const string Source = "stellaops-mirror"; - private const string BundleDtoSchemaVersion = "stellaops.mirror.bundle.v1"; - - private readonly MirrorManifestClient _client; - private readonly MirrorSignatureVerifier _signatureVerifier; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; + +namespace StellaOps.Concelier.Connector.StellaOpsMirror; + +public sealed class StellaOpsMirrorConnector : IFeedConnector +{ + public const string Source = "stellaops-mirror"; + private const string BundleDtoSchemaVersion = "stellaops.mirror.bundle.v1"; + + private readonly MirrorManifestClient _client; + private readonly MirrorSignatureVerifier _signatureVerifier; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; private readonly IDtoStore _dtoStore; private readonly IAdvisoryStore _advisoryStore; private readonly ISourceStateRepository _stateRepository; @@ -37,537 +37,537 @@ public sealed class StellaOpsMirrorConnector : IFeedConnector private readonly ILogger _logger; private readonly StellaOpsMirrorConnectorOptions _options; private readonly ICryptoHash _hash; - - public StellaOpsMirrorConnector( - MirrorManifestClient client, - MirrorSignatureVerifier signatureVerifier, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, + + public StellaOpsMirrorConnector( + MirrorManifestClient client, + MirrorSignatureVerifier signatureVerifier, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, IOptions options, TimeProvider? timeProvider, ICryptoHash cryptoHash, ILogger logger) - { - _client = client ?? throw new ArgumentNullException(nameof(client)); - _signatureVerifier = signatureVerifier ?? throw new ArgumentNullException(nameof(signatureVerifier)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + { + _client = client ?? throw new ArgumentNullException(nameof(client)); + _signatureVerifier = signatureVerifier ?? throw new ArgumentNullException(nameof(signatureVerifier)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _timeProvider = timeProvider ?? TimeProvider.System; _hash = cryptoHash ?? throw new ArgumentNullException(nameof(cryptoHash)); _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - ValidateOptions(_options); - } - - public string SourceName => Source; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - _ = services ?? throw new ArgumentNullException(nameof(services)); - - var now = _timeProvider.GetUtcNow(); - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - MirrorIndexDocument index; - try - { - index = await _client.GetIndexAsync(_options.IndexPath, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - await _stateRepository.MarkFailureAsync(Source, now, TimeSpan.FromMinutes(15), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var domain = index.Domains.FirstOrDefault(entry => - string.Equals(entry.DomainId, _options.DomainId, StringComparison.OrdinalIgnoreCase)); - - if (domain is null) - { - var message = $"Mirror domain '{_options.DomainId}' not present in index."; - await _stateRepository.MarkFailureAsync(Source, now, TimeSpan.FromMinutes(30), message, cancellationToken).ConfigureAwait(false); - throw new InvalidOperationException(message); - } - - var fingerprint = CreateFingerprint(index, domain); - var isNewDigest = !string.Equals(domain.Bundle.Digest, cursor.BundleDigest, StringComparison.OrdinalIgnoreCase); - - if (isNewDigest) - { - pendingDocuments.Clear(); - pendingMappings.Clear(); - } - - if (string.Equals(domain.Bundle.Digest, cursor.BundleDigest, StringComparison.OrdinalIgnoreCase)) - { - _logger.LogInformation("Mirror bundle digest {Digest} unchanged; skipping fetch.", domain.Bundle.Digest); - return; - } - - try - { - await ProcessDomainAsync(index, domain, pendingDocuments, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - await _stateRepository.MarkFailureAsync(Source, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var completedFingerprint = isNewDigest ? null : cursor.CompletedFingerprint; - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithBundleSnapshot(domain.Bundle.Path, domain.Bundle.Digest, index.GeneratedAt) - .WithCompletedFingerprint(completedFingerprint); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - return ParseInternalAsync(cancellationToken); - } - - public Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - return MapInternalAsync(cancellationToken); - } - - private async Task ProcessDomainAsync( - MirrorIndexDocument index, - MirrorIndexDomainEntry domain, - HashSet pendingDocuments, - CancellationToken cancellationToken) - { - var manifestBytes = await _client.DownloadAsync(domain.Manifest.Path, cancellationToken).ConfigureAwait(false); - var bundleBytes = await _client.DownloadAsync(domain.Bundle.Path, cancellationToken).ConfigureAwait(false); - - VerifyDigest(domain.Manifest.Digest, manifestBytes, domain.Manifest.Path); - VerifyDigest(domain.Bundle.Digest, bundleBytes, domain.Bundle.Path); - - if (_options.Signature.Enabled) - { - if (domain.Bundle.Signature is null) - { - throw new InvalidOperationException("Mirror bundle did not include a signature descriptor while verification is enabled."); - } - - if (!string.IsNullOrWhiteSpace(_options.Signature.KeyId) && - !string.Equals(domain.Bundle.Signature.KeyId, _options.Signature.KeyId, StringComparison.OrdinalIgnoreCase)) - { - throw new InvalidOperationException($"Mirror bundle signature key '{domain.Bundle.Signature.KeyId}' did not match expected key '{_options.Signature.KeyId}'."); - } - - if (!string.IsNullOrWhiteSpace(_options.Signature.Provider) && - !string.Equals(domain.Bundle.Signature.Provider, _options.Signature.Provider, StringComparison.OrdinalIgnoreCase)) - { - throw new InvalidOperationException($"Mirror bundle signature provider '{domain.Bundle.Signature.Provider ?? ""}' did not match expected provider '{_options.Signature.Provider}'."); - } - - var signatureBytes = await _client.DownloadAsync(domain.Bundle.Signature.Path, cancellationToken).ConfigureAwait(false); - var signatureValue = Encoding.UTF8.GetString(signatureBytes).Trim(); - await _signatureVerifier.VerifyAsync( - bundleBytes, - signatureValue, - expectedKeyId: _options.Signature.KeyId, - expectedProvider: _options.Signature.Provider, - fallbackPublicKeyPath: _options.Signature.PublicKeyPath, - cancellationToken).ConfigureAwait(false); - } - else if (domain.Bundle.Signature is not null) - { - _logger.LogInformation("Mirror bundle provided signature descriptor but verification is disabled; skipping verification."); - } - - await StoreAsync(domain, index.GeneratedAt, domain.Manifest, manifestBytes, "application/json", DocumentStatuses.Mapped, addToPending: false, pendingDocuments, cancellationToken).ConfigureAwait(false); - var bundleRecord = await StoreAsync(domain, index.GeneratedAt, domain.Bundle, bundleBytes, "application/json", DocumentStatuses.PendingParse, addToPending: true, pendingDocuments, cancellationToken).ConfigureAwait(false); - - _logger.LogInformation( - "Stored mirror bundle {Uri} as document {DocumentId} with digest {Digest}.", - bundleRecord.Uri, - bundleRecord.Id, - bundleRecord.Sha256); - } - - private async Task StoreAsync( - MirrorIndexDomainEntry domain, - DateTimeOffset generatedAt, - MirrorFileDescriptor descriptor, - byte[] payload, - string contentType, - string status, - bool addToPending, - HashSet pendingDocuments, - CancellationToken cancellationToken) - { - var absolute = ResolveAbsolutePath(descriptor.Path); - - var existing = await _documentStore.FindBySourceAndUriAsync(Source, absolute, cancellationToken).ConfigureAwait(false); - if (existing is not null && string.Equals(existing.Sha256, NormalizeDigest(descriptor.Digest), StringComparison.OrdinalIgnoreCase)) - { - if (addToPending) - { - pendingDocuments.Add(existing.Id); - } - - return existing; - } - - var gridFsId = await _rawDocumentStorage.UploadAsync(Source, absolute, payload, contentType, cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var sha = ComputeSha256(payload); - - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["mirror.domainId"] = domain.DomainId, - ["mirror.displayName"] = domain.DisplayName, - ["mirror.path"] = descriptor.Path, - ["mirror.digest"] = NormalizeDigest(descriptor.Digest), - ["mirror.type"] = ReferenceEquals(descriptor, domain.Bundle) ? "bundle" : "manifest", - }; - - var record = new DocumentRecord( - existing?.Id ?? Guid.NewGuid(), - Source, - absolute, - now, - sha, - status, - contentType, - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: generatedAt, - GridFsId: gridFsId, - ExpiresAt: null); - - var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); - - if (addToPending) - { - pendingDocuments.Add(upserted.Id); - } - - return upserted; - } - - private string ResolveAbsolutePath(string path) - { - var uri = new Uri(_options.BaseAddress, path); - return uri.ToString(); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(Source, cancellationToken).ConfigureAwait(false); - return state is null ? StellaOpsMirrorCursor.Empty : StellaOpsMirrorCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(StellaOpsMirrorCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - var now = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(Source, document, now, cancellationToken).ConfigureAwait(false); - } - + ValidateOptions(_options); + } + + public string SourceName => Source; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + _ = services ?? throw new ArgumentNullException(nameof(services)); + + var now = _timeProvider.GetUtcNow(); + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + MirrorIndexDocument index; + try + { + index = await _client.GetIndexAsync(_options.IndexPath, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + await _stateRepository.MarkFailureAsync(Source, now, TimeSpan.FromMinutes(15), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var domain = index.Domains.FirstOrDefault(entry => + string.Equals(entry.DomainId, _options.DomainId, StringComparison.OrdinalIgnoreCase)); + + if (domain is null) + { + var message = $"Mirror domain '{_options.DomainId}' not present in index."; + await _stateRepository.MarkFailureAsync(Source, now, TimeSpan.FromMinutes(30), message, cancellationToken).ConfigureAwait(false); + throw new InvalidOperationException(message); + } + + var fingerprint = CreateFingerprint(index, domain); + var isNewDigest = !string.Equals(domain.Bundle.Digest, cursor.BundleDigest, StringComparison.OrdinalIgnoreCase); + + if (isNewDigest) + { + pendingDocuments.Clear(); + pendingMappings.Clear(); + } + + if (string.Equals(domain.Bundle.Digest, cursor.BundleDigest, StringComparison.OrdinalIgnoreCase)) + { + _logger.LogInformation("Mirror bundle digest {Digest} unchanged; skipping fetch.", domain.Bundle.Digest); + return; + } + + try + { + await ProcessDomainAsync(index, domain, pendingDocuments, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + await _stateRepository.MarkFailureAsync(Source, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var completedFingerprint = isNewDigest ? null : cursor.CompletedFingerprint; + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithBundleSnapshot(domain.Bundle.Path, domain.Bundle.Digest, index.GeneratedAt) + .WithCompletedFingerprint(completedFingerprint); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + return ParseInternalAsync(cancellationToken); + } + + public Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + return MapInternalAsync(cancellationToken); + } + + private async Task ProcessDomainAsync( + MirrorIndexDocument index, + MirrorIndexDomainEntry domain, + HashSet pendingDocuments, + CancellationToken cancellationToken) + { + var manifestBytes = await _client.DownloadAsync(domain.Manifest.Path, cancellationToken).ConfigureAwait(false); + var bundleBytes = await _client.DownloadAsync(domain.Bundle.Path, cancellationToken).ConfigureAwait(false); + + VerifyDigest(domain.Manifest.Digest, manifestBytes, domain.Manifest.Path); + VerifyDigest(domain.Bundle.Digest, bundleBytes, domain.Bundle.Path); + + if (_options.Signature.Enabled) + { + if (domain.Bundle.Signature is null) + { + throw new InvalidOperationException("Mirror bundle did not include a signature descriptor while verification is enabled."); + } + + if (!string.IsNullOrWhiteSpace(_options.Signature.KeyId) && + !string.Equals(domain.Bundle.Signature.KeyId, _options.Signature.KeyId, StringComparison.OrdinalIgnoreCase)) + { + throw new InvalidOperationException($"Mirror bundle signature key '{domain.Bundle.Signature.KeyId}' did not match expected key '{_options.Signature.KeyId}'."); + } + + if (!string.IsNullOrWhiteSpace(_options.Signature.Provider) && + !string.Equals(domain.Bundle.Signature.Provider, _options.Signature.Provider, StringComparison.OrdinalIgnoreCase)) + { + throw new InvalidOperationException($"Mirror bundle signature provider '{domain.Bundle.Signature.Provider ?? ""}' did not match expected provider '{_options.Signature.Provider}'."); + } + + var signatureBytes = await _client.DownloadAsync(domain.Bundle.Signature.Path, cancellationToken).ConfigureAwait(false); + var signatureValue = Encoding.UTF8.GetString(signatureBytes).Trim(); + await _signatureVerifier.VerifyAsync( + bundleBytes, + signatureValue, + expectedKeyId: _options.Signature.KeyId, + expectedProvider: _options.Signature.Provider, + fallbackPublicKeyPath: _options.Signature.PublicKeyPath, + cancellationToken).ConfigureAwait(false); + } + else if (domain.Bundle.Signature is not null) + { + _logger.LogInformation("Mirror bundle provided signature descriptor but verification is disabled; skipping verification."); + } + + await StoreAsync(domain, index.GeneratedAt, domain.Manifest, manifestBytes, "application/json", DocumentStatuses.Mapped, addToPending: false, pendingDocuments, cancellationToken).ConfigureAwait(false); + var bundleRecord = await StoreAsync(domain, index.GeneratedAt, domain.Bundle, bundleBytes, "application/json", DocumentStatuses.PendingParse, addToPending: true, pendingDocuments, cancellationToken).ConfigureAwait(false); + + _logger.LogInformation( + "Stored mirror bundle {Uri} as document {DocumentId} with digest {Digest}.", + bundleRecord.Uri, + bundleRecord.Id, + bundleRecord.Sha256); + } + + private async Task StoreAsync( + MirrorIndexDomainEntry domain, + DateTimeOffset generatedAt, + MirrorFileDescriptor descriptor, + byte[] payload, + string contentType, + string status, + bool addToPending, + HashSet pendingDocuments, + CancellationToken cancellationToken) + { + var absolute = ResolveAbsolutePath(descriptor.Path); + + var existing = await _documentStore.FindBySourceAndUriAsync(Source, absolute, cancellationToken).ConfigureAwait(false); + if (existing is not null && string.Equals(existing.Sha256, NormalizeDigest(descriptor.Digest), StringComparison.OrdinalIgnoreCase)) + { + if (addToPending) + { + pendingDocuments.Add(existing.Id); + } + + return existing; + } + + var gridFsId = await _rawDocumentStorage.UploadAsync(Source, absolute, payload, contentType, cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var sha = ComputeSha256(payload); + + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["mirror.domainId"] = domain.DomainId, + ["mirror.displayName"] = domain.DisplayName, + ["mirror.path"] = descriptor.Path, + ["mirror.digest"] = NormalizeDigest(descriptor.Digest), + ["mirror.type"] = ReferenceEquals(descriptor, domain.Bundle) ? "bundle" : "manifest", + }; + + var record = new DocumentRecord( + existing?.Id ?? Guid.NewGuid(), + Source, + absolute, + now, + sha, + status, + contentType, + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: generatedAt, + PayloadId: gridFsId, + ExpiresAt: null); + + var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); + + if (addToPending) + { + pendingDocuments.Add(upserted.Id); + } + + return upserted; + } + + private string ResolveAbsolutePath(string path) + { + var uri = new Uri(_options.BaseAddress, path); + return uri.ToString(); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(Source, cancellationToken).ConfigureAwait(false); + return state is null ? StellaOpsMirrorCursor.Empty : StellaOpsMirrorCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(StellaOpsMirrorCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + var now = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(Source, document, now, cancellationToken).ConfigureAwait(false); + } + private void VerifyDigest(string expected, ReadOnlySpan payload, string path) - { - if (string.IsNullOrWhiteSpace(expected)) - { - return; - } - - if (!expected.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase)) - { - throw new InvalidOperationException($"Unsupported digest '{expected}' for '{path}'."); - } - + { + if (string.IsNullOrWhiteSpace(expected)) + { + return; + } + + if (!expected.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase)) + { + throw new InvalidOperationException($"Unsupported digest '{expected}' for '{path}'."); + } + var actualHash = _hash.ComputeHashHex(payload, HashAlgorithms.Sha256); var actual = "sha256:" + actualHash; - if (!string.Equals(actual, expected, StringComparison.OrdinalIgnoreCase)) - { - throw new InvalidOperationException($"Digest mismatch for '{path}'. Expected {expected}, computed {actual}."); - } - } - + if (!string.Equals(actual, expected, StringComparison.OrdinalIgnoreCase)) + { + throw new InvalidOperationException($"Digest mismatch for '{path}'. Expected {expected}, computed {actual}."); + } + } + private string ComputeSha256(ReadOnlySpan payload) => _hash.ComputeHashHex(payload, HashAlgorithms.Sha256); - - private static string NormalizeDigest(string digest) - { - if (string.IsNullOrWhiteSpace(digest)) - { - return string.Empty; - } - - return digest.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) - ? digest[7..] - : digest.ToLowerInvariant(); - } - - private static string? CreateFingerprint(MirrorIndexDocument index, MirrorIndexDomainEntry domain) - => CreateFingerprint(domain.Bundle.Digest, index.GeneratedAt); - - private static string? CreateFingerprint(string? digest, DateTimeOffset? generatedAt) - { - var normalizedDigest = NormalizeDigest(digest ?? string.Empty); - if (string.IsNullOrWhiteSpace(normalizedDigest) || generatedAt is null) - { - return null; - } - - return FormattableString.Invariant($"{normalizedDigest}:{generatedAt.Value.ToUniversalTime():O}"); - } - - private static void ValidateOptions(StellaOpsMirrorConnectorOptions options) - { - if (options.BaseAddress is null || !options.BaseAddress.IsAbsoluteUri) - { - throw new InvalidOperationException("Mirror connector requires an absolute baseAddress."); - } - - if (string.IsNullOrWhiteSpace(options.DomainId)) - { - throw new InvalidOperationException("Mirror connector requires domainId to be specified."); - } - } - - private async Task ParseInternalAsync(CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var now = _timeProvider.GetUtcNow(); - var parsed = 0; - var failures = 0; - - foreach (var documentId in cursor.PendingDocuments.ToArray()) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Mirror bundle document {DocumentId} missing GridFS payload.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - byte[] payload; - try - { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Mirror bundle {DocumentId} failed to download from raw storage.", documentId); - throw; - } - - MirrorBundleDocument? bundle; - string json; - try - { - json = Encoding.UTF8.GetString(payload); - bundle = CanonicalJsonSerializer.Deserialize(json); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Mirror bundle {DocumentId} failed to deserialize.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - if (bundle is null || bundle.Advisories is null) - { - _logger.LogWarning("Mirror bundle {DocumentId} produced null payload.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - var dtoBson = BsonDocument.Parse(json); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, Source, BundleDtoSchemaVersion, dtoBson, now); - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - pendingMappings.Add(document.Id); - parsed++; - - _logger.LogDebug( - "Parsed mirror bundle {DocumentId} domain={DomainId} advisories={AdvisoryCount}.", - document.Id, - bundle.DomainId, - bundle.AdvisoryCount); - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - - if (parsed > 0 || failures > 0) - { - _logger.LogInformation( - "Mirror parse completed parsed={Parsed} failures={Failures} pendingDocuments={PendingDocuments} pendingMappings={PendingMappings}.", - parsed, - failures, - pendingDocuments.Count, - pendingMappings.Count); - } - } - - private async Task MapInternalAsync(CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var mapped = 0; - var failures = 0; - var completedFingerprint = cursor.CompletedFingerprint; - - foreach (var documentId in cursor.PendingMappings.ToArray()) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingMappings.Remove(documentId); - failures++; - continue; - } - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - _logger.LogWarning("Mirror document {DocumentId} missing DTO payload.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - MirrorBundleDocument? bundle; - try - { - var json = dtoRecord.Payload.ToJson(); - bundle = CanonicalJsonSerializer.Deserialize(json); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Mirror DTO for document {DocumentId} failed to deserialize.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - if (bundle is null || bundle.Advisories is null) - { - _logger.LogWarning("Mirror bundle DTO {DocumentId} evaluated to null.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - failures++; - continue; - } - - try - { - var advisories = MirrorAdvisoryMapper.Map(bundle); - - foreach (var advisory in advisories) - { - cancellationToken.ThrowIfCancellationRequested(); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - } - - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - mapped++; - - _logger.LogDebug( - "Mirror map completed for document {DocumentId} domain={DomainId} advisories={AdvisoryCount}.", - document.Id, - bundle.DomainId, - advisories.Length); - } - catch (Exception ex) - { - _logger.LogError(ex, "Mirror mapping failed for document {DocumentId}.", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - failures++; - } - } - - if (pendingMappings.Count == 0 && failures == 0) - { - var fingerprint = CreateFingerprint(cursor.BundleDigest, cursor.GeneratedAt); - if (!string.IsNullOrWhiteSpace(fingerprint)) - { - completedFingerprint = fingerprint; - } - } - - var updatedCursor = cursor - .WithPendingMappings(pendingMappings) - .WithCompletedFingerprint(completedFingerprint); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - - if (mapped > 0 || failures > 0) - { - _logger.LogInformation( - "Mirror map completed mapped={Mapped} failures={Failures} pendingMappings={PendingMappings}.", - mapped, - failures, - pendingMappings.Count); - } - } -} - -file static class UriExtensions -{ - public static Uri Combine(this Uri baseUri, string relative) - => new(baseUri, relative); -} + + private static string NormalizeDigest(string digest) + { + if (string.IsNullOrWhiteSpace(digest)) + { + return string.Empty; + } + + return digest.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) + ? digest[7..] + : digest.ToLowerInvariant(); + } + + private static string? CreateFingerprint(MirrorIndexDocument index, MirrorIndexDomainEntry domain) + => CreateFingerprint(domain.Bundle.Digest, index.GeneratedAt); + + private static string? CreateFingerprint(string? digest, DateTimeOffset? generatedAt) + { + var normalizedDigest = NormalizeDigest(digest ?? string.Empty); + if (string.IsNullOrWhiteSpace(normalizedDigest) || generatedAt is null) + { + return null; + } + + return FormattableString.Invariant($"{normalizedDigest}:{generatedAt.Value.ToUniversalTime():O}"); + } + + private static void ValidateOptions(StellaOpsMirrorConnectorOptions options) + { + if (options.BaseAddress is null || !options.BaseAddress.IsAbsoluteUri) + { + throw new InvalidOperationException("Mirror connector requires an absolute baseAddress."); + } + + if (string.IsNullOrWhiteSpace(options.DomainId)) + { + throw new InvalidOperationException("Mirror connector requires domainId to be specified."); + } + } + + private async Task ParseInternalAsync(CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var now = _timeProvider.GetUtcNow(); + var parsed = 0; + var failures = 0; + + foreach (var documentId in cursor.PendingDocuments.ToArray()) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Mirror bundle document {DocumentId} missing GridFS payload.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + byte[] payload; + try + { + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Mirror bundle {DocumentId} failed to download from raw storage.", documentId); + throw; + } + + MirrorBundleDocument? bundle; + string json; + try + { + json = Encoding.UTF8.GetString(payload); + bundle = CanonicalJsonSerializer.Deserialize(json); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Mirror bundle {DocumentId} failed to deserialize.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + if (bundle is null || bundle.Advisories is null) + { + _logger.LogWarning("Mirror bundle {DocumentId} produced null payload.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + var dtoBson = BsonDocument.Parse(json); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, Source, BundleDtoSchemaVersion, dtoBson, now); + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + pendingMappings.Add(document.Id); + parsed++; + + _logger.LogDebug( + "Parsed mirror bundle {DocumentId} domain={DomainId} advisories={AdvisoryCount}.", + document.Id, + bundle.DomainId, + bundle.AdvisoryCount); + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + + if (parsed > 0 || failures > 0) + { + _logger.LogInformation( + "Mirror parse completed parsed={Parsed} failures={Failures} pendingDocuments={PendingDocuments} pendingMappings={PendingMappings}.", + parsed, + failures, + pendingDocuments.Count, + pendingMappings.Count); + } + } + + private async Task MapInternalAsync(CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var mapped = 0; + var failures = 0; + var completedFingerprint = cursor.CompletedFingerprint; + + foreach (var documentId in cursor.PendingMappings.ToArray()) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingMappings.Remove(documentId); + failures++; + continue; + } + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + _logger.LogWarning("Mirror document {DocumentId} missing DTO payload.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + MirrorBundleDocument? bundle; + try + { + var json = dtoRecord.Payload.ToJson(); + bundle = CanonicalJsonSerializer.Deserialize(json); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Mirror DTO for document {DocumentId} failed to deserialize.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + if (bundle is null || bundle.Advisories is null) + { + _logger.LogWarning("Mirror bundle DTO {DocumentId} evaluated to null.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + failures++; + continue; + } + + try + { + var advisories = MirrorAdvisoryMapper.Map(bundle); + + foreach (var advisory in advisories) + { + cancellationToken.ThrowIfCancellationRequested(); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + } + + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + mapped++; + + _logger.LogDebug( + "Mirror map completed for document {DocumentId} domain={DomainId} advisories={AdvisoryCount}.", + document.Id, + bundle.DomainId, + advisories.Length); + } + catch (Exception ex) + { + _logger.LogError(ex, "Mirror mapping failed for document {DocumentId}.", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + failures++; + } + } + + if (pendingMappings.Count == 0 && failures == 0) + { + var fingerprint = CreateFingerprint(cursor.BundleDigest, cursor.GeneratedAt); + if (!string.IsNullOrWhiteSpace(fingerprint)) + { + completedFingerprint = fingerprint; + } + } + + var updatedCursor = cursor + .WithPendingMappings(pendingMappings) + .WithCompletedFingerprint(completedFingerprint); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + + if (mapped > 0 || failures > 0) + { + _logger.LogInformation( + "Mirror map completed mapped={Mapped} failures={Failures} pendingMappings={PendingMappings}.", + mapped, + failures, + pendingMappings.Count); + } + } +} + +file static class UriExtensions +{ + public static Uri Combine(this Uri baseUri, string relative) + => new(baseUri, relative); +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Adobe/AdobeConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Adobe/AdobeConnector.cs index 566952bc8..f0f7a65ec 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Adobe/AdobeConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Adobe/AdobeConnector.cs @@ -1,711 +1,711 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text; -using System.Text.Json; -using System.Text.Json.Serialization; -using System.Text.RegularExpressions; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using Json.Schema; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Json; -using StellaOps.Concelier.Connector.Common.Packages; -using StellaOps.Concelier.Connector.Vndr.Adobe.Configuration; -using StellaOps.Concelier.Connector.Vndr.Adobe.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.PsirtFlags; -using StellaOps.Concelier.Models; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Vndr.Adobe; - -public sealed class AdobeConnector : IFeedConnector -{ - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly IPsirtFlagStore _psirtFlagStore; - private readonly IJsonSchemaValidator _schemaValidator; - private readonly AdobeOptions _options; - private readonly TimeProvider _timeProvider; - private readonly IHttpClientFactory _httpClientFactory; - private readonly AdobeDiagnostics _diagnostics; - private readonly ILogger _logger; - - private static readonly JsonSchema Schema = AdobeSchemaProvider.Schema; - private static readonly JsonSerializerOptions SerializerOptions = new() - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - }; - - public AdobeConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IPsirtFlagStore psirtFlagStore, - IJsonSchemaValidator schemaValidator, - IOptions options, - TimeProvider? timeProvider, - IHttpClientFactory httpClientFactory, - AdobeDiagnostics diagnostics, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); - _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); - _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - private static IReadOnlyList BuildStatuses(AdobeProductEntry product, AdvisoryProvenance provenance) - { - if (!TryResolveAvailabilityStatus(product.Availability, out var status)) - { - return Array.Empty(); - } - - return new[] { new AffectedPackageStatus(status, provenance) }; - } - - private static bool TryResolveAvailabilityStatus(string? availability, out string status) - { - status = string.Empty; - if (string.IsNullOrWhiteSpace(availability)) - { - return false; - } - - var trimmed = availability.Trim(); - - if (AffectedPackageStatusCatalog.TryNormalize(trimmed, out var normalized)) - { - status = normalized; - return true; - } - - var token = SanitizeStatusToken(trimmed); - if (token.Length == 0) - { - return false; - } - - if (AvailabilityStatusMap.TryGetValue(token, out var mapped)) - { - status = mapped; - return true; - } - - return false; - } - - private static string SanitizeStatusToken(string value) - { - var buffer = new char[value.Length]; - var index = 0; - - foreach (var ch in value) - { - if (char.IsLetterOrDigit(ch)) - { - buffer[index++] = char.ToLowerInvariant(ch); - } - } - - return index == 0 ? string.Empty : new string(buffer, 0, index); - } - - private static readonly Dictionary AvailabilityStatusMap = new(StringComparer.Ordinal) - { - ["available"] = AffectedPackageStatusCatalog.Fixed, - ["availabletoday"] = AffectedPackageStatusCatalog.Fixed, - ["availablenow"] = AffectedPackageStatusCatalog.Fixed, - ["updateavailable"] = AffectedPackageStatusCatalog.Fixed, - ["patchavailable"] = AffectedPackageStatusCatalog.Fixed, - ["fixavailable"] = AffectedPackageStatusCatalog.Fixed, - ["mitigationavailable"] = AffectedPackageStatusCatalog.Mitigated, - ["workaroundavailable"] = AffectedPackageStatusCatalog.Mitigated, - ["mitigationprovided"] = AffectedPackageStatusCatalog.Mitigated, - ["workaroundprovided"] = AffectedPackageStatusCatalog.Mitigated, - ["planned"] = AffectedPackageStatusCatalog.Pending, - ["updateplanned"] = AffectedPackageStatusCatalog.Pending, - ["plannedupdate"] = AffectedPackageStatusCatalog.Pending, - ["scheduled"] = AffectedPackageStatusCatalog.Pending, - ["scheduledupdate"] = AffectedPackageStatusCatalog.Pending, - ["pendingavailability"] = AffectedPackageStatusCatalog.Pending, - ["pendingupdate"] = AffectedPackageStatusCatalog.Pending, - ["pendingfix"] = AffectedPackageStatusCatalog.Pending, - ["notavailable"] = AffectedPackageStatusCatalog.Unknown, - ["unavailable"] = AffectedPackageStatusCatalog.Unknown, - ["notcurrentlyavailable"] = AffectedPackageStatusCatalog.Unknown, - ["notapplicable"] = AffectedPackageStatusCatalog.NotApplicable, - }; - - private AffectedVersionRange? BuildVersionRange(AdobeProductEntry product, DateTimeOffset recordedAt) - { - if (string.IsNullOrWhiteSpace(product.AffectedVersion) && string.IsNullOrWhiteSpace(product.UpdatedVersion)) - { - return null; - } - - var key = string.IsNullOrWhiteSpace(product.Platform) - ? product.Product - : $"{product.Product}:{product.Platform}"; - - var provenance = new AdvisoryProvenance(SourceName, "range", key, recordedAt); - - var extensions = new Dictionary(StringComparer.Ordinal); - AddExtension(extensions, "adobe.track", product.Track); - AddExtension(extensions, "adobe.platform", product.Platform); - AddExtension(extensions, "adobe.affected.raw", product.AffectedVersion); - AddExtension(extensions, "adobe.updated.raw", product.UpdatedVersion); - AddExtension(extensions, "adobe.priority", product.Priority); - AddExtension(extensions, "adobe.availability", product.Availability); - - var lastAffected = ExtractVersionNumber(product.AffectedVersion); - var fixedVersion = ExtractVersionNumber(product.UpdatedVersion); - - var primitives = BuildRangePrimitives(lastAffected, fixedVersion, extensions); - - return new AffectedVersionRange( - rangeKind: "vendor", - introducedVersion: null, - fixedVersion: fixedVersion, - lastAffectedVersion: lastAffected, - rangeExpression: product.AffectedVersion ?? product.UpdatedVersion, - provenance: provenance, - primitives: primitives); - } - - private static RangePrimitives? BuildRangePrimitives(string? lastAffected, string? fixedVersion, Dictionary extensions) - { - var semVer = BuildSemVerPrimitive(lastAffected, fixedVersion); - - if (semVer is null && extensions.Count == 0) - { - return null; - } - - return new RangePrimitives(semVer, null, null, extensions.Count == 0 ? null : extensions); - } - - private static SemVerPrimitive? BuildSemVerPrimitive(string? lastAffected, string? fixedVersion) - { - var fixedNormalized = NormalizeSemVer(fixedVersion); - var lastNormalized = NormalizeSemVer(lastAffected); - - if (fixedNormalized is null && lastNormalized is null) - { - return null; - } - - return new SemVerPrimitive( - Introduced: null, - IntroducedInclusive: true, - Fixed: fixedNormalized, - FixedInclusive: false, - LastAffected: lastNormalized, - LastAffectedInclusive: true, - ConstraintExpression: null); - } - - private static string? NormalizeSemVer(string? value) - { - if (string.IsNullOrWhiteSpace(value)) - { - return null; - } - - var trimmed = value.Trim(); - if (PackageCoordinateHelper.TryParseSemVer(trimmed, out _, out var normalized) && !string.IsNullOrWhiteSpace(normalized)) - { - return normalized; - } - - if (Version.TryParse(trimmed, out var parsed)) - { - if (parsed.Build >= 0 && parsed.Revision >= 0) - { - return $"{parsed.Major}.{parsed.Minor}.{parsed.Build}.{parsed.Revision}"; - } - - if (parsed.Build >= 0) - { - return $"{parsed.Major}.{parsed.Minor}.{parsed.Build}"; - } - - return $"{parsed.Major}.{parsed.Minor}"; - } - - return null; - } - - private static string? ExtractVersionNumber(string? text) - { - if (string.IsNullOrWhiteSpace(text)) - { - return null; - } - - var match = VersionPattern.Match(text); - return match.Success ? match.Value : null; - } - - private static void AddExtension(IDictionary extensions, string key, string? value) - { - if (!string.IsNullOrWhiteSpace(value)) - { - extensions[key] = value.Trim(); - } - } - - private static readonly Regex VersionPattern = new("\\d+(?:\\.\\d+)+", RegexOptions.Compiled); - - public string SourceName => VndrAdobeConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var backfillStart = now - _options.InitialBackfill; - var windowStart = cursor.LastPublished.HasValue - ? cursor.LastPublished.Value - _options.WindowOverlap - : backfillStart; - if (windowStart < backfillStart) - { - windowStart = backfillStart; - } - - var maxPublished = cursor.LastPublished; - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - var fetchCache = cursor.FetchCache is null - ? new Dictionary(StringComparer.Ordinal) - : new Dictionary(cursor.FetchCache, StringComparer.Ordinal); - var touchedResources = new HashSet(StringComparer.Ordinal); - - var collectedEntries = new Dictionary(StringComparer.OrdinalIgnoreCase); - - foreach (var indexUri in EnumerateIndexUris()) - { - _diagnostics.FetchAttempt(); - string? html = null; - try - { - var client = _httpClientFactory.CreateClient(AdobeOptions.HttpClientName); - using var response = await client.GetAsync(indexUri, cancellationToken).ConfigureAwait(false); - response.EnsureSuccessStatusCode(); - html = await response.Content.ReadAsStringAsync(cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "Failed to download Adobe index page {Uri}", indexUri); - continue; - } - - if (string.IsNullOrEmpty(html)) - { - continue; - } - - IReadOnlyCollection entries; - try - { - entries = AdobeIndexParser.Parse(html, indexUri); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to parse Adobe index page {Uri}", indexUri); - _diagnostics.FetchFailure(); - continue; - } - - foreach (var entry in entries) - { - if (entry.PublishedUtc < windowStart) - { - continue; - } - - if (!collectedEntries.TryGetValue(entry.AdvisoryId, out var existing) || entry.PublishedUtc > existing.PublishedUtc) - { - collectedEntries[entry.AdvisoryId] = entry; - } - } - } - - foreach (var entry in collectedEntries.Values.OrderBy(static e => e.PublishedUtc)) - { - if (!maxPublished.HasValue || entry.PublishedUtc > maxPublished) - { - maxPublished = entry.PublishedUtc; - } - - var cacheKey = entry.DetailUri.ToString(); - touchedResources.Add(cacheKey); - - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["advisoryId"] = entry.AdvisoryId, - ["published"] = entry.PublishedUtc.ToString("O"), - ["title"] = entry.Title ?? string.Empty, - }; - - try - { - var result = await _fetchService.FetchAsync( - new SourceFetchRequest(AdobeOptions.HttpClientName, SourceName, entry.DetailUri) - { - Metadata = metadata, - AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, - }, - cancellationToken).ConfigureAwait(false); - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - if (cursor.TryGetFetchCache(cacheKey, out var cached) - && string.Equals(cached.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase)) - { - _diagnostics.FetchUnchanged(); - fetchCache[cacheKey] = new AdobeFetchCacheEntry(result.Document.Sha256); - await _documentStore.UpdateStatusAsync(result.Document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - continue; - } - - _diagnostics.FetchDocument(); - fetchCache[cacheKey] = new AdobeFetchCacheEntry(result.Document.Sha256); - - if (!pendingDocuments.Contains(result.Document.Id)) - { - pendingDocuments.Add(result.Document.Id); - } - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "Failed to fetch Adobe advisory {AdvisoryId} ({Uri})", entry.AdvisoryId, entry.DetailUri); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - foreach (var key in fetchCache.Keys.ToList()) - { - if (!touchedResources.Contains(key)) - { - fetchCache.Remove(key); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastPublished(maxPublished) - .WithFetchCache(fetchCache); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Adobe document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - AdobeDocumentMetadata metadata; - try - { - metadata = AdobeDocumentMetadata.FromDocument(document); - } - catch (Exception ex) - { - _logger.LogError(ex, "Adobe metadata parse failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - AdobeBulletinDto dto; - try - { - var bytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - var html = Encoding.UTF8.GetString(bytes); - dto = AdobeDetailParser.Parse(html, metadata); - } - catch (Exception ex) - { - _logger.LogError(ex, "Adobe parse failed for advisory {AdvisoryId} ({Uri})", metadata.AdvisoryId, document.Uri); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var json = JsonSerializer.Serialize(dto, SerializerOptions); - using var jsonDocument = JsonDocument.Parse(json); - _schemaValidator.Validate(jsonDocument, Schema, metadata.AdvisoryId); - - var payload = MongoDB.Bson.BsonDocument.Parse(json); - var dtoRecord = new DtoRecord( - Guid.NewGuid(), - document.Id, - SourceName, - "adobe.bulletin.v1", - payload, - _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - var now = _timeProvider.GetUtcNow(); - - foreach (var documentId in cursor.PendingMappings) - { - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - AdobeBulletinDto? dto; - try - { - var json = dtoRecord.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings - { - OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, - }); - - dto = JsonSerializer.Deserialize(json, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "Adobe DTO deserialization failed for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (dto is null) - { - _logger.LogWarning("Adobe DTO payload deserialized as null for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var advisory = BuildAdvisory(dto, now); - if (!string.IsNullOrWhiteSpace(advisory.AdvisoryKey)) - { - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - - var flag = new PsirtFlagRecord( - advisory.AdvisoryKey, - "Adobe", - SourceName, - dto.AdvisoryId, - now); - - await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); - } - else - { - _logger.LogWarning("Skipping PSIRT flag for advisory with missing key (document {DocumentId})", documentId); - } - - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private IEnumerable EnumerateIndexUris() - { - yield return _options.IndexUri; - foreach (var uri in _options.AdditionalIndexUris) - { - yield return uri; - } - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return AdobeCursor.FromBsonDocument(record?.Cursor); - } - - private async Task UpdateCursorAsync(AdobeCursor cursor, CancellationToken cancellationToken) - { - var updatedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), updatedAt, cancellationToken).ConfigureAwait(false); - } - - private Advisory BuildAdvisory(AdobeBulletinDto dto, DateTimeOffset recordedAt) - { - var provenance = new AdvisoryProvenance(SourceName, "parser", dto.AdvisoryId, recordedAt); - - var aliasSet = new HashSet(StringComparer.OrdinalIgnoreCase) - { - dto.AdvisoryId, - }; - foreach (var cve in dto.Cves) - { - if (!string.IsNullOrWhiteSpace(cve)) - { - aliasSet.Add(cve); - } - } - - var comparer = StringComparer.OrdinalIgnoreCase; - var references = new List<(AdvisoryReference Reference, int Priority)> - { - (new AdvisoryReference(dto.DetailUrl, "advisory", "adobe-psirt", dto.Summary, provenance), 0), - }; - - foreach (var cve in dto.Cves) - { - if (string.IsNullOrWhiteSpace(cve)) - { - continue; - } - - var url = $"https://www.cve.org/CVERecord?id={cve}"; - references.Add((new AdvisoryReference(url, "advisory", cve, null, provenance), 1)); - } - - var orderedReferences = references - .GroupBy(tuple => tuple.Reference.Url, comparer) - .Select(group => group - .OrderBy(t => t.Priority) - .ThenBy(t => t.Reference.Kind ?? string.Empty, comparer) - .ThenBy(t => t.Reference.Url, comparer) - .First()) - .OrderBy(t => t.Priority) - .ThenBy(t => t.Reference.Kind ?? string.Empty, comparer) - .ThenBy(t => t.Reference.Url, comparer) - .Select(t => t.Reference) - .ToArray(); - - var affected = dto.Products - .Select(product => BuildPackage(product, recordedAt)) - .ToArray(); - - var aliases = aliasSet - .Where(static alias => !string.IsNullOrWhiteSpace(alias)) - .Select(static alias => alias.Trim()) - .Distinct(StringComparer.OrdinalIgnoreCase) - .OrderBy(static alias => alias, StringComparer.Ordinal) - .ToArray(); - - return new Advisory( - dto.AdvisoryId, - dto.Title, - dto.Summary, - language: "en", - published: dto.Published, - modified: null, - severity: null, - exploitKnown: false, - aliases, - orderedReferences, - affected, - Array.Empty(), - new[] { provenance }); - } - - private AffectedPackage BuildPackage(AdobeProductEntry product, DateTimeOffset recordedAt) - { - var identifier = string.IsNullOrWhiteSpace(product.Product) - ? "Adobe Product" - : product.Product.Trim(); - - var platform = string.IsNullOrWhiteSpace(product.Platform) ? null : product.Platform; - - var provenance = new AdvisoryProvenance( - SourceName, - "affected", - string.IsNullOrWhiteSpace(platform) ? identifier : $"{identifier}:{platform}", - recordedAt); - - var range = BuildVersionRange(product, recordedAt); +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Text.RegularExpressions; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using Json.Schema; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Json; +using StellaOps.Concelier.Connector.Common.Packages; +using StellaOps.Concelier.Connector.Vndr.Adobe.Configuration; +using StellaOps.Concelier.Connector.Vndr.Adobe.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.PsirtFlags; +using StellaOps.Concelier.Models; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Vndr.Adobe; + +public sealed class AdobeConnector : IFeedConnector +{ + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly IPsirtFlagStore _psirtFlagStore; + private readonly IJsonSchemaValidator _schemaValidator; + private readonly AdobeOptions _options; + private readonly TimeProvider _timeProvider; + private readonly IHttpClientFactory _httpClientFactory; + private readonly AdobeDiagnostics _diagnostics; + private readonly ILogger _logger; + + private static readonly JsonSchema Schema = AdobeSchemaProvider.Schema; + private static readonly JsonSerializerOptions SerializerOptions = new() + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + }; + + public AdobeConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IPsirtFlagStore psirtFlagStore, + IJsonSchemaValidator schemaValidator, + IOptions options, + TimeProvider? timeProvider, + IHttpClientFactory httpClientFactory, + AdobeDiagnostics diagnostics, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); + _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); + _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + private static IReadOnlyList BuildStatuses(AdobeProductEntry product, AdvisoryProvenance provenance) + { + if (!TryResolveAvailabilityStatus(product.Availability, out var status)) + { + return Array.Empty(); + } + + return new[] { new AffectedPackageStatus(status, provenance) }; + } + + private static bool TryResolveAvailabilityStatus(string? availability, out string status) + { + status = string.Empty; + if (string.IsNullOrWhiteSpace(availability)) + { + return false; + } + + var trimmed = availability.Trim(); + + if (AffectedPackageStatusCatalog.TryNormalize(trimmed, out var normalized)) + { + status = normalized; + return true; + } + + var token = SanitizeStatusToken(trimmed); + if (token.Length == 0) + { + return false; + } + + if (AvailabilityStatusMap.TryGetValue(token, out var mapped)) + { + status = mapped; + return true; + } + + return false; + } + + private static string SanitizeStatusToken(string value) + { + var buffer = new char[value.Length]; + var index = 0; + + foreach (var ch in value) + { + if (char.IsLetterOrDigit(ch)) + { + buffer[index++] = char.ToLowerInvariant(ch); + } + } + + return index == 0 ? string.Empty : new string(buffer, 0, index); + } + + private static readonly Dictionary AvailabilityStatusMap = new(StringComparer.Ordinal) + { + ["available"] = AffectedPackageStatusCatalog.Fixed, + ["availabletoday"] = AffectedPackageStatusCatalog.Fixed, + ["availablenow"] = AffectedPackageStatusCatalog.Fixed, + ["updateavailable"] = AffectedPackageStatusCatalog.Fixed, + ["patchavailable"] = AffectedPackageStatusCatalog.Fixed, + ["fixavailable"] = AffectedPackageStatusCatalog.Fixed, + ["mitigationavailable"] = AffectedPackageStatusCatalog.Mitigated, + ["workaroundavailable"] = AffectedPackageStatusCatalog.Mitigated, + ["mitigationprovided"] = AffectedPackageStatusCatalog.Mitigated, + ["workaroundprovided"] = AffectedPackageStatusCatalog.Mitigated, + ["planned"] = AffectedPackageStatusCatalog.Pending, + ["updateplanned"] = AffectedPackageStatusCatalog.Pending, + ["plannedupdate"] = AffectedPackageStatusCatalog.Pending, + ["scheduled"] = AffectedPackageStatusCatalog.Pending, + ["scheduledupdate"] = AffectedPackageStatusCatalog.Pending, + ["pendingavailability"] = AffectedPackageStatusCatalog.Pending, + ["pendingupdate"] = AffectedPackageStatusCatalog.Pending, + ["pendingfix"] = AffectedPackageStatusCatalog.Pending, + ["notavailable"] = AffectedPackageStatusCatalog.Unknown, + ["unavailable"] = AffectedPackageStatusCatalog.Unknown, + ["notcurrentlyavailable"] = AffectedPackageStatusCatalog.Unknown, + ["notapplicable"] = AffectedPackageStatusCatalog.NotApplicable, + }; + + private AffectedVersionRange? BuildVersionRange(AdobeProductEntry product, DateTimeOffset recordedAt) + { + if (string.IsNullOrWhiteSpace(product.AffectedVersion) && string.IsNullOrWhiteSpace(product.UpdatedVersion)) + { + return null; + } + + var key = string.IsNullOrWhiteSpace(product.Platform) + ? product.Product + : $"{product.Product}:{product.Platform}"; + + var provenance = new AdvisoryProvenance(SourceName, "range", key, recordedAt); + + var extensions = new Dictionary(StringComparer.Ordinal); + AddExtension(extensions, "adobe.track", product.Track); + AddExtension(extensions, "adobe.platform", product.Platform); + AddExtension(extensions, "adobe.affected.raw", product.AffectedVersion); + AddExtension(extensions, "adobe.updated.raw", product.UpdatedVersion); + AddExtension(extensions, "adobe.priority", product.Priority); + AddExtension(extensions, "adobe.availability", product.Availability); + + var lastAffected = ExtractVersionNumber(product.AffectedVersion); + var fixedVersion = ExtractVersionNumber(product.UpdatedVersion); + + var primitives = BuildRangePrimitives(lastAffected, fixedVersion, extensions); + + return new AffectedVersionRange( + rangeKind: "vendor", + introducedVersion: null, + fixedVersion: fixedVersion, + lastAffectedVersion: lastAffected, + rangeExpression: product.AffectedVersion ?? product.UpdatedVersion, + provenance: provenance, + primitives: primitives); + } + + private static RangePrimitives? BuildRangePrimitives(string? lastAffected, string? fixedVersion, Dictionary extensions) + { + var semVer = BuildSemVerPrimitive(lastAffected, fixedVersion); + + if (semVer is null && extensions.Count == 0) + { + return null; + } + + return new RangePrimitives(semVer, null, null, extensions.Count == 0 ? null : extensions); + } + + private static SemVerPrimitive? BuildSemVerPrimitive(string? lastAffected, string? fixedVersion) + { + var fixedNormalized = NormalizeSemVer(fixedVersion); + var lastNormalized = NormalizeSemVer(lastAffected); + + if (fixedNormalized is null && lastNormalized is null) + { + return null; + } + + return new SemVerPrimitive( + Introduced: null, + IntroducedInclusive: true, + Fixed: fixedNormalized, + FixedInclusive: false, + LastAffected: lastNormalized, + LastAffectedInclusive: true, + ConstraintExpression: null); + } + + private static string? NormalizeSemVer(string? value) + { + if (string.IsNullOrWhiteSpace(value)) + { + return null; + } + + var trimmed = value.Trim(); + if (PackageCoordinateHelper.TryParseSemVer(trimmed, out _, out var normalized) && !string.IsNullOrWhiteSpace(normalized)) + { + return normalized; + } + + if (Version.TryParse(trimmed, out var parsed)) + { + if (parsed.Build >= 0 && parsed.Revision >= 0) + { + return $"{parsed.Major}.{parsed.Minor}.{parsed.Build}.{parsed.Revision}"; + } + + if (parsed.Build >= 0) + { + return $"{parsed.Major}.{parsed.Minor}.{parsed.Build}"; + } + + return $"{parsed.Major}.{parsed.Minor}"; + } + + return null; + } + + private static string? ExtractVersionNumber(string? text) + { + if (string.IsNullOrWhiteSpace(text)) + { + return null; + } + + var match = VersionPattern.Match(text); + return match.Success ? match.Value : null; + } + + private static void AddExtension(IDictionary extensions, string key, string? value) + { + if (!string.IsNullOrWhiteSpace(value)) + { + extensions[key] = value.Trim(); + } + } + + private static readonly Regex VersionPattern = new("\\d+(?:\\.\\d+)+", RegexOptions.Compiled); + + public string SourceName => VndrAdobeConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var backfillStart = now - _options.InitialBackfill; + var windowStart = cursor.LastPublished.HasValue + ? cursor.LastPublished.Value - _options.WindowOverlap + : backfillStart; + if (windowStart < backfillStart) + { + windowStart = backfillStart; + } + + var maxPublished = cursor.LastPublished; + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + var fetchCache = cursor.FetchCache is null + ? new Dictionary(StringComparer.Ordinal) + : new Dictionary(cursor.FetchCache, StringComparer.Ordinal); + var touchedResources = new HashSet(StringComparer.Ordinal); + + var collectedEntries = new Dictionary(StringComparer.OrdinalIgnoreCase); + + foreach (var indexUri in EnumerateIndexUris()) + { + _diagnostics.FetchAttempt(); + string? html = null; + try + { + var client = _httpClientFactory.CreateClient(AdobeOptions.HttpClientName); + using var response = await client.GetAsync(indexUri, cancellationToken).ConfigureAwait(false); + response.EnsureSuccessStatusCode(); + html = await response.Content.ReadAsStringAsync(cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "Failed to download Adobe index page {Uri}", indexUri); + continue; + } + + if (string.IsNullOrEmpty(html)) + { + continue; + } + + IReadOnlyCollection entries; + try + { + entries = AdobeIndexParser.Parse(html, indexUri); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to parse Adobe index page {Uri}", indexUri); + _diagnostics.FetchFailure(); + continue; + } + + foreach (var entry in entries) + { + if (entry.PublishedUtc < windowStart) + { + continue; + } + + if (!collectedEntries.TryGetValue(entry.AdvisoryId, out var existing) || entry.PublishedUtc > existing.PublishedUtc) + { + collectedEntries[entry.AdvisoryId] = entry; + } + } + } + + foreach (var entry in collectedEntries.Values.OrderBy(static e => e.PublishedUtc)) + { + if (!maxPublished.HasValue || entry.PublishedUtc > maxPublished) + { + maxPublished = entry.PublishedUtc; + } + + var cacheKey = entry.DetailUri.ToString(); + touchedResources.Add(cacheKey); + + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["advisoryId"] = entry.AdvisoryId, + ["published"] = entry.PublishedUtc.ToString("O"), + ["title"] = entry.Title ?? string.Empty, + }; + + try + { + var result = await _fetchService.FetchAsync( + new SourceFetchRequest(AdobeOptions.HttpClientName, SourceName, entry.DetailUri) + { + Metadata = metadata, + AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, + }, + cancellationToken).ConfigureAwait(false); + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + if (cursor.TryGetFetchCache(cacheKey, out var cached) + && string.Equals(cached.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase)) + { + _diagnostics.FetchUnchanged(); + fetchCache[cacheKey] = new AdobeFetchCacheEntry(result.Document.Sha256); + await _documentStore.UpdateStatusAsync(result.Document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + continue; + } + + _diagnostics.FetchDocument(); + fetchCache[cacheKey] = new AdobeFetchCacheEntry(result.Document.Sha256); + + if (!pendingDocuments.Contains(result.Document.Id)) + { + pendingDocuments.Add(result.Document.Id); + } + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "Failed to fetch Adobe advisory {AdvisoryId} ({Uri})", entry.AdvisoryId, entry.DetailUri); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + foreach (var key in fetchCache.Keys.ToList()) + { + if (!touchedResources.Contains(key)) + { + fetchCache.Remove(key); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastPublished(maxPublished) + .WithFetchCache(fetchCache); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Adobe document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + AdobeDocumentMetadata metadata; + try + { + metadata = AdobeDocumentMetadata.FromDocument(document); + } + catch (Exception ex) + { + _logger.LogError(ex, "Adobe metadata parse failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + AdobeBulletinDto dto; + try + { + var bytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + var html = Encoding.UTF8.GetString(bytes); + dto = AdobeDetailParser.Parse(html, metadata); + } + catch (Exception ex) + { + _logger.LogError(ex, "Adobe parse failed for advisory {AdvisoryId} ({Uri})", metadata.AdvisoryId, document.Uri); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var json = JsonSerializer.Serialize(dto, SerializerOptions); + using var jsonDocument = JsonDocument.Parse(json); + _schemaValidator.Validate(jsonDocument, Schema, metadata.AdvisoryId); + + var payload = MongoDB.Bson.BsonDocument.Parse(json); + var dtoRecord = new DtoRecord( + Guid.NewGuid(), + document.Id, + SourceName, + "adobe.bulletin.v1", + payload, + _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + var now = _timeProvider.GetUtcNow(); + + foreach (var documentId in cursor.PendingMappings) + { + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + AdobeBulletinDto? dto; + try + { + var json = dtoRecord.Payload.ToJson(new MongoDB.Bson.IO.JsonWriterSettings + { + OutputMode = MongoDB.Bson.IO.JsonOutputMode.RelaxedExtendedJson, + }); + + dto = JsonSerializer.Deserialize(json, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "Adobe DTO deserialization failed for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (dto is null) + { + _logger.LogWarning("Adobe DTO payload deserialized as null for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var advisory = BuildAdvisory(dto, now); + if (!string.IsNullOrWhiteSpace(advisory.AdvisoryKey)) + { + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + + var flag = new PsirtFlagRecord( + advisory.AdvisoryKey, + "Adobe", + SourceName, + dto.AdvisoryId, + now); + + await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); + } + else + { + _logger.LogWarning("Skipping PSIRT flag for advisory with missing key (document {DocumentId})", documentId); + } + + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private IEnumerable EnumerateIndexUris() + { + yield return _options.IndexUri; + foreach (var uri in _options.AdditionalIndexUris) + { + yield return uri; + } + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return AdobeCursor.FromBsonDocument(record?.Cursor); + } + + private async Task UpdateCursorAsync(AdobeCursor cursor, CancellationToken cancellationToken) + { + var updatedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), updatedAt, cancellationToken).ConfigureAwait(false); + } + + private Advisory BuildAdvisory(AdobeBulletinDto dto, DateTimeOffset recordedAt) + { + var provenance = new AdvisoryProvenance(SourceName, "parser", dto.AdvisoryId, recordedAt); + + var aliasSet = new HashSet(StringComparer.OrdinalIgnoreCase) + { + dto.AdvisoryId, + }; + foreach (var cve in dto.Cves) + { + if (!string.IsNullOrWhiteSpace(cve)) + { + aliasSet.Add(cve); + } + } + + var comparer = StringComparer.OrdinalIgnoreCase; + var references = new List<(AdvisoryReference Reference, int Priority)> + { + (new AdvisoryReference(dto.DetailUrl, "advisory", "adobe-psirt", dto.Summary, provenance), 0), + }; + + foreach (var cve in dto.Cves) + { + if (string.IsNullOrWhiteSpace(cve)) + { + continue; + } + + var url = $"https://www.cve.org/CVERecord?id={cve}"; + references.Add((new AdvisoryReference(url, "advisory", cve, null, provenance), 1)); + } + + var orderedReferences = references + .GroupBy(tuple => tuple.Reference.Url, comparer) + .Select(group => group + .OrderBy(t => t.Priority) + .ThenBy(t => t.Reference.Kind ?? string.Empty, comparer) + .ThenBy(t => t.Reference.Url, comparer) + .First()) + .OrderBy(t => t.Priority) + .ThenBy(t => t.Reference.Kind ?? string.Empty, comparer) + .ThenBy(t => t.Reference.Url, comparer) + .Select(t => t.Reference) + .ToArray(); + + var affected = dto.Products + .Select(product => BuildPackage(product, recordedAt)) + .ToArray(); + + var aliases = aliasSet + .Where(static alias => !string.IsNullOrWhiteSpace(alias)) + .Select(static alias => alias.Trim()) + .Distinct(StringComparer.OrdinalIgnoreCase) + .OrderBy(static alias => alias, StringComparer.Ordinal) + .ToArray(); + + return new Advisory( + dto.AdvisoryId, + dto.Title, + dto.Summary, + language: "en", + published: dto.Published, + modified: null, + severity: null, + exploitKnown: false, + aliases, + orderedReferences, + affected, + Array.Empty(), + new[] { provenance }); + } + + private AffectedPackage BuildPackage(AdobeProductEntry product, DateTimeOffset recordedAt) + { + var identifier = string.IsNullOrWhiteSpace(product.Product) + ? "Adobe Product" + : product.Product.Trim(); + + var platform = string.IsNullOrWhiteSpace(product.Platform) ? null : product.Platform; + + var provenance = new AdvisoryProvenance( + SourceName, + "affected", + string.IsNullOrWhiteSpace(platform) ? identifier : $"{identifier}:{platform}", + recordedAt); + + var range = BuildVersionRange(product, recordedAt); var ranges = range is null ? Array.Empty() : new[] { range }; var normalizedVersions = BuildNormalizedVersions(product, ranges); var statuses = BuildStatuses(product, provenance); diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Apple/AppleConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Apple/AppleConnector.cs index d8ff245fc..4978190d3 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Apple/AppleConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Apple/AppleConnector.cs @@ -1,439 +1,439 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Net; -using System.Text.Json; -using System.Text.Json.Serialization; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Vndr.Apple.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.PsirtFlags; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Vndr.Apple; - -public sealed class AppleConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, - PropertyNameCaseInsensitive = true, - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly IPsirtFlagStore _psirtFlagStore; - private readonly ISourceStateRepository _stateRepository; - private readonly AppleOptions _options; - private readonly AppleDiagnostics _diagnostics; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public AppleConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - IPsirtFlagStore psirtFlagStore, - ISourceStateRepository stateRepository, - AppleDiagnostics diagnostics, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => VndrAppleConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var processedIds = cursor.ProcessedIds.ToHashSet(StringComparer.OrdinalIgnoreCase); - var maxPosted = cursor.LastPosted ?? DateTimeOffset.MinValue; - var baseline = cursor.LastPosted?.Add(-_options.ModifiedTolerance) ?? _timeProvider.GetUtcNow().Add(-_options.InitialBackfill); - - SourceFetchContentResult indexResult; - try - { - var request = new SourceFetchRequest(AppleOptions.HttpClientName, SourceName, _options.SoftwareLookupUri!) - { - AcceptHeaders = new[] { "application/json", "application/vnd.apple.security+json;q=0.9" }, - }; - - indexResult = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "Apple software index fetch failed from {Uri}", _options.SoftwareLookupUri); - await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (!indexResult.IsSuccess || indexResult.Content is null) - { - if (indexResult.IsNotModified) - { - _diagnostics.FetchUnchanged(); - } - - await UpdateCursorAsync(cursor, cancellationToken).ConfigureAwait(false); - return; - } - - var indexEntries = AppleIndexParser.Parse(indexResult.Content, _options.AdvisoryBaseUri!); - if (indexEntries.Count == 0) - { - await UpdateCursorAsync(cursor, cancellationToken).ConfigureAwait(false); - return; - } - - var allowlist = _options.AdvisoryAllowlist; - var blocklist = _options.AdvisoryBlocklist; - - var ordered = indexEntries - .Where(entry => ShouldInclude(entry, allowlist, blocklist)) - .OrderBy(entry => entry.PostingDate) - .ThenBy(entry => entry.ArticleId, StringComparer.OrdinalIgnoreCase) - .ToArray(); - - foreach (var entry in ordered) - { - cancellationToken.ThrowIfCancellationRequested(); - - if (entry.PostingDate < baseline) - { - continue; - } - - if (cursor.LastPosted.HasValue - && entry.PostingDate <= cursor.LastPosted.Value - && processedIds.Contains(entry.UpdateId)) - { - continue; - } - - var metadata = BuildMetadata(entry); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, entry.DetailUri.ToString(), cancellationToken).ConfigureAwait(false); - - SourceFetchResult result; - try - { - result = await _fetchService.FetchAsync( - new SourceFetchRequest(AppleOptions.HttpClientName, SourceName, entry.DetailUri) - { - Metadata = metadata, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = new[] - { - "text/html", - "application/xhtml+xml", - "text/plain;q=0.5" - }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "Apple advisory fetch failed for {Uri}", entry.DetailUri); - await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (result.StatusCode == HttpStatusCode.NotModified) - { - _diagnostics.FetchUnchanged(); - } - - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - _diagnostics.FetchItem(); - - pendingDocuments.Add(result.Document.Id); - processedIds.Add(entry.UpdateId); - - if (entry.PostingDate > maxPosted) - { - maxPosted = entry.PostingDate; - } - } - - var updated = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithLastPosted(maxPosted == DateTimeOffset.MinValue ? cursor.LastPosted ?? DateTimeOffset.MinValue : maxPosted, processedIds); - - await UpdateCursorAsync(updated, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remainingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _diagnostics.ParseFailure(); - _logger.LogWarning("Apple document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - AppleDetailDto dto; - try - { - var content = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - var html = System.Text.Encoding.UTF8.GetString(content); - var entry = RehydrateIndexEntry(document); - dto = AppleDetailParser.Parse(html, entry); - } - catch (Exception ex) - { - _diagnostics.ParseFailure(); - _logger.LogError(ex, "Apple parse failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remainingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var json = JsonSerializer.Serialize(dto, SerializerOptions); - var payload = BsonDocument.Parse(json); - var validatedAt = _timeProvider.GetUtcNow(); - - var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); - var dtoRecord = existingDto is null - ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "apple.security.update.v1", payload, validatedAt) - : existingDto with - { - Payload = payload, - SchemaVersion = "apple.security.update.v1", - ValidatedAt = validatedAt, - }; - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remainingDocuments.Remove(documentId); - pendingMappings.Add(document.Id); - } - - var updatedCursor = cursor - .WithPendingDocuments(remainingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToHashSet(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); - if (dtoRecord is null) - { - pendingMappings.Remove(documentId); - continue; - } - - AppleDetailDto dto; - try - { - dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions) - ?? throw new InvalidOperationException("Unable to deserialize Apple DTO."); - } - catch (Exception ex) - { - _logger.LogError(ex, "Apple DTO deserialization failed for document {DocumentId}", document.Id); - pendingMappings.Remove(documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - continue; - } - - var (advisory, flag) = AppleMapper.Map(dto, document, dtoRecord); - _diagnostics.MapAffectedCount(advisory.AffectedPackages.Length); - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - if (flag is not null) - { - await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); - } - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private AppleIndexEntry RehydrateIndexEntry(DocumentRecord document) - { - var metadata = document.Metadata ?? new Dictionary(StringComparer.Ordinal); - metadata.TryGetValue("apple.articleId", out var articleId); - metadata.TryGetValue("apple.updateId", out var updateId); - metadata.TryGetValue("apple.title", out var title); - metadata.TryGetValue("apple.postingDate", out var postingDateRaw); - metadata.TryGetValue("apple.detailUri", out var detailUriRaw); - metadata.TryGetValue("apple.rapidResponse", out var rapidRaw); - metadata.TryGetValue("apple.products", out var productsJson); - - if (!DateTimeOffset.TryParse(postingDateRaw, out var postingDate)) - { - postingDate = document.FetchedAt; - } - - var detailUri = !string.IsNullOrWhiteSpace(detailUriRaw) && Uri.TryCreate(detailUriRaw, UriKind.Absolute, out var parsedUri) - ? parsedUri - : new Uri(_options.AdvisoryBaseUri!, articleId ?? document.Uri); - - var rapid = string.Equals(rapidRaw, "true", StringComparison.OrdinalIgnoreCase); - var products = DeserializeProducts(productsJson); - - return new AppleIndexEntry( - UpdateId: string.IsNullOrWhiteSpace(updateId) ? articleId ?? document.Uri : updateId, - ArticleId: articleId ?? document.Uri, - Title: title ?? document.Metadata?["apple.originalTitle"] ?? "Apple Security Update", - PostingDate: postingDate.ToUniversalTime(), - DetailUri: detailUri, - Products: products, - IsRapidSecurityResponse: rapid); - } - - private static IReadOnlyList DeserializeProducts(string? json) - { - if (string.IsNullOrWhiteSpace(json)) - { - return Array.Empty(); - } - - try - { - var products = JsonSerializer.Deserialize>(json, SerializerOptions); - return products is { Count: > 0 } ? products : Array.Empty(); - } - catch (JsonException) - { - return Array.Empty(); - } - } - - private static Dictionary BuildMetadata(AppleIndexEntry entry) - { - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["apple.articleId"] = entry.ArticleId, - ["apple.updateId"] = entry.UpdateId, - ["apple.title"] = entry.Title, - ["apple.postingDate"] = entry.PostingDate.ToString("O"), - ["apple.detailUri"] = entry.DetailUri.ToString(), - ["apple.rapidResponse"] = entry.IsRapidSecurityResponse ? "true" : "false", - ["apple.products"] = JsonSerializer.Serialize(entry.Products, SerializerOptions), - }; - - return metadata; - } - - private static bool ShouldInclude(AppleIndexEntry entry, IReadOnlyCollection allowlist, IReadOnlyCollection blocklist) - { - if (allowlist.Count > 0 && !allowlist.Contains(entry.ArticleId)) - { - return false; - } - - if (blocklist.Count > 0 && blocklist.Contains(entry.ArticleId)) - { - return false; - } - - return true; - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? AppleCursor.Empty : AppleCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(AppleCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBson(); - await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } -} +using System; +using System.Collections.Generic; +using System.Linq; +using System.Net; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Vndr.Apple.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.PsirtFlags; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Vndr.Apple; + +public sealed class AppleConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + PropertyNameCaseInsensitive = true, + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly IPsirtFlagStore _psirtFlagStore; + private readonly ISourceStateRepository _stateRepository; + private readonly AppleOptions _options; + private readonly AppleDiagnostics _diagnostics; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public AppleConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + IPsirtFlagStore psirtFlagStore, + ISourceStateRepository stateRepository, + AppleDiagnostics diagnostics, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => VndrAppleConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var processedIds = cursor.ProcessedIds.ToHashSet(StringComparer.OrdinalIgnoreCase); + var maxPosted = cursor.LastPosted ?? DateTimeOffset.MinValue; + var baseline = cursor.LastPosted?.Add(-_options.ModifiedTolerance) ?? _timeProvider.GetUtcNow().Add(-_options.InitialBackfill); + + SourceFetchContentResult indexResult; + try + { + var request = new SourceFetchRequest(AppleOptions.HttpClientName, SourceName, _options.SoftwareLookupUri!) + { + AcceptHeaders = new[] { "application/json", "application/vnd.apple.security+json;q=0.9" }, + }; + + indexResult = await _fetchService.FetchContentAsync(request, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "Apple software index fetch failed from {Uri}", _options.SoftwareLookupUri); + await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (!indexResult.IsSuccess || indexResult.Content is null) + { + if (indexResult.IsNotModified) + { + _diagnostics.FetchUnchanged(); + } + + await UpdateCursorAsync(cursor, cancellationToken).ConfigureAwait(false); + return; + } + + var indexEntries = AppleIndexParser.Parse(indexResult.Content, _options.AdvisoryBaseUri!); + if (indexEntries.Count == 0) + { + await UpdateCursorAsync(cursor, cancellationToken).ConfigureAwait(false); + return; + } + + var allowlist = _options.AdvisoryAllowlist; + var blocklist = _options.AdvisoryBlocklist; + + var ordered = indexEntries + .Where(entry => ShouldInclude(entry, allowlist, blocklist)) + .OrderBy(entry => entry.PostingDate) + .ThenBy(entry => entry.ArticleId, StringComparer.OrdinalIgnoreCase) + .ToArray(); + + foreach (var entry in ordered) + { + cancellationToken.ThrowIfCancellationRequested(); + + if (entry.PostingDate < baseline) + { + continue; + } + + if (cursor.LastPosted.HasValue + && entry.PostingDate <= cursor.LastPosted.Value + && processedIds.Contains(entry.UpdateId)) + { + continue; + } + + var metadata = BuildMetadata(entry); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, entry.DetailUri.ToString(), cancellationToken).ConfigureAwait(false); + + SourceFetchResult result; + try + { + result = await _fetchService.FetchAsync( + new SourceFetchRequest(AppleOptions.HttpClientName, SourceName, entry.DetailUri) + { + Metadata = metadata, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = new[] + { + "text/html", + "application/xhtml+xml", + "text/plain;q=0.5" + }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "Apple advisory fetch failed for {Uri}", entry.DetailUri); + await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (result.StatusCode == HttpStatusCode.NotModified) + { + _diagnostics.FetchUnchanged(); + } + + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + _diagnostics.FetchItem(); + + pendingDocuments.Add(result.Document.Id); + processedIds.Add(entry.UpdateId); + + if (entry.PostingDate > maxPosted) + { + maxPosted = entry.PostingDate; + } + } + + var updated = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithLastPosted(maxPosted == DateTimeOffset.MinValue ? cursor.LastPosted ?? DateTimeOffset.MinValue : maxPosted, processedIds); + + await UpdateCursorAsync(updated, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remainingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _diagnostics.ParseFailure(); + _logger.LogWarning("Apple document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + AppleDetailDto dto; + try + { + var content = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + var html = System.Text.Encoding.UTF8.GetString(content); + var entry = RehydrateIndexEntry(document); + dto = AppleDetailParser.Parse(html, entry); + } + catch (Exception ex) + { + _diagnostics.ParseFailure(); + _logger.LogError(ex, "Apple parse failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remainingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var json = JsonSerializer.Serialize(dto, SerializerOptions); + var payload = BsonDocument.Parse(json); + var validatedAt = _timeProvider.GetUtcNow(); + + var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); + var dtoRecord = existingDto is null + ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "apple.security.update.v1", payload, validatedAt) + : existingDto with + { + Payload = payload, + SchemaVersion = "apple.security.update.v1", + ValidatedAt = validatedAt, + }; + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remainingDocuments.Remove(documentId); + pendingMappings.Add(document.Id); + } + + var updatedCursor = cursor + .WithPendingDocuments(remainingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToHashSet(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); + if (dtoRecord is null) + { + pendingMappings.Remove(documentId); + continue; + } + + AppleDetailDto dto; + try + { + dto = JsonSerializer.Deserialize(dtoRecord.Payload.ToJson(), SerializerOptions) + ?? throw new InvalidOperationException("Unable to deserialize Apple DTO."); + } + catch (Exception ex) + { + _logger.LogError(ex, "Apple DTO deserialization failed for document {DocumentId}", document.Id); + pendingMappings.Remove(documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + continue; + } + + var (advisory, flag) = AppleMapper.Map(dto, document, dtoRecord); + _diagnostics.MapAffectedCount(advisory.AffectedPackages.Length); + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + if (flag is not null) + { + await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); + } + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private AppleIndexEntry RehydrateIndexEntry(DocumentRecord document) + { + var metadata = document.Metadata ?? new Dictionary(StringComparer.Ordinal); + metadata.TryGetValue("apple.articleId", out var articleId); + metadata.TryGetValue("apple.updateId", out var updateId); + metadata.TryGetValue("apple.title", out var title); + metadata.TryGetValue("apple.postingDate", out var postingDateRaw); + metadata.TryGetValue("apple.detailUri", out var detailUriRaw); + metadata.TryGetValue("apple.rapidResponse", out var rapidRaw); + metadata.TryGetValue("apple.products", out var productsJson); + + if (!DateTimeOffset.TryParse(postingDateRaw, out var postingDate)) + { + postingDate = document.FetchedAt; + } + + var detailUri = !string.IsNullOrWhiteSpace(detailUriRaw) && Uri.TryCreate(detailUriRaw, UriKind.Absolute, out var parsedUri) + ? parsedUri + : new Uri(_options.AdvisoryBaseUri!, articleId ?? document.Uri); + + var rapid = string.Equals(rapidRaw, "true", StringComparison.OrdinalIgnoreCase); + var products = DeserializeProducts(productsJson); + + return new AppleIndexEntry( + UpdateId: string.IsNullOrWhiteSpace(updateId) ? articleId ?? document.Uri : updateId, + ArticleId: articleId ?? document.Uri, + Title: title ?? document.Metadata?["apple.originalTitle"] ?? "Apple Security Update", + PostingDate: postingDate.ToUniversalTime(), + DetailUri: detailUri, + Products: products, + IsRapidSecurityResponse: rapid); + } + + private static IReadOnlyList DeserializeProducts(string? json) + { + if (string.IsNullOrWhiteSpace(json)) + { + return Array.Empty(); + } + + try + { + var products = JsonSerializer.Deserialize>(json, SerializerOptions); + return products is { Count: > 0 } ? products : Array.Empty(); + } + catch (JsonException) + { + return Array.Empty(); + } + } + + private static Dictionary BuildMetadata(AppleIndexEntry entry) + { + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["apple.articleId"] = entry.ArticleId, + ["apple.updateId"] = entry.UpdateId, + ["apple.title"] = entry.Title, + ["apple.postingDate"] = entry.PostingDate.ToString("O"), + ["apple.detailUri"] = entry.DetailUri.ToString(), + ["apple.rapidResponse"] = entry.IsRapidSecurityResponse ? "true" : "false", + ["apple.products"] = JsonSerializer.Serialize(entry.Products, SerializerOptions), + }; + + return metadata; + } + + private static bool ShouldInclude(AppleIndexEntry entry, IReadOnlyCollection allowlist, IReadOnlyCollection blocklist) + { + if (allowlist.Count > 0 && !allowlist.Contains(entry.ArticleId)) + { + return false; + } + + if (blocklist.Count > 0 && blocklist.Contains(entry.ArticleId)) + { + return false; + } + + return true; + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? AppleCursor.Empty : AppleCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(AppleCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBson(); + await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Chromium/ChromiumConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Chromium/ChromiumConnector.cs index 8a4bed41f..95910a776 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Chromium/ChromiumConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Chromium/ChromiumConnector.cs @@ -1,366 +1,366 @@ -using System.Collections.Generic; -using System.Linq; -using System.Text; -using System.Text.Json; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Json; -using StellaOps.Concelier.Connector.Vndr.Chromium.Configuration; -using StellaOps.Concelier.Connector.Vndr.Chromium.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.PsirtFlags; -using StellaOps.Plugin; -using Json.Schema; - -namespace StellaOps.Concelier.Connector.Vndr.Chromium; - -public sealed class ChromiumConnector : IFeedConnector -{ - private static readonly JsonSchema Schema = ChromiumSchemaProvider.Schema; - private static readonly JsonSerializerOptions SerializerOptions = new() - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly ChromiumFeedLoader _feedLoader; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly IPsirtFlagStore _psirtFlagStore; - private readonly ISourceStateRepository _stateRepository; - private readonly IJsonSchemaValidator _schemaValidator; - private readonly ChromiumOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ChromiumDiagnostics _diagnostics; - private readonly ILogger _logger; - - public ChromiumConnector( - ChromiumFeedLoader feedLoader, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - IPsirtFlagStore psirtFlagStore, - ISourceStateRepository stateRepository, - IJsonSchemaValidator schemaValidator, - IOptions options, - TimeProvider? timeProvider, - ChromiumDiagnostics diagnostics, - ILogger logger) - { - _feedLoader = feedLoader ?? throw new ArgumentNullException(nameof(feedLoader)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); - _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => VndrChromiumConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var (windowStart, windowEnd) = CalculateWindow(cursor, now); - ProvenanceDiagnostics.ReportResumeWindow(SourceName, windowStart, _logger); - - IReadOnlyList feedEntries; - _diagnostics.FetchAttempt(); - try - { - feedEntries = await _feedLoader.LoadAsync(windowStart, windowEnd, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Chromium feed load failed {Start}-{End}", windowStart, windowEnd); - _diagnostics.FetchFailure(); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.Ordinal); - var touchedResources = new HashSet(StringComparer.Ordinal); - - var candidates = feedEntries - .Where(static entry => entry.IsSecurityUpdate()) - .OrderBy(static entry => entry.Published) - .ToArray(); - - if (candidates.Length == 0) - { - var untouched = cursor - .WithLastPublished(cursor.LastPublished ?? windowEnd) - .WithFetchCache(fetchCache); - await UpdateCursorAsync(untouched, cancellationToken).ConfigureAwait(false); - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var maxPublished = cursor.LastPublished; - - foreach (var entry in candidates) - { - try - { - var cacheKey = entry.DetailUri.ToString(); - touchedResources.Add(cacheKey); - - var metadata = ChromiumDocumentMetadata.CreateMetadata(entry.PostId, entry.Title, entry.Published, entry.Updated, entry.Summary); - var request = new SourceFetchRequest(ChromiumOptions.HttpClientName, SourceName, entry.DetailUri) - { - Metadata = metadata, - AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, - }; - - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - if (cursor.TryGetFetchCache(cacheKey, out var cached) && string.Equals(cached.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase)) - { - _diagnostics.FetchUnchanged(); - fetchCache[cacheKey] = new ChromiumFetchCacheEntry(result.Document.Sha256); - await _documentStore.UpdateStatusAsync(result.Document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - if (!maxPublished.HasValue || entry.Published > maxPublished) - { - maxPublished = entry.Published; - } - - continue; - } - - _diagnostics.FetchDocument(); - if (!pendingDocuments.Contains(result.Document.Id)) - { - pendingDocuments.Add(result.Document.Id); - } - - if (!maxPublished.HasValue || entry.Published > maxPublished) - { - maxPublished = entry.Published; - } - - fetchCache[cacheKey] = new ChromiumFetchCacheEntry(result.Document.Sha256); - } - catch (Exception ex) - { - _logger.LogError(ex, "Chromium fetch failed for {Uri}", entry.DetailUri); - _diagnostics.FetchFailure(); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - if (touchedResources.Count > 0) - { - var keysToRemove = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); - foreach (var key in keysToRemove) - { - fetchCache.Remove(key); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(cursor.PendingMappings) - .WithLastPublished(maxPublished ?? cursor.LastPublished ?? windowEnd) - .WithFetchCache(fetchCache); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Chromium document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - ChromiumDto dto; - try - { - var metadata = ChromiumDocumentMetadata.FromDocument(document); - var content = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - var html = Encoding.UTF8.GetString(content); - dto = ChromiumParser.Parse(html, metadata); - } - catch (Exception ex) - { - _logger.LogError(ex, "Chromium parse failed for {Uri}", document.Uri); - _diagnostics.ParseFailure(); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var json = JsonSerializer.Serialize(dto, SerializerOptions); - using var jsonDocument = JsonDocument.Parse(json); - try - { - _schemaValidator.Validate(jsonDocument, Schema, dto.PostId); - } - catch (StellaOps.Concelier.Connector.Common.Json.JsonSchemaValidationException ex) - { - _logger.LogError(ex, "Chromium schema validation failed for {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - var payload = BsonDocument.Parse(json); - var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); - var validatedAt = _timeProvider.GetUtcNow(); - - var dtoRecord = existingDto is null - ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "chromium.post.v1", payload, validatedAt) - : existingDto with - { - Payload = payload, - SchemaVersion = "chromium.post.v1", - ValidatedAt = validatedAt, - }; - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - _diagnostics.ParseSuccess(); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var json = dtoRecord.Payload.ToJson(new JsonWriterSettings { OutputMode = JsonOutputMode.RelaxedExtendedJson }); - var dto = JsonSerializer.Deserialize(json, SerializerOptions); - if (dto is null) - { - _logger.LogWarning("Chromium DTO deserialization failed for {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var recordedAt = _timeProvider.GetUtcNow(); - var (advisory, flag) = ChromiumMapper.Map(dto, SourceName, recordedAt); - - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - _diagnostics.MapSuccess(); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return ChromiumCursor.FromBsonDocument(record?.Cursor); - } - - private async Task UpdateCursorAsync(ChromiumCursor cursor, CancellationToken cancellationToken) - { - var completedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); - } - - private (DateTimeOffset start, DateTimeOffset end) CalculateWindow(ChromiumCursor cursor, DateTimeOffset now) - { - var lastPublished = cursor.LastPublished ?? now - _options.InitialBackfill; - var start = lastPublished - _options.WindowOverlap; - var backfill = now - _options.InitialBackfill; - if (start < backfill) - { - start = backfill; - } - - var end = now; - if (end <= start) - { - end = start.AddHours(1); - } - - return (start, end); - } -} +using System.Collections.Generic; +using System.Linq; +using System.Text; +using System.Text.Json; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Json; +using StellaOps.Concelier.Connector.Vndr.Chromium.Configuration; +using StellaOps.Concelier.Connector.Vndr.Chromium.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.PsirtFlags; +using StellaOps.Plugin; +using Json.Schema; + +namespace StellaOps.Concelier.Connector.Vndr.Chromium; + +public sealed class ChromiumConnector : IFeedConnector +{ + private static readonly JsonSchema Schema = ChromiumSchemaProvider.Schema; + private static readonly JsonSerializerOptions SerializerOptions = new() + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly ChromiumFeedLoader _feedLoader; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly IPsirtFlagStore _psirtFlagStore; + private readonly ISourceStateRepository _stateRepository; + private readonly IJsonSchemaValidator _schemaValidator; + private readonly ChromiumOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ChromiumDiagnostics _diagnostics; + private readonly ILogger _logger; + + public ChromiumConnector( + ChromiumFeedLoader feedLoader, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + IPsirtFlagStore psirtFlagStore, + ISourceStateRepository stateRepository, + IJsonSchemaValidator schemaValidator, + IOptions options, + TimeProvider? timeProvider, + ChromiumDiagnostics diagnostics, + ILogger logger) + { + _feedLoader = feedLoader ?? throw new ArgumentNullException(nameof(feedLoader)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _schemaValidator = schemaValidator ?? throw new ArgumentNullException(nameof(schemaValidator)); + _options = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => VndrChromiumConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var (windowStart, windowEnd) = CalculateWindow(cursor, now); + ProvenanceDiagnostics.ReportResumeWindow(SourceName, windowStart, _logger); + + IReadOnlyList feedEntries; + _diagnostics.FetchAttempt(); + try + { + feedEntries = await _feedLoader.LoadAsync(windowStart, windowEnd, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Chromium feed load failed {Start}-{End}", windowStart, windowEnd); + _diagnostics.FetchFailure(); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.Ordinal); + var touchedResources = new HashSet(StringComparer.Ordinal); + + var candidates = feedEntries + .Where(static entry => entry.IsSecurityUpdate()) + .OrderBy(static entry => entry.Published) + .ToArray(); + + if (candidates.Length == 0) + { + var untouched = cursor + .WithLastPublished(cursor.LastPublished ?? windowEnd) + .WithFetchCache(fetchCache); + await UpdateCursorAsync(untouched, cancellationToken).ConfigureAwait(false); + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var maxPublished = cursor.LastPublished; + + foreach (var entry in candidates) + { + try + { + var cacheKey = entry.DetailUri.ToString(); + touchedResources.Add(cacheKey); + + var metadata = ChromiumDocumentMetadata.CreateMetadata(entry.PostId, entry.Title, entry.Published, entry.Updated, entry.Summary); + var request = new SourceFetchRequest(ChromiumOptions.HttpClientName, SourceName, entry.DetailUri) + { + Metadata = metadata, + AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, + }; + + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + if (cursor.TryGetFetchCache(cacheKey, out var cached) && string.Equals(cached.Sha256, result.Document.Sha256, StringComparison.OrdinalIgnoreCase)) + { + _diagnostics.FetchUnchanged(); + fetchCache[cacheKey] = new ChromiumFetchCacheEntry(result.Document.Sha256); + await _documentStore.UpdateStatusAsync(result.Document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + if (!maxPublished.HasValue || entry.Published > maxPublished) + { + maxPublished = entry.Published; + } + + continue; + } + + _diagnostics.FetchDocument(); + if (!pendingDocuments.Contains(result.Document.Id)) + { + pendingDocuments.Add(result.Document.Id); + } + + if (!maxPublished.HasValue || entry.Published > maxPublished) + { + maxPublished = entry.Published; + } + + fetchCache[cacheKey] = new ChromiumFetchCacheEntry(result.Document.Sha256); + } + catch (Exception ex) + { + _logger.LogError(ex, "Chromium fetch failed for {Uri}", entry.DetailUri); + _diagnostics.FetchFailure(); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + if (touchedResources.Count > 0) + { + var keysToRemove = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); + foreach (var key in keysToRemove) + { + fetchCache.Remove(key); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(cursor.PendingMappings) + .WithLastPublished(maxPublished ?? cursor.LastPublished ?? windowEnd) + .WithFetchCache(fetchCache); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Chromium document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + ChromiumDto dto; + try + { + var metadata = ChromiumDocumentMetadata.FromDocument(document); + var content = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + var html = Encoding.UTF8.GetString(content); + dto = ChromiumParser.Parse(html, metadata); + } + catch (Exception ex) + { + _logger.LogError(ex, "Chromium parse failed for {Uri}", document.Uri); + _diagnostics.ParseFailure(); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var json = JsonSerializer.Serialize(dto, SerializerOptions); + using var jsonDocument = JsonDocument.Parse(json); + try + { + _schemaValidator.Validate(jsonDocument, Schema, dto.PostId); + } + catch (StellaOps.Concelier.Connector.Common.Json.JsonSchemaValidationException ex) + { + _logger.LogError(ex, "Chromium schema validation failed for {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + var payload = BsonDocument.Parse(json); + var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); + var validatedAt = _timeProvider.GetUtcNow(); + + var dtoRecord = existingDto is null + ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "chromium.post.v1", payload, validatedAt) + : existingDto with + { + Payload = payload, + SchemaVersion = "chromium.post.v1", + ValidatedAt = validatedAt, + }; + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + _diagnostics.ParseSuccess(); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var json = dtoRecord.Payload.ToJson(new JsonWriterSettings { OutputMode = JsonOutputMode.RelaxedExtendedJson }); + var dto = JsonSerializer.Deserialize(json, SerializerOptions); + if (dto is null) + { + _logger.LogWarning("Chromium DTO deserialization failed for {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var recordedAt = _timeProvider.GetUtcNow(); + var (advisory, flag) = ChromiumMapper.Map(dto, SourceName, recordedAt); + + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + _diagnostics.MapSuccess(); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return ChromiumCursor.FromBsonDocument(record?.Cursor); + } + + private async Task UpdateCursorAsync(ChromiumCursor cursor, CancellationToken cancellationToken) + { + var completedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); + } + + private (DateTimeOffset start, DateTimeOffset end) CalculateWindow(ChromiumCursor cursor, DateTimeOffset now) + { + var lastPublished = cursor.LastPublished ?? now - _options.InitialBackfill; + var start = lastPublished - _options.WindowOverlap; + var backfill = now - _options.InitialBackfill; + if (start < backfill) + { + start = backfill; + } + + var end = now; + if (end <= start) + { + end = start.AddHours(1); + } + + return (start, end); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Cisco/CiscoConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Cisco/CiscoConnector.cs index 2ef7b35c4..cbed9f590 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Cisco/CiscoConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Cisco/CiscoConnector.cs @@ -163,7 +163,7 @@ public sealed class CiscoConnector : IFeedConnector BuildMetadata(advisory), Etag: null, LastModified: advisory.LastUpdated ?? advisory.FirstPublished ?? now, - GridFsId: gridFsId, + PayloadId: gridFsId, ExpiresAt: null); var upserted = await _documentStore.UpsertAsync(record, cancellationToken).ConfigureAwait(false); @@ -259,7 +259,7 @@ public sealed class CiscoConnector : IFeedConnector continue; } - if (!document.GridFsId.HasValue) + if (!document.PayloadId.HasValue) { _diagnostics.ParseFailure(); _logger.LogWarning("Cisco document {DocumentId} missing GridFS payload", documentId); @@ -273,7 +273,7 @@ public sealed class CiscoConnector : IFeedConnector byte[] payload; try { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); } catch (Exception ex) { diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Msrc/MsrcConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Msrc/MsrcConnector.cs index c4fd943d2..8d192c4b0 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Msrc/MsrcConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Msrc/MsrcConnector.cs @@ -133,7 +133,7 @@ public sealed class MsrcConnector : IFeedConnector } _diagnostics.DetailFetchAttempt(); - if (existing?.GridFsId is { } oldGridId) + if (existing?.PayloadId is { } oldGridId) { await _rawDocumentStorage.DeleteAsync(oldGridId, cancellationToken).ConfigureAwait(false); } @@ -238,7 +238,7 @@ public sealed class MsrcConnector : IFeedConnector continue; } - if (!document.GridFsId.HasValue) + if (!document.PayloadId.HasValue) { await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); remainingDocuments.Remove(documentId); @@ -250,7 +250,7 @@ public sealed class MsrcConnector : IFeedConnector byte[] payload; try { - payload = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); + payload = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); } catch (Exception ex) { diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Oracle/OracleConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Oracle/OracleConnector.cs index 912a67070..9a4b5acd3 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Oracle/OracleConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Oracle/OracleConnector.cs @@ -1,366 +1,366 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Text.Json; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Vndr.Oracle.Configuration; -using StellaOps.Concelier.Connector.Vndr.Oracle.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.PsirtFlags; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Vndr.Oracle; - -public sealed class OracleConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new() - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly IPsirtFlagStore _psirtFlagStore; - private readonly ISourceStateRepository _stateRepository; - private readonly OracleCalendarFetcher _calendarFetcher; - private readonly OracleOptions _options; - private readonly TimeProvider _timeProvider; - private readonly ILogger _logger; - - public OracleConnector( - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - IPsirtFlagStore psirtFlagStore, - ISourceStateRepository stateRepository, - OracleCalendarFetcher calendarFetcher, - IOptions options, - TimeProvider? timeProvider, - ILogger logger) - { - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _calendarFetcher = calendarFetcher ?? throw new ArgumentNullException(nameof(calendarFetcher)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => VndrOracleConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); - var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); - var now = _timeProvider.GetUtcNow(); - - var advisoryUris = await ResolveAdvisoryUrisAsync(cancellationToken).ConfigureAwait(false); - - foreach (var uri in advisoryUris) - { - cancellationToken.ThrowIfCancellationRequested(); - - try - { - var cacheKey = uri.AbsoluteUri; - touchedResources.Add(cacheKey); - - var advisoryId = DeriveAdvisoryId(uri); - var title = advisoryId.Replace('-', ' '); - var published = now; - - var metadata = OracleDocumentMetadata.CreateMetadata(advisoryId, title, published); - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, uri.ToString(), cancellationToken).ConfigureAwait(false); - - var request = new SourceFetchRequest(OracleOptions.HttpClientName, SourceName, uri) - { - Metadata = metadata, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, - }; - - var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); - if (!result.IsSuccess || result.Document is null) - { - continue; - } - - var cacheEntry = OracleFetchCacheEntry.FromDocument(result.Document); - if (existing is not null - && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal) - && cursor.TryGetFetchCache(cacheKey, out var cached) - && cached.Matches(result.Document)) - { - _logger.LogDebug("Oracle advisory {AdvisoryId} unchanged; skipping parse/map", advisoryId); - await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(result.Document.Id); - pendingMappings.Remove(result.Document.Id); - fetchCache[cacheKey] = cacheEntry; - continue; - } - - fetchCache[cacheKey] = cacheEntry; - - if (!pendingDocuments.Contains(result.Document.Id)) - { - pendingDocuments.Add(result.Document.Id); - } - - if (_options.RequestDelay > TimeSpan.Zero) - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - } - catch (Exception ex) - { - _logger.LogError(ex, "Oracle fetch failed for {Uri}", uri); - await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - } - - if (fetchCache.Count > 0 && touchedResources.Count > 0) - { - var stale = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); - foreach (var key in stale) - { - fetchCache.Remove(key); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithFetchCache(fetchCache) - .WithLastProcessed(now); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var pendingDocuments = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("Oracle document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - OracleDto dto; - try - { - var metadata = OracleDocumentMetadata.FromDocument(document); - var content = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - var html = System.Text.Encoding.UTF8.GetString(content); - dto = OracleParser.Parse(html, metadata); - } - catch (Exception ex) - { - _logger.LogError(ex, "Oracle parse failed for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - if (!OracleDtoValidator.TryNormalize(dto, out var normalized, out var validationError)) - { - _logger.LogWarning("Oracle validation failed for document {DocumentId}: {Reason}", document.Id, validationError ?? "unknown"); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingDocuments.Remove(documentId); - pendingMappings.Remove(documentId); - continue; - } - - dto = normalized; - - var json = JsonSerializer.Serialize(dto, SerializerOptions); - var payload = BsonDocument.Parse(json); - var validatedAt = _timeProvider.GetUtcNow(); - - var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); - var dtoRecord = existingDto is null - ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "oracle.advisory.v1", payload, validatedAt) - : existingDto with - { - Payload = payload, - SchemaVersion = "oracle.advisory.v1", - ValidatedAt = validatedAt, - }; - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - pendingDocuments.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - - if (dtoRecord is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - OracleDto? dto; - try - { - var json = dtoRecord.Payload.ToJson(); - dto = JsonSerializer.Deserialize(json, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "Oracle DTO deserialization failed for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (dto is null) - { - _logger.LogWarning("Oracle DTO payload deserialized as null for document {DocumentId}", documentId); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var mappedAt = _timeProvider.GetUtcNow(); - var (advisory, flag) = OracleMapper.Map(dto, document, dtoRecord, SourceName, mappedAt); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return OracleCursor.FromBson(record?.Cursor); - } - - private async Task UpdateCursorAsync(OracleCursor cursor, CancellationToken cancellationToken) - { - var completedAt = _timeProvider.GetUtcNow(); - await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); - } - - private async Task> ResolveAdvisoryUrisAsync(CancellationToken cancellationToken) - { - var uris = new HashSet(StringComparer.OrdinalIgnoreCase); - - foreach (var uri in _options.AdvisoryUris) - { - if (uri is not null) - { - uris.Add(uri.AbsoluteUri); - } - } - - var calendarUris = await _calendarFetcher.GetAdvisoryUrisAsync(cancellationToken).ConfigureAwait(false); - foreach (var uri in calendarUris) - { - uris.Add(uri.AbsoluteUri); - } - - return uris - .Select(static value => new Uri(value, UriKind.Absolute)) - .OrderBy(static value => value.AbsoluteUri, StringComparer.OrdinalIgnoreCase) - .ToArray(); - } - - private static string DeriveAdvisoryId(Uri uri) - { - var segments = uri.Segments; - if (segments.Length == 0) - { - return uri.AbsoluteUri; - } - - var slug = segments[^1].Trim('/'); - if (string.IsNullOrWhiteSpace(slug)) - { - return uri.AbsoluteUri; - } - - return slug.Replace('.', '-'); - } -} +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Vndr.Oracle.Configuration; +using StellaOps.Concelier.Connector.Vndr.Oracle.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.PsirtFlags; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Vndr.Oracle; + +public sealed class OracleConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new() + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly IPsirtFlagStore _psirtFlagStore; + private readonly ISourceStateRepository _stateRepository; + private readonly OracleCalendarFetcher _calendarFetcher; + private readonly OracleOptions _options; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public OracleConnector( + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + IPsirtFlagStore psirtFlagStore, + ISourceStateRepository stateRepository, + OracleCalendarFetcher calendarFetcher, + IOptions options, + TimeProvider? timeProvider, + ILogger logger) + { + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _calendarFetcher = calendarFetcher ?? throw new ArgumentNullException(nameof(calendarFetcher)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => VndrOracleConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); + var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); + var now = _timeProvider.GetUtcNow(); + + var advisoryUris = await ResolveAdvisoryUrisAsync(cancellationToken).ConfigureAwait(false); + + foreach (var uri in advisoryUris) + { + cancellationToken.ThrowIfCancellationRequested(); + + try + { + var cacheKey = uri.AbsoluteUri; + touchedResources.Add(cacheKey); + + var advisoryId = DeriveAdvisoryId(uri); + var title = advisoryId.Replace('-', ' '); + var published = now; + + var metadata = OracleDocumentMetadata.CreateMetadata(advisoryId, title, published); + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, uri.ToString(), cancellationToken).ConfigureAwait(false); + + var request = new SourceFetchRequest(OracleOptions.HttpClientName, SourceName, uri) + { + Metadata = metadata, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = new[] { "text/html", "application/xhtml+xml", "text/plain;q=0.5" }, + }; + + var result = await _fetchService.FetchAsync(request, cancellationToken).ConfigureAwait(false); + if (!result.IsSuccess || result.Document is null) + { + continue; + } + + var cacheEntry = OracleFetchCacheEntry.FromDocument(result.Document); + if (existing is not null + && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal) + && cursor.TryGetFetchCache(cacheKey, out var cached) + && cached.Matches(result.Document)) + { + _logger.LogDebug("Oracle advisory {AdvisoryId} unchanged; skipping parse/map", advisoryId); + await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(result.Document.Id); + pendingMappings.Remove(result.Document.Id); + fetchCache[cacheKey] = cacheEntry; + continue; + } + + fetchCache[cacheKey] = cacheEntry; + + if (!pendingDocuments.Contains(result.Document.Id)) + { + pendingDocuments.Add(result.Document.Id); + } + + if (_options.RequestDelay > TimeSpan.Zero) + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Oracle fetch failed for {Uri}", uri); + await _stateRepository.MarkFailureAsync(SourceName, _timeProvider.GetUtcNow(), TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + } + + if (fetchCache.Count > 0 && touchedResources.Count > 0) + { + var stale = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); + foreach (var key in stale) + { + fetchCache.Remove(key); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithFetchCache(fetchCache) + .WithLastProcessed(now); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var pendingDocuments = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("Oracle document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + OracleDto dto; + try + { + var metadata = OracleDocumentMetadata.FromDocument(document); + var content = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + var html = System.Text.Encoding.UTF8.GetString(content); + dto = OracleParser.Parse(html, metadata); + } + catch (Exception ex) + { + _logger.LogError(ex, "Oracle parse failed for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + if (!OracleDtoValidator.TryNormalize(dto, out var normalized, out var validationError)) + { + _logger.LogWarning("Oracle validation failed for document {DocumentId}: {Reason}", document.Id, validationError ?? "unknown"); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingDocuments.Remove(documentId); + pendingMappings.Remove(documentId); + continue; + } + + dto = normalized; + + var json = JsonSerializer.Serialize(dto, SerializerOptions); + var payload = BsonDocument.Parse(json); + var validatedAt = _timeProvider.GetUtcNow(); + + var existingDto = await _dtoStore.FindByDocumentIdAsync(document.Id, cancellationToken).ConfigureAwait(false); + var dtoRecord = existingDto is null + ? new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "oracle.advisory.v1", payload, validatedAt) + : existingDto with + { + Payload = payload, + SchemaVersion = "oracle.advisory.v1", + ValidatedAt = validatedAt, + }; + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + pendingDocuments.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dtoRecord = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + + if (dtoRecord is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + OracleDto? dto; + try + { + var json = dtoRecord.Payload.ToJson(); + dto = JsonSerializer.Deserialize(json, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "Oracle DTO deserialization failed for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (dto is null) + { + _logger.LogWarning("Oracle DTO payload deserialized as null for document {DocumentId}", documentId); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var mappedAt = _timeProvider.GetUtcNow(); + var (advisory, flag) = OracleMapper.Map(dto, document, dtoRecord, SourceName, mappedAt); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var record = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return OracleCursor.FromBson(record?.Cursor); + } + + private async Task UpdateCursorAsync(OracleCursor cursor, CancellationToken cancellationToken) + { + var completedAt = _timeProvider.GetUtcNow(); + await _stateRepository.UpdateCursorAsync(SourceName, cursor.ToBsonDocument(), completedAt, cancellationToken).ConfigureAwait(false); + } + + private async Task> ResolveAdvisoryUrisAsync(CancellationToken cancellationToken) + { + var uris = new HashSet(StringComparer.OrdinalIgnoreCase); + + foreach (var uri in _options.AdvisoryUris) + { + if (uri is not null) + { + uris.Add(uri.AbsoluteUri); + } + } + + var calendarUris = await _calendarFetcher.GetAdvisoryUrisAsync(cancellationToken).ConfigureAwait(false); + foreach (var uri in calendarUris) + { + uris.Add(uri.AbsoluteUri); + } + + return uris + .Select(static value => new Uri(value, UriKind.Absolute)) + .OrderBy(static value => value.AbsoluteUri, StringComparer.OrdinalIgnoreCase) + .ToArray(); + } + + private static string DeriveAdvisoryId(Uri uri) + { + var segments = uri.Segments; + if (segments.Length == 0) + { + return uri.AbsoluteUri; + } + + var slug = segments[^1].Trim('/'); + if (string.IsNullOrWhiteSpace(slug)) + { + return uri.AbsoluteUri; + } + + return slug.Replace('.', '-'); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Vmware/VmwareConnector.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Vmware/VmwareConnector.cs index 1c2095d82..790414238 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Vmware/VmwareConnector.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Connector.Vndr.Vmware/VmwareConnector.cs @@ -1,454 +1,454 @@ -using System; -using System.Collections.Generic; -using System.Linq; -using System.Net.Http; -using System.Text.Json; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using MongoDB.Bson.IO; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Vndr.Vmware.Configuration; -using StellaOps.Concelier.Connector.Vndr.Vmware.Internal; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Storage.Mongo.PsirtFlags; -using StellaOps.Plugin; - -namespace StellaOps.Concelier.Connector.Vndr.Vmware; - -public sealed class VmwareConnector : IFeedConnector -{ - private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) - { - PropertyNameCaseInsensitive = true, - DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, - }; - - private readonly IHttpClientFactory _httpClientFactory; - private readonly SourceFetchService _fetchService; - private readonly RawDocumentStorage _rawDocumentStorage; - private readonly IDocumentStore _documentStore; - private readonly IDtoStore _dtoStore; - private readonly IAdvisoryStore _advisoryStore; - private readonly ISourceStateRepository _stateRepository; - private readonly IPsirtFlagStore _psirtFlagStore; - private readonly VmwareOptions _options; - private readonly TimeProvider _timeProvider; - private readonly VmwareDiagnostics _diagnostics; - private readonly ILogger _logger; - - public VmwareConnector( - IHttpClientFactory httpClientFactory, - SourceFetchService fetchService, - RawDocumentStorage rawDocumentStorage, - IDocumentStore documentStore, - IDtoStore dtoStore, - IAdvisoryStore advisoryStore, - ISourceStateRepository stateRepository, - IPsirtFlagStore psirtFlagStore, - IOptions options, - TimeProvider? timeProvider, - VmwareDiagnostics diagnostics, - ILogger logger) - { - _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); - _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); - _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); - _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); - _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); - _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); - _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); - _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); - _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); - _options.Validate(); - _timeProvider = timeProvider ?? TimeProvider.System; - _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); - _logger = logger ?? throw new ArgumentNullException(nameof(logger)); - } - - public string SourceName => VmwareConnectorPlugin.SourceName; - - public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - var now = _timeProvider.GetUtcNow(); - var pendingDocuments = cursor.PendingDocuments.ToHashSet(); - var pendingMappings = cursor.PendingMappings.ToHashSet(); - var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); - var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); - var remainingCapacity = _options.MaxAdvisoriesPerFetch; - - IReadOnlyList indexItems; - try - { - indexItems = await FetchIndexAsync(cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "Failed to retrieve VMware advisory index"); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (indexItems.Count == 0) - { - return; - } - - var orderedItems = indexItems - .Where(static item => !string.IsNullOrWhiteSpace(item.Id) && !string.IsNullOrWhiteSpace(item.DetailUrl)) - .OrderBy(static item => item.Modified ?? DateTimeOffset.MinValue) - .ThenBy(static item => item.Id, StringComparer.OrdinalIgnoreCase) - .ToArray(); - - var baseline = cursor.LastModified ?? now - _options.InitialBackfill; - var resumeStart = baseline - _options.ModifiedTolerance; - ProvenanceDiagnostics.ReportResumeWindow(SourceName, resumeStart, _logger); - var processedIds = new HashSet(cursor.ProcessedIds, StringComparer.OrdinalIgnoreCase); - var maxModified = cursor.LastModified ?? DateTimeOffset.MinValue; - var processedUpdated = false; - - foreach (var item in orderedItems) - { - if (remainingCapacity <= 0) - { - break; - } - - cancellationToken.ThrowIfCancellationRequested(); - - var modified = (item.Modified ?? DateTimeOffset.MinValue).ToUniversalTime(); - if (modified < baseline - _options.ModifiedTolerance) - { - continue; - } - - if (cursor.LastModified.HasValue && modified < cursor.LastModified.Value - _options.ModifiedTolerance) - { - continue; - } - - if (modified == cursor.LastModified && cursor.ProcessedIds.Contains(item.Id, StringComparer.OrdinalIgnoreCase)) - { - continue; - } - - if (!Uri.TryCreate(item.DetailUrl, UriKind.Absolute, out var detailUri)) - { - _logger.LogWarning("VMware advisory {AdvisoryId} has invalid detail URL {Url}", item.Id, item.DetailUrl); - continue; - } - - var cacheKey = detailUri.AbsoluteUri; - touchedResources.Add(cacheKey); - - var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); - var metadata = new Dictionary(StringComparer.Ordinal) - { - ["vmware.id"] = item.Id, - ["vmware.modified"] = modified.ToString("O"), - }; - - SourceFetchResult result; - try - { - result = await _fetchService.FetchAsync( - new SourceFetchRequest(VmwareOptions.HttpClientName, SourceName, detailUri) - { - Metadata = metadata, - ETag = existing?.Etag, - LastModified = existing?.LastModified, - AcceptHeaders = new[] { "application/json" }, - }, - cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _diagnostics.FetchFailure(); - _logger.LogError(ex, "Failed to fetch VMware advisory {AdvisoryId}", item.Id); - await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); - throw; - } - - if (result.IsNotModified) - { - _diagnostics.FetchUnchanged(); - if (existing is not null) - { - fetchCache[cacheKey] = VmwareFetchCacheEntry.FromDocument(existing); - pendingDocuments.Remove(existing.Id); - pendingMappings.Remove(existing.Id); - _logger.LogInformation("VMware advisory {AdvisoryId} returned 304 Not Modified", item.Id); - } - - continue; - } - - if (!result.IsSuccess || result.Document is null) - { - _diagnostics.FetchFailure(); - continue; - } - - remainingCapacity--; - - if (modified > maxModified) - { - maxModified = modified; - processedIds.Clear(); - processedUpdated = true; - } - - if (modified == maxModified) - { - processedIds.Add(item.Id); - processedUpdated = true; - } - - var cacheEntry = VmwareFetchCacheEntry.FromDocument(result.Document); - - if (existing is not null - && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal) - && cursor.TryGetFetchCache(cacheKey, out var cachedEntry) - && cachedEntry.Matches(result.Document)) - { - _diagnostics.FetchUnchanged(); - fetchCache[cacheKey] = cacheEntry; - pendingDocuments.Remove(result.Document.Id); - pendingMappings.Remove(result.Document.Id); - await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); - _logger.LogInformation("VMware advisory {AdvisoryId} unchanged; skipping reprocessing", item.Id); - continue; - } - - _diagnostics.FetchItem(); - fetchCache[cacheKey] = cacheEntry; - pendingDocuments.Add(result.Document.Id); - _logger.LogInformation( - "VMware advisory {AdvisoryId} fetched (documentId={DocumentId}, sha256={Sha})", - item.Id, - result.Document.Id, - result.Document.Sha256); - - if (_options.RequestDelay > TimeSpan.Zero) - { - try - { - await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); - } - catch (TaskCanceledException) - { - break; - } - } - } - - if (fetchCache.Count > 0 && touchedResources.Count > 0) - { - var stale = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); - foreach (var key in stale) - { - fetchCache.Remove(key); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(pendingDocuments) - .WithPendingMappings(pendingMappings) - .WithFetchCache(fetchCache); - - if (processedUpdated) - { - updatedCursor = updatedCursor.WithLastModified(maxModified, processedIds); - } - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingDocuments.Count == 0) - { - return; - } - - var remaining = cursor.PendingDocuments.ToList(); - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingDocuments) - { - cancellationToken.ThrowIfCancellationRequested(); - - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (document is null) - { - remaining.Remove(documentId); - continue; - } - - if (!document.GridFsId.HasValue) - { - _logger.LogWarning("VMware document {DocumentId} missing GridFS payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - _diagnostics.ParseFailure(); - continue; - } - - byte[] bytes; - try - { - bytes = await _rawDocumentStorage.DownloadAsync(document.GridFsId.Value, cancellationToken).ConfigureAwait(false); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed downloading VMware document {DocumentId}", document.Id); - throw; - } - - VmwareDetailDto? detail; - try - { - detail = JsonSerializer.Deserialize(bytes, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogWarning(ex, "Failed to deserialize VMware advisory {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - _diagnostics.ParseFailure(); - continue; - } - - if (detail is null || string.IsNullOrWhiteSpace(detail.AdvisoryId)) - { - _logger.LogWarning("VMware advisory document {DocumentId} contained empty payload", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - remaining.Remove(documentId); - _diagnostics.ParseFailure(); - continue; - } - - var sanitized = JsonSerializer.Serialize(detail, SerializerOptions); - var payload = MongoDB.Bson.BsonDocument.Parse(sanitized); - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "vmware.v1", payload, _timeProvider.GetUtcNow()); - - await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); - - remaining.Remove(documentId); - if (!pendingMappings.Contains(documentId)) - { - pendingMappings.Add(documentId); - } - } - - var updatedCursor = cursor - .WithPendingDocuments(remaining) - .WithPendingMappings(pendingMappings); - - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) - { - ArgumentNullException.ThrowIfNull(services); - - var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); - if (cursor.PendingMappings.Count == 0) - { - return; - } - - var pendingMappings = cursor.PendingMappings.ToList(); - - foreach (var documentId in cursor.PendingMappings) - { - cancellationToken.ThrowIfCancellationRequested(); - - var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); - var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); - if (dto is null || document is null) - { - pendingMappings.Remove(documentId); - continue; - } - - var json = dto.Payload.ToJson(new JsonWriterSettings - { - OutputMode = JsonOutputMode.RelaxedExtendedJson, - }); - - VmwareDetailDto? detail; - try - { - detail = JsonSerializer.Deserialize(json, SerializerOptions); - } - catch (Exception ex) - { - _logger.LogError(ex, "Failed to deserialize VMware DTO for document {DocumentId}", document.Id); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - if (detail is null || string.IsNullOrWhiteSpace(detail.AdvisoryId)) - { - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); - pendingMappings.Remove(documentId); - continue; - } - - var (advisory, flag) = VmwareMapper.Map(detail, document, dto); - await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); - await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); - await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); - _diagnostics.MapAffectedCount(advisory.AffectedPackages.Length); - _logger.LogInformation( - "VMware advisory {AdvisoryId} mapped with {AffectedCount} affected packages", - detail.AdvisoryId, - advisory.AffectedPackages.Length); - - pendingMappings.Remove(documentId); - } - - var updatedCursor = cursor.WithPendingMappings(pendingMappings); - await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); - } - - private async Task> FetchIndexAsync(CancellationToken cancellationToken) - { - var client = _httpClientFactory.CreateClient(VmwareOptions.HttpClientName); - using var response = await client.GetAsync(_options.IndexUri, cancellationToken).ConfigureAwait(false); - response.EnsureSuccessStatusCode(); - - await using var stream = await response.Content.ReadAsStreamAsync(cancellationToken).ConfigureAwait(false); - var items = await JsonSerializer.DeserializeAsync>(stream, SerializerOptions, cancellationToken).ConfigureAwait(false); - return items ?? Array.Empty(); - } - - private async Task GetCursorAsync(CancellationToken cancellationToken) - { - var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); - return state is null ? VmwareCursor.Empty : VmwareCursor.FromBson(state.Cursor); - } - - private async Task UpdateCursorAsync(VmwareCursor cursor, CancellationToken cancellationToken) - { - var document = cursor.ToBsonDocument(); - await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); - } -} +using System; +using System.Collections.Generic; +using System.Linq; +using System.Net.Http; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using MongoDB.Bson.IO; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Vndr.Vmware.Configuration; +using StellaOps.Concelier.Connector.Vndr.Vmware.Internal; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Storage.Mongo.PsirtFlags; +using StellaOps.Plugin; + +namespace StellaOps.Concelier.Connector.Vndr.Vmware; + +public sealed class VmwareConnector : IFeedConnector +{ + private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web) + { + PropertyNameCaseInsensitive = true, + DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull, + }; + + private readonly IHttpClientFactory _httpClientFactory; + private readonly SourceFetchService _fetchService; + private readonly RawDocumentStorage _rawDocumentStorage; + private readonly IDocumentStore _documentStore; + private readonly IDtoStore _dtoStore; + private readonly IAdvisoryStore _advisoryStore; + private readonly ISourceStateRepository _stateRepository; + private readonly IPsirtFlagStore _psirtFlagStore; + private readonly VmwareOptions _options; + private readonly TimeProvider _timeProvider; + private readonly VmwareDiagnostics _diagnostics; + private readonly ILogger _logger; + + public VmwareConnector( + IHttpClientFactory httpClientFactory, + SourceFetchService fetchService, + RawDocumentStorage rawDocumentStorage, + IDocumentStore documentStore, + IDtoStore dtoStore, + IAdvisoryStore advisoryStore, + ISourceStateRepository stateRepository, + IPsirtFlagStore psirtFlagStore, + IOptions options, + TimeProvider? timeProvider, + VmwareDiagnostics diagnostics, + ILogger logger) + { + _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); + _fetchService = fetchService ?? throw new ArgumentNullException(nameof(fetchService)); + _rawDocumentStorage = rawDocumentStorage ?? throw new ArgumentNullException(nameof(rawDocumentStorage)); + _documentStore = documentStore ?? throw new ArgumentNullException(nameof(documentStore)); + _dtoStore = dtoStore ?? throw new ArgumentNullException(nameof(dtoStore)); + _advisoryStore = advisoryStore ?? throw new ArgumentNullException(nameof(advisoryStore)); + _stateRepository = stateRepository ?? throw new ArgumentNullException(nameof(stateRepository)); + _psirtFlagStore = psirtFlagStore ?? throw new ArgumentNullException(nameof(psirtFlagStore)); + _options = (options ?? throw new ArgumentNullException(nameof(options))).Value ?? throw new ArgumentNullException(nameof(options)); + _options.Validate(); + _timeProvider = timeProvider ?? TimeProvider.System; + _diagnostics = diagnostics ?? throw new ArgumentNullException(nameof(diagnostics)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } + + public string SourceName => VmwareConnectorPlugin.SourceName; + + public async Task FetchAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + var now = _timeProvider.GetUtcNow(); + var pendingDocuments = cursor.PendingDocuments.ToHashSet(); + var pendingMappings = cursor.PendingMappings.ToHashSet(); + var fetchCache = new Dictionary(cursor.FetchCache, StringComparer.OrdinalIgnoreCase); + var touchedResources = new HashSet(StringComparer.OrdinalIgnoreCase); + var remainingCapacity = _options.MaxAdvisoriesPerFetch; + + IReadOnlyList indexItems; + try + { + indexItems = await FetchIndexAsync(cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "Failed to retrieve VMware advisory index"); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(10), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (indexItems.Count == 0) + { + return; + } + + var orderedItems = indexItems + .Where(static item => !string.IsNullOrWhiteSpace(item.Id) && !string.IsNullOrWhiteSpace(item.DetailUrl)) + .OrderBy(static item => item.Modified ?? DateTimeOffset.MinValue) + .ThenBy(static item => item.Id, StringComparer.OrdinalIgnoreCase) + .ToArray(); + + var baseline = cursor.LastModified ?? now - _options.InitialBackfill; + var resumeStart = baseline - _options.ModifiedTolerance; + ProvenanceDiagnostics.ReportResumeWindow(SourceName, resumeStart, _logger); + var processedIds = new HashSet(cursor.ProcessedIds, StringComparer.OrdinalIgnoreCase); + var maxModified = cursor.LastModified ?? DateTimeOffset.MinValue; + var processedUpdated = false; + + foreach (var item in orderedItems) + { + if (remainingCapacity <= 0) + { + break; + } + + cancellationToken.ThrowIfCancellationRequested(); + + var modified = (item.Modified ?? DateTimeOffset.MinValue).ToUniversalTime(); + if (modified < baseline - _options.ModifiedTolerance) + { + continue; + } + + if (cursor.LastModified.HasValue && modified < cursor.LastModified.Value - _options.ModifiedTolerance) + { + continue; + } + + if (modified == cursor.LastModified && cursor.ProcessedIds.Contains(item.Id, StringComparer.OrdinalIgnoreCase)) + { + continue; + } + + if (!Uri.TryCreate(item.DetailUrl, UriKind.Absolute, out var detailUri)) + { + _logger.LogWarning("VMware advisory {AdvisoryId} has invalid detail URL {Url}", item.Id, item.DetailUrl); + continue; + } + + var cacheKey = detailUri.AbsoluteUri; + touchedResources.Add(cacheKey); + + var existing = await _documentStore.FindBySourceAndUriAsync(SourceName, cacheKey, cancellationToken).ConfigureAwait(false); + var metadata = new Dictionary(StringComparer.Ordinal) + { + ["vmware.id"] = item.Id, + ["vmware.modified"] = modified.ToString("O"), + }; + + SourceFetchResult result; + try + { + result = await _fetchService.FetchAsync( + new SourceFetchRequest(VmwareOptions.HttpClientName, SourceName, detailUri) + { + Metadata = metadata, + ETag = existing?.Etag, + LastModified = existing?.LastModified, + AcceptHeaders = new[] { "application/json" }, + }, + cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _diagnostics.FetchFailure(); + _logger.LogError(ex, "Failed to fetch VMware advisory {AdvisoryId}", item.Id); + await _stateRepository.MarkFailureAsync(SourceName, now, TimeSpan.FromMinutes(5), ex.Message, cancellationToken).ConfigureAwait(false); + throw; + } + + if (result.IsNotModified) + { + _diagnostics.FetchUnchanged(); + if (existing is not null) + { + fetchCache[cacheKey] = VmwareFetchCacheEntry.FromDocument(existing); + pendingDocuments.Remove(existing.Id); + pendingMappings.Remove(existing.Id); + _logger.LogInformation("VMware advisory {AdvisoryId} returned 304 Not Modified", item.Id); + } + + continue; + } + + if (!result.IsSuccess || result.Document is null) + { + _diagnostics.FetchFailure(); + continue; + } + + remainingCapacity--; + + if (modified > maxModified) + { + maxModified = modified; + processedIds.Clear(); + processedUpdated = true; + } + + if (modified == maxModified) + { + processedIds.Add(item.Id); + processedUpdated = true; + } + + var cacheEntry = VmwareFetchCacheEntry.FromDocument(result.Document); + + if (existing is not null + && string.Equals(existing.Status, DocumentStatuses.Mapped, StringComparison.Ordinal) + && cursor.TryGetFetchCache(cacheKey, out var cachedEntry) + && cachedEntry.Matches(result.Document)) + { + _diagnostics.FetchUnchanged(); + fetchCache[cacheKey] = cacheEntry; + pendingDocuments.Remove(result.Document.Id); + pendingMappings.Remove(result.Document.Id); + await _documentStore.UpdateStatusAsync(result.Document.Id, existing.Status, cancellationToken).ConfigureAwait(false); + _logger.LogInformation("VMware advisory {AdvisoryId} unchanged; skipping reprocessing", item.Id); + continue; + } + + _diagnostics.FetchItem(); + fetchCache[cacheKey] = cacheEntry; + pendingDocuments.Add(result.Document.Id); + _logger.LogInformation( + "VMware advisory {AdvisoryId} fetched (documentId={DocumentId}, sha256={Sha})", + item.Id, + result.Document.Id, + result.Document.Sha256); + + if (_options.RequestDelay > TimeSpan.Zero) + { + try + { + await Task.Delay(_options.RequestDelay, cancellationToken).ConfigureAwait(false); + } + catch (TaskCanceledException) + { + break; + } + } + } + + if (fetchCache.Count > 0 && touchedResources.Count > 0) + { + var stale = fetchCache.Keys.Where(key => !touchedResources.Contains(key)).ToArray(); + foreach (var key in stale) + { + fetchCache.Remove(key); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(pendingDocuments) + .WithPendingMappings(pendingMappings) + .WithFetchCache(fetchCache); + + if (processedUpdated) + { + updatedCursor = updatedCursor.WithLastModified(maxModified, processedIds); + } + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task ParseAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingDocuments.Count == 0) + { + return; + } + + var remaining = cursor.PendingDocuments.ToList(); + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingDocuments) + { + cancellationToken.ThrowIfCancellationRequested(); + + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (document is null) + { + remaining.Remove(documentId); + continue; + } + + if (!document.PayloadId.HasValue) + { + _logger.LogWarning("VMware document {DocumentId} missing GridFS payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + _diagnostics.ParseFailure(); + continue; + } + + byte[] bytes; + try + { + bytes = await _rawDocumentStorage.DownloadAsync(document.PayloadId.Value, cancellationToken).ConfigureAwait(false); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed downloading VMware document {DocumentId}", document.Id); + throw; + } + + VmwareDetailDto? detail; + try + { + detail = JsonSerializer.Deserialize(bytes, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to deserialize VMware advisory {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + _diagnostics.ParseFailure(); + continue; + } + + if (detail is null || string.IsNullOrWhiteSpace(detail.AdvisoryId)) + { + _logger.LogWarning("VMware advisory document {DocumentId} contained empty payload", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + remaining.Remove(documentId); + _diagnostics.ParseFailure(); + continue; + } + + var sanitized = JsonSerializer.Serialize(detail, SerializerOptions); + var payload = MongoDB.Bson.BsonDocument.Parse(sanitized); + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, SourceName, "vmware.v1", payload, _timeProvider.GetUtcNow()); + + await _dtoStore.UpsertAsync(dtoRecord, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.PendingMap, cancellationToken).ConfigureAwait(false); + + remaining.Remove(documentId); + if (!pendingMappings.Contains(documentId)) + { + pendingMappings.Add(documentId); + } + } + + var updatedCursor = cursor + .WithPendingDocuments(remaining) + .WithPendingMappings(pendingMappings); + + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + public async Task MapAsync(IServiceProvider services, CancellationToken cancellationToken) + { + ArgumentNullException.ThrowIfNull(services); + + var cursor = await GetCursorAsync(cancellationToken).ConfigureAwait(false); + if (cursor.PendingMappings.Count == 0) + { + return; + } + + var pendingMappings = cursor.PendingMappings.ToList(); + + foreach (var documentId in cursor.PendingMappings) + { + cancellationToken.ThrowIfCancellationRequested(); + + var dto = await _dtoStore.FindByDocumentIdAsync(documentId, cancellationToken).ConfigureAwait(false); + var document = await _documentStore.FindAsync(documentId, cancellationToken).ConfigureAwait(false); + if (dto is null || document is null) + { + pendingMappings.Remove(documentId); + continue; + } + + var json = dto.Payload.ToJson(new JsonWriterSettings + { + OutputMode = JsonOutputMode.RelaxedExtendedJson, + }); + + VmwareDetailDto? detail; + try + { + detail = JsonSerializer.Deserialize(json, SerializerOptions); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deserialize VMware DTO for document {DocumentId}", document.Id); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + if (detail is null || string.IsNullOrWhiteSpace(detail.AdvisoryId)) + { + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Failed, cancellationToken).ConfigureAwait(false); + pendingMappings.Remove(documentId); + continue; + } + + var (advisory, flag) = VmwareMapper.Map(detail, document, dto); + await _advisoryStore.UpsertAsync(advisory, cancellationToken).ConfigureAwait(false); + await _psirtFlagStore.UpsertAsync(flag, cancellationToken).ConfigureAwait(false); + await _documentStore.UpdateStatusAsync(document.Id, DocumentStatuses.Mapped, cancellationToken).ConfigureAwait(false); + _diagnostics.MapAffectedCount(advisory.AffectedPackages.Length); + _logger.LogInformation( + "VMware advisory {AdvisoryId} mapped with {AffectedCount} affected packages", + detail.AdvisoryId, + advisory.AffectedPackages.Length); + + pendingMappings.Remove(documentId); + } + + var updatedCursor = cursor.WithPendingMappings(pendingMappings); + await UpdateCursorAsync(updatedCursor, cancellationToken).ConfigureAwait(false); + } + + private async Task> FetchIndexAsync(CancellationToken cancellationToken) + { + var client = _httpClientFactory.CreateClient(VmwareOptions.HttpClientName); + using var response = await client.GetAsync(_options.IndexUri, cancellationToken).ConfigureAwait(false); + response.EnsureSuccessStatusCode(); + + await using var stream = await response.Content.ReadAsStreamAsync(cancellationToken).ConfigureAwait(false); + var items = await JsonSerializer.DeserializeAsync>(stream, SerializerOptions, cancellationToken).ConfigureAwait(false); + return items ?? Array.Empty(); + } + + private async Task GetCursorAsync(CancellationToken cancellationToken) + { + var state = await _stateRepository.TryGetAsync(SourceName, cancellationToken).ConfigureAwait(false); + return state is null ? VmwareCursor.Empty : VmwareCursor.FromBson(state.Cursor); + } + + private async Task UpdateCursorAsync(VmwareCursor cursor, CancellationToken cancellationToken) + { + var document = cursor.ToBsonDocument(); + await _stateRepository.UpdateCursorAsync(SourceName, document, _timeProvider.GetUtcNow(), cancellationToken).ConfigureAwait(false); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.Json/StellaOps.Concelier.Exporter.Json.csproj b/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.Json/StellaOps.Concelier.Exporter.Json.csproj index 3a3ba7329..c9539dde1 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.Json/StellaOps.Concelier.Exporter.Json.csproj +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.Json/StellaOps.Concelier.Exporter.Json.csproj @@ -10,6 +10,7 @@ + @@ -20,4 +21,4 @@ - \ No newline at end of file + diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.TrivyDb/StellaOps.Concelier.Exporter.TrivyDb.csproj b/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.TrivyDb/StellaOps.Concelier.Exporter.TrivyDb.csproj index 5053b7217..ca108bb6a 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.TrivyDb/StellaOps.Concelier.Exporter.TrivyDb.csproj +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Exporter.TrivyDb/StellaOps.Concelier.Exporter.TrivyDb.csproj @@ -10,6 +10,7 @@ + @@ -18,4 +19,4 @@ - \ No newline at end of file + diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/Bson.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/Bson.cs index 03f922166..04f094c9f 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/Bson.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/Bson.cs @@ -24,6 +24,8 @@ namespace MongoDB.Bson { protected readonly object? _value; public BsonValue(object? value) => _value = value; + internal object? RawValue => _value; + public static BsonValue Create(object? value) => BsonDocument.WrapExternal(value); public virtual BsonType BsonType => _value switch { null => BsonType.Null, @@ -59,12 +61,24 @@ namespace MongoDB.Bson public class BsonInt64 : BsonValue { public BsonInt64(long value) : base(value) { } } public class BsonDouble : BsonValue { public BsonDouble(double value) : base(value) { } } public class BsonDateTime : BsonValue { public BsonDateTime(DateTime value) : base(value) { } } + public class BsonNull : BsonValue + { + private BsonNull() : base(null) { } + public static BsonNull Value { get; } = new(); + } public class BsonArray : BsonValue, IEnumerable { private readonly List _items = new(); public BsonArray() : base(null) { } public BsonArray(IEnumerable values) : this() => _items.AddRange(values); + public BsonArray(IEnumerable values) : this() + { + foreach (var value in values) + { + _items.Add(BsonDocument.WrapExternal(value)); + } + } public void Add(BsonValue value) => _items.Add(value); public IEnumerator GetEnumerator() => _items.GetEnumerator(); IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); @@ -93,6 +107,8 @@ namespace MongoDB.Bson _ => new BsonValue(value) }; + internal static BsonValue WrapExternal(object? value) => Wrap(value); + public BsonValue this[string key] { get => _values[key]; @@ -104,6 +120,7 @@ namespace MongoDB.Bson public bool TryGetValue(string key, out BsonValue value) => _values.TryGetValue(key, out value!); public void Add(string key, BsonValue value) => _values[key] = value; + public void Add(string key, object? value) => _values[key] = Wrap(value); public IEnumerator> GetEnumerator() => _values.GetEnumerator(); IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); @@ -156,7 +173,7 @@ namespace MongoDB.Bson { BsonDocument doc => doc._values.ToDictionary(kvp => kvp.Key, kvp => Unwrap(kvp.Value)), BsonArray array => array.Select(Unwrap).ToArray(), - _ => value._value + _ => value.RawValue }; } } diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/StorageStubs.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/StorageStubs.cs index 6cf5cf0c7..f0ca7338b 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/StorageStubs.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Models/MongoCompat/StorageStubs.cs @@ -1,4 +1,5 @@ using System.Collections.Concurrent; +using System.IO; using StellaOps.Concelier.Models; namespace StellaOps.Concelier.Storage.Mongo @@ -33,8 +34,9 @@ namespace StellaOps.Concelier.Storage.Mongo IReadOnlyDictionary? Metadata = null, string? Etag = null, DateTimeOffset? LastModified = null, - MongoDB.Bson.ObjectId? GridFsId = null, - DateTimeOffset? ExpiresAt = null); + Guid? PayloadId = null, + DateTimeOffset? ExpiresAt = null, + byte[]? Payload = null); public interface IDocumentStore { @@ -85,7 +87,7 @@ namespace StellaOps.Concelier.Storage.Mongo Guid DocumentId, string SourceName, string Format, - MongoDB.Bson.BsonDocument Payload, + string Payload, DateTimeOffset CreatedAt); public interface IDtoStore @@ -113,40 +115,40 @@ namespace StellaOps.Concelier.Storage.Mongo public sealed class RawDocumentStorage { - private readonly ConcurrentDictionary _blobs = new(); + private readonly ConcurrentDictionary _blobs = new(); - public Task UploadAsync(string sourceName, string uri, byte[] content, string? contentType, DateTimeOffset? expiresAt, CancellationToken cancellationToken) + public Task UploadAsync(string sourceName, string uri, byte[] content, string? contentType, DateTimeOffset? expiresAt, CancellationToken cancellationToken) { - var id = MongoDB.Bson.ObjectId.GenerateNewId(); + var id = Guid.NewGuid(); _blobs[id] = content.ToArray(); return Task.FromResult(id); } - public Task UploadAsync(string sourceName, string uri, byte[] content, string? contentType, CancellationToken cancellationToken) + public Task UploadAsync(string sourceName, string uri, byte[] content, string? contentType, CancellationToken cancellationToken) => UploadAsync(sourceName, uri, content, contentType, null, cancellationToken); - public Task DownloadAsync(MongoDB.Bson.ObjectId id, CancellationToken cancellationToken) + public Task DownloadAsync(Guid id, CancellationToken cancellationToken) { if (_blobs.TryGetValue(id, out var bytes)) { return Task.FromResult(bytes); } - throw new MongoDB.Driver.GridFSFileNotFoundException($"Blob {id} not found."); + throw new FileNotFoundException($"Blob {id} not found."); } - public Task DeleteAsync(MongoDB.Bson.ObjectId id, CancellationToken cancellationToken) + public Task DeleteAsync(Guid id, CancellationToken cancellationToken) { _blobs.TryRemove(id, out _); return Task.CompletedTask; } } - public sealed record SourceStateRecord(string SourceName, MongoDB.Bson.BsonDocument? Cursor, DateTimeOffset UpdatedAt); + public sealed record SourceStateRecord(string SourceName, string? CursorJson, DateTimeOffset UpdatedAt); public interface ISourceStateRepository { Task TryGetAsync(string sourceName, CancellationToken cancellationToken); - Task UpdateCursorAsync(string sourceName, MongoDB.Bson.BsonDocument cursor, DateTimeOffset completedAt, CancellationToken cancellationToken); + Task UpdateCursorAsync(string sourceName, string cursorJson, DateTimeOffset completedAt, CancellationToken cancellationToken); Task MarkFailureAsync(string sourceName, DateTimeOffset now, TimeSpan backoff, string reason, CancellationToken cancellationToken); } @@ -160,9 +162,9 @@ namespace StellaOps.Concelier.Storage.Mongo return Task.FromResult(record); } - public Task UpdateCursorAsync(string sourceName, MongoDB.Bson.BsonDocument cursor, DateTimeOffset completedAt, CancellationToken cancellationToken) + public Task UpdateCursorAsync(string sourceName, string cursorJson, DateTimeOffset completedAt, CancellationToken cancellationToken) { - _states[sourceName] = new SourceStateRecord(sourceName, cursor.DeepClone(), completedAt); + _states[sourceName] = new SourceStateRecord(sourceName, cursorJson, completedAt); return Task.CompletedTask; } @@ -174,6 +176,53 @@ namespace StellaOps.Concelier.Storage.Mongo } } +namespace StellaOps.Concelier.Storage.Mongo.Advisories +{ + public interface IAdvisoryStore + { + Task UpsertAsync(Advisory advisory, CancellationToken cancellationToken); + Task FindAsync(string advisoryKey, CancellationToken cancellationToken); + Task> GetRecentAsync(int limit, CancellationToken cancellationToken); + IAsyncEnumerable StreamAsync(CancellationToken cancellationToken); + } + + public sealed class InMemoryAdvisoryStore : IAdvisoryStore + { + private readonly ConcurrentDictionary _advisories = new(StringComparer.OrdinalIgnoreCase); + + public Task UpsertAsync(Advisory advisory, CancellationToken cancellationToken) + { + _advisories[advisory.AdvisoryKey] = advisory; + return Task.CompletedTask; + } + + public Task FindAsync(string advisoryKey, CancellationToken cancellationToken) + { + _advisories.TryGetValue(advisoryKey, out var advisory); + return Task.FromResult(advisory); + } + + public Task> GetRecentAsync(int limit, CancellationToken cancellationToken) + { + var result = _advisories.Values + .OrderByDescending(a => a.Modified ?? a.Published ?? DateTimeOffset.MinValue) + .Take(limit) + .ToArray(); + return Task.FromResult>(result); + } + + public async IAsyncEnumerable StreamAsync([System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken) + { + foreach (var advisory in _advisories.Values.OrderBy(a => a.AdvisoryKey, StringComparer.OrdinalIgnoreCase)) + { + cancellationToken.ThrowIfCancellationRequested(); + yield return advisory; + await Task.Yield(); + } + } + } +} + namespace StellaOps.Concelier.Storage.Mongo.Aliases { public sealed record AliasRecord(string AdvisoryKey, string Scheme, string Value); @@ -192,13 +241,13 @@ namespace StellaOps.Concelier.Storage.Mongo.Aliases public Task> GetByAdvisoryAsync(string advisoryKey, CancellationToken cancellationToken) { _byAdvisory.TryGetValue(advisoryKey, out var records); - return Task.FromResult>(records ?? Array.Empty()); + return Task.FromResult>(records ?? (IReadOnlyList)Array.Empty()); } public Task> GetByAliasAsync(string scheme, string value, CancellationToken cancellationToken) { _byAlias.TryGetValue((scheme, value), out var records); - return Task.FromResult>(records ?? Array.Empty()); + return Task.FromResult>(records ?? (IReadOnlyList)Array.Empty()); } } } @@ -286,10 +335,10 @@ namespace StellaOps.Concelier.Storage.Mongo.Exporting id, cursor ?? digest, digest, - lastDeltaDigest: null, - baseExportId: resetBaseline ? exportId : null, - baseDigest: resetBaseline ? digest : null, - targetRepository, + LastDeltaDigest: null, + BaseExportId: resetBaseline ? exportId : null, + BaseDigest: resetBaseline ? digest : null, + TargetRepository: targetRepository, manifest, exporterVersion, _timeProvider.GetUtcNow()); @@ -307,11 +356,11 @@ namespace StellaOps.Concelier.Storage.Mongo.Exporting var record = new ExportStateRecord( id, cursor ?? deltaDigest, - lastFullDigest: null, - lastDeltaDigest: deltaDigest, - baseExportId: null, - baseDigest: null, - targetRepository: null, + LastFullDigest: null, + LastDeltaDigest: deltaDigest, + BaseExportId: null, + BaseDigest: null, + TargetRepository: null, manifest, exporterVersion, _timeProvider.GetUtcNow()); diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/DocumentStore.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/DocumentStore.cs new file mode 100644 index 000000000..bf6fd8cbe --- /dev/null +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/DocumentStore.cs @@ -0,0 +1,88 @@ +using System.Text.Json; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Postgres.Models; +using StellaOps.Concelier.Storage.Postgres.Repositories; + +namespace StellaOps.Concelier.Storage.Postgres; + +/// +/// Postgres-backed implementation that satisfies the legacy IDocumentStore contract. +/// +public sealed class PostgresDocumentStore : IDocumentStore +{ + private readonly IDocumentRepository _repository; + private readonly ISourceRepository _sourceRepository; + private readonly JsonSerializerOptions _json = new(JsonSerializerDefaults.Web); + + public PostgresDocumentStore(IDocumentRepository repository, ISourceRepository sourceRepository) + { + _repository = repository ?? throw new ArgumentNullException(nameof(repository)); + _sourceRepository = sourceRepository ?? throw new ArgumentNullException(nameof(sourceRepository)); + } + + public async Task FindAsync(Guid id, CancellationToken cancellationToken, MongoDB.Driver.IClientSessionHandle? session = null) + { + var row = await _repository.FindAsync(id, cancellationToken).ConfigureAwait(false); + return row is null ? null : Map(row); + } + + public async Task FindBySourceAndUriAsync(string sourceName, string uri, CancellationToken cancellationToken, MongoDB.Driver.IClientSessionHandle? session = null) + { + var row = await _repository.FindBySourceAndUriAsync(sourceName, uri, cancellationToken).ConfigureAwait(false); + return row is null ? null : Map(row); + } + + public async Task UpsertAsync(DocumentRecord record, CancellationToken cancellationToken, MongoDB.Driver.IClientSessionHandle? session = null) + { + // Ensure source exists + var source = await _sourceRepository.GetByNameAsync(record.SourceName, cancellationToken).ConfigureAwait(false) + ?? throw new InvalidOperationException($"Source '{record.SourceName}' not provisioned."); + + var entity = new DocumentRecordEntity( + Id: record.Id == Guid.Empty ? Guid.NewGuid() : record.Id, + SourceId: source.Id, + SourceName: record.SourceName, + Uri: record.Uri, + Sha256: record.Sha256, + Status: record.Status, + ContentType: record.ContentType, + HeadersJson: record.Headers is null ? null : JsonSerializer.Serialize(record.Headers, _json), + MetadataJson: record.Metadata is null ? null : JsonSerializer.Serialize(record.Metadata, _json), + Etag: record.Etag, + LastModified: record.LastModified, + Payload: Array.Empty(), // payload handled via RawDocumentStorage; keep pointer zero-length here + CreatedAt: record.CreatedAt, + UpdatedAt: DateTimeOffset.UtcNow, + ExpiresAt: record.ExpiresAt); + + var saved = await _repository.UpsertAsync(entity, cancellationToken).ConfigureAwait(false); + return Map(saved); + } + + public async Task UpdateStatusAsync(Guid id, string status, CancellationToken cancellationToken, MongoDB.Driver.IClientSessionHandle? session = null) + { + await _repository.UpdateStatusAsync(id, status, cancellationToken).ConfigureAwait(false); + } + + private DocumentRecord Map(DocumentRecordEntity row) + { + return new DocumentRecord( + row.Id, + row.SourceName, + row.Uri, + row.CreatedAt, + row.Sha256, + row.Status, + row.ContentType, + row.HeadersJson is null + ? null + : JsonSerializer.Deserialize>(row.HeadersJson, _json), + row.MetadataJson is null + ? null + : JsonSerializer.Deserialize>(row.MetadataJson, _json), + row.Etag, + row.LastModified, + PayloadId: null, + ExpiresAt: row.ExpiresAt); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Migrations/004_documents.sql b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Migrations/004_documents.sql new file mode 100644 index 000000000..339b378f8 --- /dev/null +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Migrations/004_documents.sql @@ -0,0 +1,23 @@ +-- Concelier Postgres Migration 004: Source documents and payload storage (Mongo replacement) + +CREATE TABLE IF NOT EXISTS concelier.source_documents ( + id UUID NOT NULL, + source_id UUID NOT NULL, + source_name TEXT NOT NULL, + uri TEXT NOT NULL, + sha256 TEXT NOT NULL, + status TEXT NOT NULL, + content_type TEXT, + headers_json JSONB, + metadata_json JSONB, + etag TEXT, + last_modified TIMESTAMPTZ, + payload BYTEA NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + expires_at TIMESTAMPTZ, + CONSTRAINT pk_source_documents PRIMARY KEY (source_name, uri) +); + +CREATE INDEX IF NOT EXISTS idx_source_documents_source_id ON concelier.source_documents(source_id); +CREATE INDEX IF NOT EXISTS idx_source_documents_status ON concelier.source_documents(status); diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Models/DocumentRecordEntity.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Models/DocumentRecordEntity.cs new file mode 100644 index 000000000..db9e490c7 --- /dev/null +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Models/DocumentRecordEntity.cs @@ -0,0 +1,18 @@ +namespace StellaOps.Concelier.Storage.Postgres.Models; + +public sealed record DocumentRecordEntity( + Guid Id, + Guid SourceId, + string SourceName, + string Uri, + string Sha256, + string Status, + string? ContentType, + string? HeadersJson, + string? MetadataJson, + string? Etag, + DateTimeOffset? LastModified, + byte[] Payload, + DateTimeOffset CreatedAt, + DateTimeOffset UpdatedAt, + DateTimeOffset? ExpiresAt); diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Repositories/DocumentRepository.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Repositories/DocumentRepository.cs new file mode 100644 index 000000000..720adcad9 --- /dev/null +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/Repositories/DocumentRepository.cs @@ -0,0 +1,125 @@ +using System.Text.Json; +using Dapper; +using StellaOps.Concelier.Storage.Postgres.Models; +using StellaOps.Infrastructure.Postgres; +using StellaOps.Infrastructure.Postgres.Connections; + +namespace StellaOps.Concelier.Storage.Postgres.Repositories; + +public interface IDocumentRepository +{ + Task FindAsync(Guid id, CancellationToken cancellationToken); + Task FindBySourceAndUriAsync(string sourceName, string uri, CancellationToken cancellationToken); + Task UpsertAsync(DocumentRecordEntity record, CancellationToken cancellationToken); + Task UpdateStatusAsync(Guid id, string status, CancellationToken cancellationToken); +} + +public sealed class DocumentRepository : RepositoryBase, IDocumentRepository +{ + private readonly JsonSerializerOptions _json = new(JsonSerializerDefaults.Web); + + public DocumentRepository(ConcelierDataSource dataSource, ILogger logger) + : base(dataSource, logger) + { + } + + public async Task FindAsync(Guid id, CancellationToken cancellationToken) + { + const string sql = """ +SELECT * FROM concelier.source_documents +WHERE id = @Id +LIMIT 1; +"""; + + await using var conn = await DataSource.OpenSystemConnectionAsync(cancellationToken); + var row = await conn.QuerySingleOrDefaultAsync(sql, new { Id = id }); + return row is null ? null : Map(row); + } + + public async Task FindBySourceAndUriAsync(string sourceName, string uri, CancellationToken cancellationToken) + { + const string sql = """ +SELECT * FROM concelier.source_documents +WHERE source_name = @SourceName AND uri = @Uri +LIMIT 1; +"""; + await using var conn = await DataSource.OpenSystemConnectionAsync(cancellationToken); + var row = await conn.QuerySingleOrDefaultAsync(sql, new { SourceName = sourceName, Uri = uri }); + return row is null ? null : Map(row); + } + + public async Task UpsertAsync(DocumentRecordEntity record, CancellationToken cancellationToken) + { + const string sql = """ +INSERT INTO concelier.source_documents ( + id, source_id, source_name, uri, sha256, status, content_type, + headers_json, metadata_json, etag, last_modified, payload, created_at, updated_at, expires_at) +VALUES ( + @Id, @SourceId, @SourceName, @Uri, @Sha256, @Status, @ContentType, + @HeadersJson, @MetadataJson, @Etag, @LastModified, @Payload, @CreatedAt, @UpdatedAt, @ExpiresAt) +ON CONFLICT (source_name, uri) DO UPDATE SET + sha256 = EXCLUDED.sha256, + status = EXCLUDED.status, + content_type = EXCLUDED.content_type, + headers_json = EXCLUDED.headers_json, + metadata_json = EXCLUDED.metadata_json, + etag = EXCLUDED.etag, + last_modified = EXCLUDED.last_modified, + payload = EXCLUDED.payload, + updated_at = EXCLUDED.updated_at, + expires_at = EXCLUDED.expires_at +RETURNING *; +"""; + await using var conn = await DataSource.OpenSystemConnectionAsync(cancellationToken); + var row = await conn.QuerySingleAsync(sql, new + { + record.Id, + record.SourceId, + record.SourceName, + record.Uri, + record.Sha256, + record.Status, + record.ContentType, + record.HeadersJson, + record.MetadataJson, + record.Etag, + record.LastModified, + record.Payload, + record.CreatedAt, + record.UpdatedAt, + record.ExpiresAt + }); + return Map(row); + } + + public async Task UpdateStatusAsync(Guid id, string status, CancellationToken cancellationToken) + { + const string sql = """ +UPDATE concelier.source_documents +SET status = @Status, updated_at = NOW() +WHERE id = @Id; +"""; + await using var conn = await DataSource.OpenSystemConnectionAsync(cancellationToken); + await conn.ExecuteAsync(sql, new { Id = id, Status = status }); + } + + private DocumentRecordEntity Map(dynamic row) + { + return new DocumentRecordEntity( + row.id, + row.source_id, + row.source_name, + row.uri, + row.sha256, + row.status, + (string?)row.content_type, + (string?)row.headers_json, + (string?)row.metadata_json, + (string?)row.etag, + (DateTimeOffset?)row.last_modified, + (byte[])row.payload, + DateTime.SpecifyKind(row.created_at, DateTimeKind.Utc), + DateTime.SpecifyKind(row.updated_at, DateTimeKind.Utc), + row.expires_at is null ? null : DateTime.SpecifyKind(row.expires_at, DateTimeKind.Utc)); + } +} diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/ServiceCollectionExtensions.cs b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/ServiceCollectionExtensions.cs index 33970a504..232f34598 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/ServiceCollectionExtensions.cs +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/ServiceCollectionExtensions.cs @@ -4,6 +4,7 @@ using StellaOps.Concelier.Storage.Postgres.Repositories; using StellaOps.Infrastructure.Postgres; using StellaOps.Infrastructure.Postgres.Options; using StellaOps.Concelier.Core.Linksets; +using StellaOps.Concelier.Storage.Mongo; namespace StellaOps.Concelier.Storage.Postgres; @@ -38,11 +39,13 @@ public static class ServiceCollectionExtensions services.AddScoped(); services.AddScoped(); services.AddScoped(); + services.AddScoped(); services.AddScoped(); services.AddScoped(); services.AddScoped(); services.AddScoped(); services.AddScoped(sp => sp.GetRequiredService()); + services.AddScoped(); return services; } @@ -71,11 +74,13 @@ public static class ServiceCollectionExtensions services.AddScoped(); services.AddScoped(); services.AddScoped(); + services.AddScoped(); services.AddScoped(); services.AddScoped(); services.AddScoped(); services.AddScoped(); services.AddScoped(sp => sp.GetRequiredService()); + services.AddScoped(); return services; } diff --git a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/StellaOps.Concelier.Storage.Postgres.csproj b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/StellaOps.Concelier.Storage.Postgres.csproj index c3333764b..dfdf3ae61 100644 --- a/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/StellaOps.Concelier.Storage.Postgres.csproj +++ b/src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres/StellaOps.Concelier.Storage.Postgres.csproj @@ -10,6 +10,11 @@ StellaOps.Concelier.Storage.Postgres + + + + + @@ -25,6 +30,7 @@ + diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Internal/CccsMapperTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Internal/CccsMapperTests.cs index 03413e7a5..8d0d5e50b 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Internal/CccsMapperTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Internal/CccsMapperTests.cs @@ -6,36 +6,36 @@ using StellaOps.Concelier.Connector.Common.Html; using StellaOps.Concelier.Models; using StellaOps.Concelier.Storage.Mongo.Documents; using Xunit; - -namespace StellaOps.Concelier.Connector.Cccs.Tests.Internal; - -public sealed class CccsMapperTests -{ - [Fact] - public void Map_CreatesCanonicalAdvisory() - { - var raw = CccsHtmlParserTests.LoadFixture("cccs-raw-advisory.json"); - var dto = new CccsHtmlParser(new HtmlContentSanitizer()).Parse(raw); - var document = new DocumentRecord( - Guid.NewGuid(), - CccsConnectorPlugin.SourceName, - dto.CanonicalUrl, - DateTimeOffset.UtcNow, - "sha-test", - DocumentStatuses.PendingMap, - "application/json", - Headers: null, - Metadata: null, - Etag: null, - LastModified: dto.Modified, - GridFsId: null); - - var recordedAt = DateTimeOffset.Parse("2025-08-12T00:00:00Z"); - var advisory = CccsMapper.Map(dto, document, recordedAt); - - advisory.AdvisoryKey.Should().Be("TEST-001"); - advisory.Title.Should().Be(dto.Title); - advisory.Aliases.Should().Contain(new[] { "TEST-001", "CVE-2020-1234", "CVE-2021-9999" }); + +namespace StellaOps.Concelier.Connector.Cccs.Tests.Internal; + +public sealed class CccsMapperTests +{ + [Fact] + public void Map_CreatesCanonicalAdvisory() + { + var raw = CccsHtmlParserTests.LoadFixture("cccs-raw-advisory.json"); + var dto = new CccsHtmlParser(new HtmlContentSanitizer()).Parse(raw); + var document = new DocumentRecord( + Guid.NewGuid(), + CccsConnectorPlugin.SourceName, + dto.CanonicalUrl, + DateTimeOffset.UtcNow, + "sha-test", + DocumentStatuses.PendingMap, + "application/json", + Headers: null, + Metadata: null, + Etag: null, + LastModified: dto.Modified, + PayloadId: null); + + var recordedAt = DateTimeOffset.Parse("2025-08-12T00:00:00Z"); + var advisory = CccsMapper.Map(dto, document, recordedAt); + + advisory.AdvisoryKey.Should().Be("TEST-001"); + advisory.Title.Should().Be(dto.Title); + advisory.Aliases.Should().Contain(new[] { "TEST-001", "CVE-2020-1234", "CVE-2021-9999" }); advisory.References.Should().Contain(reference => reference.Url == dto.CanonicalUrl && reference.Kind == "details"); advisory.References.Should().Contain(reference => reference.Url == "https://example.com/details"); advisory.AffectedPackages.Should().HaveCount(2); diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.CertCc.Tests/Internal/CertCcMapperTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.CertCc.Tests/Internal/CertCcMapperTests.cs index 51ccf0135..6b201b710 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.CertCc.Tests/Internal/CertCcMapperTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.CertCc.Tests/Internal/CertCcMapperTests.cs @@ -1,118 +1,118 @@ -using System; -using System.Globalization; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.CertCc.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using Xunit; - -namespace StellaOps.Concelier.Connector.CertCc.Tests.Internal; - -public sealed class CertCcMapperTests -{ - private static readonly DateTimeOffset PublishedAt = DateTimeOffset.Parse("2025-10-03T11:35:31Z", CultureInfo.InvariantCulture); - - [Fact] - public void Map_ProducesCanonicalAdvisoryWithVendorPrimitives() - { - const string vendorStatement = - "The issue is confirmed, and here is the patch list\n\n" + - "V3912/V3910/V2962/V1000B\t4.4.3.6/4.4.5.1\n" + - "V2927/V2865/V2866\t4.5.1\n" + - "V2765/V2766/V2763/V2135\t4.5.1"; - - var vendor = new CertCcVendorDto( - "DrayTek Corporation", - ContactDate: PublishedAt.AddDays(-10), - StatementDate: PublishedAt.AddDays(-5), - Updated: PublishedAt, - Statement: vendorStatement, - Addendum: null, - References: new[] { "https://www.draytek.com/support/resources?type=version" }); - - var vendorStatus = new CertCcVendorStatusDto( - Vendor: "DrayTek Corporation", - CveId: "CVE-2025-10547", - Status: "Affected", - Statement: null, - References: Array.Empty(), - DateAdded: PublishedAt, - DateUpdated: PublishedAt); - - var vulnerability = new CertCcVulnerabilityDto( - CveId: "CVE-2025-10547", - Description: null, - DateAdded: PublishedAt, - DateUpdated: PublishedAt); - - var metadata = new CertCcNoteMetadata( - VuId: "VU#294418", - IdNumber: "294418", - Title: "Vigor routers running DrayOS RCE via EasyVPN", - Overview: "Overview", - Summary: "Summary", - Published: PublishedAt, - Updated: PublishedAt.AddMinutes(5), - Created: PublishedAt, - Revision: 2, - CveIds: new[] { "CVE-2025-10547" }, - PublicUrls: new[] - { - "https://www.draytek.com/about/security-advisory/use-of-uninitialized-variable-vulnerabilities/", - "https://www.draytek.com/support/resources?type=version" - }, - PrimaryUrl: "https://www.kb.cert.org/vuls/id/294418/"); - - var dto = new CertCcNoteDto( - metadata, - Vendors: new[] { vendor }, - VendorStatuses: new[] { vendorStatus }, - Vulnerabilities: new[] { vulnerability }); - - var document = new DocumentRecord( - Guid.NewGuid(), - "cert-cc", - "https://www.kb.cert.org/vuls/id/294418/", - PublishedAt, - Sha256: new string('0', 64), - Status: "pending-map", - ContentType: "application/json", - Headers: null, - Metadata: null, - Etag: null, - LastModified: PublishedAt, - GridFsId: null); - - var dtoRecord = new DtoRecord( - Id: Guid.NewGuid(), - DocumentId: document.Id, - SourceName: "cert-cc", - SchemaVersion: "certcc.vince.note.v1", - Payload: new BsonDocument(), - ValidatedAt: PublishedAt.AddMinutes(1)); - - var advisory = CertCcMapper.Map(dto, document, dtoRecord, "cert-cc"); - - Assert.Equal("certcc/vu-294418", advisory.AdvisoryKey); - Assert.Contains("VU#294418", advisory.Aliases); - Assert.Contains("CVE-2025-10547", advisory.Aliases); - Assert.Equal("en", advisory.Language); - Assert.Equal(PublishedAt, advisory.Published); - - Assert.Contains(advisory.References, reference => reference.Url.Contains("/vuls/id/294418", StringComparison.OrdinalIgnoreCase)); - - var affected = Assert.Single(advisory.AffectedPackages); - Assert.Equal("vendor", affected.Type); - Assert.Equal("DrayTek Corporation", affected.Identifier); - Assert.Contains(affected.Statuses, status => status.Status == AffectedPackageStatusCatalog.Affected); - - var range = Assert.Single(affected.VersionRanges); - Assert.NotNull(range.Primitives); - Assert.NotNull(range.Primitives!.VendorExtensions); - Assert.Contains(range.Primitives.VendorExtensions!, kvp => kvp.Key == "certcc.vendor.patches"); - - Assert.NotEmpty(affected.NormalizedVersions); - Assert.Contains(affected.NormalizedVersions, rule => rule.Scheme == "certcc.vendor" && rule.Value == "4.5.1"); - } -} +using System; +using System.Globalization; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.CertCc.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using Xunit; + +namespace StellaOps.Concelier.Connector.CertCc.Tests.Internal; + +public sealed class CertCcMapperTests +{ + private static readonly DateTimeOffset PublishedAt = DateTimeOffset.Parse("2025-10-03T11:35:31Z", CultureInfo.InvariantCulture); + + [Fact] + public void Map_ProducesCanonicalAdvisoryWithVendorPrimitives() + { + const string vendorStatement = + "The issue is confirmed, and here is the patch list\n\n" + + "V3912/V3910/V2962/V1000B\t4.4.3.6/4.4.5.1\n" + + "V2927/V2865/V2866\t4.5.1\n" + + "V2765/V2766/V2763/V2135\t4.5.1"; + + var vendor = new CertCcVendorDto( + "DrayTek Corporation", + ContactDate: PublishedAt.AddDays(-10), + StatementDate: PublishedAt.AddDays(-5), + Updated: PublishedAt, + Statement: vendorStatement, + Addendum: null, + References: new[] { "https://www.draytek.com/support/resources?type=version" }); + + var vendorStatus = new CertCcVendorStatusDto( + Vendor: "DrayTek Corporation", + CveId: "CVE-2025-10547", + Status: "Affected", + Statement: null, + References: Array.Empty(), + DateAdded: PublishedAt, + DateUpdated: PublishedAt); + + var vulnerability = new CertCcVulnerabilityDto( + CveId: "CVE-2025-10547", + Description: null, + DateAdded: PublishedAt, + DateUpdated: PublishedAt); + + var metadata = new CertCcNoteMetadata( + VuId: "VU#294418", + IdNumber: "294418", + Title: "Vigor routers running DrayOS RCE via EasyVPN", + Overview: "Overview", + Summary: "Summary", + Published: PublishedAt, + Updated: PublishedAt.AddMinutes(5), + Created: PublishedAt, + Revision: 2, + CveIds: new[] { "CVE-2025-10547" }, + PublicUrls: new[] + { + "https://www.draytek.com/about/security-advisory/use-of-uninitialized-variable-vulnerabilities/", + "https://www.draytek.com/support/resources?type=version" + }, + PrimaryUrl: "https://www.kb.cert.org/vuls/id/294418/"); + + var dto = new CertCcNoteDto( + metadata, + Vendors: new[] { vendor }, + VendorStatuses: new[] { vendorStatus }, + Vulnerabilities: new[] { vulnerability }); + + var document = new DocumentRecord( + Guid.NewGuid(), + "cert-cc", + "https://www.kb.cert.org/vuls/id/294418/", + PublishedAt, + Sha256: new string('0', 64), + Status: "pending-map", + ContentType: "application/json", + Headers: null, + Metadata: null, + Etag: null, + LastModified: PublishedAt, + PayloadId: null); + + var dtoRecord = new DtoRecord( + Id: Guid.NewGuid(), + DocumentId: document.Id, + SourceName: "cert-cc", + SchemaVersion: "certcc.vince.note.v1", + Payload: new BsonDocument(), + ValidatedAt: PublishedAt.AddMinutes(1)); + + var advisory = CertCcMapper.Map(dto, document, dtoRecord, "cert-cc"); + + Assert.Equal("certcc/vu-294418", advisory.AdvisoryKey); + Assert.Contains("VU#294418", advisory.Aliases); + Assert.Contains("CVE-2025-10547", advisory.Aliases); + Assert.Equal("en", advisory.Language); + Assert.Equal(PublishedAt, advisory.Published); + + Assert.Contains(advisory.References, reference => reference.Url.Contains("/vuls/id/294418", StringComparison.OrdinalIgnoreCase)); + + var affected = Assert.Single(advisory.AffectedPackages); + Assert.Equal("vendor", affected.Type); + Assert.Equal("DrayTek Corporation", affected.Identifier); + Assert.Contains(affected.Statuses, status => status.Status == AffectedPackageStatusCatalog.Affected); + + var range = Assert.Single(affected.VersionRanges); + Assert.NotNull(range.Primitives); + Assert.NotNull(range.Primitives!.VendorExtensions); + Assert.Contains(range.Primitives.VendorExtensions!, kvp => kvp.Key == "certcc.vendor.patches"); + + Assert.NotEmpty(affected.NormalizedVersions); + Assert.Contains(affected.NormalizedVersions, rule => rule.Scheme == "certcc.vendor" && rule.Value == "4.5.1"); + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Common.Tests/Common/SourceStateSeedProcessorTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Common.Tests/Common/SourceStateSeedProcessorTests.cs index e9912b3c4..54e090ab0 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Common.Tests/Common/SourceStateSeedProcessorTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Common.Tests/Common/SourceStateSeedProcessorTests.cs @@ -93,7 +93,7 @@ public sealed class SourceStateSeedProcessorTests : IAsyncLifetime Assert.Equal(documentId, storedDocument!.Id); Assert.Equal("application/json", storedDocument.ContentType); Assert.Equal(DocumentStatuses.PendingParse, storedDocument.Status); - Assert.NotNull(storedDocument.GridFsId); + Assert.NotNull(storedDocument.PayloadId); Assert.NotNull(storedDocument.Headers); Assert.Equal("true", storedDocument.Headers!["X-Test"]); Assert.NotNull(storedDocument.Metadata); @@ -153,7 +153,7 @@ public sealed class SourceStateSeedProcessorTests : IAsyncLifetime CancellationToken.None); Assert.NotNull(existingRecord); - var previousGridId = existingRecord!.GridFsId; + var previousGridId = existingRecord!.PayloadId; Assert.NotNull(previousGridId); var filesCollection = _database.GetCollection("documents.files"); @@ -189,8 +189,8 @@ public sealed class SourceStateSeedProcessorTests : IAsyncLifetime Assert.NotNull(refreshedRecord); Assert.Equal(documentId, refreshedRecord!.Id); - Assert.NotNull(refreshedRecord.GridFsId); - Assert.NotEqual(previousGridId, refreshedRecord.GridFsId); + Assert.NotNull(refreshedRecord.PayloadId); + Assert.NotEqual(previousGridId, refreshedRecord.PayloadId); var files = await filesCollection.Find(FilterDefinition.Empty).ToListAsync(); Assert.Single(files); diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Debian.Tests/DebianMapperTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Debian.Tests/DebianMapperTests.cs index 8b75a5b07..6636a0c55 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Debian.Tests/DebianMapperTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Debian.Tests/DebianMapperTests.cs @@ -1,82 +1,82 @@ -using System; -using Xunit; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Distro.Debian; -using StellaOps.Concelier.Connector.Distro.Debian.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; - -namespace StellaOps.Concelier.Connector.Distro.Debian.Tests; - -public sealed class DebianMapperTests -{ - [Fact] - public void Map_BuildsRangePrimitives_ForResolvedPackage() - { - var dto = new DebianAdvisoryDto( - AdvisoryId: "DSA-2024-123", - SourcePackage: "openssl", - Title: "Openssl security update", - Description: "Fixes multiple issues.", - CveIds: new[] { "CVE-2024-1000", "CVE-2024-1001" }, - Packages: new[] - { - new DebianPackageStateDto( - Package: "openssl", - Release: "bullseye", - Status: "resolved", - IntroducedVersion: "1:1.1.1n-0+deb11u2", - FixedVersion: "1:1.1.1n-0+deb11u5", - LastAffectedVersion: null, - Published: new DateTimeOffset(2024, 9, 1, 0, 0, 0, TimeSpan.Zero)), - new DebianPackageStateDto( - Package: "openssl", - Release: "bookworm", - Status: "open", - IntroducedVersion: null, - FixedVersion: null, - LastAffectedVersion: null, - Published: null) - }, - References: new[] - { - new DebianReferenceDto( - Url: "https://security-tracker.debian.org/tracker/DSA-2024-123", - Kind: "advisory", - Title: "Debian Security Advisory 2024-123"), - }); - - var document = new DocumentRecord( - Id: Guid.NewGuid(), - SourceName: DebianConnectorPlugin.SourceName, - Uri: "https://security-tracker.debian.org/tracker/DSA-2024-123", - FetchedAt: new DateTimeOffset(2024, 9, 1, 1, 0, 0, TimeSpan.Zero), - Sha256: "sha", - Status: "Fetched", - ContentType: "application/json", - Headers: null, - Metadata: null, - Etag: null, - LastModified: null, - GridFsId: null); - - Advisory advisory = DebianMapper.Map(dto, document, new DateTimeOffset(2024, 9, 1, 2, 0, 0, TimeSpan.Zero)); - - Assert.Equal("DSA-2024-123", advisory.AdvisoryKey); - Assert.Contains("CVE-2024-1000", advisory.Aliases); - Assert.Contains("CVE-2024-1001", advisory.Aliases); - - var resolvedPackage = Assert.Single(advisory.AffectedPackages, p => p.Platform == "bullseye"); - var range = Assert.Single(resolvedPackage.VersionRanges); - Assert.Equal("evr", range.RangeKind); - Assert.Equal("1:1.1.1n-0+deb11u2", range.IntroducedVersion); - Assert.Equal("1:1.1.1n-0+deb11u5", range.FixedVersion); - Assert.NotNull(range.Primitives); - var evr = range.Primitives!.Evr; - Assert.NotNull(evr); - Assert.NotNull(evr!.Introduced); - Assert.Equal(1, evr.Introduced!.Epoch); - Assert.Equal("1.1.1n", evr.Introduced.UpstreamVersion); - Assert.Equal("0+deb11u2", evr.Introduced.Revision); +using System; +using Xunit; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Distro.Debian; +using StellaOps.Concelier.Connector.Distro.Debian.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; + +namespace StellaOps.Concelier.Connector.Distro.Debian.Tests; + +public sealed class DebianMapperTests +{ + [Fact] + public void Map_BuildsRangePrimitives_ForResolvedPackage() + { + var dto = new DebianAdvisoryDto( + AdvisoryId: "DSA-2024-123", + SourcePackage: "openssl", + Title: "Openssl security update", + Description: "Fixes multiple issues.", + CveIds: new[] { "CVE-2024-1000", "CVE-2024-1001" }, + Packages: new[] + { + new DebianPackageStateDto( + Package: "openssl", + Release: "bullseye", + Status: "resolved", + IntroducedVersion: "1:1.1.1n-0+deb11u2", + FixedVersion: "1:1.1.1n-0+deb11u5", + LastAffectedVersion: null, + Published: new DateTimeOffset(2024, 9, 1, 0, 0, 0, TimeSpan.Zero)), + new DebianPackageStateDto( + Package: "openssl", + Release: "bookworm", + Status: "open", + IntroducedVersion: null, + FixedVersion: null, + LastAffectedVersion: null, + Published: null) + }, + References: new[] + { + new DebianReferenceDto( + Url: "https://security-tracker.debian.org/tracker/DSA-2024-123", + Kind: "advisory", + Title: "Debian Security Advisory 2024-123"), + }); + + var document = new DocumentRecord( + Id: Guid.NewGuid(), + SourceName: DebianConnectorPlugin.SourceName, + Uri: "https://security-tracker.debian.org/tracker/DSA-2024-123", + FetchedAt: new DateTimeOffset(2024, 9, 1, 1, 0, 0, TimeSpan.Zero), + Sha256: "sha", + Status: "Fetched", + ContentType: "application/json", + Headers: null, + Metadata: null, + Etag: null, + LastModified: null, + PayloadId: null); + + Advisory advisory = DebianMapper.Map(dto, document, new DateTimeOffset(2024, 9, 1, 2, 0, 0, TimeSpan.Zero)); + + Assert.Equal("DSA-2024-123", advisory.AdvisoryKey); + Assert.Contains("CVE-2024-1000", advisory.Aliases); + Assert.Contains("CVE-2024-1001", advisory.Aliases); + + var resolvedPackage = Assert.Single(advisory.AffectedPackages, p => p.Platform == "bullseye"); + var range = Assert.Single(resolvedPackage.VersionRanges); + Assert.Equal("evr", range.RangeKind); + Assert.Equal("1:1.1.1n-0+deb11u2", range.IntroducedVersion); + Assert.Equal("1:1.1.1n-0+deb11u5", range.FixedVersion); + Assert.NotNull(range.Primitives); + var evr = range.Primitives!.Evr; + Assert.NotNull(evr); + Assert.NotNull(evr!.Introduced); + Assert.Equal(1, evr.Introduced!.Epoch); + Assert.Equal("1.1.1n", evr.Introduced.UpstreamVersion); + Assert.Equal("0+deb11u2", evr.Introduced.Revision); Assert.NotNull(evr.Fixed); Assert.Equal(1, evr.Fixed!.Epoch); Assert.Equal("1.1.1n", evr.Fixed.UpstreamVersion); @@ -94,5 +94,5 @@ public sealed class DebianMapperTests var openPackage = Assert.Single(advisory.AffectedPackages, p => p.Platform == "bookworm"); Assert.Empty(openPackage.VersionRanges); Assert.Empty(openPackage.NormalizedVersions); - } -} + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.RedHat.Tests/RedHat/RedHatConnectorTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.RedHat.Tests/RedHat/RedHatConnectorTests.cs index ce2a4534c..24127e142 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.RedHat.Tests/RedHat/RedHatConnectorTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.RedHat.Tests/RedHat/RedHatConnectorTests.cs @@ -1,252 +1,252 @@ -using System; -using System.Globalization; -using System.Collections.Generic; -using System.IO; -using System.Linq; -using System.Threading; -using System.Threading.Tasks; -using Microsoft.Extensions.Configuration; -using System.Text.Json; -using Microsoft.Extensions.DependencyInjection; -using Microsoft.Extensions.Http; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Logging.Abstractions; -using Microsoft.Extensions.Options; -using Microsoft.Extensions.Time.Testing; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Core.Jobs; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Http; -using StellaOps.Concelier.Connector.Common.Testing; -using StellaOps.Concelier.Connector.Distro.RedHat; -using StellaOps.Concelier.Connector.Distro.RedHat.Configuration; -using StellaOps.Concelier.Connector.Distro.RedHat.Internal; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Testing; -using StellaOps.Plugin; -using Xunit; -using Xunit.Abstractions; - -namespace StellaOps.Concelier.Connector.Distro.RedHat.Tests; - -[Collection("mongo-fixture")] -public sealed class RedHatConnectorTests : IAsyncLifetime -{ - private readonly MongoIntegrationFixture _fixture; - private readonly FakeTimeProvider _timeProvider; - private readonly DateTimeOffset _initialNow; - private readonly CannedHttpMessageHandler _handler; - private readonly ITestOutputHelper _output; - private ServiceProvider? _serviceProvider; +using System; +using System.Globalization; +using System.Collections.Generic; +using System.IO; +using System.Linq; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Configuration; +using System.Text.Json; +using Microsoft.Extensions.DependencyInjection; +using Microsoft.Extensions.Http; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Logging.Abstractions; +using Microsoft.Extensions.Options; +using Microsoft.Extensions.Time.Testing; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Core.Jobs; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Http; +using StellaOps.Concelier.Connector.Common.Testing; +using StellaOps.Concelier.Connector.Distro.RedHat; +using StellaOps.Concelier.Connector.Distro.RedHat.Configuration; +using StellaOps.Concelier.Connector.Distro.RedHat.Internal; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Testing; +using StellaOps.Plugin; +using Xunit; +using Xunit.Abstractions; + +namespace StellaOps.Concelier.Connector.Distro.RedHat.Tests; + +[Collection("mongo-fixture")] +public sealed class RedHatConnectorTests : IAsyncLifetime +{ + private readonly MongoIntegrationFixture _fixture; + private readonly FakeTimeProvider _timeProvider; + private readonly DateTimeOffset _initialNow; + private readonly CannedHttpMessageHandler _handler; + private readonly ITestOutputHelper _output; + private ServiceProvider? _serviceProvider; private const bool ForceUpdateGoldens = false; - - public RedHatConnectorTests(MongoIntegrationFixture fixture, ITestOutputHelper output) - { - _fixture = fixture; - _initialNow = new DateTimeOffset(2025, 10, 5, 0, 0, 0, TimeSpan.Zero); - _timeProvider = new FakeTimeProvider(_initialNow); - _handler = new CannedHttpMessageHandler(); - _output = output; - } - - [Fact] - public async Task FetchParseMap_ProducesCanonicalAdvisory() - { - await ResetDatabaseAsync(); - - var options = new RedHatOptions - { - BaseEndpoint = new Uri("https://access.redhat.com/hydra/rest/securitydata"), - PageSize = 10, - MaxPagesPerFetch = 2, - MaxAdvisoriesPerFetch = 25, - InitialBackfill = TimeSpan.FromDays(1), - Overlap = TimeSpan.Zero, - FetchTimeout = TimeSpan.FromSeconds(30), - UserAgent = "StellaOps.Tests.RedHat/1.0", - }; - - await EnsureServiceProviderAsync(options); - var provider = _serviceProvider!; - - var configuredOptions = provider.GetRequiredService>().Value; - Assert.Equal(10, configuredOptions.PageSize); - Assert.Equal(TimeSpan.FromDays(1), configuredOptions.InitialBackfill); - Assert.Equal(TimeSpan.Zero, configuredOptions.Overlap); - _output.WriteLine($"InitialBackfill configured: {configuredOptions.InitialBackfill}"); - _output.WriteLine($"TimeProvider now: {_timeProvider.GetUtcNow():O}"); - - var summaryUriBackfill = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-03&per_page=10&page=1"); - var summaryUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=1"); - var summaryUriPost = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=1"); - var summaryUriPostPage2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=2"); - var detailUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0001.json"); - var detailUri2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0002.json"); - - _output.WriteLine($"Registering summary URI: {summaryUriBackfill}"); - _output.WriteLine($"Registering summary URI (overlap): {summaryUri}"); - _handler.AddJsonResponse(summaryUriBackfill, ReadFixture("summary-page1.json")); - _handler.AddJsonResponse(summaryUri, ReadFixture("summary-page1-repeat.json")); - _handler.AddJsonResponse(summaryUriPost, "[]"); - _handler.AddJsonResponse(summaryUriPostPage2, "[]"); - _handler.AddJsonResponse(detailUri, ReadFixture("csaf-rhsa-2025-0001.json")); - _handler.AddJsonResponse(detailUri2, ReadFixture("csaf-rhsa-2025-0002.json")); - - var stateRepository = provider.GetRequiredService(); - await stateRepository.UpsertAsync( - new SourceStateRecord( - RedHatConnectorPlugin.SourceName, - Enabled: true, - Paused: false, - Cursor: new BsonDocument(), - LastSuccess: null, - LastFailure: null, - FailCount: 0, - BackoffUntil: null, - UpdatedAt: _timeProvider.GetUtcNow(), - LastFailureReason: null), - CancellationToken.None); - - var connector = new RedHatConnectorPlugin().Create(provider); - - await connector.FetchAsync(provider, CancellationToken.None); - await connector.ParseAsync(provider, CancellationToken.None); - await connector.MapAsync(provider, CancellationToken.None); - - - foreach (var request in _handler.Requests) - { - _output.WriteLine($"Captured request: {request.Uri}"); - } - - var advisoryStore = provider.GetRequiredService(); - var advisories = await advisoryStore.GetRecentAsync(10, CancellationToken.None); - var advisory = advisories.Single(a => string.Equals(a.AdvisoryKey, "RHSA-2025:0001", StringComparison.Ordinal)); - Assert.Equal("red hat security advisory: example kernel update", advisory.Title.ToLowerInvariant()); - Assert.Contains("RHSA-2025:0001", advisory.Aliases); - Assert.Contains("CVE-2025-0001", advisory.Aliases); - Assert.Equal("high", advisory.Severity); - Assert.Equal("en", advisory.Language); - - var rpmPackage = advisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Rpm); - _output.WriteLine($"RPM statuses count: {rpmPackage.Statuses.Length}"); - _output.WriteLine($"RPM ranges count: {rpmPackage.VersionRanges.Length}"); - foreach (var range in rpmPackage.VersionRanges) - { - _output.WriteLine($"Range fixed={range.FixedVersion}, last={range.LastAffectedVersion}, expr={range.RangeExpression}"); - } - Assert.Equal("kernel-0:4.18.0-513.5.1.el8.x86_64", rpmPackage.Identifier); - var fixedRange = Assert.Single( - rpmPackage.VersionRanges, - range => string.Equals(range.FixedVersion, "kernel-0:4.18.0-513.5.1.el8.x86_64", StringComparison.Ordinal)); - Assert.Equal("kernel-0:4.18.0-500.1.0.el8.x86_64", fixedRange.LastAffectedVersion); - var nevraPrimitive = fixedRange.Primitives?.Nevra; - Assert.NotNull(nevraPrimitive); - Assert.Null(nevraPrimitive!.Introduced); - Assert.Equal("kernel", nevraPrimitive.Fixed?.Name); - - var cpePackage = advisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Cpe); - Assert.Equal("cpe:2.3:o:redhat:enterprise_linux:8:*:*:*:*:*:*:*", cpePackage.Identifier); - - Assert.Contains(advisory.References, reference => reference.Url == "https://access.redhat.com/errata/RHSA-2025:0001"); - Assert.Contains(advisory.References, reference => reference.Url == "https://www.cve.org/CVERecord?id=CVE-2025-0001"); - - var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n"); - _output.WriteLine("-- RHSA-2025:0001 snapshot --\n" + snapshot); - var snapshotPath = ProjectFixturePath("rhsa-2025-0001.snapshot.json"); - if (ShouldUpdateGoldens()) - { - File.WriteAllText(snapshotPath, snapshot); - return; - } - - var expectedSnapshot = File.ReadAllText(snapshotPath); - Assert.Equal(NormalizeLineEndings(expectedSnapshot), NormalizeLineEndings(snapshot)); - - var state = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); - Assert.NotNull(state); - Assert.True(state!.Cursor.TryGetValue("pendingDocuments", out var pendingDocs2) && pendingDocs2.AsBsonArray.Count == 0); - Assert.True(state.Cursor.TryGetValue("pendingMappings", out var pendingMappings2) && pendingMappings2.AsBsonArray.Count == 0); - - const string fetchKind = "source:redhat:fetch"; - const string parseKind = "source:redhat:parse"; - const string mapKind = "source:redhat:map"; - - var schedulerOptions = provider.GetRequiredService>().Value; - Assert.True(schedulerOptions.Definitions.TryGetValue(fetchKind, out var fetchDefinition)); - Assert.True(schedulerOptions.Definitions.TryGetValue(parseKind, out var parseDefinition)); - Assert.True(schedulerOptions.Definitions.TryGetValue(mapKind, out var mapDefinition)); - - Assert.Equal("RedHatFetchJob", fetchDefinition.JobType.Name); - Assert.Equal(TimeSpan.FromMinutes(12), fetchDefinition.Timeout); - Assert.Equal(TimeSpan.FromMinutes(6), fetchDefinition.LeaseDuration); - Assert.Equal("0,15,30,45 * * * *", fetchDefinition.CronExpression); - Assert.True(fetchDefinition.Enabled); - - Assert.Equal("RedHatParseJob", parseDefinition.JobType.Name); - Assert.Equal(TimeSpan.FromMinutes(15), parseDefinition.Timeout); - Assert.Equal(TimeSpan.FromMinutes(6), parseDefinition.LeaseDuration); - Assert.Equal("5,20,35,50 * * * *", parseDefinition.CronExpression); - Assert.True(parseDefinition.Enabled); - - Assert.Equal("RedHatMapJob", mapDefinition.JobType.Name); - Assert.Equal(TimeSpan.FromMinutes(20), mapDefinition.Timeout); - Assert.Equal(TimeSpan.FromMinutes(6), mapDefinition.LeaseDuration); - Assert.Equal("10,25,40,55 * * * *", mapDefinition.CronExpression); - Assert.True(mapDefinition.Enabled); - - var summaryUriRepeat = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-03&per_page=10&page=1"); - var summaryUriSecondPage = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-03&per_page=10&page=2"); - var summaryUriRepeatOverlap = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=1"); - var summaryUriSecondPageOverlap = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=2"); - - _output.WriteLine($"Registering repeat summary URI: {summaryUriRepeat}"); - _output.WriteLine($"Registering second page summary URI: {summaryUriSecondPage}"); - _output.WriteLine($"Registering overlap repeat summary URI: {summaryUriRepeatOverlap}"); - _output.WriteLine($"Registering overlap second page summary URI: {summaryUriSecondPageOverlap}"); - _handler.AddJsonResponse(summaryUriRepeat, ReadFixture("summary-page1-repeat.json")); - _handler.AddJsonResponse(summaryUriSecondPage, ReadFixture("summary-page2.json")); - _handler.AddJsonResponse(summaryUriRepeatOverlap, ReadFixture("summary-page1-repeat.json")); - _handler.AddJsonResponse(summaryUriSecondPageOverlap, ReadFixture("summary-page2.json")); - - await connector.FetchAsync(provider, CancellationToken.None); - await connector.ParseAsync(provider, CancellationToken.None); - await connector.MapAsync(provider, CancellationToken.None); - - advisories = await advisoryStore.GetRecentAsync(10, CancellationToken.None); - Assert.Equal(2, advisories.Count); - - var secondAdvisory = advisories.Single(a => string.Equals(a.AdvisoryKey, "RHSA-2025:0002", StringComparison.Ordinal)); - var rpm2 = secondAdvisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Rpm); - Assert.Equal("kernel-0:5.14.0-400.el9.x86_64", rpm2.Identifier); - const string knownNotAffected = "known_not_affected"; - - foreach (var status in rpm2.Statuses) - { - _output.WriteLine($"RPM2 status: {status.Status}"); - } - - Assert.DoesNotContain(rpm2.VersionRanges, range => string.Equals(range.RangeExpression, knownNotAffected, StringComparison.Ordinal)); - Assert.Contains(rpm2.Statuses, status => status.Status == knownNotAffected); - - var cpe2 = secondAdvisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Cpe); - Assert.Equal("cpe:2.3:o:redhat:enterprise_linux:9:*:*:*:*:*:*:*", cpe2.Identifier); - Assert.Empty(cpe2.VersionRanges); - Assert.Contains(cpe2.Statuses, status => status.Status == knownNotAffected); - - state = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); - Assert.NotNull(state); - Assert.True(state!.Cursor.TryGetValue("pendingDocuments", out var pendingDocs3) && pendingDocs3.AsBsonArray.Count == 0); - Assert.True(state.Cursor.TryGetValue("pendingMappings", out var pendingMappings3) && pendingMappings3.AsBsonArray.Count == 0); - } - - [Fact] - public void GoldenFixturesMatchSnapshots() + + public RedHatConnectorTests(MongoIntegrationFixture fixture, ITestOutputHelper output) + { + _fixture = fixture; + _initialNow = new DateTimeOffset(2025, 10, 5, 0, 0, 0, TimeSpan.Zero); + _timeProvider = new FakeTimeProvider(_initialNow); + _handler = new CannedHttpMessageHandler(); + _output = output; + } + + [Fact] + public async Task FetchParseMap_ProducesCanonicalAdvisory() + { + await ResetDatabaseAsync(); + + var options = new RedHatOptions + { + BaseEndpoint = new Uri("https://access.redhat.com/hydra/rest/securitydata"), + PageSize = 10, + MaxPagesPerFetch = 2, + MaxAdvisoriesPerFetch = 25, + InitialBackfill = TimeSpan.FromDays(1), + Overlap = TimeSpan.Zero, + FetchTimeout = TimeSpan.FromSeconds(30), + UserAgent = "StellaOps.Tests.RedHat/1.0", + }; + + await EnsureServiceProviderAsync(options); + var provider = _serviceProvider!; + + var configuredOptions = provider.GetRequiredService>().Value; + Assert.Equal(10, configuredOptions.PageSize); + Assert.Equal(TimeSpan.FromDays(1), configuredOptions.InitialBackfill); + Assert.Equal(TimeSpan.Zero, configuredOptions.Overlap); + _output.WriteLine($"InitialBackfill configured: {configuredOptions.InitialBackfill}"); + _output.WriteLine($"TimeProvider now: {_timeProvider.GetUtcNow():O}"); + + var summaryUriBackfill = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-03&per_page=10&page=1"); + var summaryUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=1"); + var summaryUriPost = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=1"); + var summaryUriPostPage2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=2"); + var detailUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0001.json"); + var detailUri2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0002.json"); + + _output.WriteLine($"Registering summary URI: {summaryUriBackfill}"); + _output.WriteLine($"Registering summary URI (overlap): {summaryUri}"); + _handler.AddJsonResponse(summaryUriBackfill, ReadFixture("summary-page1.json")); + _handler.AddJsonResponse(summaryUri, ReadFixture("summary-page1-repeat.json")); + _handler.AddJsonResponse(summaryUriPost, "[]"); + _handler.AddJsonResponse(summaryUriPostPage2, "[]"); + _handler.AddJsonResponse(detailUri, ReadFixture("csaf-rhsa-2025-0001.json")); + _handler.AddJsonResponse(detailUri2, ReadFixture("csaf-rhsa-2025-0002.json")); + + var stateRepository = provider.GetRequiredService(); + await stateRepository.UpsertAsync( + new SourceStateRecord( + RedHatConnectorPlugin.SourceName, + Enabled: true, + Paused: false, + Cursor: new BsonDocument(), + LastSuccess: null, + LastFailure: null, + FailCount: 0, + BackoffUntil: null, + UpdatedAt: _timeProvider.GetUtcNow(), + LastFailureReason: null), + CancellationToken.None); + + var connector = new RedHatConnectorPlugin().Create(provider); + + await connector.FetchAsync(provider, CancellationToken.None); + await connector.ParseAsync(provider, CancellationToken.None); + await connector.MapAsync(provider, CancellationToken.None); + + + foreach (var request in _handler.Requests) + { + _output.WriteLine($"Captured request: {request.Uri}"); + } + + var advisoryStore = provider.GetRequiredService(); + var advisories = await advisoryStore.GetRecentAsync(10, CancellationToken.None); + var advisory = advisories.Single(a => string.Equals(a.AdvisoryKey, "RHSA-2025:0001", StringComparison.Ordinal)); + Assert.Equal("red hat security advisory: example kernel update", advisory.Title.ToLowerInvariant()); + Assert.Contains("RHSA-2025:0001", advisory.Aliases); + Assert.Contains("CVE-2025-0001", advisory.Aliases); + Assert.Equal("high", advisory.Severity); + Assert.Equal("en", advisory.Language); + + var rpmPackage = advisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Rpm); + _output.WriteLine($"RPM statuses count: {rpmPackage.Statuses.Length}"); + _output.WriteLine($"RPM ranges count: {rpmPackage.VersionRanges.Length}"); + foreach (var range in rpmPackage.VersionRanges) + { + _output.WriteLine($"Range fixed={range.FixedVersion}, last={range.LastAffectedVersion}, expr={range.RangeExpression}"); + } + Assert.Equal("kernel-0:4.18.0-513.5.1.el8.x86_64", rpmPackage.Identifier); + var fixedRange = Assert.Single( + rpmPackage.VersionRanges, + range => string.Equals(range.FixedVersion, "kernel-0:4.18.0-513.5.1.el8.x86_64", StringComparison.Ordinal)); + Assert.Equal("kernel-0:4.18.0-500.1.0.el8.x86_64", fixedRange.LastAffectedVersion); + var nevraPrimitive = fixedRange.Primitives?.Nevra; + Assert.NotNull(nevraPrimitive); + Assert.Null(nevraPrimitive!.Introduced); + Assert.Equal("kernel", nevraPrimitive.Fixed?.Name); + + var cpePackage = advisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Cpe); + Assert.Equal("cpe:2.3:o:redhat:enterprise_linux:8:*:*:*:*:*:*:*", cpePackage.Identifier); + + Assert.Contains(advisory.References, reference => reference.Url == "https://access.redhat.com/errata/RHSA-2025:0001"); + Assert.Contains(advisory.References, reference => reference.Url == "https://www.cve.org/CVERecord?id=CVE-2025-0001"); + + var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n"); + _output.WriteLine("-- RHSA-2025:0001 snapshot --\n" + snapshot); + var snapshotPath = ProjectFixturePath("rhsa-2025-0001.snapshot.json"); + if (ShouldUpdateGoldens()) + { + File.WriteAllText(snapshotPath, snapshot); + return; + } + + var expectedSnapshot = File.ReadAllText(snapshotPath); + Assert.Equal(NormalizeLineEndings(expectedSnapshot), NormalizeLineEndings(snapshot)); + + var state = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); + Assert.NotNull(state); + Assert.True(state!.Cursor.TryGetValue("pendingDocuments", out var pendingDocs2) && pendingDocs2.AsBsonArray.Count == 0); + Assert.True(state.Cursor.TryGetValue("pendingMappings", out var pendingMappings2) && pendingMappings2.AsBsonArray.Count == 0); + + const string fetchKind = "source:redhat:fetch"; + const string parseKind = "source:redhat:parse"; + const string mapKind = "source:redhat:map"; + + var schedulerOptions = provider.GetRequiredService>().Value; + Assert.True(schedulerOptions.Definitions.TryGetValue(fetchKind, out var fetchDefinition)); + Assert.True(schedulerOptions.Definitions.TryGetValue(parseKind, out var parseDefinition)); + Assert.True(schedulerOptions.Definitions.TryGetValue(mapKind, out var mapDefinition)); + + Assert.Equal("RedHatFetchJob", fetchDefinition.JobType.Name); + Assert.Equal(TimeSpan.FromMinutes(12), fetchDefinition.Timeout); + Assert.Equal(TimeSpan.FromMinutes(6), fetchDefinition.LeaseDuration); + Assert.Equal("0,15,30,45 * * * *", fetchDefinition.CronExpression); + Assert.True(fetchDefinition.Enabled); + + Assert.Equal("RedHatParseJob", parseDefinition.JobType.Name); + Assert.Equal(TimeSpan.FromMinutes(15), parseDefinition.Timeout); + Assert.Equal(TimeSpan.FromMinutes(6), parseDefinition.LeaseDuration); + Assert.Equal("5,20,35,50 * * * *", parseDefinition.CronExpression); + Assert.True(parseDefinition.Enabled); + + Assert.Equal("RedHatMapJob", mapDefinition.JobType.Name); + Assert.Equal(TimeSpan.FromMinutes(20), mapDefinition.Timeout); + Assert.Equal(TimeSpan.FromMinutes(6), mapDefinition.LeaseDuration); + Assert.Equal("10,25,40,55 * * * *", mapDefinition.CronExpression); + Assert.True(mapDefinition.Enabled); + + var summaryUriRepeat = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-03&per_page=10&page=1"); + var summaryUriSecondPage = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-03&per_page=10&page=2"); + var summaryUriRepeatOverlap = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=1"); + var summaryUriSecondPageOverlap = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=2"); + + _output.WriteLine($"Registering repeat summary URI: {summaryUriRepeat}"); + _output.WriteLine($"Registering second page summary URI: {summaryUriSecondPage}"); + _output.WriteLine($"Registering overlap repeat summary URI: {summaryUriRepeatOverlap}"); + _output.WriteLine($"Registering overlap second page summary URI: {summaryUriSecondPageOverlap}"); + _handler.AddJsonResponse(summaryUriRepeat, ReadFixture("summary-page1-repeat.json")); + _handler.AddJsonResponse(summaryUriSecondPage, ReadFixture("summary-page2.json")); + _handler.AddJsonResponse(summaryUriRepeatOverlap, ReadFixture("summary-page1-repeat.json")); + _handler.AddJsonResponse(summaryUriSecondPageOverlap, ReadFixture("summary-page2.json")); + + await connector.FetchAsync(provider, CancellationToken.None); + await connector.ParseAsync(provider, CancellationToken.None); + await connector.MapAsync(provider, CancellationToken.None); + + advisories = await advisoryStore.GetRecentAsync(10, CancellationToken.None); + Assert.Equal(2, advisories.Count); + + var secondAdvisory = advisories.Single(a => string.Equals(a.AdvisoryKey, "RHSA-2025:0002", StringComparison.Ordinal)); + var rpm2 = secondAdvisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Rpm); + Assert.Equal("kernel-0:5.14.0-400.el9.x86_64", rpm2.Identifier); + const string knownNotAffected = "known_not_affected"; + + foreach (var status in rpm2.Statuses) + { + _output.WriteLine($"RPM2 status: {status.Status}"); + } + + Assert.DoesNotContain(rpm2.VersionRanges, range => string.Equals(range.RangeExpression, knownNotAffected, StringComparison.Ordinal)); + Assert.Contains(rpm2.Statuses, status => status.Status == knownNotAffected); + + var cpe2 = secondAdvisory.AffectedPackages.Single(pkg => pkg.Type == AffectedPackageTypes.Cpe); + Assert.Equal("cpe:2.3:o:redhat:enterprise_linux:9:*:*:*:*:*:*:*", cpe2.Identifier); + Assert.Empty(cpe2.VersionRanges); + Assert.Contains(cpe2.Statuses, status => status.Status == knownNotAffected); + + state = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); + Assert.NotNull(state); + Assert.True(state!.Cursor.TryGetValue("pendingDocuments", out var pendingDocs3) && pendingDocs3.AsBsonArray.Count == 0); + Assert.True(state.Cursor.TryGetValue("pendingMappings", out var pendingMappings3) && pendingMappings3.AsBsonArray.Count == 0); + } + + [Fact] + public void GoldenFixturesMatchSnapshots() { var fixtures = new[] { @@ -260,394 +260,394 @@ public sealed class RedHatConnectorTests : IAsyncLifetime InputFile: "csaf-rhsa-2025-0002.json", SnapshotFile: "rhsa-2025-0002.snapshot.json", ValidatedAt: DateTimeOffset.Parse("2025-10-05T12:00:00Z")), - new GoldenFixtureCase( - AdvisoryId: "RHSA-2025:0003", - InputFile: "csaf-rhsa-2025-0003.json", - SnapshotFile: "rhsa-2025-0003.snapshot.json", - ValidatedAt: DateTimeOffset.Parse("2025-10-06T09:00:00Z")), - }; - - var updateGoldens = ShouldUpdateGoldens(); - - foreach (var fixture in fixtures) - { - var snapshot = MapFixtureToSnapshot(fixture); - var snapshotPath = ProjectFixturePath(fixture.SnapshotFile); - - if (updateGoldens) - { - File.WriteAllText(snapshotPath, snapshot); - continue; - } - - var expected = File.ReadAllText(snapshotPath).Replace("\r\n", "\n"); - Assert.Equal(expected, snapshot); - } - } - - [Fact] - public async Task Resume_CompletesPendingDocumentsAfterRestart() - { - await ResetDatabaseAsync(); - - var options = new RedHatOptions - { - BaseEndpoint = new Uri("https://access.redhat.com/hydra/rest/securitydata"), - PageSize = 10, - MaxPagesPerFetch = 2, - MaxAdvisoriesPerFetch = 25, - InitialBackfill = TimeSpan.FromDays(1), - Overlap = TimeSpan.Zero, - FetchTimeout = TimeSpan.FromSeconds(30), - UserAgent = "StellaOps.Tests.RedHat/1.0", - }; - - var summaryUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=1"); - var summaryUriPost = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=1"); - var summaryUriPostPage2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=2"); - var detailUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0001.json"); - var detailUri2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0002.json"); - - var fetchHandler = new CannedHttpMessageHandler(); - fetchHandler.AddJsonResponse(summaryUri, ReadFixture("summary-page1-repeat.json")); - fetchHandler.AddJsonResponse(summaryUriPost, "[]"); - fetchHandler.AddJsonResponse(summaryUriPostPage2, "[]"); - fetchHandler.AddJsonResponse(detailUri, ReadFixture("csaf-rhsa-2025-0001.json")); - fetchHandler.AddJsonResponse(detailUri2, ReadFixture("csaf-rhsa-2025-0002.json")); - - Guid[] pendingDocumentIds; - await using (var fetchProvider = await CreateServiceProviderAsync(options, fetchHandler)) - { - var stateRepository = fetchProvider.GetRequiredService(); - await stateRepository.UpsertAsync( - new SourceStateRecord( - RedHatConnectorPlugin.SourceName, - Enabled: true, - Paused: false, - Cursor: new BsonDocument(), - LastSuccess: null, - LastFailure: null, - FailCount: 0, - BackoffUntil: null, - UpdatedAt: _timeProvider.GetUtcNow(), - LastFailureReason: null), - CancellationToken.None); - - var connector = new RedHatConnectorPlugin().Create(fetchProvider); - await connector.FetchAsync(fetchProvider, CancellationToken.None); - - var state = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); - Assert.NotNull(state); - var pendingDocs = state!.Cursor.TryGetValue("pendingDocuments", out var pendingDocsValue) - ? pendingDocsValue.AsBsonArray - : new BsonArray(); - Assert.NotEmpty(pendingDocs); - pendingDocumentIds = pendingDocs.Select(value => Guid.Parse(value.AsString)).ToArray(); - } - - var resumeHandler = new CannedHttpMessageHandler(); - await using (var resumeProvider = await CreateServiceProviderAsync(options, resumeHandler)) - { - var resumeConnector = new RedHatConnectorPlugin().Create(resumeProvider); - - await resumeConnector.ParseAsync(resumeProvider, CancellationToken.None); - await resumeConnector.MapAsync(resumeProvider, CancellationToken.None); - - var documentStore = resumeProvider.GetRequiredService(); - foreach (var documentId in pendingDocumentIds) - { - var document = await documentStore.FindAsync(documentId, CancellationToken.None); - Assert.NotNull(document); - Assert.Equal(DocumentStatuses.Mapped, document!.Status); - } - - var advisoryStore = resumeProvider.GetRequiredService(); - var advisories = await advisoryStore.GetRecentAsync(10, CancellationToken.None); - Assert.NotEmpty(advisories); - - var stateRepository = resumeProvider.GetRequiredService(); - var finalState = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); - Assert.NotNull(finalState); - var finalPendingDocs = finalState!.Cursor.TryGetValue("pendingDocuments", out var docsValue) ? docsValue.AsBsonArray : new BsonArray(); - Assert.Empty(finalPendingDocs); - var finalPendingMappings = finalState.Cursor.TryGetValue("pendingMappings", out var mappingsValue) ? mappingsValue.AsBsonArray : new BsonArray(); - Assert.Empty(finalPendingMappings); - } - } - - [Fact] - public async Task MapAsync_DeduplicatesReferencesAndOrdersDeterministically() - { - await ResetDatabaseAsync(); - - var options = new RedHatOptions - { - BaseEndpoint = new Uri("https://access.redhat.com/hydra/rest/securitydata"), - PageSize = 10, - MaxPagesPerFetch = 2, - MaxAdvisoriesPerFetch = 10, - InitialBackfill = TimeSpan.FromDays(7), - Overlap = TimeSpan.Zero, - FetchTimeout = TimeSpan.FromSeconds(30), - UserAgent = "StellaOps.Tests.RedHat/1.0", - }; - - await EnsureServiceProviderAsync(options); - var provider = _serviceProvider!; - - var summaryUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-09-28&per_page=10&page=1"); - var summaryUriPost = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=1"); - var detailUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0003.json"); - - _handler.AddJsonResponse(summaryUri, ReadFixture("summary-page3.json")); - _handler.AddJsonResponse(summaryUriPost, "[]"); - _handler.AddJsonResponse(detailUri, ReadFixture("csaf-rhsa-2025-0003.json")); - - var stateRepository = provider.GetRequiredService(); - await stateRepository.UpsertAsync( - new SourceStateRecord( - RedHatConnectorPlugin.SourceName, - Enabled: true, - Paused: false, - Cursor: new BsonDocument(), - LastSuccess: null, - LastFailure: null, - FailCount: 0, - BackoffUntil: null, - UpdatedAt: _timeProvider.GetUtcNow(), - LastFailureReason: null), - CancellationToken.None); - - var connector = new RedHatConnectorPlugin().Create(provider); - - await connector.FetchAsync(provider, CancellationToken.None); - await connector.ParseAsync(provider, CancellationToken.None); - await connector.MapAsync(provider, CancellationToken.None); - - var advisoryStore = provider.GetRequiredService(); - var advisory = (await advisoryStore.GetRecentAsync(10, CancellationToken.None)) - .Single(a => string.Equals(a.AdvisoryKey, "RHSA-2025:0003", StringComparison.Ordinal)); - - var references = advisory.References.ToArray(); - Assert.Collection( - references, - reference => - { - Assert.Equal("self", reference.Kind); - Assert.Equal("https://access.redhat.com/errata/RHSA-2025:0003", reference.Url); - Assert.Equal("Primary advisory", reference.Summary); - }, - reference => - { - Assert.Equal("mitigation", reference.Kind); - Assert.Equal("https://access.redhat.com/solutions/999999", reference.Url); - Assert.Equal("Knowledge base guidance", reference.Summary); - }, - reference => - { - Assert.Equal("exploit", reference.Kind); - Assert.Equal("https://bugzilla.redhat.com/show_bug.cgi?id=2222222", reference.Url); - Assert.Equal("Exploit tracking", reference.Summary); - }, - reference => - { - Assert.Equal("external", reference.Kind); - Assert.Equal("https://www.cve.org/CVERecord?id=CVE-2025-0003", reference.Url); - Assert.Equal("CVE record", reference.Summary); - }); - Assert.Equal(4, references.Length); - - Assert.Equal("self", references[0].Kind); - Assert.Equal("https://access.redhat.com/errata/RHSA-2025:0003", references[0].Url); - Assert.Equal("Primary advisory", references[0].Summary); - - Assert.Equal("mitigation", references[1].Kind); - Assert.Equal("https://access.redhat.com/solutions/999999", references[1].Url); - Assert.Equal("Knowledge base guidance", references[1].Summary); - - Assert.Equal("exploit", references[2].Kind); - Assert.Equal("https://bugzilla.redhat.com/show_bug.cgi?id=2222222", references[2].Url); - - Assert.Equal("external", references[3].Kind); - Assert.Equal("https://www.cve.org/CVERecord?id=CVE-2025-0003", references[3].Url); - Assert.Equal("CVE record", references[3].Summary); - } - - private static string MapFixtureToSnapshot(GoldenFixtureCase fixture) - { - var jsonPath = ProjectFixturePath(fixture.InputFile); - var json = File.ReadAllText(jsonPath); - - using var jsonDocument = JsonDocument.Parse(json); - var bson = BsonDocument.Parse(json); - - var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) - { - ["advisoryId"] = fixture.AdvisoryId, - }; - - var document = new DocumentRecord( - Guid.NewGuid(), - RedHatConnectorPlugin.SourceName, - $"https://access.redhat.com/hydra/rest/securitydata/csaf/{fixture.AdvisoryId}.json", - fixture.ValidatedAt, - new string('0', 64), - DocumentStatuses.Mapped, - "application/json", - Headers: null, - Metadata: metadata, - Etag: null, - LastModified: fixture.ValidatedAt, - GridFsId: null); - - var dto = new DtoRecord(Guid.NewGuid(), document.Id, RedHatConnectorPlugin.SourceName, "redhat.csaf.v2", bson, fixture.ValidatedAt); - - var advisory = RedHatMapper.Map(RedHatConnectorPlugin.SourceName, dto, document, jsonDocument); - Assert.NotNull(advisory); - - return SnapshotSerializer.ToSnapshot(advisory!).Replace("\r\n", "\n"); - } - - private static bool ShouldUpdateGoldens() - => ForceUpdateGoldens - || IsTruthy(Environment.GetEnvironmentVariable("UPDATE_GOLDENS")) - || IsTruthy(Environment.GetEnvironmentVariable("DOTNET_TEST_UPDATE_GOLDENS")); - - private static bool IsTruthy(string? value) - => !string.IsNullOrWhiteSpace(value) - && (string.Equals(value, "1", StringComparison.OrdinalIgnoreCase) - || string.Equals(value, "true", StringComparison.OrdinalIgnoreCase) - || string.Equals(value, "yes", StringComparison.OrdinalIgnoreCase)); - - private sealed record GoldenFixtureCase(string AdvisoryId, string InputFile, string SnapshotFile, DateTimeOffset ValidatedAt); - - private static string ProjectFixturePath(string filename) - => Path.Combine(GetProjectRoot(), "RedHat", "Fixtures", filename); - - private static string GetProjectRoot() - => Path.GetFullPath(Path.Combine(AppContext.BaseDirectory, "..", "..", "..")); - - private async Task EnsureServiceProviderAsync(RedHatOptions options) - { - if (_serviceProvider is not null) - { - return; - } - - _serviceProvider = await CreateServiceProviderAsync(options, _handler); - } - - private async Task CreateServiceProviderAsync(RedHatOptions options, CannedHttpMessageHandler handler) - { - var services = new ServiceCollection(); - services.AddLogging(builder => builder.AddProvider(NullLoggerProvider.Instance)); - services.AddSingleton(_timeProvider); - services.AddSingleton(handler); - - services.AddMongoStorage(storageOptions => - { - storageOptions.ConnectionString = _fixture.Runner.ConnectionString; - storageOptions.DatabaseName = _fixture.Database.DatabaseNamespace.DatabaseName; - storageOptions.CommandTimeout = TimeSpan.FromSeconds(5); - }); - - services.AddSourceCommon(); - services.AddRedHatConnector(opts => - { - opts.BaseEndpoint = options.BaseEndpoint; - opts.SummaryPath = options.SummaryPath; - opts.PageSize = options.PageSize; - opts.MaxPagesPerFetch = options.MaxPagesPerFetch; - opts.MaxAdvisoriesPerFetch = options.MaxAdvisoriesPerFetch; - opts.InitialBackfill = options.InitialBackfill; - opts.Overlap = options.Overlap; - opts.FetchTimeout = options.FetchTimeout; - opts.UserAgent = options.UserAgent; - }); - - services.Configure(schedulerOptions => - { - var fetchType = Type.GetType("StellaOps.Concelier.Connector.Distro.RedHat.RedHatFetchJob, StellaOps.Concelier.Connector.Distro.RedHat", throwOnError: true)!; - var parseType = Type.GetType("StellaOps.Concelier.Connector.Distro.RedHat.RedHatParseJob, StellaOps.Concelier.Connector.Distro.RedHat", throwOnError: true)!; - var mapType = Type.GetType("StellaOps.Concelier.Connector.Distro.RedHat.RedHatMapJob, StellaOps.Concelier.Connector.Distro.RedHat", throwOnError: true)!; - - schedulerOptions.Definitions["source:redhat:fetch"] = new JobDefinition("source:redhat:fetch", fetchType, TimeSpan.FromMinutes(12), TimeSpan.FromMinutes(6), "0,15,30,45 * * * *", true); - schedulerOptions.Definitions["source:redhat:parse"] = new JobDefinition("source:redhat:parse", parseType, TimeSpan.FromMinutes(15), TimeSpan.FromMinutes(6), "5,20,35,50 * * * *", true); - schedulerOptions.Definitions["source:redhat:map"] = new JobDefinition("source:redhat:map", mapType, TimeSpan.FromMinutes(20), TimeSpan.FromMinutes(6), "10,25,40,55 * * * *", true); - }); - - services.Configure(RedHatOptions.HttpClientName, builderOptions => - { - builderOptions.HttpMessageHandlerBuilderActions.Add(builder => - { - builder.PrimaryHandler = handler; - }); - }); - - var provider = services.BuildServiceProvider(); - var bootstrapper = provider.GetRequiredService(); - await bootstrapper.InitializeAsync(CancellationToken.None); - return provider; - } - - private Task ResetDatabaseAsync() - { - return ResetDatabaseInternalAsync(); - } - - private async Task ResetDatabaseInternalAsync() - { - if (_serviceProvider is not null) - { - if (_serviceProvider is IAsyncDisposable asyncDisposable) - { - await asyncDisposable.DisposeAsync(); - } - else - { - _serviceProvider.Dispose(); - } - - _serviceProvider = null; - } - - await _fixture.Client.DropDatabaseAsync(_fixture.Database.DatabaseNamespace.DatabaseName); - _handler.Clear(); - _timeProvider.SetUtcNow(_initialNow); - } - - private static string ReadFixture(string name) - => File.ReadAllText(ResolveFixturePath(name)); - - private static string ResolveFixturePath(string filename) - { - var candidates = new[] - { - Path.Combine(AppContext.BaseDirectory, "Source", "Distro", "RedHat", "Fixtures", filename), - Path.Combine(AppContext.BaseDirectory, "RedHat", "Fixtures", filename), - }; - - foreach (var candidate in candidates) - { - if (File.Exists(candidate)) - { - return candidate; - } - } - - throw new FileNotFoundException($"Fixture '{filename}' not found in output directory.", filename); - } - - private static string NormalizeLineEndings(string value) - { - var normalized = value.Replace("\r\n", "\n").Replace('\r', '\n'); - return normalized.TrimEnd('\n'); - } - - public Task InitializeAsync() => Task.CompletedTask; - - public async Task DisposeAsync() - { - await ResetDatabaseInternalAsync(); - } -} + new GoldenFixtureCase( + AdvisoryId: "RHSA-2025:0003", + InputFile: "csaf-rhsa-2025-0003.json", + SnapshotFile: "rhsa-2025-0003.snapshot.json", + ValidatedAt: DateTimeOffset.Parse("2025-10-06T09:00:00Z")), + }; + + var updateGoldens = ShouldUpdateGoldens(); + + foreach (var fixture in fixtures) + { + var snapshot = MapFixtureToSnapshot(fixture); + var snapshotPath = ProjectFixturePath(fixture.SnapshotFile); + + if (updateGoldens) + { + File.WriteAllText(snapshotPath, snapshot); + continue; + } + + var expected = File.ReadAllText(snapshotPath).Replace("\r\n", "\n"); + Assert.Equal(expected, snapshot); + } + } + + [Fact] + public async Task Resume_CompletesPendingDocumentsAfterRestart() + { + await ResetDatabaseAsync(); + + var options = new RedHatOptions + { + BaseEndpoint = new Uri("https://access.redhat.com/hydra/rest/securitydata"), + PageSize = 10, + MaxPagesPerFetch = 2, + MaxAdvisoriesPerFetch = 25, + InitialBackfill = TimeSpan.FromDays(1), + Overlap = TimeSpan.Zero, + FetchTimeout = TimeSpan.FromSeconds(30), + UserAgent = "StellaOps.Tests.RedHat/1.0", + }; + + var summaryUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-04&per_page=10&page=1"); + var summaryUriPost = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=1"); + var summaryUriPostPage2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=2"); + var detailUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0001.json"); + var detailUri2 = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0002.json"); + + var fetchHandler = new CannedHttpMessageHandler(); + fetchHandler.AddJsonResponse(summaryUri, ReadFixture("summary-page1-repeat.json")); + fetchHandler.AddJsonResponse(summaryUriPost, "[]"); + fetchHandler.AddJsonResponse(summaryUriPostPage2, "[]"); + fetchHandler.AddJsonResponse(detailUri, ReadFixture("csaf-rhsa-2025-0001.json")); + fetchHandler.AddJsonResponse(detailUri2, ReadFixture("csaf-rhsa-2025-0002.json")); + + Guid[] pendingDocumentIds; + await using (var fetchProvider = await CreateServiceProviderAsync(options, fetchHandler)) + { + var stateRepository = fetchProvider.GetRequiredService(); + await stateRepository.UpsertAsync( + new SourceStateRecord( + RedHatConnectorPlugin.SourceName, + Enabled: true, + Paused: false, + Cursor: new BsonDocument(), + LastSuccess: null, + LastFailure: null, + FailCount: 0, + BackoffUntil: null, + UpdatedAt: _timeProvider.GetUtcNow(), + LastFailureReason: null), + CancellationToken.None); + + var connector = new RedHatConnectorPlugin().Create(fetchProvider); + await connector.FetchAsync(fetchProvider, CancellationToken.None); + + var state = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); + Assert.NotNull(state); + var pendingDocs = state!.Cursor.TryGetValue("pendingDocuments", out var pendingDocsValue) + ? pendingDocsValue.AsBsonArray + : new BsonArray(); + Assert.NotEmpty(pendingDocs); + pendingDocumentIds = pendingDocs.Select(value => Guid.Parse(value.AsString)).ToArray(); + } + + var resumeHandler = new CannedHttpMessageHandler(); + await using (var resumeProvider = await CreateServiceProviderAsync(options, resumeHandler)) + { + var resumeConnector = new RedHatConnectorPlugin().Create(resumeProvider); + + await resumeConnector.ParseAsync(resumeProvider, CancellationToken.None); + await resumeConnector.MapAsync(resumeProvider, CancellationToken.None); + + var documentStore = resumeProvider.GetRequiredService(); + foreach (var documentId in pendingDocumentIds) + { + var document = await documentStore.FindAsync(documentId, CancellationToken.None); + Assert.NotNull(document); + Assert.Equal(DocumentStatuses.Mapped, document!.Status); + } + + var advisoryStore = resumeProvider.GetRequiredService(); + var advisories = await advisoryStore.GetRecentAsync(10, CancellationToken.None); + Assert.NotEmpty(advisories); + + var stateRepository = resumeProvider.GetRequiredService(); + var finalState = await stateRepository.TryGetAsync(RedHatConnectorPlugin.SourceName, CancellationToken.None); + Assert.NotNull(finalState); + var finalPendingDocs = finalState!.Cursor.TryGetValue("pendingDocuments", out var docsValue) ? docsValue.AsBsonArray : new BsonArray(); + Assert.Empty(finalPendingDocs); + var finalPendingMappings = finalState.Cursor.TryGetValue("pendingMappings", out var mappingsValue) ? mappingsValue.AsBsonArray : new BsonArray(); + Assert.Empty(finalPendingMappings); + } + } + + [Fact] + public async Task MapAsync_DeduplicatesReferencesAndOrdersDeterministically() + { + await ResetDatabaseAsync(); + + var options = new RedHatOptions + { + BaseEndpoint = new Uri("https://access.redhat.com/hydra/rest/securitydata"), + PageSize = 10, + MaxPagesPerFetch = 2, + MaxAdvisoriesPerFetch = 10, + InitialBackfill = TimeSpan.FromDays(7), + Overlap = TimeSpan.Zero, + FetchTimeout = TimeSpan.FromSeconds(30), + UserAgent = "StellaOps.Tests.RedHat/1.0", + }; + + await EnsureServiceProviderAsync(options); + var provider = _serviceProvider!; + + var summaryUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-09-28&per_page=10&page=1"); + var summaryUriPost = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf.json?after=2025-10-05&per_page=10&page=1"); + var detailUri = new Uri("https://access.redhat.com/hydra/rest/securitydata/csaf/RHSA-2025:0003.json"); + + _handler.AddJsonResponse(summaryUri, ReadFixture("summary-page3.json")); + _handler.AddJsonResponse(summaryUriPost, "[]"); + _handler.AddJsonResponse(detailUri, ReadFixture("csaf-rhsa-2025-0003.json")); + + var stateRepository = provider.GetRequiredService(); + await stateRepository.UpsertAsync( + new SourceStateRecord( + RedHatConnectorPlugin.SourceName, + Enabled: true, + Paused: false, + Cursor: new BsonDocument(), + LastSuccess: null, + LastFailure: null, + FailCount: 0, + BackoffUntil: null, + UpdatedAt: _timeProvider.GetUtcNow(), + LastFailureReason: null), + CancellationToken.None); + + var connector = new RedHatConnectorPlugin().Create(provider); + + await connector.FetchAsync(provider, CancellationToken.None); + await connector.ParseAsync(provider, CancellationToken.None); + await connector.MapAsync(provider, CancellationToken.None); + + var advisoryStore = provider.GetRequiredService(); + var advisory = (await advisoryStore.GetRecentAsync(10, CancellationToken.None)) + .Single(a => string.Equals(a.AdvisoryKey, "RHSA-2025:0003", StringComparison.Ordinal)); + + var references = advisory.References.ToArray(); + Assert.Collection( + references, + reference => + { + Assert.Equal("self", reference.Kind); + Assert.Equal("https://access.redhat.com/errata/RHSA-2025:0003", reference.Url); + Assert.Equal("Primary advisory", reference.Summary); + }, + reference => + { + Assert.Equal("mitigation", reference.Kind); + Assert.Equal("https://access.redhat.com/solutions/999999", reference.Url); + Assert.Equal("Knowledge base guidance", reference.Summary); + }, + reference => + { + Assert.Equal("exploit", reference.Kind); + Assert.Equal("https://bugzilla.redhat.com/show_bug.cgi?id=2222222", reference.Url); + Assert.Equal("Exploit tracking", reference.Summary); + }, + reference => + { + Assert.Equal("external", reference.Kind); + Assert.Equal("https://www.cve.org/CVERecord?id=CVE-2025-0003", reference.Url); + Assert.Equal("CVE record", reference.Summary); + }); + Assert.Equal(4, references.Length); + + Assert.Equal("self", references[0].Kind); + Assert.Equal("https://access.redhat.com/errata/RHSA-2025:0003", references[0].Url); + Assert.Equal("Primary advisory", references[0].Summary); + + Assert.Equal("mitigation", references[1].Kind); + Assert.Equal("https://access.redhat.com/solutions/999999", references[1].Url); + Assert.Equal("Knowledge base guidance", references[1].Summary); + + Assert.Equal("exploit", references[2].Kind); + Assert.Equal("https://bugzilla.redhat.com/show_bug.cgi?id=2222222", references[2].Url); + + Assert.Equal("external", references[3].Kind); + Assert.Equal("https://www.cve.org/CVERecord?id=CVE-2025-0003", references[3].Url); + Assert.Equal("CVE record", references[3].Summary); + } + + private static string MapFixtureToSnapshot(GoldenFixtureCase fixture) + { + var jsonPath = ProjectFixturePath(fixture.InputFile); + var json = File.ReadAllText(jsonPath); + + using var jsonDocument = JsonDocument.Parse(json); + var bson = BsonDocument.Parse(json); + + var metadata = new Dictionary(StringComparer.OrdinalIgnoreCase) + { + ["advisoryId"] = fixture.AdvisoryId, + }; + + var document = new DocumentRecord( + Guid.NewGuid(), + RedHatConnectorPlugin.SourceName, + $"https://access.redhat.com/hydra/rest/securitydata/csaf/{fixture.AdvisoryId}.json", + fixture.ValidatedAt, + new string('0', 64), + DocumentStatuses.Mapped, + "application/json", + Headers: null, + Metadata: metadata, + Etag: null, + LastModified: fixture.ValidatedAt, + PayloadId: null); + + var dto = new DtoRecord(Guid.NewGuid(), document.Id, RedHatConnectorPlugin.SourceName, "redhat.csaf.v2", bson, fixture.ValidatedAt); + + var advisory = RedHatMapper.Map(RedHatConnectorPlugin.SourceName, dto, document, jsonDocument); + Assert.NotNull(advisory); + + return SnapshotSerializer.ToSnapshot(advisory!).Replace("\r\n", "\n"); + } + + private static bool ShouldUpdateGoldens() + => ForceUpdateGoldens + || IsTruthy(Environment.GetEnvironmentVariable("UPDATE_GOLDENS")) + || IsTruthy(Environment.GetEnvironmentVariable("DOTNET_TEST_UPDATE_GOLDENS")); + + private static bool IsTruthy(string? value) + => !string.IsNullOrWhiteSpace(value) + && (string.Equals(value, "1", StringComparison.OrdinalIgnoreCase) + || string.Equals(value, "true", StringComparison.OrdinalIgnoreCase) + || string.Equals(value, "yes", StringComparison.OrdinalIgnoreCase)); + + private sealed record GoldenFixtureCase(string AdvisoryId, string InputFile, string SnapshotFile, DateTimeOffset ValidatedAt); + + private static string ProjectFixturePath(string filename) + => Path.Combine(GetProjectRoot(), "RedHat", "Fixtures", filename); + + private static string GetProjectRoot() + => Path.GetFullPath(Path.Combine(AppContext.BaseDirectory, "..", "..", "..")); + + private async Task EnsureServiceProviderAsync(RedHatOptions options) + { + if (_serviceProvider is not null) + { + return; + } + + _serviceProvider = await CreateServiceProviderAsync(options, _handler); + } + + private async Task CreateServiceProviderAsync(RedHatOptions options, CannedHttpMessageHandler handler) + { + var services = new ServiceCollection(); + services.AddLogging(builder => builder.AddProvider(NullLoggerProvider.Instance)); + services.AddSingleton(_timeProvider); + services.AddSingleton(handler); + + services.AddMongoStorage(storageOptions => + { + storageOptions.ConnectionString = _fixture.Runner.ConnectionString; + storageOptions.DatabaseName = _fixture.Database.DatabaseNamespace.DatabaseName; + storageOptions.CommandTimeout = TimeSpan.FromSeconds(5); + }); + + services.AddSourceCommon(); + services.AddRedHatConnector(opts => + { + opts.BaseEndpoint = options.BaseEndpoint; + opts.SummaryPath = options.SummaryPath; + opts.PageSize = options.PageSize; + opts.MaxPagesPerFetch = options.MaxPagesPerFetch; + opts.MaxAdvisoriesPerFetch = options.MaxAdvisoriesPerFetch; + opts.InitialBackfill = options.InitialBackfill; + opts.Overlap = options.Overlap; + opts.FetchTimeout = options.FetchTimeout; + opts.UserAgent = options.UserAgent; + }); + + services.Configure(schedulerOptions => + { + var fetchType = Type.GetType("StellaOps.Concelier.Connector.Distro.RedHat.RedHatFetchJob, StellaOps.Concelier.Connector.Distro.RedHat", throwOnError: true)!; + var parseType = Type.GetType("StellaOps.Concelier.Connector.Distro.RedHat.RedHatParseJob, StellaOps.Concelier.Connector.Distro.RedHat", throwOnError: true)!; + var mapType = Type.GetType("StellaOps.Concelier.Connector.Distro.RedHat.RedHatMapJob, StellaOps.Concelier.Connector.Distro.RedHat", throwOnError: true)!; + + schedulerOptions.Definitions["source:redhat:fetch"] = new JobDefinition("source:redhat:fetch", fetchType, TimeSpan.FromMinutes(12), TimeSpan.FromMinutes(6), "0,15,30,45 * * * *", true); + schedulerOptions.Definitions["source:redhat:parse"] = new JobDefinition("source:redhat:parse", parseType, TimeSpan.FromMinutes(15), TimeSpan.FromMinutes(6), "5,20,35,50 * * * *", true); + schedulerOptions.Definitions["source:redhat:map"] = new JobDefinition("source:redhat:map", mapType, TimeSpan.FromMinutes(20), TimeSpan.FromMinutes(6), "10,25,40,55 * * * *", true); + }); + + services.Configure(RedHatOptions.HttpClientName, builderOptions => + { + builderOptions.HttpMessageHandlerBuilderActions.Add(builder => + { + builder.PrimaryHandler = handler; + }); + }); + + var provider = services.BuildServiceProvider(); + var bootstrapper = provider.GetRequiredService(); + await bootstrapper.InitializeAsync(CancellationToken.None); + return provider; + } + + private Task ResetDatabaseAsync() + { + return ResetDatabaseInternalAsync(); + } + + private async Task ResetDatabaseInternalAsync() + { + if (_serviceProvider is not null) + { + if (_serviceProvider is IAsyncDisposable asyncDisposable) + { + await asyncDisposable.DisposeAsync(); + } + else + { + _serviceProvider.Dispose(); + } + + _serviceProvider = null; + } + + await _fixture.Client.DropDatabaseAsync(_fixture.Database.DatabaseNamespace.DatabaseName); + _handler.Clear(); + _timeProvider.SetUtcNow(_initialNow); + } + + private static string ReadFixture(string name) + => File.ReadAllText(ResolveFixturePath(name)); + + private static string ResolveFixturePath(string filename) + { + var candidates = new[] + { + Path.Combine(AppContext.BaseDirectory, "Source", "Distro", "RedHat", "Fixtures", filename), + Path.Combine(AppContext.BaseDirectory, "RedHat", "Fixtures", filename), + }; + + foreach (var candidate in candidates) + { + if (File.Exists(candidate)) + { + return candidate; + } + } + + throw new FileNotFoundException($"Fixture '{filename}' not found in output directory.", filename); + } + + private static string NormalizeLineEndings(string value) + { + var normalized = value.Replace("\r\n", "\n").Replace('\r', '\n'); + return normalized.TrimEnd('\n'); + } + + public Task InitializeAsync() => Task.CompletedTask; + + public async Task DisposeAsync() + { + await ResetDatabaseInternalAsync(); + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Suse.Tests/SuseMapperTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Suse.Tests/SuseMapperTests.cs index 243485a22..8b1d204ac 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Suse.Tests/SuseMapperTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Distro.Suse.Tests/SuseMapperTests.cs @@ -1,47 +1,47 @@ -using System; -using System.Collections.Generic; -using System.IO; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Distro.Suse; -using StellaOps.Concelier.Connector.Distro.Suse.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; -using Xunit; - -namespace StellaOps.Concelier.Connector.Distro.Suse.Tests; - -public sealed class SuseMapperTests -{ - [Fact] - public void Map_BuildsNevraRangePrimitives() - { - var json = File.ReadAllText(Path.Combine(AppContext.BaseDirectory, "Source", "Distro", "Suse", "Fixtures", "suse-su-2025_0001-1.json")); - var dto = SuseCsafParser.Parse(json); - - var document = new DocumentRecord( - Guid.NewGuid(), - SuseConnectorPlugin.SourceName, - "https://ftp.suse.com/pub/projects/security/csaf/suse-su-2025_0001-1.json", - DateTimeOffset.UtcNow, - "sha256", - DocumentStatuses.PendingParse, - "application/json", - Headers: null, - Metadata: new Dictionary(StringComparer.Ordinal) - { - ["suse.id"] = dto.AdvisoryId - }, - Etag: "adv-1", - LastModified: DateTimeOffset.UtcNow, - GridFsId: ObjectId.Empty); - - var mapped = SuseMapper.Map(dto, document, DateTimeOffset.UtcNow); - - Assert.Equal(dto.AdvisoryId, mapped.AdvisoryKey); - var package = Assert.Single(mapped.AffectedPackages); - Assert.Equal(AffectedPackageTypes.Rpm, package.Type); - var range = Assert.Single(package.VersionRanges); +using System; +using System.Collections.Generic; +using System.IO; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Distro.Suse; +using StellaOps.Concelier.Connector.Distro.Suse.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; +using Xunit; + +namespace StellaOps.Concelier.Connector.Distro.Suse.Tests; + +public sealed class SuseMapperTests +{ + [Fact] + public void Map_BuildsNevraRangePrimitives() + { + var json = File.ReadAllText(Path.Combine(AppContext.BaseDirectory, "Source", "Distro", "Suse", "Fixtures", "suse-su-2025_0001-1.json")); + var dto = SuseCsafParser.Parse(json); + + var document = new DocumentRecord( + Guid.NewGuid(), + SuseConnectorPlugin.SourceName, + "https://ftp.suse.com/pub/projects/security/csaf/suse-su-2025_0001-1.json", + DateTimeOffset.UtcNow, + "sha256", + DocumentStatuses.PendingParse, + "application/json", + Headers: null, + Metadata: new Dictionary(StringComparer.Ordinal) + { + ["suse.id"] = dto.AdvisoryId + }, + Etag: "adv-1", + LastModified: DateTimeOffset.UtcNow, + PayloadId: ObjectId.Empty); + + var mapped = SuseMapper.Map(dto, document, DateTimeOffset.UtcNow); + + Assert.Equal(dto.AdvisoryId, mapped.AdvisoryKey); + var package = Assert.Single(mapped.AffectedPackages); + Assert.Equal(AffectedPackageTypes.Rpm, package.Type); + var range = Assert.Single(package.VersionRanges); Assert.Equal("nevra", range.RangeKind); Assert.NotNull(range.Primitives); Assert.NotNull(range.Primitives!.Nevra); diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaConflictFixtureTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaConflictFixtureTests.cs index e3b8010b8..b8b863782 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaConflictFixtureTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaConflictFixtureTests.cs @@ -1,94 +1,94 @@ -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Ghsa.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; - -namespace StellaOps.Concelier.Connector.Ghsa.Tests; - -public sealed class GhsaConflictFixtureTests -{ - [Fact] - public void ConflictFixture_MatchesSnapshot() - { - var recordedAt = new DateTimeOffset(2025, 3, 4, 8, 30, 0, TimeSpan.Zero); - var document = new DocumentRecord( - Id: Guid.Parse("2f5c4d67-fcac-4ec9-a8d4-8a9c5a6d0fc9"), - SourceName: GhsaConnectorPlugin.SourceName, - Uri: "https://github.com/advisories/GHSA-qqqq-wwww-eeee", - FetchedAt: new DateTimeOffset(2025, 3, 3, 18, 0, 0, TimeSpan.Zero), - Sha256: "sha256-ghsa-conflict-fixture", - Status: "completed", - ContentType: "application/json", - Headers: null, - Metadata: null, - Etag: "\"etag-ghsa-conflict\"", - LastModified: new DateTimeOffset(2025, 3, 3, 18, 0, 0, TimeSpan.Zero), - GridFsId: null); - - var dto = new GhsaRecordDto - { - GhsaId = "GHSA-qqqq-wwww-eeee", - Summary = "Container escape in conflict-package", - Description = "Container escape vulnerability allowing privilege escalation in conflict-package.", - Severity = "HIGH", - PublishedAt = new DateTimeOffset(2025, 2, 25, 0, 0, 0, TimeSpan.Zero), - UpdatedAt = new DateTimeOffset(2025, 3, 2, 12, 0, 0, TimeSpan.Zero), - Aliases = new[] { "GHSA-qqqq-wwww-eeee", "CVE-2025-4242" }, - References = new[] - { - new GhsaReferenceDto - { - Url = "https://github.com/advisories/GHSA-qqqq-wwww-eeee", - Type = "ADVISORY" - }, - new GhsaReferenceDto - { - Url = "https://github.com/conflict/package/releases/tag/v1.4.0", - Type = "FIX" - } - }, - Affected = new[] - { - new GhsaAffectedDto - { - PackageName = "conflict/package", - Ecosystem = "npm", - VulnerableRange = "< 1.4.0", - PatchedVersion = "1.4.0" - } - }, - Credits = new[] - { - new GhsaCreditDto - { - Type = "reporter", - Name = "security-researcher", - Login = "sec-researcher", - ProfileUrl = "https://github.com/sec-researcher" - }, - new GhsaCreditDto - { - Type = "remediation_developer", - Name = "maintainer-team", - Login = "conflict-maintainer", - ProfileUrl = "https://github.com/conflict/package" - } - } - }; - - var advisory = GhsaMapper.Map(dto, document, recordedAt); - Assert.Equal("ghsa:severity/high", advisory.CanonicalMetricId); - Assert.True(advisory.CvssMetrics.IsEmpty); - var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n").TrimEnd(); - - var expectedPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-ghsa.canonical.json"); - var expected = File.ReadAllText(expectedPath).Replace("\r\n", "\n").TrimEnd(); - - if (!string.Equals(expected, snapshot, StringComparison.Ordinal)) - { - var actualPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-ghsa.canonical.actual.json"); - File.WriteAllText(actualPath, snapshot); - } - - Assert.Equal(expected, snapshot); - } -} +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Ghsa.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; + +namespace StellaOps.Concelier.Connector.Ghsa.Tests; + +public sealed class GhsaConflictFixtureTests +{ + [Fact] + public void ConflictFixture_MatchesSnapshot() + { + var recordedAt = new DateTimeOffset(2025, 3, 4, 8, 30, 0, TimeSpan.Zero); + var document = new DocumentRecord( + Id: Guid.Parse("2f5c4d67-fcac-4ec9-a8d4-8a9c5a6d0fc9"), + SourceName: GhsaConnectorPlugin.SourceName, + Uri: "https://github.com/advisories/GHSA-qqqq-wwww-eeee", + FetchedAt: new DateTimeOffset(2025, 3, 3, 18, 0, 0, TimeSpan.Zero), + Sha256: "sha256-ghsa-conflict-fixture", + Status: "completed", + ContentType: "application/json", + Headers: null, + Metadata: null, + Etag: "\"etag-ghsa-conflict\"", + LastModified: new DateTimeOffset(2025, 3, 3, 18, 0, 0, TimeSpan.Zero), + PayloadId: null); + + var dto = new GhsaRecordDto + { + GhsaId = "GHSA-qqqq-wwww-eeee", + Summary = "Container escape in conflict-package", + Description = "Container escape vulnerability allowing privilege escalation in conflict-package.", + Severity = "HIGH", + PublishedAt = new DateTimeOffset(2025, 2, 25, 0, 0, 0, TimeSpan.Zero), + UpdatedAt = new DateTimeOffset(2025, 3, 2, 12, 0, 0, TimeSpan.Zero), + Aliases = new[] { "GHSA-qqqq-wwww-eeee", "CVE-2025-4242" }, + References = new[] + { + new GhsaReferenceDto + { + Url = "https://github.com/advisories/GHSA-qqqq-wwww-eeee", + Type = "ADVISORY" + }, + new GhsaReferenceDto + { + Url = "https://github.com/conflict/package/releases/tag/v1.4.0", + Type = "FIX" + } + }, + Affected = new[] + { + new GhsaAffectedDto + { + PackageName = "conflict/package", + Ecosystem = "npm", + VulnerableRange = "< 1.4.0", + PatchedVersion = "1.4.0" + } + }, + Credits = new[] + { + new GhsaCreditDto + { + Type = "reporter", + Name = "security-researcher", + Login = "sec-researcher", + ProfileUrl = "https://github.com/sec-researcher" + }, + new GhsaCreditDto + { + Type = "remediation_developer", + Name = "maintainer-team", + Login = "conflict-maintainer", + ProfileUrl = "https://github.com/conflict/package" + } + } + }; + + var advisory = GhsaMapper.Map(dto, document, recordedAt); + Assert.Equal("ghsa:severity/high", advisory.CanonicalMetricId); + Assert.True(advisory.CvssMetrics.IsEmpty); + var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n").TrimEnd(); + + var expectedPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-ghsa.canonical.json"); + var expected = File.ReadAllText(expectedPath).Replace("\r\n", "\n").TrimEnd(); + + if (!string.Equals(expected, snapshot, StringComparison.Ordinal)) + { + var actualPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-ghsa.canonical.actual.json"); + File.WriteAllText(actualPath, snapshot); + } + + Assert.Equal(expected, snapshot); + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaMapperTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaMapperTests.cs index 208025841..84ad65e2a 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaMapperTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Ghsa.Tests/Ghsa/GhsaMapperTests.cs @@ -21,7 +21,7 @@ public sealed class GhsaMapperTests Metadata: null, Etag: "\"etag-ghsa-fallback\"", LastModified: recordedAt.AddHours(-3), - GridFsId: null); + PayloadId: null); var dto = new GhsaRecordDto { diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/NvdConflictFixtureTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/NvdConflictFixtureTests.cs index 13728e86a..0e027adbd 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/NvdConflictFixtureTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/NvdConflictFixtureTests.cs @@ -1,103 +1,103 @@ -using System.Text.Json; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Nvd.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; - -namespace StellaOps.Concelier.Connector.Nvd.Tests; - -public sealed class NvdConflictFixtureTests -{ - [Fact] - public void ConflictFixture_MatchesSnapshot() - { - const string payload = """ - { - "vulnerabilities": [ - { - "cve": { - "id": "CVE-2025-4242", - "published": "2025-03-01T10:15:00Z", - "lastModified": "2025-03-03T09:45:00Z", - "descriptions": [ - { "lang": "en", "value": "NVD baseline summary for conflict-package allowing container escape." } - ], - "references": [ - { - "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-4242", - "source": "NVD", - "tags": ["Vendor Advisory"] - } - ], - "weaknesses": [ - { - "description": [ - { "lang": "en", "value": "CWE-269" } - ] - } - ], - "metrics": { - "cvssMetricV31": [ - { - "cvssData": { - "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H", - "baseScore": 9.8, - "baseSeverity": "CRITICAL" - }, - "exploitabilityScore": 3.9, - "impactScore": 5.9 - } - ] - }, - "configurations": { - "nodes": [ - { - "cpeMatch": [ - { - "criteria": "cpe:2.3:a:conflict:package:1.0:*:*:*:*:*:*:*", - "vulnerable": true, - "versionStartIncluding": "1.0", - "versionEndExcluding": "1.4" - } - ] - } - ] - } - } - } - ] - } - """; - - using var document = JsonDocument.Parse(payload); - - var sourceDocument = new DocumentRecord( - Id: Guid.Parse("1a6a0700-2dd0-4f69-bb37-64ca77e51c91"), - SourceName: NvdConnectorPlugin.SourceName, - Uri: "https://services.nvd.nist.gov/rest/json/cve/2.0?cveId=CVE-2025-4242", - FetchedAt: new DateTimeOffset(2025, 3, 3, 10, 0, 0, TimeSpan.Zero), - Sha256: "sha256-nvd-conflict-fixture", - Status: "completed", - ContentType: "application/json", - Headers: null, - Metadata: null, - Etag: "\"etag-nvd-conflict\"", - LastModified: new DateTimeOffset(2025, 3, 3, 9, 45, 0, TimeSpan.Zero), - GridFsId: null); - - var advisories = NvdMapper.Map(document, sourceDocument, new DateTimeOffset(2025, 3, 4, 2, 0, 0, TimeSpan.Zero)); - var advisory = Assert.Single(advisories); - - var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n").TrimEnd(); - var expectedPath = Path.Combine(AppContext.BaseDirectory, "Nvd", "Fixtures", "conflict-nvd.canonical.json"); - var expected = File.ReadAllText(expectedPath).Replace("\r\n", "\n").TrimEnd(); - - if (!string.Equals(expected, snapshot, StringComparison.Ordinal)) - { - var actualPath = Path.Combine(AppContext.BaseDirectory, "Nvd", "Fixtures", "conflict-nvd.canonical.actual.json"); - Directory.CreateDirectory(Path.GetDirectoryName(actualPath)!); - File.WriteAllText(actualPath, snapshot); - } - - Assert.Equal(expected, snapshot); - } -} +using System.Text.Json; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Nvd.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; + +namespace StellaOps.Concelier.Connector.Nvd.Tests; + +public sealed class NvdConflictFixtureTests +{ + [Fact] + public void ConflictFixture_MatchesSnapshot() + { + const string payload = """ + { + "vulnerabilities": [ + { + "cve": { + "id": "CVE-2025-4242", + "published": "2025-03-01T10:15:00Z", + "lastModified": "2025-03-03T09:45:00Z", + "descriptions": [ + { "lang": "en", "value": "NVD baseline summary for conflict-package allowing container escape." } + ], + "references": [ + { + "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-4242", + "source": "NVD", + "tags": ["Vendor Advisory"] + } + ], + "weaknesses": [ + { + "description": [ + { "lang": "en", "value": "CWE-269" } + ] + } + ], + "metrics": { + "cvssMetricV31": [ + { + "cvssData": { + "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H", + "baseScore": 9.8, + "baseSeverity": "CRITICAL" + }, + "exploitabilityScore": 3.9, + "impactScore": 5.9 + } + ] + }, + "configurations": { + "nodes": [ + { + "cpeMatch": [ + { + "criteria": "cpe:2.3:a:conflict:package:1.0:*:*:*:*:*:*:*", + "vulnerable": true, + "versionStartIncluding": "1.0", + "versionEndExcluding": "1.4" + } + ] + } + ] + } + } + } + ] + } + """; + + using var document = JsonDocument.Parse(payload); + + var sourceDocument = new DocumentRecord( + Id: Guid.Parse("1a6a0700-2dd0-4f69-bb37-64ca77e51c91"), + SourceName: NvdConnectorPlugin.SourceName, + Uri: "https://services.nvd.nist.gov/rest/json/cve/2.0?cveId=CVE-2025-4242", + FetchedAt: new DateTimeOffset(2025, 3, 3, 10, 0, 0, TimeSpan.Zero), + Sha256: "sha256-nvd-conflict-fixture", + Status: "completed", + ContentType: "application/json", + Headers: null, + Metadata: null, + Etag: "\"etag-nvd-conflict\"", + LastModified: new DateTimeOffset(2025, 3, 3, 9, 45, 0, TimeSpan.Zero), + PayloadId: null); + + var advisories = NvdMapper.Map(document, sourceDocument, new DateTimeOffset(2025, 3, 4, 2, 0, 0, TimeSpan.Zero)); + var advisory = Assert.Single(advisories); + + var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n").TrimEnd(); + var expectedPath = Path.Combine(AppContext.BaseDirectory, "Nvd", "Fixtures", "conflict-nvd.canonical.json"); + var expected = File.ReadAllText(expectedPath).Replace("\r\n", "\n").TrimEnd(); + + if (!string.Equals(expected, snapshot, StringComparison.Ordinal)) + { + var actualPath = Path.Combine(AppContext.BaseDirectory, "Nvd", "Fixtures", "conflict-nvd.canonical.actual.json"); + Directory.CreateDirectory(Path.GetDirectoryName(actualPath)!); + File.WriteAllText(actualPath, snapshot); + } + + Assert.Equal(expected, snapshot); + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Osv.Tests/Osv/OsvConflictFixtureTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Osv.Tests/Osv/OsvConflictFixtureTests.cs index 79d6c69fb..a4e53ca23 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Osv.Tests/Osv/OsvConflictFixtureTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Osv.Tests/Osv/OsvConflictFixtureTests.cs @@ -1,118 +1,118 @@ -using System.Text.Json; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Osv.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; - -namespace StellaOps.Concelier.Connector.Osv.Tests; - -public sealed class OsvConflictFixtureTests -{ - [Fact] - public void ConflictFixture_MatchesSnapshot() - { - using var databaseSpecificDoc = JsonDocument.Parse("""{"severity":"medium"}"""); - - var dto = new OsvVulnerabilityDto - { - Id = "OSV-2025-4242", - Summary = "Container escape for conflict-package", - Details = "OSV captures the latest container escape details including patched version metadata.", - Aliases = new[] { "CVE-2025-4242", "GHSA-qqqq-wwww-eeee" }, - Published = new DateTimeOffset(2025, 2, 28, 0, 0, 0, TimeSpan.Zero), - Modified = new DateTimeOffset(2025, 3, 6, 12, 0, 0, TimeSpan.Zero), - Severity = new[] - { - new OsvSeverityDto - { - Type = "CVSS_V3", - Score = "CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:L/A:L" - } - }, - References = new[] - { - new OsvReferenceDto - { - Type = "ADVISORY", - Url = "https://osv.dev/vulnerability/OSV-2025-4242" - }, - new OsvReferenceDto - { - Type = "FIX", - Url = "https://github.com/conflict/package/commit/abcdef1234567890" - } - }, - Credits = new[] - { - new OsvCreditDto - { - Name = "osv-reporter", - Type = "reporter", - Contact = new[] { "mailto:osv-reporter@example.com" } - } - }, - Affected = new[] - { - new OsvAffectedPackageDto - { - Package = new OsvPackageDto - { - Ecosystem = "npm", - Name = "conflict/package" - }, - Ranges = new[] - { - new OsvRangeDto - { - Type = "SEMVER", - Events = new[] - { - new OsvEventDto { Introduced = "1.0.0" }, - new OsvEventDto { LastAffected = "1.4.2" }, - new OsvEventDto { Fixed = "1.5.0" } - } - } - } - } - }, - DatabaseSpecific = databaseSpecificDoc.RootElement.Clone() - }; - - var document = new DocumentRecord( - Id: Guid.Parse("8dd2b0fe-a5f5-4b3b-9f5c-0f3aad6fb6ce"), - SourceName: OsvConnectorPlugin.SourceName, - Uri: "https://api.osv.dev/v1/vulns/OSV-2025-4242", - FetchedAt: new DateTimeOffset(2025, 3, 6, 11, 30, 0, TimeSpan.Zero), - Sha256: "sha256-osv-conflict-fixture", - Status: "completed", - ContentType: "application/json", - Headers: null, - Metadata: null, - Etag: "\"etag-osv-conflict\"", - LastModified: new DateTimeOffset(2025, 3, 6, 12, 0, 0, TimeSpan.Zero), - GridFsId: null); - - var dtoRecord = new DtoRecord( - Id: Guid.Parse("6f7d5ce7-cb47-40a5-8b41-8ad022b5fd5c"), - DocumentId: document.Id, - SourceName: OsvConnectorPlugin.SourceName, - SchemaVersion: "osv.v1", - Payload: new BsonDocument("id", dto.Id), - ValidatedAt: new DateTimeOffset(2025, 3, 6, 12, 5, 0, TimeSpan.Zero)); - - var advisory = OsvMapper.Map(dto, document, dtoRecord, "npm"); - var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n").TrimEnd(); - - var expectedPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-osv.canonical.json"); - var expected = File.ReadAllText(expectedPath).Replace("\r\n", "\n").TrimEnd(); - - if (!string.Equals(expected, snapshot, StringComparison.Ordinal)) - { - var actualPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-osv.canonical.actual.json"); - File.WriteAllText(actualPath, snapshot); - } - - Assert.Equal(expected, snapshot); - } -} +using System.Text.Json; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Osv.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; + +namespace StellaOps.Concelier.Connector.Osv.Tests; + +public sealed class OsvConflictFixtureTests +{ + [Fact] + public void ConflictFixture_MatchesSnapshot() + { + using var databaseSpecificDoc = JsonDocument.Parse("""{"severity":"medium"}"""); + + var dto = new OsvVulnerabilityDto + { + Id = "OSV-2025-4242", + Summary = "Container escape for conflict-package", + Details = "OSV captures the latest container escape details including patched version metadata.", + Aliases = new[] { "CVE-2025-4242", "GHSA-qqqq-wwww-eeee" }, + Published = new DateTimeOffset(2025, 2, 28, 0, 0, 0, TimeSpan.Zero), + Modified = new DateTimeOffset(2025, 3, 6, 12, 0, 0, TimeSpan.Zero), + Severity = new[] + { + new OsvSeverityDto + { + Type = "CVSS_V3", + Score = "CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:L/A:L" + } + }, + References = new[] + { + new OsvReferenceDto + { + Type = "ADVISORY", + Url = "https://osv.dev/vulnerability/OSV-2025-4242" + }, + new OsvReferenceDto + { + Type = "FIX", + Url = "https://github.com/conflict/package/commit/abcdef1234567890" + } + }, + Credits = new[] + { + new OsvCreditDto + { + Name = "osv-reporter", + Type = "reporter", + Contact = new[] { "mailto:osv-reporter@example.com" } + } + }, + Affected = new[] + { + new OsvAffectedPackageDto + { + Package = new OsvPackageDto + { + Ecosystem = "npm", + Name = "conflict/package" + }, + Ranges = new[] + { + new OsvRangeDto + { + Type = "SEMVER", + Events = new[] + { + new OsvEventDto { Introduced = "1.0.0" }, + new OsvEventDto { LastAffected = "1.4.2" }, + new OsvEventDto { Fixed = "1.5.0" } + } + } + } + } + }, + DatabaseSpecific = databaseSpecificDoc.RootElement.Clone() + }; + + var document = new DocumentRecord( + Id: Guid.Parse("8dd2b0fe-a5f5-4b3b-9f5c-0f3aad6fb6ce"), + SourceName: OsvConnectorPlugin.SourceName, + Uri: "https://api.osv.dev/v1/vulns/OSV-2025-4242", + FetchedAt: new DateTimeOffset(2025, 3, 6, 11, 30, 0, TimeSpan.Zero), + Sha256: "sha256-osv-conflict-fixture", + Status: "completed", + ContentType: "application/json", + Headers: null, + Metadata: null, + Etag: "\"etag-osv-conflict\"", + LastModified: new DateTimeOffset(2025, 3, 6, 12, 0, 0, TimeSpan.Zero), + PayloadId: null); + + var dtoRecord = new DtoRecord( + Id: Guid.Parse("6f7d5ce7-cb47-40a5-8b41-8ad022b5fd5c"), + DocumentId: document.Id, + SourceName: OsvConnectorPlugin.SourceName, + SchemaVersion: "osv.v1", + Payload: new BsonDocument("id", dto.Id), + ValidatedAt: new DateTimeOffset(2025, 3, 6, 12, 5, 0, TimeSpan.Zero)); + + var advisory = OsvMapper.Map(dto, document, dtoRecord, "npm"); + var snapshot = SnapshotSerializer.ToSnapshot(advisory).Replace("\r\n", "\n").TrimEnd(); + + var expectedPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-osv.canonical.json"); + var expected = File.ReadAllText(expectedPath).Replace("\r\n", "\n").TrimEnd(); + + if (!string.Equals(expected, snapshot, StringComparison.Ordinal)) + { + var actualPath = Path.Combine(AppContext.BaseDirectory, "Fixtures", "conflict-osv.canonical.actual.json"); + File.WriteAllText(actualPath, snapshot); + } + + Assert.Equal(expected, snapshot); + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.StellaOpsMirror.Tests/StellaOpsMirrorConnectorTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.StellaOpsMirror.Tests/StellaOpsMirrorConnectorTests.cs index ce368f871..5031c0d3f 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.StellaOpsMirror.Tests/StellaOpsMirrorConnectorTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.StellaOpsMirror.Tests/StellaOpsMirrorConnectorTests.cs @@ -1,463 +1,463 @@ -using System; -using System.Collections.Generic; -using System.IO; -using System.Net; -using System.Net.Http; -using System.Security.Cryptography; -using System.Text; -using System.Text.Json; -using Microsoft.Extensions.Configuration; -using Microsoft.Extensions.DependencyInjection; -using Microsoft.Extensions.Http; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Logging.Abstractions; -using Microsoft.Extensions.Options; -using MongoDB.Bson; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Common.Fetch; -using StellaOps.Concelier.Connector.Common.Testing; -using StellaOps.Concelier.Connector.StellaOpsMirror.Internal; -using StellaOps.Concelier.Connector.StellaOpsMirror.Settings; -using StellaOps.Concelier.Storage.Mongo; -using StellaOps.Concelier.Storage.Mongo.Advisories; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using StellaOps.Concelier.Testing; +using System; +using System.Collections.Generic; +using System.IO; +using System.Net; +using System.Net.Http; +using System.Security.Cryptography; +using System.Text; +using System.Text.Json; +using Microsoft.Extensions.Configuration; +using Microsoft.Extensions.DependencyInjection; +using Microsoft.Extensions.Http; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Logging.Abstractions; +using Microsoft.Extensions.Options; +using MongoDB.Bson; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Common.Fetch; +using StellaOps.Concelier.Connector.Common.Testing; +using StellaOps.Concelier.Connector.StellaOpsMirror.Internal; +using StellaOps.Concelier.Connector.StellaOpsMirror.Settings; +using StellaOps.Concelier.Storage.Mongo; +using StellaOps.Concelier.Storage.Mongo.Advisories; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using StellaOps.Concelier.Testing; using StellaOps.Cryptography; using StellaOps.Cryptography.DependencyInjection; -using StellaOps.Concelier.Models; -using Xunit; - -namespace StellaOps.Concelier.Connector.StellaOpsMirror.Tests; - -[Collection("mongo-fixture")] -public sealed class StellaOpsMirrorConnectorTests : IAsyncLifetime -{ - private readonly MongoIntegrationFixture _fixture; - private readonly CannedHttpMessageHandler _handler; - - public StellaOpsMirrorConnectorTests(MongoIntegrationFixture fixture) - { - _fixture = fixture; - _handler = new CannedHttpMessageHandler(); - } - - [Fact] - public async Task FetchAsync_PersistsMirrorArtifacts() - { - var manifestContent = "{\"domain\":\"primary\",\"files\":[]}"; - var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0001\"}]}"; - - var manifestDigest = ComputeDigest(manifestContent); - var bundleDigest = ComputeDigest(bundleContent); - - var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestContent), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: false); - - await using var provider = await BuildServiceProviderAsync(); - - SeedResponses(index, manifestContent, bundleContent, signature: null); - - var connector = provider.GetRequiredService(); - await connector.FetchAsync(provider, CancellationToken.None); - - var documentStore = provider.GetRequiredService(); - var manifestUri = "https://mirror.test/mirror/primary/manifest.json"; - var bundleUri = "https://mirror.test/mirror/primary/bundle.json"; - - var manifestDocument = await documentStore.FindBySourceAndUriAsync(StellaOpsMirrorConnector.Source, manifestUri, CancellationToken.None); - Assert.NotNull(manifestDocument); - Assert.Equal(DocumentStatuses.Mapped, manifestDocument!.Status); - Assert.Equal(NormalizeDigest(manifestDigest), manifestDocument.Sha256); - - var bundleDocument = await documentStore.FindBySourceAndUriAsync(StellaOpsMirrorConnector.Source, bundleUri, CancellationToken.None); - Assert.NotNull(bundleDocument); - Assert.Equal(DocumentStatuses.PendingParse, bundleDocument!.Status); - Assert.Equal(NormalizeDigest(bundleDigest), bundleDocument.Sha256); - - var rawStorage = provider.GetRequiredService(); - Assert.NotNull(manifestDocument.GridFsId); - Assert.NotNull(bundleDocument.GridFsId); - - var manifestBytes = await rawStorage.DownloadAsync(manifestDocument.GridFsId!.Value, CancellationToken.None); - var bundleBytes = await rawStorage.DownloadAsync(bundleDocument.GridFsId!.Value, CancellationToken.None); - Assert.Equal(manifestContent, Encoding.UTF8.GetString(manifestBytes)); - Assert.Equal(bundleContent, Encoding.UTF8.GetString(bundleBytes)); - - var stateRepository = provider.GetRequiredService(); - var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); - Assert.NotNull(state); - - var cursorDocument = state!.Cursor ?? new BsonDocument(); - var digestValue = cursorDocument.TryGetValue("bundleDigest", out var digestBson) ? digestBson.AsString : string.Empty; - Assert.Equal(NormalizeDigest(bundleDigest), NormalizeDigest(digestValue)); - - var pendingDocumentsArray = cursorDocument.TryGetValue("pendingDocuments", out var pendingDocsBson) && pendingDocsBson is BsonArray pendingArray - ? pendingArray - : new BsonArray(); - Assert.Single(pendingDocumentsArray); - var pendingDocumentId = Guid.Parse(pendingDocumentsArray[0].AsString); - Assert.Equal(bundleDocument.Id, pendingDocumentId); - - var pendingMappingsArray = cursorDocument.TryGetValue("pendingMappings", out var pendingMappingsBson) && pendingMappingsBson is BsonArray mappingsArray - ? mappingsArray - : new BsonArray(); - Assert.Empty(pendingMappingsArray); - } - - [Fact] - public async Task FetchAsync_TamperedSignatureThrows() - { - var manifestContent = "{\"domain\":\"primary\"}"; - var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0002\"}]}"; - - var manifestDigest = ComputeDigest(manifestContent); - var bundleDigest = ComputeDigest(bundleContent); - var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestContent), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: true); - - await using var provider = await BuildServiceProviderAsync(options => - { - options.Signature.Enabled = true; - options.Signature.KeyId = "mirror-key"; - options.Signature.Provider = "default"; - }); - - var defaultProvider = provider.GetRequiredService(); - var signingKey = CreateSigningKey("mirror-key"); - defaultProvider.UpsertSigningKey(signingKey); - - var (signatureValue, _) = CreateDetachedJws(signingKey, bundleContent); - // Tamper with signature so verification fails. - var tamperedSignature = signatureValue.Replace('a', 'b'); - - SeedResponses(index, manifestContent, bundleContent, tamperedSignature); - - var connector = provider.GetRequiredService(); - await Assert.ThrowsAsync(() => connector.FetchAsync(provider, CancellationToken.None)); - - var stateRepository = provider.GetRequiredService(); - var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); - Assert.NotNull(state); - Assert.True(state!.FailCount >= 1); - Assert.False(state.Cursor.TryGetValue("bundleDigest", out _)); - } - - [Fact] - public async Task FetchAsync_SignatureKeyMismatchThrows() - { - var manifestContent = "{\"domain\":\"primary\"}"; - var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0003\"}]}"; - - var manifestDigest = ComputeDigest(manifestContent); - var bundleDigest = ComputeDigest(bundleContent); - var index = BuildIndex( - manifestDigest, - Encoding.UTF8.GetByteCount(manifestContent), - bundleDigest, - Encoding.UTF8.GetByteCount(bundleContent), - includeSignature: true, - signatureKeyId: "unexpected-key", - signatureProvider: "default"); - - var signingKey = CreateSigningKey("unexpected-key"); - var (signatureValue, _) = CreateDetachedJws(signingKey, bundleContent); - - await using var provider = await BuildServiceProviderAsync(options => - { - options.Signature.Enabled = true; - options.Signature.KeyId = "mirror-key"; - options.Signature.Provider = "default"; - }); - - SeedResponses(index, manifestContent, bundleContent, signatureValue); - - var connector = provider.GetRequiredService(); - await Assert.ThrowsAsync(() => connector.FetchAsync(provider, CancellationToken.None)); - } - - [Fact] - public async Task FetchAsync_VerifiesSignatureUsingFallbackPublicKey() - { - var manifestContent = "{\"domain\":\"primary\"}"; - var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0004\"}]}"; - - var manifestDigest = ComputeDigest(manifestContent); - var bundleDigest = ComputeDigest(bundleContent); - var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestContent), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: true); - - var signingKey = CreateSigningKey("mirror-key"); - var (signatureValue, _) = CreateDetachedJws(signingKey, bundleContent); - var publicKeyPath = WritePublicKeyPem(signingKey); - - await using var provider = await BuildServiceProviderAsync(options => - { - options.Signature.Enabled = true; - options.Signature.KeyId = "mirror-key"; - options.Signature.Provider = "default"; - options.Signature.PublicKeyPath = publicKeyPath; - }); - - try - { - SeedResponses(index, manifestContent, bundleContent, signatureValue); - - var connector = provider.GetRequiredService(); - await connector.FetchAsync(provider, CancellationToken.None); - - var stateRepository = provider.GetRequiredService(); - var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); - Assert.NotNull(state); - Assert.Equal(0, state!.FailCount); - } - finally - { - if (File.Exists(publicKeyPath)) - { - File.Delete(publicKeyPath); - } - } - } - - [Fact] - public async Task FetchAsync_DigestMismatchMarksFailure() - { - var manifestExpected = "{\"domain\":\"primary\"}"; - var manifestTampered = "{\"domain\":\"tampered\"}"; - var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0005\"}]}"; - - var manifestDigest = ComputeDigest(manifestExpected); - var bundleDigest = ComputeDigest(bundleContent); - var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestExpected), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: false); - - await using var provider = await BuildServiceProviderAsync(); - - SeedResponses(index, manifestTampered, bundleContent, signature: null); - - var connector = provider.GetRequiredService(); - - await Assert.ThrowsAsync(() => connector.FetchAsync(provider, CancellationToken.None)); - - var stateRepository = provider.GetRequiredService(); - var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); - Assert.NotNull(state); - var cursor = state!.Cursor ?? new BsonDocument(); - Assert.True(state.FailCount >= 1); - Assert.False(cursor.Contains("bundleDigest")); - } - - [Fact] - public void ParseAndMap_PersistAdvisoriesFromBundle() - { - var bundleDocument = SampleData.CreateBundle(); - var bundleJson = CanonicalJsonSerializer.SerializeIndented(bundleDocument); - var normalizedFixture = FixtureLoader.Read(SampleData.BundleFixture).TrimEnd(); - Assert.Equal(normalizedFixture, FixtureLoader.Normalize(bundleJson).TrimEnd()); - - var advisories = MirrorAdvisoryMapper.Map(bundleDocument); - Assert.Single(advisories); - var advisory = advisories[0]; - - var expectedAdvisoryJson = FixtureLoader.Read(SampleData.AdvisoryFixture).TrimEnd(); - var mappedJson = CanonicalJsonSerializer.SerializeIndented(advisory); - Assert.Equal(expectedAdvisoryJson, FixtureLoader.Normalize(mappedJson).TrimEnd()); - - // AdvisoryStore integration validated elsewhere; ensure canonical serialization is stable. - } - - public Task InitializeAsync() => Task.CompletedTask; - - public Task DisposeAsync() - { - _handler.Clear(); - return Task.CompletedTask; - } - - private async Task BuildServiceProviderAsync(Action? configureOptions = null) - { - await _fixture.Client.DropDatabaseAsync(_fixture.Database.DatabaseNamespace.DatabaseName); - _handler.Clear(); - - var services = new ServiceCollection(); - services.AddLogging(builder => builder.AddProvider(NullLoggerProvider.Instance)); - services.AddSingleton(_handler); - services.AddSingleton(TimeProvider.System); - - services.AddMongoStorage(options => - { - options.ConnectionString = _fixture.Runner.ConnectionString; - options.DatabaseName = _fixture.Database.DatabaseNamespace.DatabaseName; - options.CommandTimeout = TimeSpan.FromSeconds(5); - }); - +using StellaOps.Concelier.Models; +using Xunit; + +namespace StellaOps.Concelier.Connector.StellaOpsMirror.Tests; + +[Collection("mongo-fixture")] +public sealed class StellaOpsMirrorConnectorTests : IAsyncLifetime +{ + private readonly MongoIntegrationFixture _fixture; + private readonly CannedHttpMessageHandler _handler; + + public StellaOpsMirrorConnectorTests(MongoIntegrationFixture fixture) + { + _fixture = fixture; + _handler = new CannedHttpMessageHandler(); + } + + [Fact] + public async Task FetchAsync_PersistsMirrorArtifacts() + { + var manifestContent = "{\"domain\":\"primary\",\"files\":[]}"; + var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0001\"}]}"; + + var manifestDigest = ComputeDigest(manifestContent); + var bundleDigest = ComputeDigest(bundleContent); + + var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestContent), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: false); + + await using var provider = await BuildServiceProviderAsync(); + + SeedResponses(index, manifestContent, bundleContent, signature: null); + + var connector = provider.GetRequiredService(); + await connector.FetchAsync(provider, CancellationToken.None); + + var documentStore = provider.GetRequiredService(); + var manifestUri = "https://mirror.test/mirror/primary/manifest.json"; + var bundleUri = "https://mirror.test/mirror/primary/bundle.json"; + + var manifestDocument = await documentStore.FindBySourceAndUriAsync(StellaOpsMirrorConnector.Source, manifestUri, CancellationToken.None); + Assert.NotNull(manifestDocument); + Assert.Equal(DocumentStatuses.Mapped, manifestDocument!.Status); + Assert.Equal(NormalizeDigest(manifestDigest), manifestDocument.Sha256); + + var bundleDocument = await documentStore.FindBySourceAndUriAsync(StellaOpsMirrorConnector.Source, bundleUri, CancellationToken.None); + Assert.NotNull(bundleDocument); + Assert.Equal(DocumentStatuses.PendingParse, bundleDocument!.Status); + Assert.Equal(NormalizeDigest(bundleDigest), bundleDocument.Sha256); + + var rawStorage = provider.GetRequiredService(); + Assert.NotNull(manifestDocument.PayloadId); + Assert.NotNull(bundleDocument.PayloadId); + + var manifestBytes = await rawStorage.DownloadAsync(manifestDocument.PayloadId!.Value, CancellationToken.None); + var bundleBytes = await rawStorage.DownloadAsync(bundleDocument.PayloadId!.Value, CancellationToken.None); + Assert.Equal(manifestContent, Encoding.UTF8.GetString(manifestBytes)); + Assert.Equal(bundleContent, Encoding.UTF8.GetString(bundleBytes)); + + var stateRepository = provider.GetRequiredService(); + var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); + Assert.NotNull(state); + + var cursorDocument = state!.Cursor ?? new BsonDocument(); + var digestValue = cursorDocument.TryGetValue("bundleDigest", out var digestBson) ? digestBson.AsString : string.Empty; + Assert.Equal(NormalizeDigest(bundleDigest), NormalizeDigest(digestValue)); + + var pendingDocumentsArray = cursorDocument.TryGetValue("pendingDocuments", out var pendingDocsBson) && pendingDocsBson is BsonArray pendingArray + ? pendingArray + : new BsonArray(); + Assert.Single(pendingDocumentsArray); + var pendingDocumentId = Guid.Parse(pendingDocumentsArray[0].AsString); + Assert.Equal(bundleDocument.Id, pendingDocumentId); + + var pendingMappingsArray = cursorDocument.TryGetValue("pendingMappings", out var pendingMappingsBson) && pendingMappingsBson is BsonArray mappingsArray + ? mappingsArray + : new BsonArray(); + Assert.Empty(pendingMappingsArray); + } + + [Fact] + public async Task FetchAsync_TamperedSignatureThrows() + { + var manifestContent = "{\"domain\":\"primary\"}"; + var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0002\"}]}"; + + var manifestDigest = ComputeDigest(manifestContent); + var bundleDigest = ComputeDigest(bundleContent); + var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestContent), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: true); + + await using var provider = await BuildServiceProviderAsync(options => + { + options.Signature.Enabled = true; + options.Signature.KeyId = "mirror-key"; + options.Signature.Provider = "default"; + }); + + var defaultProvider = provider.GetRequiredService(); + var signingKey = CreateSigningKey("mirror-key"); + defaultProvider.UpsertSigningKey(signingKey); + + var (signatureValue, _) = CreateDetachedJws(signingKey, bundleContent); + // Tamper with signature so verification fails. + var tamperedSignature = signatureValue.Replace('a', 'b'); + + SeedResponses(index, manifestContent, bundleContent, tamperedSignature); + + var connector = provider.GetRequiredService(); + await Assert.ThrowsAsync(() => connector.FetchAsync(provider, CancellationToken.None)); + + var stateRepository = provider.GetRequiredService(); + var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); + Assert.NotNull(state); + Assert.True(state!.FailCount >= 1); + Assert.False(state.Cursor.TryGetValue("bundleDigest", out _)); + } + + [Fact] + public async Task FetchAsync_SignatureKeyMismatchThrows() + { + var manifestContent = "{\"domain\":\"primary\"}"; + var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0003\"}]}"; + + var manifestDigest = ComputeDigest(manifestContent); + var bundleDigest = ComputeDigest(bundleContent); + var index = BuildIndex( + manifestDigest, + Encoding.UTF8.GetByteCount(manifestContent), + bundleDigest, + Encoding.UTF8.GetByteCount(bundleContent), + includeSignature: true, + signatureKeyId: "unexpected-key", + signatureProvider: "default"); + + var signingKey = CreateSigningKey("unexpected-key"); + var (signatureValue, _) = CreateDetachedJws(signingKey, bundleContent); + + await using var provider = await BuildServiceProviderAsync(options => + { + options.Signature.Enabled = true; + options.Signature.KeyId = "mirror-key"; + options.Signature.Provider = "default"; + }); + + SeedResponses(index, manifestContent, bundleContent, signatureValue); + + var connector = provider.GetRequiredService(); + await Assert.ThrowsAsync(() => connector.FetchAsync(provider, CancellationToken.None)); + } + + [Fact] + public async Task FetchAsync_VerifiesSignatureUsingFallbackPublicKey() + { + var manifestContent = "{\"domain\":\"primary\"}"; + var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0004\"}]}"; + + var manifestDigest = ComputeDigest(manifestContent); + var bundleDigest = ComputeDigest(bundleContent); + var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestContent), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: true); + + var signingKey = CreateSigningKey("mirror-key"); + var (signatureValue, _) = CreateDetachedJws(signingKey, bundleContent); + var publicKeyPath = WritePublicKeyPem(signingKey); + + await using var provider = await BuildServiceProviderAsync(options => + { + options.Signature.Enabled = true; + options.Signature.KeyId = "mirror-key"; + options.Signature.Provider = "default"; + options.Signature.PublicKeyPath = publicKeyPath; + }); + + try + { + SeedResponses(index, manifestContent, bundleContent, signatureValue); + + var connector = provider.GetRequiredService(); + await connector.FetchAsync(provider, CancellationToken.None); + + var stateRepository = provider.GetRequiredService(); + var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); + Assert.NotNull(state); + Assert.Equal(0, state!.FailCount); + } + finally + { + if (File.Exists(publicKeyPath)) + { + File.Delete(publicKeyPath); + } + } + } + + [Fact] + public async Task FetchAsync_DigestMismatchMarksFailure() + { + var manifestExpected = "{\"domain\":\"primary\"}"; + var manifestTampered = "{\"domain\":\"tampered\"}"; + var bundleContent = "{\"advisories\":[{\"id\":\"CVE-2025-0005\"}]}"; + + var manifestDigest = ComputeDigest(manifestExpected); + var bundleDigest = ComputeDigest(bundleContent); + var index = BuildIndex(manifestDigest, Encoding.UTF8.GetByteCount(manifestExpected), bundleDigest, Encoding.UTF8.GetByteCount(bundleContent), includeSignature: false); + + await using var provider = await BuildServiceProviderAsync(); + + SeedResponses(index, manifestTampered, bundleContent, signature: null); + + var connector = provider.GetRequiredService(); + + await Assert.ThrowsAsync(() => connector.FetchAsync(provider, CancellationToken.None)); + + var stateRepository = provider.GetRequiredService(); + var state = await stateRepository.TryGetAsync(StellaOpsMirrorConnector.Source, CancellationToken.None); + Assert.NotNull(state); + var cursor = state!.Cursor ?? new BsonDocument(); + Assert.True(state.FailCount >= 1); + Assert.False(cursor.Contains("bundleDigest")); + } + + [Fact] + public void ParseAndMap_PersistAdvisoriesFromBundle() + { + var bundleDocument = SampleData.CreateBundle(); + var bundleJson = CanonicalJsonSerializer.SerializeIndented(bundleDocument); + var normalizedFixture = FixtureLoader.Read(SampleData.BundleFixture).TrimEnd(); + Assert.Equal(normalizedFixture, FixtureLoader.Normalize(bundleJson).TrimEnd()); + + var advisories = MirrorAdvisoryMapper.Map(bundleDocument); + Assert.Single(advisories); + var advisory = advisories[0]; + + var expectedAdvisoryJson = FixtureLoader.Read(SampleData.AdvisoryFixture).TrimEnd(); + var mappedJson = CanonicalJsonSerializer.SerializeIndented(advisory); + Assert.Equal(expectedAdvisoryJson, FixtureLoader.Normalize(mappedJson).TrimEnd()); + + // AdvisoryStore integration validated elsewhere; ensure canonical serialization is stable. + } + + public Task InitializeAsync() => Task.CompletedTask; + + public Task DisposeAsync() + { + _handler.Clear(); + return Task.CompletedTask; + } + + private async Task BuildServiceProviderAsync(Action? configureOptions = null) + { + await _fixture.Client.DropDatabaseAsync(_fixture.Database.DatabaseNamespace.DatabaseName); + _handler.Clear(); + + var services = new ServiceCollection(); + services.AddLogging(builder => builder.AddProvider(NullLoggerProvider.Instance)); + services.AddSingleton(_handler); + services.AddSingleton(TimeProvider.System); + + services.AddMongoStorage(options => + { + options.ConnectionString = _fixture.Runner.ConnectionString; + options.DatabaseName = _fixture.Database.DatabaseNamespace.DatabaseName; + options.CommandTimeout = TimeSpan.FromSeconds(5); + }); + services.AddStellaOpsCrypto(); - - var configuration = new ConfigurationBuilder() - .AddInMemoryCollection(new Dictionary - { - ["concelier:sources:stellaopsMirror:baseAddress"] = "https://mirror.test/", - ["concelier:sources:stellaopsMirror:domainId"] = "primary", - ["concelier:sources:stellaopsMirror:indexPath"] = "/concelier/exports/index.json", - }) - .Build(); - - var routine = new StellaOpsMirrorDependencyInjectionRoutine(); - routine.Register(services, configuration); - - if (configureOptions is not null) - { - services.PostConfigure(configureOptions); - } - - services.Configure("stellaops-mirror", builder => - { - builder.HttpMessageHandlerBuilderActions.Add(options => - { - options.PrimaryHandler = _handler; - }); - }); - - var provider = services.BuildServiceProvider(); - var bootstrapper = provider.GetRequiredService(); - await bootstrapper.InitializeAsync(CancellationToken.None); - return provider; - } - - private void SeedResponses(string indexJson, string manifestContent, string bundleContent, string? signature) - { - var baseUri = new Uri("https://mirror.test"); - _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "/concelier/exports/index.json"), () => CreateJsonResponse(indexJson)); - _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "mirror/primary/manifest.json"), () => CreateJsonResponse(manifestContent)); - _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "mirror/primary/bundle.json"), () => CreateJsonResponse(bundleContent)); - - if (signature is not null) - { - _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "mirror/primary/bundle.json.jws"), () => new HttpResponseMessage(HttpStatusCode.OK) - { - Content = new StringContent(signature, Encoding.UTF8, "application/jose+json"), - }); - } - } - - private static HttpResponseMessage CreateJsonResponse(string content) - => new(HttpStatusCode.OK) - { - Content = new StringContent(content, Encoding.UTF8, "application/json"), - }; - - private static string BuildIndex( - string manifestDigest, - int manifestBytes, - string bundleDigest, - int bundleBytes, - bool includeSignature, - string signatureKeyId = "mirror-key", - string signatureProvider = "default") - { - var index = new - { - schemaVersion = 1, - generatedAt = new DateTimeOffset(2025, 10, 19, 12, 0, 0, TimeSpan.Zero), - targetRepository = "repo", - domains = new[] - { - new - { - domainId = "primary", - displayName = "Primary", - advisoryCount = 1, - manifest = new - { - path = "mirror/primary/manifest.json", - sizeBytes = manifestBytes, - digest = manifestDigest, - signature = (object?)null, - }, - bundle = new - { - path = "mirror/primary/bundle.json", - sizeBytes = bundleBytes, - digest = bundleDigest, - signature = includeSignature - ? new - { - path = "mirror/primary/bundle.json.jws", - algorithm = "ES256", - keyId = signatureKeyId, - provider = signatureProvider, - signedAt = new DateTimeOffset(2025, 10, 19, 12, 0, 0, TimeSpan.Zero), - } - : null, - }, - sources = Array.Empty(), - } - } - }; - - return JsonSerializer.Serialize(index, new JsonSerializerOptions - { - PropertyNamingPolicy = JsonNamingPolicy.CamelCase, - WriteIndented = false, - }); - } - - private static string ComputeDigest(string content) - { - var bytes = Encoding.UTF8.GetBytes(content); - var hash = SHA256.HashData(bytes); - return "sha256:" + Convert.ToHexString(hash).ToLowerInvariant(); - } - - private static string NormalizeDigest(string digest) - => digest.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) ? digest[7..] : digest; - - private static CryptoSigningKey CreateSigningKey(string keyId) - { - using var ecdsa = ECDsa.Create(ECCurve.NamedCurves.nistP256); - var parameters = ecdsa.ExportParameters(includePrivateParameters: true); - return new CryptoSigningKey(new CryptoKeyReference(keyId), SignatureAlgorithms.Es256, in parameters, DateTimeOffset.UtcNow); - } - - private static string WritePublicKeyPem(CryptoSigningKey signingKey) - { - ArgumentNullException.ThrowIfNull(signingKey); - var path = Path.Combine(Path.GetTempPath(), $"stellaops-mirror-{Guid.NewGuid():N}.pem"); - using var ecdsa = ECDsa.Create(signingKey.PublicParameters); - var publicKeyInfo = ecdsa.ExportSubjectPublicKeyInfo(); - var pem = PemEncoding.Write("PUBLIC KEY", publicKeyInfo); - File.WriteAllText(path, pem); - return path; - } - - private static (string Signature, DateTimeOffset SignedAt) CreateDetachedJws(CryptoSigningKey signingKey, string payload) - { - var provider = new DefaultCryptoProvider(); - provider.UpsertSigningKey(signingKey); - var signer = provider.GetSigner(SignatureAlgorithms.Es256, signingKey.Reference); - var header = new Dictionary - { - ["alg"] = SignatureAlgorithms.Es256, - ["kid"] = signingKey.Reference.KeyId, - ["provider"] = provider.Name, - ["typ"] = "application/vnd.stellaops.concelier.mirror-bundle+jws", - ["b64"] = false, - ["crit"] = new[] { "b64" } - }; - - var headerJson = JsonSerializer.Serialize(header); - var encodedHeader = Microsoft.IdentityModel.Tokens.Base64UrlEncoder.Encode(headerJson); - var payloadBytes = Encoding.UTF8.GetBytes(payload); - var signingInput = BuildSigningInput(encodedHeader, payloadBytes); - var signatureBytes = signer.SignAsync(signingInput, CancellationToken.None).GetAwaiter().GetResult(); - var encodedSignature = Microsoft.IdentityModel.Tokens.Base64UrlEncoder.Encode(signatureBytes); - return (string.Concat(encodedHeader, "..", encodedSignature), DateTimeOffset.UtcNow); - } - - private static ReadOnlyMemory BuildSigningInput(string encodedHeader, ReadOnlySpan payload) - { - var headerBytes = Encoding.ASCII.GetBytes(encodedHeader); - var buffer = new byte[headerBytes.Length + 1 + payload.Length]; - headerBytes.CopyTo(buffer, 0); - buffer[headerBytes.Length] = (byte)'.'; - payload.CopyTo(buffer.AsSpan(headerBytes.Length + 1)); - return buffer; - } -} + + var configuration = new ConfigurationBuilder() + .AddInMemoryCollection(new Dictionary + { + ["concelier:sources:stellaopsMirror:baseAddress"] = "https://mirror.test/", + ["concelier:sources:stellaopsMirror:domainId"] = "primary", + ["concelier:sources:stellaopsMirror:indexPath"] = "/concelier/exports/index.json", + }) + .Build(); + + var routine = new StellaOpsMirrorDependencyInjectionRoutine(); + routine.Register(services, configuration); + + if (configureOptions is not null) + { + services.PostConfigure(configureOptions); + } + + services.Configure("stellaops-mirror", builder => + { + builder.HttpMessageHandlerBuilderActions.Add(options => + { + options.PrimaryHandler = _handler; + }); + }); + + var provider = services.BuildServiceProvider(); + var bootstrapper = provider.GetRequiredService(); + await bootstrapper.InitializeAsync(CancellationToken.None); + return provider; + } + + private void SeedResponses(string indexJson, string manifestContent, string bundleContent, string? signature) + { + var baseUri = new Uri("https://mirror.test"); + _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "/concelier/exports/index.json"), () => CreateJsonResponse(indexJson)); + _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "mirror/primary/manifest.json"), () => CreateJsonResponse(manifestContent)); + _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "mirror/primary/bundle.json"), () => CreateJsonResponse(bundleContent)); + + if (signature is not null) + { + _handler.AddResponse(HttpMethod.Get, new Uri(baseUri, "mirror/primary/bundle.json.jws"), () => new HttpResponseMessage(HttpStatusCode.OK) + { + Content = new StringContent(signature, Encoding.UTF8, "application/jose+json"), + }); + } + } + + private static HttpResponseMessage CreateJsonResponse(string content) + => new(HttpStatusCode.OK) + { + Content = new StringContent(content, Encoding.UTF8, "application/json"), + }; + + private static string BuildIndex( + string manifestDigest, + int manifestBytes, + string bundleDigest, + int bundleBytes, + bool includeSignature, + string signatureKeyId = "mirror-key", + string signatureProvider = "default") + { + var index = new + { + schemaVersion = 1, + generatedAt = new DateTimeOffset(2025, 10, 19, 12, 0, 0, TimeSpan.Zero), + targetRepository = "repo", + domains = new[] + { + new + { + domainId = "primary", + displayName = "Primary", + advisoryCount = 1, + manifest = new + { + path = "mirror/primary/manifest.json", + sizeBytes = manifestBytes, + digest = manifestDigest, + signature = (object?)null, + }, + bundle = new + { + path = "mirror/primary/bundle.json", + sizeBytes = bundleBytes, + digest = bundleDigest, + signature = includeSignature + ? new + { + path = "mirror/primary/bundle.json.jws", + algorithm = "ES256", + keyId = signatureKeyId, + provider = signatureProvider, + signedAt = new DateTimeOffset(2025, 10, 19, 12, 0, 0, TimeSpan.Zero), + } + : null, + }, + sources = Array.Empty(), + } + } + }; + + return JsonSerializer.Serialize(index, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + WriteIndented = false, + }); + } + + private static string ComputeDigest(string content) + { + var bytes = Encoding.UTF8.GetBytes(content); + var hash = SHA256.HashData(bytes); + return "sha256:" + Convert.ToHexString(hash).ToLowerInvariant(); + } + + private static string NormalizeDigest(string digest) + => digest.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) ? digest[7..] : digest; + + private static CryptoSigningKey CreateSigningKey(string keyId) + { + using var ecdsa = ECDsa.Create(ECCurve.NamedCurves.nistP256); + var parameters = ecdsa.ExportParameters(includePrivateParameters: true); + return new CryptoSigningKey(new CryptoKeyReference(keyId), SignatureAlgorithms.Es256, in parameters, DateTimeOffset.UtcNow); + } + + private static string WritePublicKeyPem(CryptoSigningKey signingKey) + { + ArgumentNullException.ThrowIfNull(signingKey); + var path = Path.Combine(Path.GetTempPath(), $"stellaops-mirror-{Guid.NewGuid():N}.pem"); + using var ecdsa = ECDsa.Create(signingKey.PublicParameters); + var publicKeyInfo = ecdsa.ExportSubjectPublicKeyInfo(); + var pem = PemEncoding.Write("PUBLIC KEY", publicKeyInfo); + File.WriteAllText(path, pem); + return path; + } + + private static (string Signature, DateTimeOffset SignedAt) CreateDetachedJws(CryptoSigningKey signingKey, string payload) + { + var provider = new DefaultCryptoProvider(); + provider.UpsertSigningKey(signingKey); + var signer = provider.GetSigner(SignatureAlgorithms.Es256, signingKey.Reference); + var header = new Dictionary + { + ["alg"] = SignatureAlgorithms.Es256, + ["kid"] = signingKey.Reference.KeyId, + ["provider"] = provider.Name, + ["typ"] = "application/vnd.stellaops.concelier.mirror-bundle+jws", + ["b64"] = false, + ["crit"] = new[] { "b64" } + }; + + var headerJson = JsonSerializer.Serialize(header); + var encodedHeader = Microsoft.IdentityModel.Tokens.Base64UrlEncoder.Encode(headerJson); + var payloadBytes = Encoding.UTF8.GetBytes(payload); + var signingInput = BuildSigningInput(encodedHeader, payloadBytes); + var signatureBytes = signer.SignAsync(signingInput, CancellationToken.None).GetAwaiter().GetResult(); + var encodedSignature = Microsoft.IdentityModel.Tokens.Base64UrlEncoder.Encode(signatureBytes); + return (string.Concat(encodedHeader, "..", encodedSignature), DateTimeOffset.UtcNow); + } + + private static ReadOnlyMemory BuildSigningInput(string encodedHeader, ReadOnlySpan payload) + { + var headerBytes = Encoding.ASCII.GetBytes(encodedHeader); + var buffer = new byte[headerBytes.Length + 1 + payload.Length]; + headerBytes.CopyTo(buffer, 0); + buffer[headerBytes.Length] = (byte)'.'; + payload.CopyTo(buffer.AsSpan(headerBytes.Length + 1)); + return buffer; + } +} diff --git a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Cisco.Tests/CiscoMapperTests.cs b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Cisco.Tests/CiscoMapperTests.cs index d9d347f39..913d06e05 100644 --- a/src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Cisco.Tests/CiscoMapperTests.cs +++ b/src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Cisco.Tests/CiscoMapperTests.cs @@ -1,36 +1,36 @@ -using System; +using System; using System.Collections.Generic; using System.Linq; -using FluentAssertions; -using MongoDB.Bson; -using StellaOps.Concelier.Models; -using StellaOps.Concelier.Connector.Common; -using StellaOps.Concelier.Connector.Vndr.Cisco; -using StellaOps.Concelier.Connector.Vndr.Cisco.Internal; -using StellaOps.Concelier.Storage.Mongo.Documents; -using StellaOps.Concelier.Storage.Mongo.Dtos; -using Xunit; - -namespace StellaOps.Concelier.Connector.Vndr.Cisco.Tests; - -public sealed class CiscoMapperTests -{ - [Fact] - public void Map_ProducesCanonicalAdvisory() - { - var published = new DateTimeOffset(2025, 10, 1, 0, 0, 0, TimeSpan.Zero); - var updated = published.AddDays(1); - +using FluentAssertions; +using MongoDB.Bson; +using StellaOps.Concelier.Models; +using StellaOps.Concelier.Connector.Common; +using StellaOps.Concelier.Connector.Vndr.Cisco; +using StellaOps.Concelier.Connector.Vndr.Cisco.Internal; +using StellaOps.Concelier.Storage.Mongo.Documents; +using StellaOps.Concelier.Storage.Mongo.Dtos; +using Xunit; + +namespace StellaOps.Concelier.Connector.Vndr.Cisco.Tests; + +public sealed class CiscoMapperTests +{ + [Fact] + public void Map_ProducesCanonicalAdvisory() + { + var published = new DateTimeOffset(2025, 10, 1, 0, 0, 0, TimeSpan.Zero); + var updated = published.AddDays(1); + var dto = new CiscoAdvisoryDto( AdvisoryId: "CISCO-SA-TEST", Title: "Test Advisory", Summary: "Sample summary", Severity: "High", - Published: published, - Updated: updated, - PublicationUrl: "https://example.com/advisory", - CsafUrl: "https://sec.cloudapps.cisco.com/csaf/test.json", - CvrfUrl: "https://example.com/cvrf.xml", + Published: published, + Updated: updated, + PublicationUrl: "https://example.com/advisory", + CsafUrl: "https://sec.cloudapps.cisco.com/csaf/test.json", + CvrfUrl: "https://example.com/cvrf.xml", CvssBaseScore: 9.8, Cves: new List { "CVE-2024-0001" }, BugIds: new List { "BUG123" }, @@ -39,31 +39,31 @@ public sealed class CiscoMapperTests new("Cisco Widget", "PID-1", "1.2.3", new [] { AffectedPackageStatusCatalog.KnownAffected }), new("Cisco Router", "PID-2", ">=1.0.0 <1.4.0", new [] { AffectedPackageStatusCatalog.KnownAffected }) }); - - var document = new DocumentRecord( - Id: Guid.NewGuid(), - SourceName: VndrCiscoConnectorPlugin.SourceName, - Uri: "https://api.cisco.com/security/advisories/v2/advisories/CISCO-SA-TEST", - FetchedAt: published, - Sha256: "abc123", - Status: DocumentStatuses.PendingMap, - ContentType: "application/json", - Headers: null, - Metadata: null, - Etag: null, - LastModified: updated, - GridFsId: null); - - var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, VndrCiscoConnectorPlugin.SourceName, "cisco.dto.test", new BsonDocument(), updated); - - var advisory = CiscoMapper.Map(dto, document, dtoRecord); - - advisory.AdvisoryKey.Should().Be("CISCO-SA-TEST"); - advisory.Title.Should().Be("Test Advisory"); - advisory.Severity.Should().Be("high"); - advisory.Aliases.Should().Contain(new[] { "CISCO-SA-TEST", "CVE-2024-0001", "BUG123" }); - advisory.References.Should().Contain(reference => reference.Url == "https://example.com/advisory"); - advisory.References.Should().Contain(reference => reference.Url == "https://sec.cloudapps.cisco.com/csaf/test.json"); + + var document = new DocumentRecord( + Id: Guid.NewGuid(), + SourceName: VndrCiscoConnectorPlugin.SourceName, + Uri: "https://api.cisco.com/security/advisories/v2/advisories/CISCO-SA-TEST", + FetchedAt: published, + Sha256: "abc123", + Status: DocumentStatuses.PendingMap, + ContentType: "application/json", + Headers: null, + Metadata: null, + Etag: null, + LastModified: updated, + PayloadId: null); + + var dtoRecord = new DtoRecord(Guid.NewGuid(), document.Id, VndrCiscoConnectorPlugin.SourceName, "cisco.dto.test", new BsonDocument(), updated); + + var advisory = CiscoMapper.Map(dto, document, dtoRecord); + + advisory.AdvisoryKey.Should().Be("CISCO-SA-TEST"); + advisory.Title.Should().Be("Test Advisory"); + advisory.Severity.Should().Be("high"); + advisory.Aliases.Should().Contain(new[] { "CISCO-SA-TEST", "CVE-2024-0001", "BUG123" }); + advisory.References.Should().Contain(reference => reference.Url == "https://example.com/advisory"); + advisory.References.Should().Contain(reference => reference.Url == "https://sec.cloudapps.cisco.com/csaf/test.json"); advisory.AffectedPackages.Should().HaveCount(2); var package = advisory.AffectedPackages.Single(p => p.Identifier == "Cisco Widget"); diff --git a/src/Policy/StellaOps.Policy.Engine/Endpoints/CvssReceiptEndpoints.cs b/src/Policy/StellaOps.Policy.Engine/Endpoints/CvssReceiptEndpoints.cs new file mode 100644 index 000000000..1974f152d --- /dev/null +++ b/src/Policy/StellaOps.Policy.Engine/Endpoints/CvssReceiptEndpoints.cs @@ -0,0 +1,327 @@ +using System.Collections.Generic; +using System.Collections.Immutable; +using Microsoft.AspNetCore.Http.HttpResults; +using Microsoft.AspNetCore.Mvc; +using StellaOps.Auth.Abstractions; +using StellaOps.Attestor.Envelope; +using StellaOps.Policy.Engine.Services; +using StellaOps.Policy.Scoring; +using StellaOps.Policy.Scoring.Engine; +using StellaOps.Policy.Scoring.Receipts; + +namespace StellaOps.Policy.Engine.Endpoints; + +/// +/// Minimal API surface for CVSS v4.0 score receipts (create, read, amend, history). +/// +internal static class CvssReceiptEndpoints +{ + public static IEndpointRouteBuilder MapCvssReceipts(this IEndpointRouteBuilder endpoints) + { + var group = endpoints.MapGroup("/api/cvss") + .RequireAuthorization() + .WithTags("CVSS Receipts"); + + group.MapPost("/receipts", CreateReceipt) + .WithName("CreateCvssReceipt") + .WithSummary("Create a CVSS v4.0 receipt with deterministic hashing and optional DSSE attestation.") + .Produces(StatusCodes.Status201Created) + .Produces(StatusCodes.Status400BadRequest) + .Produces(StatusCodes.Status401Unauthorized); + + group.MapGet("/receipts/{receiptId}", GetReceipt) + .WithName("GetCvssReceipt") + .WithSummary("Retrieve a CVSS v4.0 receipt by ID.") + .Produces(StatusCodes.Status200OK) + .Produces(StatusCodes.Status404NotFound); + + group.MapPut("/receipts/{receiptId}/amend", AmendReceipt) + .WithName("AmendCvssReceipt") + .WithSummary("Append an amendment entry to a CVSS receipt history and optionally re-sign.") + .Produces(StatusCodes.Status200OK) + .Produces(StatusCodes.Status400BadRequest) + .Produces(StatusCodes.Status404NotFound); + + group.MapGet("/receipts/{receiptId}/history", GetReceiptHistory) + .WithName("GetCvssReceiptHistory") + .WithSummary("Return the ordered amendment history for a CVSS receipt.") + .Produces>(StatusCodes.Status200OK) + .Produces(StatusCodes.Status404NotFound); + + group.MapGet("/policies", ListPolicies) + .WithName("ListCvssPolicies") + .WithSummary("List available CVSS policies configured on this host.") + .Produces>(StatusCodes.Status200OK); + + return endpoints; + } + + private static async Task CreateReceipt( + HttpContext context, + [FromBody] CreateCvssReceiptRequest request, + IReceiptBuilder receiptBuilder, + CancellationToken cancellationToken) + { + var scopeResult = ScopeAuthorization.RequireScope(context, StellaOpsScopes.PolicyRun); + if (scopeResult is not null) + { + return scopeResult; + } + + if (request is null) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Request body required.", + Status = StatusCodes.Status400BadRequest + }); + } + + if (request.Policy is null || string.IsNullOrWhiteSpace(request.Policy.Hash)) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Policy hash required", + Detail = "CvssPolicy with a deterministic hash must be supplied.", + Status = StatusCodes.Status400BadRequest + }); + } + + var tenantId = ResolveTenantId(context); + if (string.IsNullOrWhiteSpace(tenantId)) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Tenant required", + Detail = "Specify tenant via X-Tenant-Id header or tenant_id claim.", + Status = StatusCodes.Status400BadRequest + }); + } + + var actor = ResolveActorId(context) ?? request.CreatedBy ?? "system"; + var createdAt = request.CreatedAt ?? DateTimeOffset.UtcNow; + + var createRequest = new CreateReceiptRequest + { + TenantId = tenantId, + VulnerabilityId = request.VulnerabilityId, + CreatedBy = actor, + CreatedAt = createdAt, + Policy = request.Policy, + BaseMetrics = request.BaseMetrics, + ThreatMetrics = request.ThreatMetrics, + EnvironmentalMetrics = request.EnvironmentalMetrics ?? request.Policy.DefaultEnvironmentalMetrics, + SupplementalMetrics = request.SupplementalMetrics, + Evidence = request.Evidence?.ToImmutableList() ?? ImmutableList.Empty, + SigningKey = request.SigningKey + }; + + try + { + var receipt = await receiptBuilder.CreateAsync(createRequest, cancellationToken).ConfigureAwait(false); + return Results.Created($"/api/cvss/receipts/{receipt.ReceiptId}", receipt); + } + catch (Exception ex) when (ex is InvalidOperationException or ArgumentException) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Failed to create CVSS receipt", + Detail = ex.Message, + Status = StatusCodes.Status400BadRequest + }); + } + } + + private static async Task GetReceipt( + HttpContext context, + [FromRoute] string receiptId, + IReceiptRepository repository, + CancellationToken cancellationToken) + { + var scopeResult = ScopeAuthorization.RequireScope(context, StellaOpsScopes.FindingsRead); + if (scopeResult is not null) + { + return scopeResult; + } + + var tenantId = ResolveTenantId(context); + if (string.IsNullOrWhiteSpace(tenantId)) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Tenant required", + Detail = "Specify tenant via X-Tenant-Id header or tenant_id claim.", + Status = StatusCodes.Status400BadRequest + }); + } + + var receipt = await repository.GetAsync(tenantId, receiptId, cancellationToken).ConfigureAwait(false); + if (receipt is null) + { + return Results.NotFound(new ProblemDetails + { + Title = "Receipt not found", + Detail = $"CVSS receipt '{receiptId}' was not found.", + Status = StatusCodes.Status404NotFound + }); + } + + return Results.Ok(receipt); + } + + private static async Task AmendReceipt( + HttpContext context, + [FromRoute] string receiptId, + [FromBody] AmendCvssReceiptRequest request, + IReceiptHistoryService historyService, + CancellationToken cancellationToken) + { + var scopeResult = ScopeAuthorization.RequireScope(context, StellaOpsScopes.PolicyRun); + if (scopeResult is not null) + { + return scopeResult; + } + + if (request is null) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Request body required.", + Status = StatusCodes.Status400BadRequest + }); + } + + var tenantId = ResolveTenantId(context); + if (string.IsNullOrWhiteSpace(tenantId)) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Tenant required", + Detail = "Specify tenant via X-Tenant-Id header or tenant_id claim.", + Status = StatusCodes.Status400BadRequest + }); + } + + var actor = ResolveActorId(context) ?? request.Actor ?? "system"; + + var amend = new AmendReceiptRequest + { + ReceiptId = receiptId, + TenantId = tenantId, + Actor = actor, + Field = request.Field, + PreviousValue = request.PreviousValue, + NewValue = request.NewValue, + Reason = request.Reason, + ReferenceUri = request.ReferenceUri, + SigningKey = request.SigningKey + }; + + try + { + var amended = await historyService.AmendAsync(amend, cancellationToken).ConfigureAwait(false); + return Results.Ok(amended); + } + catch (InvalidOperationException ex) + { + return Results.NotFound(new ProblemDetails + { + Title = "Receipt not found", + Detail = ex.Message, + Status = StatusCodes.Status404NotFound + }); + } + catch (Exception ex) when (ex is ArgumentException) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Failed to amend receipt", + Detail = ex.Message, + Status = StatusCodes.Status400BadRequest + }); + } + } + + private static async Task GetReceiptHistory( + HttpContext context, + [FromRoute] string receiptId, + IReceiptRepository repository, + CancellationToken cancellationToken) + { + var scopeResult = ScopeAuthorization.RequireScope(context, StellaOpsScopes.FindingsRead); + if (scopeResult is not null) + { + return scopeResult; + } + + var tenantId = ResolveTenantId(context); + if (string.IsNullOrWhiteSpace(tenantId)) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Tenant required", + Detail = "Specify tenant via X-Tenant-Id header or tenant_id claim.", + Status = StatusCodes.Status400BadRequest + }); + } + + var receipt = await repository.GetAsync(tenantId, receiptId, cancellationToken).ConfigureAwait(false); + if (receipt is null) + { + return Results.NotFound(new ProblemDetails + { + Title = "Receipt not found", + Detail = $"CVSS receipt '{receiptId}' was not found.", + Status = StatusCodes.Status404NotFound + }); + } + + var orderedHistory = receipt.History + .OrderBy(h => h.Timestamp) + .ToList(); + + return Results.Ok(orderedHistory); + } + + private static IResult ListPolicies() + => Results.Ok(Array.Empty()); + + private static string? ResolveTenantId(HttpContext context) + { + if (context.Request.Headers.TryGetValue("X-Tenant-Id", out var tenantHeader) && + !string.IsNullOrWhiteSpace(tenantHeader)) + { + return tenantHeader.ToString(); + } + + return context.User?.FindFirst("tenant_id")?.Value; + } + + private static string? ResolveActorId(HttpContext context) + { + var user = context.User; + return user?.FindFirst(System.Security.Claims.ClaimTypes.NameIdentifier)?.Value + ?? user?.FindFirst("sub")?.Value; + } +} + +internal sealed record CreateCvssReceiptRequest( + string VulnerabilityId, + CvssPolicy Policy, + CvssBaseMetrics BaseMetrics, + CvssThreatMetrics? ThreatMetrics, + CvssEnvironmentalMetrics? EnvironmentalMetrics, + CvssSupplementalMetrics? SupplementalMetrics, + IReadOnlyList? Evidence, + EnvelopeKey? SigningKey, + string? CreatedBy, + DateTimeOffset? CreatedAt); + +internal sealed record AmendCvssReceiptRequest( + string Field, + string? PreviousValue, + string? NewValue, + string Reason, + string? ReferenceUri, + EnvelopeKey? SigningKey, + string? Actor); diff --git a/src/Policy/StellaOps.Policy.Gateway/Clients/IPolicyEngineClient.cs b/src/Policy/StellaOps.Policy.Gateway/Clients/IPolicyEngineClient.cs index 40956254d..20f827071 100644 --- a/src/Policy/StellaOps.Policy.Gateway/Clients/IPolicyEngineClient.cs +++ b/src/Policy/StellaOps.Policy.Gateway/Clients/IPolicyEngineClient.cs @@ -1,15 +1,27 @@ -using StellaOps.Policy.Gateway.Contracts; -using StellaOps.Policy.Gateway.Infrastructure; - -namespace StellaOps.Policy.Gateway.Clients; - -internal interface IPolicyEngineClient -{ +using StellaOps.Policy.Gateway.Contracts; +using StellaOps.Policy.Gateway.Infrastructure; +using StellaOps.Policy.Scoring; +using StellaOps.Policy.Scoring.Receipts; + +namespace StellaOps.Policy.Gateway.Clients; + +internal interface IPolicyEngineClient +{ Task>> ListPolicyPacksAsync(GatewayForwardingContext? forwardingContext, CancellationToken cancellationToken); - Task> CreatePolicyPackAsync(GatewayForwardingContext? forwardingContext, CreatePolicyPackRequest request, CancellationToken cancellationToken); - - Task> CreatePolicyRevisionAsync(GatewayForwardingContext? forwardingContext, string packId, CreatePolicyRevisionRequest request, CancellationToken cancellationToken); - - Task> ActivatePolicyRevisionAsync(GatewayForwardingContext? forwardingContext, string packId, int version, ActivatePolicyRevisionRequest request, CancellationToken cancellationToken); -} + Task> CreatePolicyPackAsync(GatewayForwardingContext? forwardingContext, CreatePolicyPackRequest request, CancellationToken cancellationToken); + + Task> CreatePolicyRevisionAsync(GatewayForwardingContext? forwardingContext, string packId, CreatePolicyRevisionRequest request, CancellationToken cancellationToken); + + Task> ActivatePolicyRevisionAsync(GatewayForwardingContext? forwardingContext, string packId, int version, ActivatePolicyRevisionRequest request, CancellationToken cancellationToken); + + Task> CreateCvssReceiptAsync(GatewayForwardingContext? forwardingContext, CreateCvssReceiptRequest request, CancellationToken cancellationToken); + + Task> GetCvssReceiptAsync(GatewayForwardingContext? forwardingContext, string receiptId, CancellationToken cancellationToken); + + Task> AmendCvssReceiptAsync(GatewayForwardingContext? forwardingContext, string receiptId, AmendCvssReceiptRequest request, CancellationToken cancellationToken); + + Task>> GetCvssReceiptHistoryAsync(GatewayForwardingContext? forwardingContext, string receiptId, CancellationToken cancellationToken); + + Task>> ListCvssPoliciesAsync(GatewayForwardingContext? forwardingContext, CancellationToken cancellationToken); +} diff --git a/src/Policy/StellaOps.Policy.Gateway/Clients/PolicyEngineClient.cs b/src/Policy/StellaOps.Policy.Gateway/Clients/PolicyEngineClient.cs index 649503c7b..b9f7def2b 100644 --- a/src/Policy/StellaOps.Policy.Gateway/Clients/PolicyEngineClient.cs +++ b/src/Policy/StellaOps.Policy.Gateway/Clients/PolicyEngineClient.cs @@ -5,13 +5,15 @@ using System.Net.Http; using System.Net.Http.Json; using System.Text.Json; using Microsoft.AspNetCore.Http; -using Microsoft.AspNetCore.Mvc; -using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; -using StellaOps.Policy.Gateway.Contracts; -using StellaOps.Policy.Gateway.Infrastructure; -using StellaOps.Policy.Gateway.Options; -using StellaOps.Policy.Gateway.Services; +using Microsoft.AspNetCore.Mvc; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; +using StellaOps.Policy.Gateway.Contracts; +using StellaOps.Policy.Gateway.Infrastructure; +using StellaOps.Policy.Gateway.Options; +using StellaOps.Policy.Gateway.Services; +using StellaOps.Policy.Scoring; +using StellaOps.Policy.Scoring.Receipts; namespace StellaOps.Policy.Gateway.Clients; @@ -85,18 +87,73 @@ internal sealed class PolicyEngineClient : IPolicyEngineClient request, cancellationToken); - public Task> ActivatePolicyRevisionAsync( - GatewayForwardingContext? forwardingContext, - string packId, - int version, - ActivatePolicyRevisionRequest request, - CancellationToken cancellationToken) - => SendAsync( - HttpMethod.Post, - $"api/policy/packs/{Uri.EscapeDataString(packId)}/revisions/{version}:activate", - forwardingContext, - request, - cancellationToken); + public Task> ActivatePolicyRevisionAsync( + GatewayForwardingContext? forwardingContext, + string packId, + int version, + ActivatePolicyRevisionRequest request, + CancellationToken cancellationToken) + => SendAsync( + HttpMethod.Post, + $"api/policy/packs/{Uri.EscapeDataString(packId)}/revisions/{version}:activate", + forwardingContext, + request, + cancellationToken); + + public Task> CreateCvssReceiptAsync( + GatewayForwardingContext? forwardingContext, + CreateCvssReceiptRequest request, + CancellationToken cancellationToken) + => SendAsync( + HttpMethod.Post, + "api/cvss/receipts", + forwardingContext, + request, + cancellationToken); + + public Task> GetCvssReceiptAsync( + GatewayForwardingContext? forwardingContext, + string receiptId, + CancellationToken cancellationToken) + => SendAsync( + HttpMethod.Get, + $"api/cvss/receipts/{Uri.EscapeDataString(receiptId)}", + forwardingContext, + content: null, + cancellationToken); + + public Task> AmendCvssReceiptAsync( + GatewayForwardingContext? forwardingContext, + string receiptId, + AmendCvssReceiptRequest request, + CancellationToken cancellationToken) + => SendAsync( + HttpMethod.Put, + $"api/cvss/receipts/{Uri.EscapeDataString(receiptId)}/amend", + forwardingContext, + request, + cancellationToken); + + public Task>> GetCvssReceiptHistoryAsync( + GatewayForwardingContext? forwardingContext, + string receiptId, + CancellationToken cancellationToken) + => SendAsync>( + HttpMethod.Get, + $"api/cvss/receipts/{Uri.EscapeDataString(receiptId)}/history", + forwardingContext, + content: null, + cancellationToken); + + public Task>> ListCvssPoliciesAsync( + GatewayForwardingContext? forwardingContext, + CancellationToken cancellationToken) + => SendAsync>( + HttpMethod.Get, + "api/cvss/policies", + forwardingContext, + content: null, + cancellationToken); private async Task> SendAsync( HttpMethod method, diff --git a/src/Policy/StellaOps.Policy.Gateway/Contracts/CvssContracts.cs b/src/Policy/StellaOps.Policy.Gateway/Contracts/CvssContracts.cs new file mode 100644 index 000000000..05a5fa8e8 --- /dev/null +++ b/src/Policy/StellaOps.Policy.Gateway/Contracts/CvssContracts.cs @@ -0,0 +1,33 @@ +using System; +using System.Collections.Generic; +using System.ComponentModel.DataAnnotations; +using StellaOps.Attestor.Envelope; +using StellaOps.Policy.Scoring; +using StellaOps.Policy.Scoring.Receipts; + +namespace StellaOps.Policy.Gateway.Contracts; + +public sealed record CreateCvssReceiptRequest( + [Required] string VulnerabilityId, + [Required] CvssPolicy Policy, + [Required] CvssBaseMetrics BaseMetrics, + CvssThreatMetrics? ThreatMetrics, + CvssEnvironmentalMetrics? EnvironmentalMetrics, + CvssSupplementalMetrics? SupplementalMetrics, + IReadOnlyList? Evidence, + EnvelopeKey? SigningKey, + string? CreatedBy, + DateTimeOffset? CreatedAt); + +public sealed record AmendCvssReceiptRequest( + [Required] string Field, + string? PreviousValue, + string? NewValue, + [Required] string Reason, + string? ReferenceUri, + EnvelopeKey? SigningKey, + string? Actor); + +public sealed record CvssReceiptHistoryResponse( + string ReceiptId, + IReadOnlyList History); diff --git a/src/Policy/StellaOps.Policy.Gateway/Program.cs b/src/Policy/StellaOps.Policy.Gateway/Program.cs index 1d7597403..4758cb824 100644 --- a/src/Policy/StellaOps.Policy.Gateway/Program.cs +++ b/src/Policy/StellaOps.Policy.Gateway/Program.cs @@ -279,11 +279,11 @@ policyPacks.MapPost("/{packId}/revisions", async Task ( }) .RequireAuthorization(policy => policy.RequireStellaOpsScopes(StellaOpsScopes.PolicyAuthor)); -policyPacks.MapPost("/{packId}/revisions/{version:int}:activate", async Task ( - HttpContext context, - string packId, - int version, - ActivatePolicyRevisionRequest request, +policyPacks.MapPost("/{packId}/revisions/{version:int}:activate", async Task ( + HttpContext context, + string packId, + int version, + ActivatePolicyRevisionRequest request, IPolicyEngineClient client, PolicyEngineTokenProvider tokenProvider, PolicyGatewayMetrics metrics, @@ -330,13 +330,144 @@ policyPacks.MapPost("/{packId}/revisions/{version:int}:activate", async Task policy.RequireStellaOpsScopes( - StellaOpsScopes.PolicyOperate, - StellaOpsScopes.PolicyActivate)); - -app.Run(); + return response.ToMinimalResult(); + }) + .RequireAuthorization(policy => policy.RequireStellaOpsScopes( + StellaOpsScopes.PolicyOperate, + StellaOpsScopes.PolicyActivate)); + +var cvss = app.MapGroup("/api/cvss") + .WithTags("CVSS Receipts"); + +cvss.MapPost("/receipts", async Task( + HttpContext context, + CreateCvssReceiptRequest request, + IPolicyEngineClient client, + PolicyEngineTokenProvider tokenProvider, + CancellationToken cancellationToken) => + { + if (request is null) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Request body required.", + Status = StatusCodes.Status400BadRequest + }); + } + + GatewayForwardingContext? forwardingContext = null; + if (GatewayForwardingContext.TryCreate(context, out var callerContext)) + { + forwardingContext = callerContext; + } + else if (!tokenProvider.IsEnabled) + { + return Results.Unauthorized(); + } + + var response = await client.CreateCvssReceiptAsync(forwardingContext, request, cancellationToken).ConfigureAwait(false); + return response.ToMinimalResult(); + }) + .RequireAuthorization(policy => policy.RequireStellaOpsScopes(StellaOpsScopes.PolicyRun)); + +cvss.MapGet("/receipts/{receiptId}", async Task( + HttpContext context, + string receiptId, + IPolicyEngineClient client, + PolicyEngineTokenProvider tokenProvider, + CancellationToken cancellationToken) => + { + GatewayForwardingContext? forwardingContext = null; + if (GatewayForwardingContext.TryCreate(context, out var callerContext)) + { + forwardingContext = callerContext; + } + else if (!tokenProvider.IsEnabled) + { + return Results.Unauthorized(); + } + + var response = await client.GetCvssReceiptAsync(forwardingContext, receiptId, cancellationToken).ConfigureAwait(false); + return response.ToMinimalResult(); + }) + .RequireAuthorization(policy => policy.RequireStellaOpsScopes(StellaOpsScopes.FindingsRead)); + +cvss.MapPut("/receipts/{receiptId}/amend", async Task( + HttpContext context, + string receiptId, + AmendCvssReceiptRequest request, + IPolicyEngineClient client, + PolicyEngineTokenProvider tokenProvider, + CancellationToken cancellationToken) => + { + if (request is null) + { + return Results.BadRequest(new ProblemDetails + { + Title = "Request body required.", + Status = StatusCodes.Status400BadRequest + }); + } + + GatewayForwardingContext? forwardingContext = null; + if (GatewayForwardingContext.TryCreate(context, out var callerContext)) + { + forwardingContext = callerContext; + } + else if (!tokenProvider.IsEnabled) + { + return Results.Unauthorized(); + } + + var response = await client.AmendCvssReceiptAsync(forwardingContext, receiptId, request, cancellationToken).ConfigureAwait(false); + return response.ToMinimalResult(); + }) + .RequireAuthorization(policy => policy.RequireStellaOpsScopes(StellaOpsScopes.PolicyRun)); + +cvss.MapGet("/receipts/{receiptId}/history", async Task( + HttpContext context, + string receiptId, + IPolicyEngineClient client, + PolicyEngineTokenProvider tokenProvider, + CancellationToken cancellationToken) => + { + GatewayForwardingContext? forwardingContext = null; + if (GatewayForwardingContext.TryCreate(context, out var callerContext)) + { + forwardingContext = callerContext; + } + else if (!tokenProvider.IsEnabled) + { + return Results.Unauthorized(); + } + + var response = await client.GetCvssReceiptHistoryAsync(forwardingContext, receiptId, cancellationToken).ConfigureAwait(false); + return response.ToMinimalResult(); + }) + .RequireAuthorization(policy => policy.RequireStellaOpsScopes(StellaOpsScopes.FindingsRead)); + +cvss.MapGet("/policies", async Task( + HttpContext context, + IPolicyEngineClient client, + PolicyEngineTokenProvider tokenProvider, + CancellationToken cancellationToken) => + { + GatewayForwardingContext? forwardingContext = null; + if (GatewayForwardingContext.TryCreate(context, out var callerContext)) + { + forwardingContext = callerContext; + } + else if (!tokenProvider.IsEnabled) + { + return Results.Unauthorized(); + } + + var response = await client.ListCvssPoliciesAsync(forwardingContext, cancellationToken).ConfigureAwait(false); + return response.ToMinimalResult(); + }) + .RequireAuthorization(policy => policy.RequireStellaOpsScopes(StellaOpsScopes.FindingsRead)); + +app.Run(); static IAsyncPolicy CreateAuthorityRetryPolicy(IServiceProvider provider) { diff --git a/src/Policy/StellaOps.Policy.Gateway/StellaOps.Policy.Gateway.csproj b/src/Policy/StellaOps.Policy.Gateway/StellaOps.Policy.Gateway.csproj index 305f25ee4..43deeeaf0 100644 --- a/src/Policy/StellaOps.Policy.Gateway/StellaOps.Policy.Gateway.csproj +++ b/src/Policy/StellaOps.Policy.Gateway/StellaOps.Policy.Gateway.csproj @@ -16,6 +16,7 @@ + diff --git a/src/Web/StellaOps.Web/src/app/features/policy-studio/explain/policy-explain.component.ts b/src/Web/StellaOps.Web/src/app/features/policy-studio/explain/policy-explain.component.ts index 76add34ad..0397ba325 100644 --- a/src/Web/StellaOps.Web/src/app/features/policy-studio/explain/policy-explain.component.ts +++ b/src/Web/StellaOps.Web/src/app/features/policy-studio/explain/policy-explain.component.ts @@ -12,7 +12,7 @@ import jsPDF from './jspdf.stub'; imports: [CommonModule], changeDetection: ChangeDetectionStrategy.OnPush, template: ` -
+

Policy Studio · Explain

diff --git a/src/Web/StellaOps.Web/src/app/features/policy-studio/rule-builder/policy-rule-builder.component.ts b/src/Web/StellaOps.Web/src/app/features/policy-studio/rule-builder/policy-rule-builder.component.ts index d4997c077..a64addd47 100644 --- a/src/Web/StellaOps.Web/src/app/features/policy-studio/rule-builder/policy-rule-builder.component.ts +++ b/src/Web/StellaOps.Web/src/app/features/policy-studio/rule-builder/policy-rule-builder.component.ts @@ -9,7 +9,7 @@ import { ActivatedRoute } from '@angular/router'; imports: [CommonModule, ReactiveFormsModule], changeDetection: ChangeDetectionStrategy.OnPush, template: ` -
+

Policy Studio · Rule Builder

diff --git a/src/__Libraries/StellaOps.Provenance.Mongo/StellaOps.Provenance.Mongo.csproj b/src/__Libraries/StellaOps.Provenance.Mongo/StellaOps.Provenance.Mongo.csproj index 8c1d14bda..66a9cec03 100644 --- a/src/__Libraries/StellaOps.Provenance.Mongo/StellaOps.Provenance.Mongo.csproj +++ b/src/__Libraries/StellaOps.Provenance.Mongo/StellaOps.Provenance.Mongo.csproj @@ -7,7 +7,7 @@ - +