Files
git.stella-ops.org/docs/ARCHITECTURE_FEEDSER.md
master b97fc7685a
Some checks failed
Build Test Deploy / authority-container (push) Has been cancelled
Build Test Deploy / docs (push) Has been cancelled
Build Test Deploy / deploy (push) Has been cancelled
Build Test Deploy / build-test (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Initial commit (history squashed)
2025-10-11 23:28:35 +03:00

9.8 KiB
Raw Blame History

ARCHITECTURE.md — StellaOps.Feedser

Goal: Build a sovereign-ready, self-hostable feed-merge service that ingests authoritative vulnerability sources, normalizes and de-duplicates them into MongoDB, and exports JSON and Trivy-compatible DB artifacts. Form factor: Long-running Web Service with REST APIs (health, status, control) and an embedded internal cron scheduler. Controllable by StellaOps.Cli (# stella db ...) No signing inside Feedser (signing is a separate pipeline step). Runtime SDK baseline: .NET 10 Preview 7 (SDK 10.0.100-preview.7.25380.108) targeting net10.0, aligned with the deployed api.stella-ops.org service. Four explicit stages:

  1. Source Download → raw documents.
  2. Parse & Normalize → schema-validated DTOs enriched with canonical identifiers.
  3. Merge & Deduplicate → precedence-aware canonical records persisted to MongoDB.
  4. Export → JSON or TrivyDB (full or delta), then (externally) sign/publish.

1) Naming & Solution Layout

Source connectors namespace prefix: StellaOps.Feedser.Source.* Exporters:

  • StellaOps.Feedser.Exporter.Json
  • StellaOps.Feedser.Exporter.TrivyDb

Projects (/src):

StellaOps.Feedser.WebService/        # ASP.NET Core (Minimal API, net10.0 preview) WebService + embedded scheduler
StellaOps.Feedser.Core/              # Domain models, pipelines, merge/dedupe engine, jobs orchestration
StellaOps.Feedser.Models/            # Canonical POCOs, JSON Schemas, enums
StellaOps.Feedser.Storage.Mongo/     # Mongo repositories, GridFS access, indexes, resume "flags"
StellaOps.Feedser.Source.Common/     # HTTP clients, rate-limiters, schema validators, parsers utils
StellaOps.Feedser.Source.Cve/
StellaOps.Feedser.Source.Nvd/
StellaOps.Feedser.Source.Ghsa/
StellaOps.Feedser.Source.Osv/
StellaOps.Feedser.Source.Jvn/
StellaOps.Feedser.Source.CertCc/
StellaOps.Feedser.Source.Kev/
StellaOps.Feedser.Source.Kisa/
StellaOps.Feedser.Source.CertIn/
StellaOps.Feedser.Source.CertFr/
StellaOps.Feedser.Source.CertBund/
StellaOps.Feedser.Source.Acsc/
StellaOps.Feedser.Source.Cccs/
StellaOps.Feedser.Source.Ru.Bdu/     # HTML→schema with LLM fallback (gated)
StellaOps.Feedser.Source.Ru.Nkcki/   # PDF/HTML bulletins → structured
StellaOps.Feedser.Source.Vndr.Msrc/
StellaOps.Feedser.Source.Vndr.Cisco/
StellaOps.Feedser.Source.Vndr.Oracle/
StellaOps.Feedser.Source.Vndr.Adobe/   # APSB ingest; emits vendor RangePrimitives with adobe.track/platform/priority telemetry + fixed-status provenance.
StellaOps.Feedser.Source.Vndr.Apple/
StellaOps.Feedser.Source.Vndr.Chromium/
StellaOps.Feedser.Source.Vndr.Vmware/
StellaOps.Feedser.Source.Distro.RedHat/
StellaOps.Feedser.Source.Distro.Debian/    # Fetches DSA list + detail HTML, emits EVR RangePrimitives with per-release provenance and telemetry.
StellaOps.Feedser.Source.Distro.Ubuntu/   # Ubuntu Security Notices connector (JSON index → EVR ranges with ubuntu.pocket telemetry).
StellaOps.Feedser.Source.Distro.Suse/     # CSAF fetch pipeline emitting NEVRA RangePrimitives with suse.status vendor telemetry.
StellaOps.Feedser.Source.Ics.Cisa/
StellaOps.Feedser.Source.Ics.Kaspersky/
StellaOps.Feedser.Normalization/     # Canonical mappers, validators, version-range normalization
StellaOps.Feedser.Merge/             # Identity graph, precedence, deterministic merge
StellaOps.Feedser.Exporter.Json/
StellaOps.Feedser.Exporter.TrivyDb/
StellaOps.Feedser.<Component>.Tests/  # Component-scoped unit/integration suites (Core, Storage.Mongo, Source.*, Exporter.*, WebService, etc.)

2) Runtime Shape

Process: single service (StellaOps.Feedser.WebService)

  • Program.cs: top-level entry using Generic Host, DI, Options binding from appsettings.json + environment + optional feedser.yaml.
  • Built-in scheduler (cron-like) + job manager with distributed locks in Mongo to prevent overlaps, enforce timeouts, allow cancel/kill.
  • REST APIs for health/readiness/progress/trigger/kill/status.

Key NuGet concepts (indicative): MongoDB.Driver, Polly (retry/backoff), System.Threading.Channels, Microsoft.Extensions.Http, Microsoft.Extensions.Hosting, Serilog, OpenTelemetry.


3) Data Storage — MongoDB (single source of truth)

Database: feedser Write concern: majority for merge/export state, acknowledged for raw docs. Collections (with “flags”/resume points):

  • source
    • _id, name, type, baseUrl, auth, notes.
  • source_state
    • Keys: sourceName (unique), enabled, cursor, lastSuccess, failCount, backoffUntil, paceOverrides, paused.
    • Drives incremental fetch/parse/map resume and operator pause/pace controls.
  • document
    • _id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId, etag, lastModified.
    • Index {sourceName:1, uri:1} unique; optional TTL for superseded versions.
  • dto
    • _id, sourceName, documentId, schemaVer, payload (BSON), validatedAt.
    • Index {sourceName:1, documentId:1}.
  • advisory
    • _id, advisoryKey, title, summary, lang, published, modified, severity, exploitKnown.
    • Unique {advisoryKey:1} plus indexes on modified and published.
  • alias
    • advisoryId, scheme, value with index {scheme:1, value:1}.
  • affected
    • advisoryId, platform, name, versionRange, cpe, purl, fixedBy, introducedVersion.
    • Index {platform:1, name:1}, {advisoryId:1}.
  • reference
    • advisoryId, url, kind, sourceTag (e.g., advisory/patch/kb).
  • Flags collections: kev_flag, ru_flags, jp_flags, psirt_flags keyed by advisoryId.
  • merge_event
    • _id, advisoryKey, beforeHash, afterHash, mergedAt, inputs (document ids).
  • export_state
    • _id (json/trivydb), baseExportId, baseDigest, lastFullDigest, lastDeltaDigest, exportCursor, targetRepo, exporterVersion.
  • locks
    • _id (jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt (TTL index cleans dead locks).
  • jobs
    • _id, type, args, state, startedAt, endedAt, error, owner, heartbeatAt, timeoutMs.

GridFS buckets: fs.documents for raw large payloads; referenced by document.gridFsId.


4) Job & Scheduler Model

  • Scheduler stores cron expressions per source/exporter in config; persists next-run pointers in Mongo.
  • Jobs acquire locks (locks collection) to ensure singleton execution per source/exporter.
  • Supports manual triggers via API endpoints (POST /jobs/{type}) and pause/resume toggles per source.

5) Connector Contracts

Connectors implement:

public interface IFeedConnector {
    string SourceName { get; }
    Task FetchAsync(IServiceProvider sp, CancellationToken ct);
    Task ParseAsync(IServiceProvider sp, CancellationToken ct);
    Task MapAsync(IServiceProvider sp, CancellationToken ct);
}
  • Fetch populates document rows respecting rate limits, conditional GET, and source_state.cursor.
  • Parse validates schema (JSON Schema, XSD) and writes sanitized DTO payloads.
  • Map produces canonical advisory rows + provenance entries; must be idempotent.
  • Base helpers in StellaOps.Feedser.Source.Common provide HTTP clients, retry policies, and watermark utilities.

6) Merge & Normalization

  • Canonical model stored in StellaOps.Feedser.Models with serialization contracts used by storage/export layers.
  • StellaOps.Feedser.Normalization handles NEVRA/EVR/PURL range parsing, CVSS normalization, localization.
  • StellaOps.Feedser.Merge builds alias graphs keyed by CVE first, then falls back to vendor/regional IDs.
  • Precedence rules: PSIRT/OVAL overrides generic ranges; KEV only toggles exploitation; regional feeds enrich severity but dont override vendor truth.
  • Determinism enforced via canonical JSON hashing logged in merge_event.

7) Exporters

  • JSON exporter mirrors aquasecurity/vuln-list layout with deterministic ordering and reproducible timestamps.
  • Trivy DB exporter shells out to trivy-db build, produces Bolt archives, and reuses unchanged blobs from the last full baseline when running in delta mode. The exporter annotates metadata.json with mode, baseExportId, baseManifestDigest, resetBaseline, and delta.changedFiles[]/delta.removedPaths[], and honours publishFull / publishDelta (ORAS) plus includeFull / includeDelta (offline bundle) toggles.
  • StellaOps.Feedser.Storage.Mongo provides cursors for delta exports based on export_state.exportCursor and the persisted per-file manifest (export_state.files).
  • Export jobs produce OCI tarballs (layer media type application/vnd.aquasec.trivy.db.layer.v1.tar+gzip) and optionally push via ORAS; metadata.json accompanies each layout so mirrors can decide between full refreshes and deltas.

8) Observability

  • Serilog structured logging with enrichment fields (source, uri, stage, durationMs).
  • OpenTelemetry traces around fetch/parse/map/export; metrics for rate limit hits, schema failures, dedupe ratios, package size. Connector HTTP metrics are emitted via the shared feedser.source.http.* instruments tagged with feedser.source=<connector> so per-source dashboards slice on that label instead of bespoke metric names.
  • Prometheus scraping endpoint served by WebService.

9) Security Considerations

  • Offline-first: connectors only reach allowlisted hosts.
  • BDU LLM fallback gated by config flag; logs audit trail with confidence score.
  • No secrets written to logs; secrets loaded via environment or mounted files.
  • Signing handled outside Feedser pipeline.

10) Deployment Notes

  • Default storage MongoDB; for air-gapped, bundle Mongo image + seeded data backup.
  • Horizontal scale achieved via multiple web service instances sharing Mongo locks.
  • Provide feedser.yaml template describing sources, rate limits, and export settings.