9.3 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	ARCHITECTURE.md — StellaOps.Feedser
Goal: Build a sovereign-ready, self-hostable feed-merge service that ingests authoritative vulnerability sources, normalizes and de-duplicates them into MongoDB, and exports JSON and Trivy-compatible DB artifacts. Form factor: Long-running Web Service with REST APIs (health, status, control) and an embedded internal cron scheduler. Controllable by StellaOps.Cli (# stella db ...) No signing inside Feedser (signing is a separate pipeline step). Runtime SDK baseline: .NET 10 Preview 7 (SDK 10.0.100-preview.7.25380.108) targeting
net10.0, aligned with the deployed api.stella-ops.org service. Four explicit stages:
- Source Download → raw documents.
- Parse & Normalize → schema-validated DTOs enriched with canonical identifiers.
- Merge & Deduplicate → precedence-aware canonical records persisted to MongoDB.
- Export → JSON or TrivyDB (full or delta), then (externally) sign/publish.
1) Naming & Solution Layout
Source connectors namespace prefix: StellaOps.Feedser.Source.*
Exporters:
- StellaOps.Feedser.Exporter.Json
- StellaOps.Feedser.Exporter.TrivyDb
Projects (/src):
StellaOps.Feedser.WebService/        # ASP.NET Core (Minimal API, net10.0 preview) WebService + embedded scheduler
StellaOps.Feedser.Core/              # Domain models, pipelines, merge/dedupe engine, jobs orchestration
StellaOps.Feedser.Models/            # Canonical POCOs, JSON Schemas, enums
StellaOps.Feedser.Storage.Mongo/     # Mongo repositories, GridFS access, indexes, resume "flags"
StellaOps.Feedser.Source.Common/     # HTTP clients, rate-limiters, schema validators, parsers utils
StellaOps.Feedser.Source.Cve/
StellaOps.Feedser.Source.Nvd/
StellaOps.Feedser.Source.Ghsa/
StellaOps.Feedser.Source.Osv/
StellaOps.Feedser.Source.Jvn/
StellaOps.Feedser.Source.CertCc/
StellaOps.Feedser.Source.Kev/
StellaOps.Feedser.Source.Kisa/
StellaOps.Feedser.Source.CertIn/
StellaOps.Feedser.Source.CertFr/
StellaOps.Feedser.Source.CertBund/
StellaOps.Feedser.Source.Acsc/
StellaOps.Feedser.Source.Cccs/
StellaOps.Feedser.Source.Ru.Bdu/     # HTML→schema with LLM fallback (gated)
StellaOps.Feedser.Source.Ru.Nkcki/   # PDF/HTML bulletins → structured
StellaOps.Feedser.Source.Vndr.Msrc/
StellaOps.Feedser.Source.Vndr.Cisco/
StellaOps.Feedser.Source.Vndr.Oracle/
StellaOps.Feedser.Source.Vndr.Adobe/   # APSB ingest; emits vendor RangePrimitives with adobe.track/platform/priority telemetry + fixed-status provenance.
StellaOps.Feedser.Source.Vndr.Apple/
StellaOps.Feedser.Source.Vndr.Chromium/
StellaOps.Feedser.Source.Vndr.Vmware/
StellaOps.Feedser.Source.Distro.RedHat/
StellaOps.Feedser.Source.Distro.Debian/    # Fetches DSA list + detail HTML, emits EVR RangePrimitives with per-release provenance and telemetry.
StellaOps.Feedser.Source.Distro.Ubuntu/   # Ubuntu Security Notices connector (JSON index → EVR ranges with ubuntu.pocket telemetry).
StellaOps.Feedser.Source.Distro.Suse/     # CSAF fetch pipeline emitting NEVRA RangePrimitives with suse.status vendor telemetry.
StellaOps.Feedser.Source.Ics.Cisa/
StellaOps.Feedser.Source.Ics.Kaspersky/
StellaOps.Feedser.Normalization/     # Canonical mappers, validators, version-range normalization
StellaOps.Feedser.Merge/             # Identity graph, precedence, deterministic merge
StellaOps.Feedser.Exporter.Json/
StellaOps.Feedser.Exporter.TrivyDb/
StellaOps.Feedser.<Component>.Tests/  # Component-scoped unit/integration suites (Core, Storage.Mongo, Source.*, Exporter.*, WebService, etc.)
2) Runtime Shape
Process: single service (StellaOps.Feedser.WebService)
- Program.cs: top-level entry using Generic Host, DI, Options binding from- appsettings.json+ environment + optional- feedser.yaml.
- Built-in scheduler (cron-like) + job manager with distributed locks in Mongo to prevent overlaps, enforce timeouts, allow cancel/kill.
- REST APIs for health/readiness/progress/trigger/kill/status.
Key NuGet concepts (indicative): MongoDB.Driver, Polly (retry/backoff), System.Threading.Channels, Microsoft.Extensions.Http, Microsoft.Extensions.Hosting, Serilog, OpenTelemetry.
3) Data Storage — MongoDB (single source of truth)
Database: feedser
Write concern: majority for merge/export state, acknowledged for raw docs.
Collections (with “flags”/resume points):
- source- _id,- name,- type,- baseUrl,- auth,- notes.
 
- source_state- Keys: sourceName(unique),enabled,cursor,lastSuccess,failCount,backoffUntil,paceOverrides,paused.
- Drives incremental fetch/parse/map resume and operator pause/pace controls.
 
- Keys: 
- document- _id,- sourceName,- uri,- fetchedAt,- sha256,- contentType,- status,- metadata,- gridFsId,- etag,- lastModified.
- Index {sourceName:1, uri:1}unique; optional TTL for superseded versions.
 
- dto- _id,- sourceName,- documentId,- schemaVer,- payload(BSON),- validatedAt.
- Index {sourceName:1, documentId:1}.
 
- advisory- _id,- advisoryKey,- title,- summary,- lang,- published,- modified,- severity,- exploitKnown.
- Unique {advisoryKey:1}plus indexes onmodifiedandpublished.
 
- alias- advisoryId,- scheme,- valuewith index- {scheme:1, value:1}.
 
- affected- advisoryId,- platform,- name,- versionRange,- cpe,- purl,- fixedBy,- introducedVersion.
- Index {platform:1, name:1},{advisoryId:1}.
 
- reference- advisoryId,- url,- kind,- sourceTag(e.g., advisory/patch/kb).
 
- Flags collections: kev_flag,ru_flags,jp_flags,psirt_flagskeyed byadvisoryId.
- merge_event- _id,- advisoryKey,- beforeHash,- afterHash,- mergedAt,- inputs(document ids).
 
- export_state- _id(- json/- trivydb),- baseExportId,- baseDigest,- lastFullDigest,- lastDeltaDigest,- exportCursor,- targetRepo,- exporterVersion.
 
- locks- _id(- jobKey),- holder,- acquiredAt,- heartbeatAt,- leaseMs,- ttlAt(TTL index cleans dead locks).
 
- jobs- _id,- type,- args,- state,- startedAt,- endedAt,- error,- owner,- heartbeatAt,- timeoutMs.
 
GridFS buckets: fs.documents for raw large payloads; referenced by document.gridFsId.
4) Job & Scheduler Model
- Scheduler stores cron expressions per source/exporter in config; persists next-run pointers in Mongo.
- Jobs acquire locks (lockscollection) to ensure singleton execution per source/exporter.
- Supports manual triggers via API endpoints (POST /jobs/{type}) and pause/resume toggles per source.
5) Connector Contracts
Connectors implement:
public interface IFeedConnector {
    string SourceName { get; }
    Task FetchAsync(IServiceProvider sp, CancellationToken ct);
    Task ParseAsync(IServiceProvider sp, CancellationToken ct);
    Task MapAsync(IServiceProvider sp, CancellationToken ct);
}
- Fetch populates documentrows respecting rate limits, conditional GET, andsource_state.cursor.
- Parse validates schema (JSON Schema, XSD) and writes sanitized DTO payloads.
- Map produces canonical advisory rows + provenance entries; must be idempotent.
- Base helpers in StellaOps.Feedser.Source.Commonprovide HTTP clients, retry policies, and watermark utilities.
6) Merge & Normalization
- Canonical model stored in StellaOps.Feedser.Modelswith serialization contracts used by storage/export layers.
- StellaOps.Feedser.Normalizationhandles NEVRA/EVR/PURL range parsing, CVSS normalization, localization.
- StellaOps.Feedser.Mergebuilds alias graphs keyed by CVE first, then falls back to vendor/regional IDs.
- Precedence rules: PSIRT/OVAL overrides generic ranges; KEV only toggles exploitation; regional feeds enrich severity but don’t override vendor truth.
- Determinism enforced via canonical JSON hashing logged in merge_event.
7) Exporters
- JSON exporter mirrors aquasecurity/vuln-listlayout with deterministic ordering and reproducible timestamps.
- Trivy DB exporter initially shells out to trivy-dbbuilder; later will emit BoltDB directly.
- StellaOps.Feedser.Storage.Mongoprovides cursors for delta exports based on- export_state.exportCursor.
- Export jobs produce OCI tarballs (layer media type application/vnd.aquasec.trivy.db.layer.v1.tar+gzip) and optionally push via ORAS.
8) Observability
- Serilog structured logging with enrichment fields (source,uri,stage,durationMs).
- OpenTelemetry traces around fetch/parse/map/export; metrics for rate limit hits, schema failures, dedupe ratios, package size. Connector HTTP metrics are emitted via the shared feedser.source.http.*instruments tagged withfeedser.source=<connector>so per-source dashboards slice on that label instead of bespoke metric names.
- Prometheus scraping endpoint served by WebService.
9) Security Considerations
- Offline-first: connectors only reach allowlisted hosts.
- BDU LLM fallback gated by config flag; logs audit trail with confidence score.
- No secrets written to logs; secrets loaded via environment or mounted files.
- Signing handled outside Feedser pipeline.
10) Deployment Notes
- Default storage MongoDB; for air-gapped, bundle Mongo image + seeded data backup.
- Horizontal scale achieved via multiple web service instances sharing Mongo locks.
- Provide feedser.yamltemplate describing sources, rate limits, and export settings.