Initial commit (history squashed)
This commit is contained in:
		
							
								
								
									
										190
									
								
								docs/ARCHITECTURE_FEEDSER.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										190
									
								
								docs/ARCHITECTURE_FEEDSER.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,190 @@ | ||||
| # ARCHITECTURE.md — **StellaOps.Feedser** | ||||
|  | ||||
| > **Goal**: Build a sovereign-ready, self-hostable **feed-merge service** that ingests authoritative vulnerability sources, normalizes and de-duplicates them into **MongoDB**, and exports **JSON** and **Trivy-compatible DB** artifacts. | ||||
| > **Form factor**: Long-running **Web Service** with **REST APIs** (health, status, control) and an embedded **internal cron scheduler**. Controllable by StellaOps.Cli (# stella db ...) | ||||
| > **No signing inside Feedser** (signing is a separate pipeline step). | ||||
| > **Runtime SDK baseline**: .NET 10 Preview 7 (SDK 10.0.100-preview.7.25380.108) targeting `net10.0`, aligned with the deployed api.stella-ops.org service. | ||||
| > **Four explicit stages**: | ||||
| > | ||||
| > 1. **Source Download** → raw documents. | ||||
| > 2. **Parse & Normalize** → schema-validated DTOs enriched with canonical identifiers. | ||||
| > 3. **Merge & Deduplicate** → precedence-aware canonical records persisted to MongoDB. | ||||
| > 4. **Export** → JSON or TrivyDB (full or delta), then (externally) sign/publish. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Naming & Solution Layout | ||||
|  | ||||
| **Source connectors** namespace prefix: `StellaOps.Feedser.Source.*` | ||||
| **Exporters**: | ||||
|  | ||||
| * `StellaOps.Feedser.Exporter.Json` | ||||
| * `StellaOps.Feedser.Exporter.TrivyDb` | ||||
|  | ||||
| **Projects** (`/src`): | ||||
|  | ||||
| ``` | ||||
| StellaOps.Feedser.WebService/        # ASP.NET Core (Minimal API, net10.0 preview) WebService + embedded scheduler | ||||
| StellaOps.Feedser.Core/              # Domain models, pipelines, merge/dedupe engine, jobs orchestration | ||||
| StellaOps.Feedser.Models/            # Canonical POCOs, JSON Schemas, enums | ||||
| StellaOps.Feedser.Storage.Mongo/     # Mongo repositories, GridFS access, indexes, resume "flags" | ||||
| StellaOps.Feedser.Source.Common/     # HTTP clients, rate-limiters, schema validators, parsers utils | ||||
| StellaOps.Feedser.Source.Cve/ | ||||
| StellaOps.Feedser.Source.Nvd/ | ||||
| StellaOps.Feedser.Source.Ghsa/ | ||||
| StellaOps.Feedser.Source.Osv/ | ||||
| StellaOps.Feedser.Source.Jvn/ | ||||
| StellaOps.Feedser.Source.CertCc/ | ||||
| StellaOps.Feedser.Source.Kev/ | ||||
| StellaOps.Feedser.Source.Kisa/ | ||||
| StellaOps.Feedser.Source.CertIn/ | ||||
| StellaOps.Feedser.Source.CertFr/ | ||||
| StellaOps.Feedser.Source.CertBund/ | ||||
| StellaOps.Feedser.Source.Acsc/ | ||||
| StellaOps.Feedser.Source.Cccs/ | ||||
| StellaOps.Feedser.Source.Ru.Bdu/     # HTML→schema with LLM fallback (gated) | ||||
| StellaOps.Feedser.Source.Ru.Nkcki/   # PDF/HTML bulletins → structured | ||||
| StellaOps.Feedser.Source.Vndr.Msrc/ | ||||
| StellaOps.Feedser.Source.Vndr.Cisco/ | ||||
| StellaOps.Feedser.Source.Vndr.Oracle/ | ||||
| StellaOps.Feedser.Source.Vndr.Adobe/ | ||||
| StellaOps.Feedser.Source.Vndr.Apple/ | ||||
| StellaOps.Feedser.Source.Vndr.Chromium/ | ||||
| StellaOps.Feedser.Source.Vndr.Vmware/ | ||||
| StellaOps.Feedser.Source.Distro.RedHat/ | ||||
| StellaOps.Feedser.Source.Distro.Ubuntu/ | ||||
| StellaOps.Feedser.Source.Distro.Debian/ | ||||
| StellaOps.Feedser.Source.Distro.Suse/ | ||||
| StellaOps.Feedser.Source.Ics.Cisa/ | ||||
| StellaOps.Feedser.Source.Ics.Kaspersky/ | ||||
| StellaOps.Feedser.Normalization/     # Canonical mappers, validators, version-range normalization | ||||
| StellaOps.Feedser.Merge/             # Identity graph, precedence, deterministic merge | ||||
| StellaOps.Feedser.Exporter.Json/ | ||||
| StellaOps.Feedser.Exporter.TrivyDb/ | ||||
| StellaOps.Feedser.<Component>.Tests/  # Component-scoped unit/integration suites (Core, Storage.Mongo, Source.*, Exporter.*, WebService, etc.) | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Runtime Shape | ||||
|  | ||||
| **Process**: single service (`StellaOps.Feedser.WebService`) | ||||
|  | ||||
| * `Program.cs`: top-level entry using **Generic Host**, **DI**, **Options** binding from `appsettings.json` + environment + optional `feedser.yaml`. | ||||
| * Built-in **scheduler** (cron-like) + **job manager** with **distributed locks** in Mongo to prevent overlaps, enforce timeouts, allow cancel/kill. | ||||
| * **REST APIs** for health/readiness/progress/trigger/kill/status. | ||||
|  | ||||
| **Key NuGet concepts** (indicative): `MongoDB.Driver`, `Polly` (retry/backoff), `System.Threading.Channels`, `Microsoft.Extensions.Http`, `Microsoft.Extensions.Hosting`, `Serilog`, `OpenTelemetry`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Data Storage — **MongoDB** (single source of truth) | ||||
|  | ||||
| **Database**: `feedser` | ||||
| **Write concern**: `majority` for merge/export state, `acknowledged` for raw docs. | ||||
| **Collections** (with “flags”/resume points): | ||||
|  | ||||
| * `source` | ||||
|   * `_id`, `name`, `type`, `baseUrl`, `auth`, `notes`. | ||||
| * `source_state` | ||||
|   * Keys: `sourceName` (unique), `enabled`, `cursor`, `lastSuccess`, `failCount`, `backoffUntil`, `paceOverrides`, `paused`. | ||||
|   * Drives incremental fetch/parse/map resume and operator pause/pace controls. | ||||
| * `document` | ||||
|   * `_id`, `sourceName`, `uri`, `fetchedAt`, `sha256`, `contentType`, `status`, `metadata`, `gridFsId`, `etag`, `lastModified`. | ||||
|   * Index `{sourceName:1, uri:1}` unique; optional TTL for superseded versions. | ||||
| * `dto` | ||||
|   * `_id`, `sourceName`, `documentId`, `schemaVer`, `payload` (BSON), `validatedAt`. | ||||
|   * Index `{sourceName:1, documentId:1}`. | ||||
| * `advisory` | ||||
|   * `_id`, `advisoryKey`, `title`, `summary`, `lang`, `published`, `modified`, `severity`, `exploitKnown`. | ||||
|   * Unique `{advisoryKey:1}` plus indexes on `modified` and `published`. | ||||
| * `alias` | ||||
|   * `advisoryId`, `scheme`, `value` with index `{scheme:1, value:1}`. | ||||
| * `affected` | ||||
|   * `advisoryId`, `platform`, `name`, `versionRange`, `cpe`, `purl`, `fixedBy`, `introducedVersion`. | ||||
|   * Index `{platform:1, name:1}`, `{advisoryId:1}`. | ||||
| * `reference` | ||||
|   * `advisoryId`, `url`, `kind`, `sourceTag` (e.g., advisory/patch/kb). | ||||
| * Flags collections: `kev_flag`, `ru_flags`, `jp_flags`, `psirt_flags` keyed by `advisoryId`. | ||||
| * `merge_event` | ||||
|   * `_id`, `advisoryKey`, `beforeHash`, `afterHash`, `mergedAt`, `inputs` (document ids). | ||||
| * `export_state` | ||||
|   * `_id` (`json`/`trivydb`), `baseExportId`, `baseDigest`, `lastFullDigest`, `lastDeltaDigest`, `exportCursor`, `targetRepo`, `exporterVersion`. | ||||
| * `locks` | ||||
|   * `_id` (`jobKey`), `holder`, `acquiredAt`, `heartbeatAt`, `leaseMs`, `ttlAt` (TTL index cleans dead locks). | ||||
| * `jobs` | ||||
|   * `_id`, `type`, `args`, `state`, `startedAt`, `endedAt`, `error`, `owner`, `heartbeatAt`, `timeoutMs`. | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw large payloads; referenced by `document.gridFsId`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Job & Scheduler Model | ||||
|  | ||||
| * Scheduler stores cron expressions per source/exporter in config; persists next-run pointers in Mongo. | ||||
| * Jobs acquire locks (`locks` collection) to ensure singleton execution per source/exporter. | ||||
| * Supports manual triggers via API endpoints (`POST /jobs/{type}`) and pause/resume toggles per source. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Connector Contracts | ||||
|  | ||||
| Connectors implement: | ||||
|  | ||||
| ```csharp | ||||
| public interface IFeedConnector { | ||||
|     string SourceName { get; } | ||||
|     Task FetchAsync(IServiceProvider sp, CancellationToken ct); | ||||
|     Task ParseAsync(IServiceProvider sp, CancellationToken ct); | ||||
|     Task MapAsync(IServiceProvider sp, CancellationToken ct); | ||||
| } | ||||
| ``` | ||||
|  | ||||
| * Fetch populates `document` rows respecting rate limits, conditional GET, and `source_state.cursor`. | ||||
| * Parse validates schema (JSON Schema, XSD) and writes sanitized DTO payloads. | ||||
| * Map produces canonical advisory rows + provenance entries; must be idempotent. | ||||
| * Base helpers in `StellaOps.Feedser.Source.Common` provide HTTP clients, retry policies, and watermark utilities. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Merge & Normalization | ||||
|  | ||||
| * Canonical model stored in `StellaOps.Feedser.Models` with serialization contracts used by storage/export layers. | ||||
| * `StellaOps.Feedser.Normalization` handles NEVRA/EVR/PURL range parsing, CVSS normalization, localization. | ||||
| * `StellaOps.Feedser.Merge` builds alias graphs keyed by CVE first, then falls back to vendor/regional IDs. | ||||
| * Precedence rules: PSIRT/OVAL overrides generic ranges; KEV only toggles exploitation; regional feeds enrich severity but don’t override vendor truth. | ||||
| * Determinism enforced via canonical JSON hashing logged in `merge_event`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Exporters | ||||
|  | ||||
| * JSON exporter mirrors `aquasecurity/vuln-list` layout with deterministic ordering and reproducible timestamps. | ||||
| * Trivy DB exporter initially shells out to `trivy-db` builder; later will emit BoltDB directly. | ||||
| * `StellaOps.Feedser.Storage.Mongo` provides cursors for delta exports based on `export_state.exportCursor`. | ||||
| * Export jobs produce OCI tarballs (layer media type `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip`) and optionally push via ORAS. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Observability | ||||
|  | ||||
| * Serilog structured logging with enrichment fields (`source`, `uri`, `stage`, `durationMs`). | ||||
| * OpenTelemetry traces around fetch/parse/map/export; metrics for rate limit hits, schema failures, dedupe ratios, package size. | ||||
| * Prometheus scraping endpoint served by WebService. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Security Considerations | ||||
|  | ||||
| * Offline-first: connectors only reach allowlisted hosts. | ||||
| * BDU LLM fallback gated by config flag; logs audit trail with confidence score. | ||||
| * No secrets written to logs; secrets loaded via environment or mounted files. | ||||
| * Signing handled outside Feedser pipeline. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Deployment Notes | ||||
|  | ||||
| * Default storage MongoDB; for air-gapped, bundle Mongo image + seeded data backup. | ||||
| * Horizontal scale achieved via multiple web service instances sharing Mongo locks. | ||||
| * Provide `feedser.yaml` template describing sources, rate limits, and export settings. | ||||
		Reference in New Issue
	
	Block a user