commit and up
This commit is contained in:
190
docs/ARCHITECTURE_FEEDSER.md
Normal file
190
docs/ARCHITECTURE_FEEDSER.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# ARCHITECTURE.md — **StellaOps.Feedser**
|
||||
|
||||
> **Goal**: Build a sovereign-ready, self-hostable **feed-merge service** that ingests authoritative vulnerability sources, normalizes and de-duplicates them into **MongoDB**, and exports **JSON** and **Trivy-compatible DB** artifacts.
|
||||
> **Form factor**: Long-running **Web Service** with **REST APIs** (health, status, control) and an embedded **internal cron scheduler**. Controllable by StellaOps.Cli (# stella db ...)
|
||||
> **No signing inside Feedser** (signing is a separate pipeline step).
|
||||
> **Runtime SDK baseline**: .NET 10 Preview 7 (SDK 10.0.100-preview.7.25380.108) targeting `net10.0`, aligned with the deployed api.stella-ops.org service.
|
||||
> **Four explicit stages**:
|
||||
>
|
||||
> 1. **Source Download** → raw documents.
|
||||
> 2. **Parse & Normalize** → schema-validated DTOs enriched with canonical identifiers.
|
||||
> 3. **Merge & Deduplicate** → precedence-aware canonical records persisted to MongoDB.
|
||||
> 4. **Export** → JSON or TrivyDB (full or delta), then (externally) sign/publish.
|
||||
|
||||
---
|
||||
|
||||
## 1) Naming & Solution Layout
|
||||
|
||||
**Source connectors** namespace prefix: `StellaOps.Feedser.Source.*`
|
||||
**Exporters**:
|
||||
|
||||
* `StellaOps.Feedser.Exporter.Json`
|
||||
* `StellaOps.Feedser.Exporter.TrivyDb`
|
||||
|
||||
**Projects** (`/src`):
|
||||
|
||||
```
|
||||
StellaOps.Feedser.WebService/ # ASP.NET Core (Minimal API, net10.0 preview) WebService + embedded scheduler
|
||||
StellaOps.Feedser.Core/ # Domain models, pipelines, merge/dedupe engine, jobs orchestration
|
||||
StellaOps.Feedser.Models/ # Canonical POCOs, JSON Schemas, enums
|
||||
StellaOps.Feedser.Storage.Mongo/ # Mongo repositories, GridFS access, indexes, resume "flags"
|
||||
StellaOps.Feedser.Source.Common/ # HTTP clients, rate-limiters, schema validators, parsers utils
|
||||
StellaOps.Feedser.Source.Cve/
|
||||
StellaOps.Feedser.Source.Nvd/
|
||||
StellaOps.Feedser.Source.Ghsa/
|
||||
StellaOps.Feedser.Source.Osv/
|
||||
StellaOps.Feedser.Source.Jvn/
|
||||
StellaOps.Feedser.Source.CertCc/
|
||||
StellaOps.Feedser.Source.Kev/
|
||||
StellaOps.Feedser.Source.Kisa/
|
||||
StellaOps.Feedser.Source.CertIn/
|
||||
StellaOps.Feedser.Source.CertFr/
|
||||
StellaOps.Feedser.Source.CertBund/
|
||||
StellaOps.Feedser.Source.Acsc/
|
||||
StellaOps.Feedser.Source.Cccs/
|
||||
StellaOps.Feedser.Source.Ru.Bdu/ # HTML→schema with LLM fallback (gated)
|
||||
StellaOps.Feedser.Source.Ru.Nkcki/ # PDF/HTML bulletins → structured
|
||||
StellaOps.Feedser.Source.Vndr.Msrc/
|
||||
StellaOps.Feedser.Source.Vndr.Cisco/
|
||||
StellaOps.Feedser.Source.Vndr.Oracle/
|
||||
StellaOps.Feedser.Source.Vndr.Adobe/
|
||||
StellaOps.Feedser.Source.Vndr.Apple/
|
||||
StellaOps.Feedser.Source.Vndr.Chromium/
|
||||
StellaOps.Feedser.Source.Vndr.Vmware/
|
||||
StellaOps.Feedser.Source.Distro.RedHat/
|
||||
StellaOps.Feedser.Source.Distro.Ubuntu/
|
||||
StellaOps.Feedser.Source.Distro.Debian/
|
||||
StellaOps.Feedser.Source.Distro.Suse/
|
||||
StellaOps.Feedser.Source.Ics.Cisa/
|
||||
StellaOps.Feedser.Source.Ics.Kaspersky/
|
||||
StellaOps.Feedser.Normalization/ # Canonical mappers, validators, version-range normalization
|
||||
StellaOps.Feedser.Merge/ # Identity graph, precedence, deterministic merge
|
||||
StellaOps.Feedser.Exporter.Json/
|
||||
StellaOps.Feedser.Exporter.TrivyDb/
|
||||
StellaOps.Feedser.<Component>.Tests/ # Component-scoped unit/integration suites (Core, Storage.Mongo, Source.*, Exporter.*, WebService, etc.)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2) Runtime Shape
|
||||
|
||||
**Process**: single service (`StellaOps.Feedser.WebService`)
|
||||
|
||||
* `Program.cs`: top-level entry using **Generic Host**, **DI**, **Options** binding from `appsettings.json` + environment + optional `feedser.yaml`.
|
||||
* Built-in **scheduler** (cron-like) + **job manager** with **distributed locks** in Mongo to prevent overlaps, enforce timeouts, allow cancel/kill.
|
||||
* **REST APIs** for health/readiness/progress/trigger/kill/status.
|
||||
|
||||
**Key NuGet concepts** (indicative): `MongoDB.Driver`, `Polly` (retry/backoff), `System.Threading.Channels`, `Microsoft.Extensions.Http`, `Microsoft.Extensions.Hosting`, `Serilog`, `OpenTelemetry`.
|
||||
|
||||
---
|
||||
|
||||
## 3) Data Storage — **MongoDB** (single source of truth)
|
||||
|
||||
**Database**: `feedser`
|
||||
**Write concern**: `majority` for merge/export state, `acknowledged` for raw docs.
|
||||
**Collections** (with “flags”/resume points):
|
||||
|
||||
* `source`
|
||||
* `_id`, `name`, `type`, `baseUrl`, `auth`, `notes`.
|
||||
* `source_state`
|
||||
* Keys: `sourceName` (unique), `enabled`, `cursor`, `lastSuccess`, `failCount`, `backoffUntil`, `paceOverrides`, `paused`.
|
||||
* Drives incremental fetch/parse/map resume and operator pause/pace controls.
|
||||
* `document`
|
||||
* `_id`, `sourceName`, `uri`, `fetchedAt`, `sha256`, `contentType`, `status`, `metadata`, `gridFsId`, `etag`, `lastModified`.
|
||||
* Index `{sourceName:1, uri:1}` unique; optional TTL for superseded versions.
|
||||
* `dto`
|
||||
* `_id`, `sourceName`, `documentId`, `schemaVer`, `payload` (BSON), `validatedAt`.
|
||||
* Index `{sourceName:1, documentId:1}`.
|
||||
* `advisory`
|
||||
* `_id`, `advisoryKey`, `title`, `summary`, `lang`, `published`, `modified`, `severity`, `exploitKnown`.
|
||||
* Unique `{advisoryKey:1}` plus indexes on `modified` and `published`.
|
||||
* `alias`
|
||||
* `advisoryId`, `scheme`, `value` with index `{scheme:1, value:1}`.
|
||||
* `affected`
|
||||
* `advisoryId`, `platform`, `name`, `versionRange`, `cpe`, `purl`, `fixedBy`, `introducedVersion`.
|
||||
* Index `{platform:1, name:1}`, `{advisoryId:1}`.
|
||||
* `reference`
|
||||
* `advisoryId`, `url`, `kind`, `sourceTag` (e.g., advisory/patch/kb).
|
||||
* Flags collections: `kev_flag`, `ru_flags`, `jp_flags`, `psirt_flags` keyed by `advisoryId`.
|
||||
* `merge_event`
|
||||
* `_id`, `advisoryKey`, `beforeHash`, `afterHash`, `mergedAt`, `inputs` (document ids).
|
||||
* `export_state`
|
||||
* `_id` (`json`/`trivydb`), `baseExportId`, `baseDigest`, `lastFullDigest`, `lastDeltaDigest`, `exportCursor`, `targetRepo`, `exporterVersion`.
|
||||
* `locks`
|
||||
* `_id` (`jobKey`), `holder`, `acquiredAt`, `heartbeatAt`, `leaseMs`, `ttlAt` (TTL index cleans dead locks).
|
||||
* `jobs`
|
||||
* `_id`, `type`, `args`, `state`, `startedAt`, `endedAt`, `error`, `owner`, `heartbeatAt`, `timeoutMs`.
|
||||
|
||||
**GridFS buckets**: `fs.documents` for raw large payloads; referenced by `document.gridFsId`.
|
||||
|
||||
---
|
||||
|
||||
## 4) Job & Scheduler Model
|
||||
|
||||
* Scheduler stores cron expressions per source/exporter in config; persists next-run pointers in Mongo.
|
||||
* Jobs acquire locks (`locks` collection) to ensure singleton execution per source/exporter.
|
||||
* Supports manual triggers via API endpoints (`POST /jobs/{type}`) and pause/resume toggles per source.
|
||||
|
||||
---
|
||||
|
||||
## 5) Connector Contracts
|
||||
|
||||
Connectors implement:
|
||||
|
||||
```csharp
|
||||
public interface IFeedConnector {
|
||||
string SourceName { get; }
|
||||
Task FetchAsync(IServiceProvider sp, CancellationToken ct);
|
||||
Task ParseAsync(IServiceProvider sp, CancellationToken ct);
|
||||
Task MapAsync(IServiceProvider sp, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
* Fetch populates `document` rows respecting rate limits, conditional GET, and `source_state.cursor`.
|
||||
* Parse validates schema (JSON Schema, XSD) and writes sanitized DTO payloads.
|
||||
* Map produces canonical advisory rows + provenance entries; must be idempotent.
|
||||
* Base helpers in `StellaOps.Feedser.Source.Common` provide HTTP clients, retry policies, and watermark utilities.
|
||||
|
||||
---
|
||||
|
||||
## 6) Merge & Normalization
|
||||
|
||||
* Canonical model stored in `StellaOps.Feedser.Models` with serialization contracts used by storage/export layers.
|
||||
* `StellaOps.Feedser.Normalization` handles NEVRA/EVR/PURL range parsing, CVSS normalization, localization.
|
||||
* `StellaOps.Feedser.Merge` builds alias graphs keyed by CVE first, then falls back to vendor/regional IDs.
|
||||
* Precedence rules: PSIRT/OVAL overrides generic ranges; KEV only toggles exploitation; regional feeds enrich severity but don’t override vendor truth.
|
||||
* Determinism enforced via canonical JSON hashing logged in `merge_event`.
|
||||
|
||||
---
|
||||
|
||||
## 7) Exporters
|
||||
|
||||
* JSON exporter mirrors `aquasecurity/vuln-list` layout with deterministic ordering and reproducible timestamps.
|
||||
* Trivy DB exporter initially shells out to `trivy-db` builder; later will emit BoltDB directly.
|
||||
* `StellaOps.Feedser.Storage.Mongo` provides cursors for delta exports based on `export_state.exportCursor`.
|
||||
* Export jobs produce OCI tarballs (layer media type `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip`) and optionally push via ORAS.
|
||||
|
||||
---
|
||||
|
||||
## 8) Observability
|
||||
|
||||
* Serilog structured logging with enrichment fields (`source`, `uri`, `stage`, `durationMs`).
|
||||
* OpenTelemetry traces around fetch/parse/map/export; metrics for rate limit hits, schema failures, dedupe ratios, package size.
|
||||
* Prometheus scraping endpoint served by WebService.
|
||||
|
||||
---
|
||||
|
||||
## 9) Security Considerations
|
||||
|
||||
* Offline-first: connectors only reach allowlisted hosts.
|
||||
* BDU LLM fallback gated by config flag; logs audit trail with confidence score.
|
||||
* No secrets written to logs; secrets loaded via environment or mounted files.
|
||||
* Signing handled outside Feedser pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 10) Deployment Notes
|
||||
|
||||
* Default storage MongoDB; for air-gapped, bundle Mongo image + seeded data backup.
|
||||
* Horizontal scale achieved via multiple web service instances sharing Mongo locks.
|
||||
* Provide `feedser.yaml` template describing sources, rate limits, and export settings.
|
||||
Reference in New Issue
Block a user