Rewrite architecture docs and add Vexer connector template
This commit is contained in:
		| @@ -1,190 +1,433 @@ | ||||
| # ARCHITECTURE.md — **StellaOps.Feedser** | ||||
| # component_architecture_feedser.md — **Stella Ops Feedser** (2025Q4) | ||||
|  | ||||
| > **Goal**: Build a sovereign-ready, self-hostable **feed-merge service** that ingests authoritative vulnerability sources, normalizes and de-duplicates them into **MongoDB**, and exports **JSON** and **Trivy-compatible DB** artifacts. | ||||
| > **Form factor**: Long-running **Web Service** with **REST APIs** (health, status, control) and an embedded **internal cron scheduler**. Controllable by StellaOps.Cli (# stella db ...) | ||||
| > **No signing inside Feedser** (signing is a separate pipeline step). | ||||
| > **Runtime SDK baseline**: .NET 10 Preview 7 (SDK 10.0.100-preview.7.25380.108) targeting `net10.0`, aligned with the deployed api.stella-ops.org service. | ||||
| > **Four explicit stages**: | ||||
| > | ||||
| > 1. **Source Download** → raw documents. | ||||
| > 2. **Parse & Normalize** → schema-validated DTOs enriched with canonical identifiers. | ||||
| > 3. **Merge & Deduplicate** → precedence-aware canonical records persisted to MongoDB. | ||||
| > 4. **Export** → JSON or TrivyDB (full or delta), then (externally) sign/publish. | ||||
| > **Scope.** Implementation‑ready architecture for **Feedser**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Vexer pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Naming & Solution Layout | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Source connectors** namespace prefix: `StellaOps.Feedser.Source.*` | ||||
| **Exporters**: | ||||
| **Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins. | ||||
|  | ||||
| * `StellaOps.Feedser.Exporter.Json` | ||||
| * `StellaOps.Feedser.Exporter.TrivyDb` | ||||
| **Boundaries.** | ||||
|  | ||||
| **Projects** (`/src`): | ||||
| * Feedser **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (out‑of‑process). | ||||
| * Feedser **does not** decide PASS/FAIL; it provides data to the **Policy** engine. | ||||
| * Online operation is **allowlist‑only**; air‑gapped deployments use the **Offline Kit**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Topology & processes | ||||
|  | ||||
| **Process shape:** single ASP.NET Core service `StellaOps.Feedser.WebService` hosting: | ||||
|  | ||||
| * **Scheduler** with distributed locks (Mongo backed). | ||||
| * **Connectors** (fetch/parse/map). | ||||
| * **Merger** (canonical record assembly + precedence). | ||||
| * **Exporters** (JSON, Trivy DB). | ||||
| * **Minimal REST** for health/status/trigger/export. | ||||
|  | ||||
| **Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Canonical domain model | ||||
|  | ||||
| > Stored in MongoDB (database `feedser`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps). | ||||
|  | ||||
| ### 2.1 Core entities | ||||
|  | ||||
| **Advisory** | ||||
|  | ||||
| ``` | ||||
| StellaOps.Feedser.WebService/        # ASP.NET Core (Minimal API, net10.0 preview) WebService + embedded scheduler | ||||
| StellaOps.Feedser.Core/              # Domain models, pipelines, merge/dedupe engine, jobs orchestration | ||||
| StellaOps.Feedser.Models/            # Canonical POCOs, JSON Schemas, enums | ||||
| StellaOps.Feedser.Storage.Mongo/     # Mongo repositories, GridFS access, indexes, resume "flags" | ||||
| StellaOps.Feedser.Source.Common/     # HTTP clients, rate-limiters, schema validators, parsers utils | ||||
| StellaOps.Feedser.Source.Cve/ | ||||
| StellaOps.Feedser.Source.Nvd/ | ||||
| StellaOps.Feedser.Source.Ghsa/ | ||||
| StellaOps.Feedser.Source.Osv/ | ||||
| StellaOps.Feedser.Source.Jvn/ | ||||
| StellaOps.Feedser.Source.CertCc/ | ||||
| StellaOps.Feedser.Source.Kev/ | ||||
| StellaOps.Feedser.Source.Kisa/ | ||||
| StellaOps.Feedser.Source.CertIn/ | ||||
| StellaOps.Feedser.Source.CertFr/ | ||||
| StellaOps.Feedser.Source.CertBund/ | ||||
| StellaOps.Feedser.Source.Acsc/ | ||||
| StellaOps.Feedser.Source.Cccs/ | ||||
| StellaOps.Feedser.Source.Ru.Bdu/     # HTML→schema with LLM fallback (gated) | ||||
| StellaOps.Feedser.Source.Ru.Nkcki/   # PDF/HTML bulletins → structured | ||||
| StellaOps.Feedser.Source.Vndr.Msrc/ | ||||
| StellaOps.Feedser.Source.Vndr.Cisco/ | ||||
| StellaOps.Feedser.Source.Vndr.Oracle/ | ||||
| StellaOps.Feedser.Source.Vndr.Adobe/   # APSB ingest; emits vendor RangePrimitives with adobe.track/platform/priority telemetry + fixed-status provenance. | ||||
| StellaOps.Feedser.Source.Vndr.Apple/ | ||||
| StellaOps.Feedser.Source.Vndr.Chromium/ | ||||
| StellaOps.Feedser.Source.Vndr.Vmware/ | ||||
| StellaOps.Feedser.Source.Distro.RedHat/ | ||||
| StellaOps.Feedser.Source.Distro.Debian/    # Fetches DSA list + detail HTML, emits EVR RangePrimitives with per-release provenance and telemetry. | ||||
| StellaOps.Feedser.Source.Distro.Ubuntu/   # Ubuntu Security Notices connector (JSON index → EVR ranges with ubuntu.pocket telemetry). | ||||
| StellaOps.Feedser.Source.Distro.Suse/     # CSAF fetch pipeline emitting NEVRA RangePrimitives with suse.status vendor telemetry. | ||||
| StellaOps.Feedser.Source.Ics.Cisa/ | ||||
| StellaOps.Feedser.Source.Ics.Kaspersky/ | ||||
| StellaOps.Feedser.Normalization/     # Canonical mappers, validators, version-range normalization | ||||
| StellaOps.Feedser.Merge/             # Identity graph, precedence, deterministic merge | ||||
| StellaOps.Feedser.Exporter.Json/ | ||||
| StellaOps.Feedser.Exporter.TrivyDb/ | ||||
| StellaOps.Feedser.<Component>.Tests/  # Component-scoped unit/integration suites (Core, Storage.Mongo, Source.*, Exporter.*, WebService, etc.) | ||||
| advisoryId          // internal GUID | ||||
| advisoryKey         // stable string key (e.g., CVE-2025-12345 or vendor ID) | ||||
| title               // short title (best-of from sources) | ||||
| summary             // normalized summary (English; i18n optional) | ||||
| published           // earliest source timestamp | ||||
| modified            // latest source timestamp | ||||
| severity            // normalized {none, low, medium, high, critical} | ||||
| cvss                // {v2?, v3?, v4?} objects (vector, baseScore, severity, source) | ||||
| exploitKnown        // bool (e.g., KEV/active exploitation flags) | ||||
| references[]        // typed links (advisory, kb, patch, vendor, exploit, blog) | ||||
| sources[]           // provenance for traceability (doc digests, URIs) | ||||
| ``` | ||||
|  | ||||
| --- | ||||
| **Alias** | ||||
|  | ||||
| ## 2) Runtime Shape | ||||
| ``` | ||||
| advisoryId | ||||
| scheme              // CVE, GHSA, RHSA, DSA, USN, MSRC, etc. | ||||
| value               // e.g., "CVE-2025-12345" | ||||
| ``` | ||||
|  | ||||
| **Process**: single service (`StellaOps.Feedser.WebService`) | ||||
| **Affected** | ||||
|  | ||||
| * `Program.cs`: top-level entry using **Generic Host**, **DI**, **Options** binding from `appsettings.json` + environment + optional `feedser.yaml`. | ||||
| * Built-in **scheduler** (cron-like) + **job manager** with **distributed locks** in Mongo to prevent overlaps, enforce timeouts, allow cancel/kill. | ||||
| * **REST APIs** for health/readiness/progress/trigger/kill/status. | ||||
| ``` | ||||
| advisoryId | ||||
| productKey          // canonical product identity (see 2.2) | ||||
| rangeKind           // semver | evr | nvra | apk | rpm | deb | generic | exact | ||||
| introduced?         // string (format depends on rangeKind) | ||||
| fixed?              // string (format depends on rangeKind) | ||||
| lastKnownSafe?      // optional explicit safe floor | ||||
| arch?               // arch or platform qualifier if source declares (x86_64, aarch64) | ||||
| distro?             // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19) | ||||
| ecosystem?          // npm|pypi|maven|nuget|golang|… | ||||
| notes?              // normalized notes per source | ||||
| ``` | ||||
|  | ||||
| **Key NuGet concepts** (indicative): `MongoDB.Driver`, `Polly` (retry/backoff), `System.Threading.Channels`, `Microsoft.Extensions.Http`, `Microsoft.Extensions.Hosting`, `Serilog`, `OpenTelemetry`. | ||||
| **Reference** | ||||
|  | ||||
| ``` | ||||
| advisoryId | ||||
| url | ||||
| kind                // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf | ||||
| sourceTag           // e.g., vendor/redhat, distro/debian, oss/ghsa | ||||
| ``` | ||||
|  | ||||
| **MergeEvent** | ||||
|  | ||||
| ``` | ||||
| advisoryKey | ||||
| beforeHash          // canonical JSON hash before merge | ||||
| afterHash           // canonical JSON hash after merge | ||||
| mergedAt | ||||
| inputs[]            // source doc digests that contributed | ||||
| ``` | ||||
|  | ||||
| **ExportState** | ||||
|  | ||||
| ``` | ||||
| exportKind          // json | trivydb | ||||
| baseExportId?       // last full baseline | ||||
| baseDigest?         // digest of last full baseline | ||||
| lastFullDigest?     // digest of last full export | ||||
| lastDeltaDigest?    // digest of last delta export | ||||
| cursor              // per-kind incremental cursor | ||||
| files[]             // last manifest snapshot (path → sha256) | ||||
| ``` | ||||
|  | ||||
| ### 2.2 Product identity (`productKey`) | ||||
|  | ||||
| * **Primary:** `purl` (Package URL). | ||||
| * **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved. | ||||
| * **Secondary:** `cpe` retained for compatibility; advisory records may carry both. | ||||
| * **Image/platform:** `oci:<registry>/<repo>@<digest>` for image‑level advisories (rare). | ||||
| * **Unmappable:** if a source is non‑deterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **non‑joinable**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Data Storage — **MongoDB** (single source of truth) | ||||
| ## 3) Source families & precedence | ||||
|  | ||||
| **Database**: `feedser` | ||||
| **Write concern**: `majority` for merge/export state, `acknowledged` for raw docs. | ||||
| **Collections** (with “flags”/resume points): | ||||
| ### 3.1 Families | ||||
|  | ||||
| * `source` | ||||
|   * `_id`, `name`, `type`, `baseUrl`, `auth`, `notes`. | ||||
| * `source_state` | ||||
|   * Keys: `sourceName` (unique), `enabled`, `cursor`, `lastSuccess`, `failCount`, `backoffUntil`, `paceOverrides`, `paused`. | ||||
|   * Drives incremental fetch/parse/map resume and operator pause/pace controls. | ||||
| * `document` | ||||
|   * `_id`, `sourceName`, `uri`, `fetchedAt`, `sha256`, `contentType`, `status`, `metadata`, `gridFsId`, `etag`, `lastModified`. | ||||
|   * Index `{sourceName:1, uri:1}` unique; optional TTL for superseded versions. | ||||
| * `dto` | ||||
|   * `_id`, `sourceName`, `documentId`, `schemaVer`, `payload` (BSON), `validatedAt`. | ||||
|   * Index `{sourceName:1, documentId:1}`. | ||||
| * `advisory` | ||||
|   * `_id`, `advisoryKey`, `title`, `summary`, `lang`, `published`, `modified`, `severity`, `exploitKnown`. | ||||
|   * Unique `{advisoryKey:1}` plus indexes on `modified` and `published`. | ||||
| * `alias` | ||||
|   * `advisoryId`, `scheme`, `value` with index `{scheme:1, value:1}`. | ||||
| * `affected` | ||||
|   * `advisoryId`, `platform`, `name`, `versionRange`, `cpe`, `purl`, `fixedBy`, `introducedVersion`. | ||||
|   * Index `{platform:1, name:1}`, `{advisoryId:1}`. | ||||
| * `reference` | ||||
|   * `advisoryId`, `url`, `kind`, `sourceTag` (e.g., advisory/patch/kb). | ||||
| * Flags collections: `kev_flag`, `ru_flags`, `jp_flags`, `psirt_flags` keyed by `advisoryId`. | ||||
| * `merge_event` | ||||
|   * `_id`, `advisoryKey`, `beforeHash`, `afterHash`, `mergedAt`, `inputs` (document ids). | ||||
| * `export_state` | ||||
|   * `_id` (`json`/`trivydb`), `baseExportId`, `baseDigest`, `lastFullDigest`, `lastDeltaDigest`, `exportCursor`, `targetRepo`, `exporterVersion`. | ||||
| * `locks` | ||||
|   * `_id` (`jobKey`), `holder`, `acquiredAt`, `heartbeatAt`, `leaseMs`, `ttlAt` (TTL index cleans dead locks). | ||||
| * `jobs` | ||||
|   * `_id`, `type`, `args`, `state`, `startedAt`, `endedAt`, `error`, `owner`, `heartbeatAt`, `timeoutMs`. | ||||
| * **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium… | ||||
| * **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine… | ||||
| * **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go. | ||||
| * **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERT‑FR/BUND, etc. | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw large payloads; referenced by `document.gridFsId`. | ||||
| ### 3.2 Precedence (when claims conflict) | ||||
|  | ||||
| 1. **Vendor PSIRT** (authoritative for their product). | ||||
| 2. **Distro** (authoritative for packages they ship, including backports). | ||||
| 3. **Ecosystem** (OSV/GHSA) for library semantics. | ||||
| 4. **CERTs/aggregators** for enrichment (KEV/known exploited). | ||||
|  | ||||
| > Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Job & Scheduler Model | ||||
| ## 4) Connectors & normalization | ||||
|  | ||||
| * Scheduler stores cron expressions per source/exporter in config; persists next-run pointers in Mongo. | ||||
| * Jobs acquire locks (`locks` collection) to ensure singleton execution per source/exporter. | ||||
| * Supports manual triggers via API endpoints (`POST /jobs/{type}`) and pause/resume toggles per source. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Connector Contracts | ||||
|  | ||||
| Connectors implement: | ||||
| ### 4.1 Connector contract | ||||
|  | ||||
| ```csharp | ||||
| public interface IFeedConnector { | ||||
|     string SourceName { get; } | ||||
|     Task FetchAsync(IServiceProvider sp, CancellationToken ct); | ||||
|     Task ParseAsync(IServiceProvider sp, CancellationToken ct); | ||||
|     Task MapAsync(IServiceProvider sp, CancellationToken ct); | ||||
|   string SourceName { get; } | ||||
|   Task FetchAsync(IServiceProvider sp, CancellationToken ct);   // -> document collection | ||||
|   Task ParseAsync(IServiceProvider sp, CancellationToken ct);   // -> dto collection (validated) | ||||
|   Task MapAsync(IServiceProvider sp, CancellationToken ct);     // -> advisory/alias/affected/reference | ||||
| } | ||||
| ``` | ||||
|  | ||||
| * Fetch populates `document` rows respecting rate limits, conditional GET, and `source_state.cursor`. | ||||
| * Parse validates schema (JSON Schema, XSD) and writes sanitized DTO payloads. | ||||
| * Map produces canonical advisory rows + provenance entries; must be idempotent. | ||||
| * Base helpers in `StellaOps.Feedser.Source.Common` provide HTTP clients, retry policies, and watermark utilities. | ||||
| * **Fetch**: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting. | ||||
| * **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing. | ||||
| * **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors). | ||||
|  | ||||
| ### 4.2 Version range normalization | ||||
|  | ||||
| * **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals). | ||||
| * **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query. | ||||
| * **DEB**: dpkg version comparison semantics mirrored; store computed keys. | ||||
| * **APK**: Alpine version semantics; compute order keys. | ||||
| * **Generic**: if provider uses text, retain raw; do **not** invent ranges. | ||||
|  | ||||
| ### 4.3 Severity & CVSS | ||||
|  | ||||
| * Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity). | ||||
| * If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable). | ||||
| * **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Merge & Normalization | ||||
| ## 5) Merge engine | ||||
|  | ||||
| * Canonical model stored in `StellaOps.Feedser.Models` with serialization contracts used by storage/export layers. | ||||
| * `StellaOps.Feedser.Normalization` handles NEVRA/EVR/PURL range parsing, CVSS normalization, localization. | ||||
| * `StellaOps.Feedser.Merge` builds alias graphs keyed by CVE first, then falls back to vendor/regional IDs. | ||||
| * Precedence rules: PSIRT/OVAL overrides generic ranges; KEV only toggles exploitation; regional feeds enrich severity but don’t override vendor truth. | ||||
| * Determinism enforced via canonical JSON hashing logged in `merge_event`. | ||||
| ### 5.1 Keying & identity | ||||
|  | ||||
| * Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Feedser’s alias tables). | ||||
| * `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key). | ||||
|  | ||||
| ### 5.2 Merge algorithm (deterministic) | ||||
|  | ||||
| 1. **Gather** all rows for `advisoryKey` (across sources). | ||||
| 2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert). | ||||
| 3. **Union aliases** (dedupe by scheme+value). | ||||
| 4. **Merge `Affected`** with rules: | ||||
|  | ||||
|    * Prefer **vendor** ranges for vendor products; prefer **distro** for **distro‑shipped** packages. | ||||
|    * If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide. | ||||
|    * Never collapse range semantics across different families (e.g., rpm EVR vs semver). | ||||
| 5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override). | ||||
| 6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`. | ||||
| 7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes. | ||||
|  | ||||
| > The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Storage schema (MongoDB) | ||||
|  | ||||
| **Collections & indexes** | ||||
|  | ||||
| * `source` `{_id, type, baseUrl, enabled, notes}` | ||||
| * `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` | ||||
| * `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` | ||||
|  | ||||
|   * Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}` | ||||
| * `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` | ||||
|  | ||||
|   * Index: `{sourceName:1, documentId:1}` | ||||
| * `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}` | ||||
|  | ||||
|   * Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary) | ||||
| * `alias` `{advisoryId, scheme, value}` | ||||
|  | ||||
|   * Index: `{scheme:1,value:1}`, `{advisoryId:1}` | ||||
| * `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}` | ||||
|  | ||||
|   * Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}` | ||||
| * `reference` `{advisoryId, url, kind, sourceTag}` | ||||
|  | ||||
|   * Index: `{advisoryId:1}`, `{kind:1}` | ||||
| * `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}` | ||||
|  | ||||
|   * Index: `{advisoryKey:1, mergedAt:-1}` | ||||
| * `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}` | ||||
| * `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks) | ||||
| * `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}` | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw payloads. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Exporters | ||||
|  | ||||
| * JSON exporter mirrors `aquasecurity/vuln-list` layout with deterministic ordering and reproducible timestamps. | ||||
| * Trivy DB exporter shells out to `trivy-db build`, produces Bolt archives, and reuses unchanged blobs from the last full baseline when running in delta mode. The exporter annotates `metadata.json` with `mode`, `baseExportId`, `baseManifestDigest`, `resetBaseline`, and `delta.changedFiles[]`/`delta.removedPaths[]`, and honours `publishFull` / `publishDelta` (ORAS) plus `includeFull` / `includeDelta` (offline bundle) toggles. | ||||
| * `StellaOps.Feedser.Storage.Mongo` provides cursors for delta exports based on `export_state.exportCursor` and the persisted per-file manifest (`export_state.files`). | ||||
| * Export jobs produce OCI tarballs (layer media type `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip`) and optionally push via ORAS; `metadata.json` accompanies each layout so mirrors can decide between full refreshes and deltas. | ||||
| ### 7.1 Deterministic JSON (vuln‑list style) | ||||
|  | ||||
| * Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace. | ||||
| * `manifest.json` lists all files with SHA‑256 and a top‑level **export digest**. | ||||
|  | ||||
| ### 7.2 Trivy DB exporter | ||||
|  | ||||
| * Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes. | ||||
| * In delta, unchanged blobs are reused from the base; metadata captures: | ||||
|  | ||||
|   ``` | ||||
|   { | ||||
|     "mode": "delta|full", | ||||
|     "baseExportId": "...", | ||||
|     "baseManifestDigest": "sha256:...", | ||||
|     "changed": ["path1", "path2"], | ||||
|     "removed": ["path3"] | ||||
|   } | ||||
|   ``` | ||||
| * Optional ORAS push (OCI layout) for registries. | ||||
| * Offline kit bundles include Trivy DB + JSON tree + export manifest. | ||||
|  | ||||
| ### 7.3 Hand‑off to Signer/Attestor (optional) | ||||
|  | ||||
| * On export completion, if `attest: true` is set in job args, Feedser **posts** the artifact metadata to **Signer**/**Attestor**; Feedser itself **does not** hold signing keys. | ||||
| * Export record stores returned `{ uuid, index, url }` from **Rekor v2**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Observability | ||||
| ## 8) REST APIs | ||||
|  | ||||
| * Serilog structured logging with enrichment fields (`source`, `uri`, `stage`, `durationMs`). | ||||
| * OpenTelemetry traces around fetch/parse/map/export; metrics for rate limit hits, schema failures, dedupe ratios, package size. Connector HTTP metrics are emitted via the shared `feedser.source.http.*` instruments tagged with `feedser.source=<connector>` so per-source dashboards slice on that label instead of bespoke metric names. | ||||
| * Prometheus scraping endpoint served by WebService. | ||||
| All under `/api/v1/feedser`. | ||||
|  | ||||
| **Health & status** | ||||
|  | ||||
| ``` | ||||
| GET  /healthz | /readyz | ||||
| GET  /status                              → sources, last runs, export cursors | ||||
| ``` | ||||
|  | ||||
| **Sources & jobs** | ||||
|  | ||||
| ``` | ||||
| GET  /sources                              → list of configured sources | ||||
| POST /sources/{name}/trigger               → { jobId } | ||||
| POST /sources/{name}/pause | /resume       → toggle | ||||
| GET  /jobs/{id}                            → job status | ||||
| ``` | ||||
|  | ||||
| **Exports** | ||||
|  | ||||
| ``` | ||||
| POST /exports/json   { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? } | ||||
| POST /exports/trivy  { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? } | ||||
| GET  /exports/{id}   → export metadata (kind, digest, createdAt, rekor?) | ||||
| ``` | ||||
|  | ||||
| **Search (operator debugging)** | ||||
|  | ||||
| ``` | ||||
| GET  /advisories/{key} | ||||
| GET  /advisories?scheme=CVE&value=CVE-2025-12345 | ||||
| GET  /affected?productKey=pkg:rpm/openssl&limit=100 | ||||
| ``` | ||||
|  | ||||
| **AuthN/Z:** Authority tokens (OpTok) with roles: `feedser.read`, `feedser.admin`, `feedser.export`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Security Considerations | ||||
| ## 9) Configuration (YAML) | ||||
|  | ||||
| * Offline-first: connectors only reach allowlisted hosts. | ||||
| * BDU LLM fallback gated by config flag; logs audit trail with confidence score. | ||||
| * No secrets written to logs; secrets loaded via environment or mounted files. | ||||
| * Signing handled outside Feedser pipeline. | ||||
| ```yaml | ||||
| feedser: | ||||
|   mongo: { uri: "mongodb://mongo/feedser" } | ||||
|   s3: | ||||
|     endpoint: "http://minio:9000" | ||||
|     bucket: "stellaops-feedser" | ||||
|   scheduler: | ||||
|     windowSeconds: 30 | ||||
|     maxParallelSources: 4 | ||||
|   sources: | ||||
|     - name: redhat | ||||
|       kind: csaf | ||||
|       baseUrl: https://access.redhat.com/security/data/csaf/v2/ | ||||
|       signature: { type: pgp, keys: [ "…redhat PGP…" ] } | ||||
|       enabled: true | ||||
|       windowDays: 7 | ||||
|     - name: suse | ||||
|       kind: csaf | ||||
|       baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ | ||||
|       signature: { type: pgp, keys: [ "…suse PGP…" ] } | ||||
|     - name: ubuntu | ||||
|       kind: usn-json | ||||
|       baseUrl: https://ubuntu.com/security/notices.json | ||||
|       signature: { type: none } | ||||
|     - name: osv | ||||
|       kind: osv | ||||
|       baseUrl: https://api.osv.dev/v1/ | ||||
|       signature: { type: none } | ||||
|     - name: ghsa | ||||
|       kind: ghsa | ||||
|       baseUrl: https://api.github.com/graphql | ||||
|       auth: { tokenRef: "env:GITHUB_TOKEN" } | ||||
|   exporters: | ||||
|     json: | ||||
|       enabled: true | ||||
|       output: s3://stellaops-feedser/json/ | ||||
|     trivy: | ||||
|       enabled: true | ||||
|       mode: full | ||||
|       output: s3://stellaops-feedser/trivy/ | ||||
|       oras: | ||||
|         enabled: false | ||||
|         repo: ghcr.io/org/feedser | ||||
|   precedence: | ||||
|     vendorWinsOverDistro: true | ||||
|     distroWinsOverOsv: true | ||||
|   severity: | ||||
|     policy: max    # or 'vendorPreferred' / 'distroPreferred' | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Deployment Notes | ||||
| ## 10) Security & compliance | ||||
|  | ||||
| * **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible. | ||||
| * **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can down‑weight or ignore them by config. | ||||
| * **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers. | ||||
| * **Multi‑tenant**: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens. | ||||
| * **Determinism**: canonical JSON writer; export digests stable across runs given same inputs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Performance targets & scale | ||||
|  | ||||
| * **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON). | ||||
| * **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores. | ||||
| * **Merge**: ≤ 10 ms P95 per advisory at steady‑state updates. | ||||
| * **Export**: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores. | ||||
| * **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes. | ||||
|  | ||||
| **Scale pattern**: add Feedser replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Observability | ||||
|  | ||||
| * **Metrics** | ||||
|  | ||||
|   * `feedser.fetch.docs_total{source}` | ||||
|   * `feedser.fetch.bytes_total{source}` | ||||
|   * `feedser.parse.failures_total{source}` | ||||
|   * `feedser.map.affected_total{source}` | ||||
|   * `feedser.merge.changed_total` | ||||
|   * `feedser.export.bytes{kind}` | ||||
|   * `feedser.export.duration_seconds{kind}` | ||||
| * **Tracing** around fetch/parse/map/merge/export. | ||||
| * **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Testing matrix | ||||
|  | ||||
| * **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail). | ||||
| * **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases). | ||||
| * **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention. | ||||
| * **Export determinism:** byte‑for‑byte stable outputs across runs; digest equality. | ||||
| * **Performance:** soak tests with 1M advisories; cap memory; verify backpressure. | ||||
| * **API:** pagination, filters, RBAC, error envelopes (RFC 7807). | ||||
| * **Offline kit:** bundle build & import correctness. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Failure modes & recovery | ||||
|  | ||||
| * **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness. | ||||
| * **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges. | ||||
| * **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`. | ||||
| * **Resume:** all stages idempotent; `source_state.cursor` supports window resume. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Operator runbook (quick) | ||||
|  | ||||
| * **Trigger all sources:** `POST /api/v1/feedser/sources/*/trigger` | ||||
| * **Force full export JSON:** `POST /api/v1/feedser/exports/json { "full": true, "force": true }` | ||||
| * **Force Trivy DB delta publish:** `POST /api/v1/feedser/exports/trivy { "full": false, "publish": true }` | ||||
| * **Inspect advisory:** `GET /api/v1/feedser/advisories?scheme=CVE&value=CVE-2025-12345` | ||||
| * **Pause noisy source:** `POST /api/v1/feedser/sources/osv/pause` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Rollout plan | ||||
|  | ||||
| 1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export. | ||||
| 2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export. | ||||
| 3. **Attestation hand‑off**: integrate with **Signer/Attestor** (optional). | ||||
| 4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse. | ||||
| 5. **Offline kit**: end‑to‑end verified bundles for air‑gap. | ||||
|  | ||||
| * Default storage MongoDB; for air-gapped, bundle Mongo image + seeded data backup. | ||||
| * Horizontal scale achieved via multiple web service instances sharing Mongo locks. | ||||
| * Provide `feedser.yaml` template describing sources, rate limits, and export settings. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user