Rewrite architecture docs and add Vexer connector template

This commit is contained in:
2025-10-17 19:34:43 +03:00
parent 29a7d51e41
commit fbd1826ef3
25 changed files with 4885 additions and 777 deletions

View File

@@ -1,190 +1,433 @@
# ARCHITECTURE.md — **StellaOps.Feedser**
# component_architecture_feedser.md — **StellaOps Feedser** (2025Q4)
> **Goal**: Build a sovereign-ready, self-hostable **feed-merge service** that ingests authoritative vulnerability sources, normalizes and de-duplicates them into **MongoDB**, and exports **JSON** and **Trivy-compatible DB** artifacts.
> **Form factor**: Long-running **Web Service** with **REST APIs** (health, status, control) and an embedded **internal cron scheduler**. Controllable by StellaOps.Cli (# stella db ...)
> **No signing inside Feedser** (signing is a separate pipeline step).
> **Runtime SDK baseline**: .NET 10 Preview 7 (SDK 10.0.100-preview.7.25380.108) targeting `net10.0`, aligned with the deployed api.stella-ops.org service.
> **Four explicit stages**:
>
> 1. **Source Download** → raw documents.
> 2. **Parse & Normalize** → schema-validated DTOs enriched with canonical identifiers.
> 3. **Merge & Deduplicate** → precedence-aware canonical records persisted to MongoDB.
> 4. **Export** → JSON or TrivyDB (full or delta), then (externally) sign/publish.
> **Scope.** Implementationready architecture for **Feedser**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Vexer pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
---
## 1) Naming & Solution Layout
## 0) Mission & boundaries
**Source connectors** namespace prefix: `StellaOps.Feedser.Source.*`
**Exporters**:
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
* `StellaOps.Feedser.Exporter.Json`
* `StellaOps.Feedser.Exporter.TrivyDb`
**Boundaries.**
**Projects** (`/src`):
* Feedser **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Feedser **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Feedser.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 2) Canonical domain model
> Stored in MongoDB (database `feedser`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
**Advisory**
```
StellaOps.Feedser.WebService/ # ASP.NET Core (Minimal API, net10.0 preview) WebService + embedded scheduler
StellaOps.Feedser.Core/ # Domain models, pipelines, merge/dedupe engine, jobs orchestration
StellaOps.Feedser.Models/ # Canonical POCOs, JSON Schemas, enums
StellaOps.Feedser.Storage.Mongo/ # Mongo repositories, GridFS access, indexes, resume "flags"
StellaOps.Feedser.Source.Common/ # HTTP clients, rate-limiters, schema validators, parsers utils
StellaOps.Feedser.Source.Cve/
StellaOps.Feedser.Source.Nvd/
StellaOps.Feedser.Source.Ghsa/
StellaOps.Feedser.Source.Osv/
StellaOps.Feedser.Source.Jvn/
StellaOps.Feedser.Source.CertCc/
StellaOps.Feedser.Source.Kev/
StellaOps.Feedser.Source.Kisa/
StellaOps.Feedser.Source.CertIn/
StellaOps.Feedser.Source.CertFr/
StellaOps.Feedser.Source.CertBund/
StellaOps.Feedser.Source.Acsc/
StellaOps.Feedser.Source.Cccs/
StellaOps.Feedser.Source.Ru.Bdu/ # HTML→schema with LLM fallback (gated)
StellaOps.Feedser.Source.Ru.Nkcki/ # PDF/HTML bulletins → structured
StellaOps.Feedser.Source.Vndr.Msrc/
StellaOps.Feedser.Source.Vndr.Cisco/
StellaOps.Feedser.Source.Vndr.Oracle/
StellaOps.Feedser.Source.Vndr.Adobe/ # APSB ingest; emits vendor RangePrimitives with adobe.track/platform/priority telemetry + fixed-status provenance.
StellaOps.Feedser.Source.Vndr.Apple/
StellaOps.Feedser.Source.Vndr.Chromium/
StellaOps.Feedser.Source.Vndr.Vmware/
StellaOps.Feedser.Source.Distro.RedHat/
StellaOps.Feedser.Source.Distro.Debian/ # Fetches DSA list + detail HTML, emits EVR RangePrimitives with per-release provenance and telemetry.
StellaOps.Feedser.Source.Distro.Ubuntu/ # Ubuntu Security Notices connector (JSON index → EVR ranges with ubuntu.pocket telemetry).
StellaOps.Feedser.Source.Distro.Suse/ # CSAF fetch pipeline emitting NEVRA RangePrimitives with suse.status vendor telemetry.
StellaOps.Feedser.Source.Ics.Cisa/
StellaOps.Feedser.Source.Ics.Kaspersky/
StellaOps.Feedser.Normalization/ # Canonical mappers, validators, version-range normalization
StellaOps.Feedser.Merge/ # Identity graph, precedence, deterministic merge
StellaOps.Feedser.Exporter.Json/
StellaOps.Feedser.Exporter.TrivyDb/
StellaOps.Feedser.<Component>.Tests/ # Component-scoped unit/integration suites (Core, Storage.Mongo, Source.*, Exporter.*, WebService, etc.)
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
---
**Alias**
## 2) Runtime Shape
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Process**: single service (`StellaOps.Feedser.WebService`)
**Affected**
* `Program.cs`: top-level entry using **Generic Host**, **DI**, **Options** binding from `appsettings.json` + environment + optional `feedser.yaml`.
* Built-in **scheduler** (cron-like) + **job manager** with **distributed locks** in Mongo to prevent overlaps, enforce timeouts, allow cancel/kill.
* **REST APIs** for health/readiness/progress/trigger/kill/status.
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Key NuGet concepts** (indicative): `MongoDB.Driver`, `Polly` (retry/backoff), `System.Threading.Channels`, `Microsoft.Extensions.Http`, `Microsoft.Extensions.Hosting`, `Serilog`, `OpenTelemetry`.
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**ExportState**
```
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 3) Data Storage — **MongoDB** (single source of truth)
## 3) Source families & precedence
**Database**: `feedser`
**Write concern**: `majority` for merge/export state, `acknowledged` for raw docs.
**Collections** (with “flags”/resume points):
### 3.1 Families
* `source`
* `_id`, `name`, `type`, `baseUrl`, `auth`, `notes`.
* `source_state`
* Keys: `sourceName` (unique), `enabled`, `cursor`, `lastSuccess`, `failCount`, `backoffUntil`, `paceOverrides`, `paused`.
* Drives incremental fetch/parse/map resume and operator pause/pace controls.
* `document`
* `_id`, `sourceName`, `uri`, `fetchedAt`, `sha256`, `contentType`, `status`, `metadata`, `gridFsId`, `etag`, `lastModified`.
* Index `{sourceName:1, uri:1}` unique; optional TTL for superseded versions.
* `dto`
* `_id`, `sourceName`, `documentId`, `schemaVer`, `payload` (BSON), `validatedAt`.
* Index `{sourceName:1, documentId:1}`.
* `advisory`
* `_id`, `advisoryKey`, `title`, `summary`, `lang`, `published`, `modified`, `severity`, `exploitKnown`.
* Unique `{advisoryKey:1}` plus indexes on `modified` and `published`.
* `alias`
* `advisoryId`, `scheme`, `value` with index `{scheme:1, value:1}`.
* `affected`
* `advisoryId`, `platform`, `name`, `versionRange`, `cpe`, `purl`, `fixedBy`, `introducedVersion`.
* Index `{platform:1, name:1}`, `{advisoryId:1}`.
* `reference`
* `advisoryId`, `url`, `kind`, `sourceTag` (e.g., advisory/patch/kb).
* Flags collections: `kev_flag`, `ru_flags`, `jp_flags`, `psirt_flags` keyed by `advisoryId`.
* `merge_event`
* `_id`, `advisoryKey`, `beforeHash`, `afterHash`, `mergedAt`, `inputs` (document ids).
* `export_state`
* `_id` (`json`/`trivydb`), `baseExportId`, `baseDigest`, `lastFullDigest`, `lastDeltaDigest`, `exportCursor`, `targetRepo`, `exporterVersion`.
* `locks`
* `_id` (`jobKey`), `holder`, `acquiredAt`, `heartbeatAt`, `leaseMs`, `ttlAt` (TTL index cleans dead locks).
* `jobs`
* `_id`, `type`, `args`, `state`, `startedAt`, `endedAt`, `error`, `owner`, `heartbeatAt`, `timeoutMs`.
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine…
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
**GridFS buckets**: `fs.documents` for raw large payloads; referenced by `document.gridFsId`.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 4) Job & Scheduler Model
## 4) Connectors & normalization
* Scheduler stores cron expressions per source/exporter in config; persists next-run pointers in Mongo.
* Jobs acquire locks (`locks` collection) to ensure singleton execution per source/exporter.
* Supports manual triggers via API endpoints (`POST /jobs/{type}`) and pause/resume toggles per source.
---
## 5) Connector Contracts
Connectors implement:
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct);
Task ParseAsync(IServiceProvider sp, CancellationToken ct);
Task MapAsync(IServiceProvider sp, CancellationToken ct);
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
* Fetch populates `document` rows respecting rate limits, conditional GET, and `source_state.cursor`.
* Parse validates schema (JSON Schema, XSD) and writes sanitized DTO payloads.
* Map produces canonical advisory rows + provenance entries; must be idempotent.
* Base helpers in `StellaOps.Feedser.Source.Common` provide HTTP clients, retry policies, and watermark utilities.
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 6) Merge & Normalization
## 5) Merge engine
* Canonical model stored in `StellaOps.Feedser.Models` with serialization contracts used by storage/export layers.
* `StellaOps.Feedser.Normalization` handles NEVRA/EVR/PURL range parsing, CVSS normalization, localization.
* `StellaOps.Feedser.Merge` builds alias graphs keyed by CVE first, then falls back to vendor/regional IDs.
* Precedence rules: PSIRT/OVAL overrides generic ranges; KEV only toggles exploitation; regional feeds enrich severity but dont override vendor truth.
* Determinism enforced via canonical JSON hashing logged in `merge_event`.
### 5.1 Keying & identity
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Feedsers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.2 Merge algorithm (deterministic)
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
---
## 7) Exporters
* JSON exporter mirrors `aquasecurity/vuln-list` layout with deterministic ordering and reproducible timestamps.
* Trivy DB exporter shells out to `trivy-db build`, produces Bolt archives, and reuses unchanged blobs from the last full baseline when running in delta mode. The exporter annotates `metadata.json` with `mode`, `baseExportId`, `baseManifestDigest`, `resetBaseline`, and `delta.changedFiles[]`/`delta.removedPaths[]`, and honours `publishFull` / `publishDelta` (ORAS) plus `includeFull` / `includeDelta` (offline bundle) toggles.
* `StellaOps.Feedser.Storage.Mongo` provides cursors for delta exports based on `export_state.exportCursor` and the persisted per-file manifest (`export_state.files`).
* Export jobs produce OCI tarballs (layer media type `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip`) and optionally push via ORAS; `metadata.json` accompanies each layout so mirrors can decide between full refreshes and deltas.
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Feedser **posts** the artifact metadata to **Signer**/**Attestor**; Feedser itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 8) Observability
## 8) REST APIs
* Serilog structured logging with enrichment fields (`source`, `uri`, `stage`, `durationMs`).
* OpenTelemetry traces around fetch/parse/map/export; metrics for rate limit hits, schema failures, dedupe ratios, package size. Connector HTTP metrics are emitted via the shared `feedser.source.http.*` instruments tagged with `feedser.source=<connector>` so per-source dashboards slice on that label instead of bespoke metric names.
* Prometheus scraping endpoint served by WebService.
All under `/api/v1/feedser`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `feedser.read`, `feedser.admin`, `feedser.export`.
---
## 9) Security Considerations
## 9) Configuration (YAML)
* Offline-first: connectors only reach allowlisted hosts.
* BDU LLM fallback gated by config flag; logs audit trail with confidence score.
* No secrets written to logs; secrets loaded via environment or mounted files.
* Signing handled outside Feedser pipeline.
```yaml
feedser:
mongo: { uri: "mongodb://mongo/feedser" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-feedser"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-feedser/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-feedser/trivy/
oras:
enabled: false
repo: ghcr.io/org/feedser
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 10) Deployment Notes
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Feedser replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 12) Observability
* **Metrics**
* `feedser.fetch.docs_total{source}`
* `feedser.fetch.bytes_total{source}`
* `feedser.parse.failures_total{source}`
* `feedser.map.affected_total{source}`
* `feedser.merge.changed_total`
* `feedser.export.bytes{kind}`
* `feedser.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 13) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 14) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 15) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/feedser/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/feedser/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/feedser/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/feedser/advisories?scheme=CVE&value=CVE-2025-12345`
* **Pause noisy source:** `POST /api/v1/feedser/sources/osv/pause`
---
## 16) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.
* Default storage MongoDB; for air-gapped, bundle Mongo image + seeded data backup.
* Horizontal scale achieved via multiple web service instances sharing Mongo locks.
* Provide `feedser.yaml` template describing sources, rate limits, and export settings.