17 KiB
component_architecture_feedser.md — Stella Ops Feedser (2025Q4)
Scope. Implementation‑ready architecture for Feedser: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
0) Mission & boundaries
Mission. Acquire authoritative vulnerability advisories (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a canonical model, reconcile aliases and version ranges, and export deterministic artifacts (JSON, Trivy DB) for fast backend joins.
Boundaries.
- Feedser does not sign with private keys. When attestation is required, the export artifact is handed to the Signer/Attestor pipeline (out‑of‑process).
- Feedser does not decide PASS/FAIL; it provides data to the Policy engine.
- Online operation is allowlist‑only; air‑gapped deployments use the Offline Kit.
1) Topology & processes
Process shape: single ASP.NET Core service StellaOps.Feedser.WebService hosting:
- Scheduler with distributed locks (Mongo backed).
- Connectors (fetch/parse/map).
- Merger (canonical record assembly + precedence).
- Exporters (JSON, Trivy DB).
- Minimal REST for health/status/trigger/export.
Scale: HA by running N replicas; locks prevent overlapping jobs per source/exporter.
2) Canonical domain model
Stored in MongoDB (database
feedser), serialized with a canonical JSON writer (stable order, camelCase, normalized timestamps).
2.1 Core entities
Advisory
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
Alias
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
Affected
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
Reference
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
MergeEvent
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
ExportState
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
2.2 Product identity (productKey)
- Primary:
purl(Package URL). - OS packages: RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with EVR/NVRA preserved.
- Secondary:
cperetained for compatibility; advisory records may carry both. - Image/platform:
oci:<registry>/<repo>@<digest>for image‑level advisories (rare). - Unmappable: if a source is non‑deterministic, keep native string under
productKey="native:<provider>:<id>"and mark non‑joinable.
3) Source families & precedence
3.1 Families
- Vendor PSIRTs: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
- Linux distros: Red Hat, SUSE, Ubuntu, Debian, Alpine…
- OSS ecosystems: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
- CERTs / national CSIRTs: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERT‑FR/BUND, etc.
3.2 Precedence (when claims conflict)
- Vendor PSIRT (authoritative for their product).
- Distro (authoritative for packages they ship, including backports).
- Ecosystem (OSV/GHSA) for library semantics.
- CERTs/aggregators for enrichment (KEV/known exploited).
Precedence affects Affected ranges and fixed info; severity is normalized to the maximum credible severity unless policy overrides. Conflicts are retained with source provenance.
4) Connectors & normalization
4.1 Connector contract
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
- Fetch: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting.
- Parse: schema validation (JSON Schema, XSD/CSAF), content type checks; write DTO with normalized casing.
- Map: build canonical records; all outputs carry provenance (doc digest, URI, anchors).
4.2 Version range normalization
- SemVer ecosystems (npm, pypi, maven, nuget, golang): normalize to
introduced/fixedsemver ranges (use~,^,<,>=canonicalized to intervals). - RPM EVR:
epoch:version-releasewithrpmvercmpsemantics; store raw EVR strings and also computed order keys for query. - DEB: dpkg version comparison semantics mirrored; store computed keys.
- APK: Alpine version semantics; compute order keys.
- Generic: if provider uses text, retain raw; do not invent ranges.
4.3 Severity & CVSS
- Normalize CVSS v2/v3/v4 where available (vector, baseScore, severity).
- If multiple CVSS sources exist, track them all; effective severity defaults to max by policy (configurable).
- ExploitKnown toggled by KEV and equivalent sources; store evidence (source, date).
5) Merge engine
5.1 Keying & identity
- Identity graph: CVE is primary node; vendor/distro IDs resolved via Alias edges (from connectors and Feedser’s alias tables).
advisoryKeyis the canonical primary key (CVE if present, else vendor/distro key).
5.2 Merge algorithm (deterministic)
-
Gather all rows for
advisoryKey(across sources). -
Select title/summary by precedence source (vendor>distro>ecosystem>cert).
-
Union aliases (dedupe by scheme+value).
-
Merge
Affectedwith rules:- Prefer vendor ranges for vendor products; prefer distro for distro‑shipped packages.
- If both exist for same
productKey, keep both; marksourceTagandprecedenceso Policy can decide. - Never collapse range semantics across different families (e.g., rpm EVR vs semver).
-
CVSS/severity: record all CVSS sets; compute effectiveSeverity = max (unless policy override).
-
References: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve
sourceTag. -
Produce canonical JSON; compute afterHash; store MergeEvent with inputs and hashes.
The merge is pure given inputs. Any change in inputs or precedence matrices changes the hash predictably.
6) Storage schema (MongoDB)
Collections & indexes
-
source{_id, type, baseUrl, enabled, notes} -
source_state{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides} -
document{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}- Index:
{sourceName:1, uri:1}unique,{fetchedAt:-1}
- Index:
-
dto{_id, sourceName, documentId, schemaVer, payload, validatedAt}- Index:
{sourceName:1, documentId:1}
- Index:
-
advisory{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}- Index:
{advisoryKey:1}unique,{modified:-1},{severity:1}, text index (title, summary)
- Index:
-
alias{advisoryId, scheme, value}- Index:
{scheme:1,value:1},{advisoryId:1}
- Index:
-
affected{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}- Index:
{productKey:1},{advisoryId:1},{productKey:1, rangeKind:1}
- Index:
-
reference{advisoryId, url, kind, sourceTag}- Index:
{advisoryId:1},{kind:1}
- Index:
-
merge_event{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}- Index:
{advisoryKey:1, mergedAt:-1}
- Index:
-
export_state{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]} -
locks{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}(TTL cleans dead locks) -
jobs{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}
GridFS buckets: fs.documents for raw payloads.
7) Exporters
7.1 Deterministic JSON (vuln‑list style)
- Folder structure mirroring
/<scheme>/<first-two>/<rest>/…with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace. manifest.jsonlists all files with SHA‑256 and a top‑level export digest.
7.2 Trivy DB exporter
-
Builds Bolt DB archives compatible with Trivy; supports full and delta modes.
-
In delta, unchanged blobs are reused from the base; metadata captures:
{ "mode": "delta|full", "baseExportId": "...", "baseManifestDigest": "sha256:...", "changed": ["path1", "path2"], "removed": ["path3"] } -
Optional ORAS push (OCI layout) for registries.
-
Offline kit bundles include Trivy DB + JSON tree + export manifest.
7.3 Hand‑off to Signer/Attestor (optional)
- On export completion, if
attest: trueis set in job args, Feedser posts the artifact metadata to Signer/Attestor; Feedser itself does not hold signing keys. - Export record stores returned
{ uuid, index, url }from Rekor v2.
8) REST APIs
All under /api/v1/feedser.
Health & status
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
Sources & jobs
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
Exports
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
Search (operator debugging)
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
AuthN/Z: Authority tokens (OpTok) with roles: feedser.read, feedser.admin, feedser.export.
9) Configuration (YAML)
feedser:
mongo: { uri: "mongodb://mongo/feedser" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-feedser"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-feedser/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-feedser/trivy/
oras:
enabled: false
repo: ghcr.io/org/feedser
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
10) Security & compliance
- Outbound allowlist per connector (domains, protocols); proxy support; TLS pinning where possible.
- Signature verification for raw docs (PGP/cosign/x509) with results stored in
document.metadata.sig. Docs failing verification may still be ingested but flagged; merge can down‑weight or ignore them by config. - No secrets in logs; auth material via
env:or mounted files; HTTP redaction ofAuthorizationheaders. - Multi‑tenant: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens.
- Determinism: canonical JSON writer; export digests stable across runs given same inputs.
11) Performance targets & scale
- Ingest: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
- Normalize/map: ≥ 50k
Affectedrows/min on 4 cores. - Merge: ≤ 10 ms P95 per advisory at steady‑state updates.
- Export: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores.
- Memory: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
Scale pattern: add Feedser replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
12) Observability
-
Metrics
feedser.fetch.docs_total{source}feedser.fetch.bytes_total{source}feedser.parse.failures_total{source}feedser.map.affected_total{source}feedser.merge.changed_totalfeedser.export.bytes{kind}feedser.export.duration_seconds{kind}
-
Tracing around fetch/parse/map/merge/export.
-
Logs: structured with
source,uri,docDigest,advisoryKey,exportId.
13) Testing matrix
- Connectors: fixture suites for each provider/format (happy path; malformed; signature fail).
- Version semantics: EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases).
- Merge: conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
- Export determinism: byte‑for‑byte stable outputs across runs; digest equality.
- Performance: soak tests with 1M advisories; cap memory; verify backpressure.
- API: pagination, filters, RBAC, error envelopes (RFC 7807).
- Offline kit: bundle build & import correctness.
14) Failure modes & recovery
- Source outages: scheduler backs off with exponential delay;
source_state.backoffUntil; alerts on staleness. - Schema drifts: parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
- Partial exports: exporters write to temp prefix; manifest commit is atomic; only then move to final prefix and update
export_state. - Resume: all stages idempotent;
source_state.cursorsupports window resume.
15) Operator runbook (quick)
- Trigger all sources:
POST /api/v1/feedser/sources/*/trigger - Force full export JSON:
POST /api/v1/feedser/exports/json { "full": true, "force": true } - Force Trivy DB delta publish:
POST /api/v1/feedser/exports/trivy { "full": false, "publish": true } - Inspect advisory:
GET /api/v1/feedser/advisories?scheme=CVE&value=CVE-2025-12345 - Pause noisy source:
POST /api/v1/feedser/sources/osv/pause
16) Rollout plan
- MVP: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
- Add: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
- Attestation hand‑off: integrate with Signer/Attestor (optional).
- Scale & diagnostics: provider dashboards, staleness alerts, export cache reuse.
- Offline kit: end‑to‑end verified bundles for air‑gap.