# component_architecture_concelier.md — **Stella Ops Concelier** (2025Q4) > **Scope.** Implementation‑ready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices. --- ## 0) Mission & boundaries **Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins. **Boundaries.** * Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (out‑of‑process). * Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine. * Online operation is **allowlist‑only**; air‑gapped deployments use the **Offline Kit**. --- ## 1) Topology & processes **Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting: * **Scheduler** with distributed locks (Mongo backed). * **Connectors** (fetch/parse/map). * **Merger** (canonical record assembly + precedence). * **Exporters** (JSON, Trivy DB). * **Minimal REST** for health/status/trigger/export. **Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter. --- ## 2) Canonical domain model > Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps). ### 2.1 Core entities **Advisory** ``` advisoryId // internal GUID advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID) title // short title (best-of from sources) summary // normalized summary (English; i18n optional) published // earliest source timestamp modified // latest source timestamp severity // normalized {none, low, medium, high, critical} cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source) exploitKnown // bool (e.g., KEV/active exploitation flags) references[] // typed links (advisory, kb, patch, vendor, exploit, blog) sources[] // provenance for traceability (doc digests, URIs) ``` **Alias** ``` advisoryId scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc. value // e.g., "CVE-2025-12345" ``` **Affected** ``` advisoryId productKey // canonical product identity (see 2.2) rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact introduced? // string (format depends on rangeKind) fixed? // string (format depends on rangeKind) lastKnownSafe? // optional explicit safe floor arch? // arch or platform qualifier if source declares (x86_64, aarch64) distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19) ecosystem? // npm|pypi|maven|nuget|golang|… notes? // normalized notes per source ``` **Reference** ``` advisoryId url kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa ``` **MergeEvent** ``` advisoryKey beforeHash // canonical JSON hash before merge afterHash // canonical JSON hash after merge mergedAt inputs[] // source doc digests that contributed ``` **AdvisoryStatement (event log)** ``` statementId // GUID (immutable) vulnerabilityKey // canonical advisory key (e.g., CVE-2025-12345) advisoryKey // merge snapshot advisory key (may reference variant) statementHash // canonical hash of advisory payload asOf // timestamp of snapshot (UTC) recordedAt // persistence timestamp (UTC) inputDocuments[] // document IDs contributing to the snapshot payload // canonical advisory document (BSON / canonical JSON) ``` **AdvisoryConflict** ``` conflictId // GUID vulnerabilityKey // canonical advisory key conflictHash // deterministic hash of conflict payload asOf // timestamp aligned with originating statement set recordedAt // persistence timestamp statementIds[] // related advisoryStatement identifiers details // structured conflict explanation / merge reasoning ``` - `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`. - Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs. **ExportState** ``` exportKind // json | trivydb baseExportId? // last full baseline baseDigest? // digest of last full baseline lastFullDigest? // digest of last full export lastDeltaDigest? // digest of last delta export cursor // per-kind incremental cursor files[] // last manifest snapshot (path → sha256) ``` ### 2.2 Product identity (`productKey`) * **Primary:** `purl` (Package URL). * **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved. * **Secondary:** `cpe` retained for compatibility; advisory records may carry both. * **Image/platform:** `oci:/@` for image‑level advisories (rare). * **Unmappable:** if a source is non‑deterministic, keep native string under `productKey="native::"` and mark **non‑joinable**. --- ## 3) Source families & precedence ### 3.1 Families * **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium… * **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine… * **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go. * **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERT‑FR/BUND, etc. ### 3.2 Precedence (when claims conflict) 1. **Vendor PSIRT** (authoritative for their product). 2. **Distro** (authoritative for packages they ship, including backports). 3. **Ecosystem** (OSV/GHSA) for library semantics. 4. **CERTs/aggregators** for enrichment (KEV/known exploited). > Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**. --- ## 4) Connectors & normalization ### 4.1 Connector contract ```csharp public interface IFeedConnector { string SourceName { get; } Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated) Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference } ``` * **Fetch**: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting. * **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing. * **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors). ### 4.2 Version range normalization * **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals). * **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query. * **DEB**: dpkg version comparison semantics mirrored; store computed keys. * **APK**: Alpine version semantics; compute order keys. * **Generic**: if provider uses text, retain raw; do **not** invent ranges. ### 4.3 Severity & CVSS * Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity). * If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable). * **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date). --- ## 5) Merge engine ### 5.1 Keying & identity * Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Concelier’s alias tables). * `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key). ### 5.2 Merge algorithm (deterministic) 1. **Gather** all rows for `advisoryKey` (across sources). 2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert). 3. **Union aliases** (dedupe by scheme+value). 4. **Merge `Affected`** with rules: * Prefer **vendor** ranges for vendor products; prefer **distro** for **distro‑shipped** packages. * If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide. * Never collapse range semantics across different families (e.g., rpm EVR vs semver). 5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override). 6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`. 7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes. > The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably. --- ## 6) Storage schema (MongoDB) **Collections & indexes** * `source` `{_id, type, baseUrl, enabled, notes}` * `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` * `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` * Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}` * `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` * Index: `{sourceName:1, documentId:1}` * `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}` * Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary) * `alias` `{advisoryId, scheme, value}` * Index: `{scheme:1,value:1}`, `{advisoryId:1}` * `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}` * Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}` * `reference` `{advisoryId, url, kind, sourceTag}` * Index: `{advisoryId:1}`, `{kind:1}` * `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}` * Index: `{advisoryKey:1, mergedAt:-1}` * `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}` * `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks) * `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}` **GridFS buckets**: `fs.documents` for raw payloads. --- ## 7) Exporters ### 7.1 Deterministic JSON (vuln‑list style) * Folder structure mirroring `////…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace. * `manifest.json` lists all files with SHA‑256 and a top‑level **export digest**. ### 7.2 Trivy DB exporter * Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes. * In delta, unchanged blobs are reused from the base; metadata captures: ``` { "mode": "delta|full", "baseExportId": "...", "baseManifestDigest": "sha256:...", "changed": ["path1", "path2"], "removed": ["path3"] } ``` * Optional ORAS push (OCI layout) for registries. * Offline kit bundles include Trivy DB + JSON tree + export manifest. * Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints. ### 7.3 Hand‑off to Signer/Attestor (optional) * On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys. * Export record stores returned `{ uuid, index, url }` from **Rekor v2**. --- ## 8) REST APIs All under `/api/v1/concelier`. **Health & status** ``` GET /healthz | /readyz GET /status → sources, last runs, export cursors ``` **Sources & jobs** ``` GET /sources → list of configured sources POST /sources/{name}/trigger → { jobId } POST /sources/{name}/pause | /resume → toggle GET /jobs/{id} → job status ``` **Exports** ``` POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? } POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? } GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?) GET /concelier/exports/index.json → mirror index describing available domains/bundles GET /concelier/exports/mirror/{domain}/manifest.json GET /concelier/exports/mirror/{domain}/bundle.json GET /concelier/exports/mirror/{domain}/bundle.json.jws ``` **Search (operator debugging)** ``` GET /advisories/{key} GET /advisories?scheme=CVE&value=CVE-2025-12345 GET /affected?productKey=pkg:rpm/openssl&limit=100 ``` **AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`. --- ## 9) Configuration (YAML) ```yaml concelier: mongo: { uri: "mongodb://mongo/concelier" } s3: endpoint: "http://minio:9000" bucket: "stellaops-concelier" scheduler: windowSeconds: 30 maxParallelSources: 4 sources: - name: redhat kind: csaf baseUrl: https://access.redhat.com/security/data/csaf/v2/ signature: { type: pgp, keys: [ "…redhat PGP…" ] } enabled: true windowDays: 7 - name: suse kind: csaf baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ signature: { type: pgp, keys: [ "…suse PGP…" ] } - name: ubuntu kind: usn-json baseUrl: https://ubuntu.com/security/notices.json signature: { type: none } - name: osv kind: osv baseUrl: https://api.osv.dev/v1/ signature: { type: none } - name: ghsa kind: ghsa baseUrl: https://api.github.com/graphql auth: { tokenRef: "env:GITHUB_TOKEN" } exporters: json: enabled: true output: s3://stellaops-concelier/json/ trivy: enabled: true mode: full output: s3://stellaops-concelier/trivy/ oras: enabled: false repo: ghcr.io/org/concelier precedence: vendorWinsOverDistro: true distroWinsOverOsv: true severity: policy: max # or 'vendorPreferred' / 'distroPreferred' ``` --- ## 10) Security & compliance * **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible. * **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can down‑weight or ignore them by config. * **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers. * **Multi‑tenant**: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens. * **Determinism**: canonical JSON writer; export digests stable across runs given same inputs. --- ## 11) Performance targets & scale * **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON). * **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores. * **Merge**: ≤ 10 ms P95 per advisory at steady‑state updates. * **Export**: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores. * **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes. **Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs. --- ## 12) Observability * **Metrics** * `concelier.fetch.docs_total{source}` * `concelier.fetch.bytes_total{source}` * `concelier.parse.failures_total{source}` * `concelier.map.affected_total{source}` * `concelier.merge.changed_total` * `concelier.export.bytes{kind}` * `concelier.export.duration_seconds{kind}` * **Tracing** around fetch/parse/map/merge/export. * **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`. --- ## 13) Testing matrix * **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail). * **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases). * **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention. * **Export determinism:** byte‑for‑byte stable outputs across runs; digest equality. * **Performance:** soak tests with 1M advisories; cap memory; verify backpressure. * **API:** pagination, filters, RBAC, error envelopes (RFC 7807). * **Offline kit:** bundle build & import correctness. --- ## 14) Failure modes & recovery * **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness. * **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges. * **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`. * **Resume:** all stages idempotent; `source_state.cursor` supports window resume. --- ## 15) Operator runbook (quick) * **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger` * **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }` * **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }` * **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345` * **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause` --- ## 16) Rollout plan 1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export. 2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export. 3. **Attestation hand‑off**: integrate with **Signer/Attestor** (optional). 4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse. 5. **Offline kit**: end‑to‑end verified bundles for air‑gap.