- Added `/concelier/advisories/{vulnerabilityKey}/replay` endpoint to return conflict summaries and explainers.
- Introduced `MergeConflictExplainerPayload` to structure conflict details including type, reason, and source rankings.
- Enhanced `MergeConflictSummary` to include structured explainer payloads and hashes for persisted conflicts.
- Updated `MirrorEndpointExtensions` to enforce rate limits and cache headers for mirror distribution endpoints.
- Refactored tests to cover new replay endpoint functionality and validate conflict explainers.
- Documented changes in TASKS.md, noting completion of mirror distribution endpoints and updated operational runbook.
		
	
		
			
				
	
	
	
		
			19 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	component_architecture_concelier.md — Stella Ops Concelier (2025Q4)
Scope. Implementation‑ready architecture for Concelier: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
0) Mission & boundaries
Mission. Acquire authoritative vulnerability advisories (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a canonical model, reconcile aliases and version ranges, and export deterministic artifacts (JSON, Trivy DB) for fast backend joins.
Boundaries.
- Concelier does not sign with private keys. When attestation is required, the export artifact is handed to the Signer/Attestor pipeline (out‑of‑process).
- Concelier does not decide PASS/FAIL; it provides data to the Policy engine.
- Online operation is allowlist‑only; air‑gapped deployments use the Offline Kit.
1) Topology & processes
Process shape: single ASP.NET Core service StellaOps.Concelier.WebService hosting:
- Scheduler with distributed locks (Mongo backed).
- Connectors (fetch/parse/map).
- Merger (canonical record assembly + precedence).
- Exporters (JSON, Trivy DB).
- Minimal REST for health/status/trigger/export.
Scale: HA by running N replicas; locks prevent overlapping jobs per source/exporter.
2) Canonical domain model
Stored in MongoDB (database
concelier), serialized with a canonical JSON writer (stable order, camelCase, normalized timestamps).
2.1 Core entities
Advisory
advisoryId          // internal GUID
advisoryKey         // stable string key (e.g., CVE-2025-12345 or vendor ID)
title               // short title (best-of from sources)
summary             // normalized summary (English; i18n optional)
published           // earliest source timestamp
modified            // latest source timestamp
severity            // normalized {none, low, medium, high, critical}
cvss                // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown        // bool (e.g., KEV/active exploitation flags)
references[]        // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[]           // provenance for traceability (doc digests, URIs)
Alias
advisoryId
scheme              // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value               // e.g., "CVE-2025-12345"
Affected
advisoryId
productKey          // canonical product identity (see 2.2)
rangeKind           // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced?         // string (format depends on rangeKind)
fixed?              // string (format depends on rangeKind)
lastKnownSafe?      // optional explicit safe floor
arch?               // arch or platform qualifier if source declares (x86_64, aarch64)
distro?             // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem?          // npm|pypi|maven|nuget|golang|…
notes?              // normalized notes per source
Reference
advisoryId
url
kind                // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag           // e.g., vendor/redhat, distro/debian, oss/ghsa
MergeEvent
advisoryKey
beforeHash          // canonical JSON hash before merge
afterHash           // canonical JSON hash after merge
mergedAt
inputs[]            // source doc digests that contributed
AdvisoryStatement (event log)
statementId         // GUID (immutable)
vulnerabilityKey    // canonical advisory key (e.g., CVE-2025-12345)
advisoryKey         // merge snapshot advisory key (may reference variant)
statementHash       // canonical hash of advisory payload
asOf                // timestamp of snapshot (UTC)
recordedAt          // persistence timestamp (UTC)
inputDocuments[]    // document IDs contributing to the snapshot
payload             // canonical advisory document (BSON / canonical JSON)
AdvisoryConflict
conflictId          // GUID
vulnerabilityKey    // canonical advisory key
conflictHash        // deterministic hash of conflict payload
asOf                // timestamp aligned with originating statement set
recordedAt          // persistence timestamp
statementIds[]      // related advisoryStatement identifiers
details             // structured conflict explanation / merge reasoning
- AdvisoryEventLog(Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing- vulnerabilityKey, serializing advisories with- CanonicalJsonSerializer, and computing SHA-256 hashes (- statementHash,- conflictHash) over the canonical JSON payloads. Consumers can replay by key with an optional- asOffilter to obtain deterministic snapshots ordered by- asOfthen- recordedAt.
- Conflict explainers are serialized as deterministic MergeConflictExplainerPayloadrecords (type, reason, source ranks, winning values); replay clients can parse the payload to render human-readable rationales without re-computing precedence.
- Concelier.WebService exposes the immutable log via GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601], returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
ExportState
exportKind          // json | trivydb
baseExportId?       // last full baseline
baseDigest?         // digest of last full baseline
lastFullDigest?     // digest of last full export
lastDeltaDigest?    // digest of last delta export
cursor              // per-kind incremental cursor
files[]             // last manifest snapshot (path → sha256)
2.2 Product identity (productKey)
- Primary: purl(Package URL).
- OS packages: RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with EVR/NVRA preserved.
- Secondary: cperetained for compatibility; advisory records may carry both.
- Image/platform: oci:<registry>/<repo>@<digest>for image‑level advisories (rare).
- Unmappable: if a source is non‑deterministic, keep native string under productKey="native:<provider>:<id>"and mark non‑joinable.
3) Source families & precedence
3.1 Families
- Vendor PSIRTs: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
- Linux distros: Red Hat, SUSE, Ubuntu, Debian, Alpine…
- OSS ecosystems: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
- CERTs / national CSIRTs: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERT‑FR/BUND, etc.
3.2 Precedence (when claims conflict)
- Vendor PSIRT (authoritative for their product).
- Distro (authoritative for packages they ship, including backports).
- Ecosystem (OSV/GHSA) for library semantics.
- CERTs/aggregators for enrichment (KEV/known exploited).
Precedence affects Affected ranges and fixed info; severity is normalized to the maximum credible severity unless policy overrides. Conflicts are retained with source provenance.
4) Connectors & normalization
4.1 Connector contract
public interface IFeedConnector {
  string SourceName { get; }
  Task FetchAsync(IServiceProvider sp, CancellationToken ct);   // -> document collection
  Task ParseAsync(IServiceProvider sp, CancellationToken ct);   // -> dto collection (validated)
  Task MapAsync(IServiceProvider sp, CancellationToken ct);     // -> advisory/alias/affected/reference
}
- Fetch: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting.
- Parse: schema validation (JSON Schema, XSD/CSAF), content type checks; write DTO with normalized casing.
- Map: build canonical records; all outputs carry provenance (doc digest, URI, anchors).
4.2 Version range normalization
- SemVer ecosystems (npm, pypi, maven, nuget, golang): normalize to introduced/fixedsemver ranges (use~,^,<,>=canonicalized to intervals).
- RPM EVR: epoch:version-releasewithrpmvercmpsemantics; store raw EVR strings and also computed order keys for query.
- DEB: dpkg version comparison semantics mirrored; store computed keys.
- APK: Alpine version semantics; compute order keys.
- Generic: if provider uses text, retain raw; do not invent ranges.
4.3 Severity & CVSS
- Normalize CVSS v2/v3/v4 where available (vector, baseScore, severity).
- If multiple CVSS sources exist, track them all; effective severity defaults to max by policy (configurable).
- ExploitKnown toggled by KEV and equivalent sources; store evidence (source, date).
5) Merge engine
5.1 Keying & identity
- Identity graph: CVE is primary node; vendor/distro IDs resolved via Alias edges (from connectors and Concelier’s alias tables).
- advisoryKeyis the canonical primary key (CVE if present, else vendor/distro key).
5.2 Merge algorithm (deterministic)
- 
Gather all rows for advisoryKey(across sources).
- 
Select title/summary by precedence source (vendor>distro>ecosystem>cert). 
- 
Union aliases (dedupe by scheme+value). 
- 
Merge Affectedwith rules:- Prefer vendor ranges for vendor products; prefer distro for distro‑shipped packages.
- If both exist for same productKey, keep both; marksourceTagandprecedenceso Policy can decide.
- Never collapse range semantics across different families (e.g., rpm EVR vs semver).
 
- 
CVSS/severity: record all CVSS sets; compute effectiveSeverity = max (unless policy override). 
- 
References: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve sourceTag.
- 
Produce canonical JSON; compute afterHash; store MergeEvent with inputs and hashes. 
The merge is pure given inputs. Any change in inputs or precedence matrices changes the hash predictably.
6) Storage schema (MongoDB)
Collections & indexes
- 
source{_id, type, baseUrl, enabled, notes}
- 
source_state{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}
- 
document{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}- Index: {sourceName:1, uri:1}unique,{fetchedAt:-1}
 
- Index: 
- 
dto{_id, sourceName, documentId, schemaVer, payload, validatedAt}- Index: {sourceName:1, documentId:1}
 
- Index: 
- 
advisory{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}- Index: {advisoryKey:1}unique,{modified:-1},{severity:1}, text index (title, summary)
 
- Index: 
- 
alias{advisoryId, scheme, value}- Index: {scheme:1,value:1},{advisoryId:1}
 
- Index: 
- 
affected{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}- Index: {productKey:1},{advisoryId:1},{productKey:1, rangeKind:1}
 
- Index: 
- 
reference{advisoryId, url, kind, sourceTag}- Index: {advisoryId:1},{kind:1}
 
- Index: 
- 
merge_event{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}- Index: {advisoryKey:1, mergedAt:-1}
 
- Index: 
- 
export_state{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}
- 
locks{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}(TTL cleans dead locks)
- 
jobs{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}
GridFS buckets: fs.documents for raw payloads.
7) Exporters
7.1 Deterministic JSON (vuln‑list style)
- Folder structure mirroring /<scheme>/<first-two>/<rest>/…with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
- manifest.jsonlists all files with SHA‑256 and a top‑level export digest.
7.2 Trivy DB exporter
- 
Builds Bolt DB archives compatible with Trivy; supports full and delta modes. 
- 
In delta, unchanged blobs are reused from the base; metadata captures: { "mode": "delta|full", "baseExportId": "...", "baseManifestDigest": "sha256:...", "changed": ["path1", "path2"], "removed": ["path3"] }
- 
Optional ORAS push (OCI layout) for registries. 
- 
Offline kit bundles include Trivy DB + JSON tree + export manifest. 
- 
Mirror-ready bundles: when concelier.trivy.mirrordefines domains, the exporter emitsmirror/index.jsonplus per-domainmanifest.json,metadata.json, anddb.tar.gzfiles with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints.
- 
Concelier.WebService serves /concelier/exports/index.jsonand/concelier/exports/mirror/{domain}/…directly from the export tree with hour-long budgets (index: 60 s, bundles: 300 s, immutable) and per-domain rate limiting; the endpoints honour Stella Ops Authority or CIDR bypass lists depending on mirror topology.
7.3 Hand‑off to Signer/Attestor (optional)
- On export completion, if attest: trueis set in job args, Concelier posts the artifact metadata to Signer/Attestor; Concelier itself does not hold signing keys.
- Export record stores returned { uuid, index, url }from Rekor v2.
8) REST APIs
All under /api/v1/concelier.
Health & status
GET  /healthz | /readyz
GET  /status                              → sources, last runs, export cursors
Sources & jobs
GET  /sources                              → list of configured sources
POST /sources/{name}/trigger               → { jobId }
POST /sources/{name}/pause | /resume       → toggle
GET  /jobs/{id}                            → job status
Exports
POST /exports/json   { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy  { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET  /exports/{id}   → export metadata (kind, digest, createdAt, rekor?)
GET  /concelier/exports/index.json        → mirror index describing available domains/bundles
GET  /concelier/exports/mirror/{domain}/manifest.json
GET  /concelier/exports/mirror/{domain}/bundle.json
GET  /concelier/exports/mirror/{domain}/bundle.json.jws
Search (operator debugging)
GET  /advisories/{key}
GET  /advisories?scheme=CVE&value=CVE-2025-12345
GET  /affected?productKey=pkg:rpm/openssl&limit=100
AuthN/Z: Authority tokens (OpTok) with roles: concelier.read, concelier.admin, concelier.export.
9) Configuration (YAML)
concelier:
  mongo: { uri: "mongodb://mongo/concelier" }
  s3:
    endpoint: "http://minio:9000"
    bucket: "stellaops-concelier"
  scheduler:
    windowSeconds: 30
    maxParallelSources: 4
  sources:
    - name: redhat
      kind: csaf
      baseUrl: https://access.redhat.com/security/data/csaf/v2/
      signature: { type: pgp, keys: [ "…redhat PGP…" ] }
      enabled: true
      windowDays: 7
    - name: suse
      kind: csaf
      baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
      signature: { type: pgp, keys: [ "…suse PGP…" ] }
    - name: ubuntu
      kind: usn-json
      baseUrl: https://ubuntu.com/security/notices.json
      signature: { type: none }
    - name: osv
      kind: osv
      baseUrl: https://api.osv.dev/v1/
      signature: { type: none }
    - name: ghsa
      kind: ghsa
      baseUrl: https://api.github.com/graphql
      auth: { tokenRef: "env:GITHUB_TOKEN" }
  exporters:
    json:
      enabled: true
      output: s3://stellaops-concelier/json/
    trivy:
      enabled: true
      mode: full
      output: s3://stellaops-concelier/trivy/
      oras:
        enabled: false
        repo: ghcr.io/org/concelier
  precedence:
    vendorWinsOverDistro: true
    distroWinsOverOsv: true
  severity:
    policy: max    # or 'vendorPreferred' / 'distroPreferred'
10) Security & compliance
- Outbound allowlist per connector (domains, protocols); proxy support; TLS pinning where possible.
- Signature verification for raw docs (PGP/cosign/x509) with results stored in document.metadata.sig. Docs failing verification may still be ingested but flagged; merge can down‑weight or ignore them by config.
- No secrets in logs; auth material via env:or mounted files; HTTP redaction ofAuthorizationheaders.
- Multi‑tenant: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens.
- Determinism: canonical JSON writer; export digests stable across runs given same inputs.
11) Performance targets & scale
- Ingest: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
- Normalize/map: ≥ 50k Affectedrows/min on 4 cores.
- Merge: ≤ 10 ms P95 per advisory at steady‑state updates.
- Export: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores.
- Memory: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
Scale pattern: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
12) Observability
- 
Metrics - concelier.fetch.docs_total{source}
- concelier.fetch.bytes_total{source}
- concelier.parse.failures_total{source}
- concelier.map.affected_total{source}
- concelier.merge.changed_total
- concelier.export.bytes{kind}
- concelier.export.duration_seconds{kind}
 
- 
Tracing around fetch/parse/map/merge/export. 
- 
Logs: structured with source,uri,docDigest,advisoryKey,exportId.
13) Testing matrix
- Connectors: fixture suites for each provider/format (happy path; malformed; signature fail).
- Version semantics: EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases).
- Merge: conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
- Export determinism: byte‑for‑byte stable outputs across runs; digest equality.
- Performance: soak tests with 1M advisories; cap memory; verify backpressure.
- API: pagination, filters, RBAC, error envelopes (RFC 7807).
- Offline kit: bundle build & import correctness.
14) Failure modes & recovery
- Source outages: scheduler backs off with exponential delay; source_state.backoffUntil; alerts on staleness.
- Schema drifts: parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
- Partial exports: exporters write to temp prefix; manifest commit is atomic; only then move to final prefix and update export_state.
- Resume: all stages idempotent; source_state.cursorsupports window resume.
15) Operator runbook (quick)
- Trigger all sources: POST /api/v1/concelier/sources/*/trigger
- Force full export JSON: POST /api/v1/concelier/exports/json { "full": true, "force": true }
- Force Trivy DB delta publish: POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }
- Inspect advisory: GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345
- Pause noisy source: POST /api/v1/concelier/sources/osv/pause
16) Rollout plan
- MVP: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
- Add: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
- Attestation hand‑off: integrate with Signer/Attestor (optional).
- Scale & diagnostics: provider dashboards, staleness alerts, export cache reuse.
- Offline kit: end‑to‑end verified bundles for air‑gap.