mirror.md: added section 8 covering the 4-step UI wizard flow, wizard vs env var comparison table, and air-gap bundle import via UI and CLI. architecture.md: added 6 consumer API endpoints (GET/PUT /consumer, discover, verify-signature, import, import/status) to REST API section. airgap-operations-runbook.md: cross-reference to UI import alternative. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36 KiB
component_architecture_concelier.md — Stella Ops Concelier (Sprint 22)
Derived from Epic 1 – AOC enforcement and aligned with the Export Center evidence interfaces first scoped in Epic 10.
Scope. Implementation-ready architecture for Concelier: the advisory ingestion and Link-Not-Merge (LNM) observation pipeline that produces deterministic raw observations, correlation linksets, and evidence events consumed by Policy Engine, Console, CLI, and Export centers. Covers domain models, connectors, observation/linkset builders, storage schema, events, APIs, performance, security, and test matrices.
0) Mission & boundaries
Mission. Acquire authoritative vulnerability advisories (vendor PSIRTs, distros, OSS ecosystems, CERTs), persist them as immutable observations under the Aggregation-Only Contract (AOC), construct linksets that correlate observations without merging or precedence, and export deterministic evidence bundles (JSON, Trivy DB, Offline Kit) for downstream policy evaluation and operator tooling.
Boundaries.
- Concelier does not sign with private keys. When attestation is required, the export artifact is handed to the Signer/Attestor pipeline (out‑of‑process).
- Concelier does not decide PASS/FAIL; it provides data to the Policy engine.
- Online operation is allowlist‑only; air‑gapped deployments use the Offline Kit.
1) Aggregation-Only Contract guardrails
Epic 1 distilled — the service itself is the enforcement point for AOC. The guardrail checklist is embedded in code (AOCWriteGuard) and must be satisfied before any advisory hits PostgreSQL:
- No derived semantics in ingestion. The DTOs produced by connectors cannot contain severity, consensus, reachability, merged status, or fix hints. Roslyn analyzers (
StellaOps.AOC.Analyzers) scan connectors and fail builds if forbidden properties appear. - Immutable raw rows. Every upstream advisory is persisted in
advisory_rawwith append-only semantics. Revisions produce new IDs via version suffix (:v2,:v3), linking back throughsupersedes. - Mandatory provenance. Collectors record
source,upstreammetadata (document_version,fetched_at,received_at,content_hash), and signature presence before writing. - Linkset only. Derived joins (aliases, PURLs, CPEs, references) are stored inside
linksetand never mutatecontent.raw. - Deterministic canonicalisation. Writers use canonical JSON (sorted object keys, lexicographic arrays) ensuring identical inputs yield the same hashes/diff-friendly outputs.
- Idempotent upserts.
(source.vendor, upstream.upstream_id, upstream.content_hash)uniquely identify a document. Duplicate hashes short-circuit; new hashes create a new version. - Verifier & CI.
StellaOps.AOC.Verifierprocesses observation batches in CI and at runtime, rejecting writes lacking provenance, introducing unordered collections, or violating the schema.
Feature toggle: set
concelier:features:noMergeEnabled=trueto disable the legacy Merge module and itsmerge:reconcilejob once Link-Not-Merge adoption is complete (MERGE-LNM-21-002). AnalyzerCONCELIER0002prevents new references to Merge DI helpers when this flag is enabled.
1.1 Advisory raw document shape
{
"_id": "advisory_raw:osv:GHSA-xxxx-....:v3",
"source": {
"vendor": "OSV",
"stream": "github",
"api": "https://api.osv.dev/v1/.../GHSA-...",
"collector_version": "concelier/1.7.3"
},
"upstream": {
"upstream_id": "GHSA-xxxx-....",
"document_version": "2025-09-01T12:13:14Z",
"fetched_at": "2025-09-01T13:04:05Z",
"received_at": "2025-09-01T13:04:06Z",
"content_hash": "sha256:...",
"signature": {
"present": true,
"format": "dsse",
"key_id": "rekor:.../key/abc",
"sig": "base64..."
}
},
"content": {
"format": "OSV",
"spec_version": "1.6",
"raw": { /* unmodified upstream document */ }
},
"identifiers": {
"primary": "GHSA-xxxx-....",
"aliases": ["CVE-2025-12345", "GHSA-xxxx-...."]
},
"linkset": {
"purls": ["pkg:npm/lodash@4.17.21"],
"cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"],
"references": [
{"type":"advisory","url":"https://..."},
{"type":"fix","url":"https://..."}
],
"reconciled_from": ["content.raw.affected.ranges", "content.raw.pkg"]
},
"advisory_key": "CVE-2025-12345",
"links": [
{"scheme":"CVE","value":"CVE-2025-12345"},
{"scheme":"GHSA","value":"GHSA-XXXX-...."},
{"scheme":"PRIMARY","value":"CVE-2025-12345"}
],
"supersedes": "advisory_raw:osv:GHSA-xxxx-....:v2",
"tenant": "default"
}
1.2 Connector lifecycle
- Snapshot stage — connectors fetch signed feeds or use offline mirrors keyed by
{vendor, stream, snapshot_date}. - Parse stage — upstream payloads are normalised into strongly-typed DTOs with UTC timestamps.
- Guard stage — DTOs run through
AOCWriteGuardperforming schema validation, forbidden-field checks, provenance validation, deterministic sorting, and_idcomputation. - Write stage — append-only PostgreSQL insert; duplicate hash is ignored, changed hash creates a new version and emits
supersedespointer. - Event stage — DSSE-backed events
advisory.observation.updatedandadvisory.linkset.updatednotify downstream services (Policy, Export Center, CLI).
1.3 Export readiness
Concelier feeds Export Center profiles (Epic 10) by:
- Maintaining canonical JSON exports with deterministic manifests (
export.json) listing content hashes, counts, andsupersedeschains. - Producing Trivy DB-compatible artifacts (SQLite + metadata) packaged under
db/with hash manifests. - Surfacing mirror manifests that reference PostgreSQL snapshot digests, enabling Offline Kit bundle verification.
Running the same export job twice against the same snapshot must yield byte-identical archives and manifest hashes.
2) Topology & processes
Process shape: single ASP.NET Core service StellaOps.Concelier.WebService hosting:
- Scheduler with distributed locks (PostgreSQL backed).
- Connectors (fetch/parse/map) that emit immutable observation candidates.
- Observation writer enforcing AOC invariants via
AOCWriteGuard. - Linkset builder that correlates observations into
advisory_linksetsand annotates conflicts. - Event publisher emitting
advisory.observation.updatedandadvisory.linkset.updatedmessages. - Exporters (JSON, Trivy DB, Offline Kit slices) fed from observation/linkset stores.
- Minimal REST for health/status/trigger/export, raw observation reads, and evidence retrieval (
GET /vuln/evidence/advisories/{advisory_key}).
Scale: HA by running N replicas; locks prevent overlapping jobs per source/exporter.
3) Canonical domain model
Stored in PostgreSQL (database
concelier), serialized with a canonical JSON writer (stable order, camelCase, normalized timestamps).
2.1 Core entities
AdvisoryObservation
observationId // deterministic id: {tenant}:{source.vendor}:{upstreamId}:{revision}
tenant // issuing tenant (lower-case)
source{
vendor, stream, api, collectorVersion
}
upstream{
upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?}
}
content{
format, specVersion, raw, metadata?
}
identifiers{
cve?, ghsa?, vendorIds[], aliases[]
}
linkset{
purls[], cpes[], aliases[], references[{type,url}],
reconciledFrom[]
}
createdAt // when Concelier recorded the observation
attributes // optional provenance metadata (batch ids, ingest cursor)
```jsonc
#### AdvisoryLinkset
```jsonc
linksetId // sha256 over sorted (tenant, product/vuln tuple, observation ids)
tenant
key{
vulnerabilityId,
productKey,
confidence // low|medium|high
}
observations[] = [
{
observationId,
sourceVendor,
statement{
status?, severity?, references?, notes?
},
collectedAt
}
]
aliases{
primary,
others[]
}
purls[]
cpes[]
conflicts[]? // see AdvisoryLinksetConflict
createdAt
updatedAt
```jsonc
#### AdvisoryLinksetConflict
```jsonc
conflictId // deterministic hash
type // severity-mismatch | affected-range-divergence | reference-clash | alias-inconsistency | metadata-gap
field? // optional JSON pointer (e.g., /statement/severity/vector)
observations[] // per-source values contributing to the conflict
confidence // low|medium|high (heuristic weight)
detectedAt
```jsonc
#### ObservationEvent / LinksetEvent
```jsonc
eventId // ULID
tenant
type // advisory.observation.updated | advisory.linkset.updated
key{
observationId? // on observation event
linksetId? // on linkset event
vulnerabilityId?,
productKey?
}
delta{
added[], removed[], changed[] // normalized summary for consumers
}
hash // canonical hash of serialized delta payload
occurredAt
```jsonc
#### ExportState
```jsonc
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```jsonc
Legacy `Advisory`, `Affected`, and merge-centric entities remain in the repository for historical exports and replay but are being phased out as Link-Not-Merge takes over. New code paths must interact with `AdvisoryObservation` / `AdvisoryLinkset` exclusively and emit conflicts through the structured payloads described above.
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for image‑level advisories (rare).
* **Unmappable:** if a source is non‑deterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **non‑joinable**.
---
## 4) Source families & precedence
The source catalog contains **75 definitions** across **14 categories**. The authoritative definition lives in `src/Concelier/__Libraries/StellaOps.Concelier.Core/Sources/SourceDefinitions.cs`; for the full connector index see `docs/modules/concelier/connectors.md`.
### 3.1 Families
* **Primary databases**: NVD, OSV, GHSA, CVE.org (MITRE).
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Apple, VMware, Fortinet, Juniper, Palo Alto, plus cloud providers (AWS, Azure, GCP).
* **Linux distros**: Debian, Ubuntu, Alpine, SUSE, RHEL, CentOS, Fedora, Arch, Gentoo, Astra Linux.
* **OSS ecosystems**: npm, PyPI, Go, RubyGems, NuGet, Maven, Crates.io, Packagist, Hex.pm.
* **Package manager native**: RustSec (cargo-audit), PyPA (pip-audit), Go Vuln DB (govulncheck), Ruby Advisory DB (bundler-audit).
* **CSAF/VEX**: CSAF Aggregator, CSAF TC Trusted Publishers, VEX Hub.
* **Exploit databases**: Exploit-DB, PoC-in-GitHub, Metasploit Modules.
* **Container**: Docker Official CVEs, Chainguard Advisories.
* **Hardware/firmware**: Intel PSIRT, AMD Security, ARM Security Center.
* **ICS/SCADA**: Siemens ProductCERT, Kaspersky ICS-CERT.
* **CERTs / national CSIRTs**: CERT-FR, CERT-Bund, CERT.at, CERT.be, NCSC-CH, CERT-EU, JPCERT/CC, CISA (US-CERT), CERT-UA, CERT.PL, AusCERT, KrCERT/CC, CERT-In.
* **Russian/CIS**: FSTEC BDU, NKCKI (both promoted to stable).
* **Threat intelligence**: EPSS (FIRST), CISA KEV, MITRE ATT&CK, MITRE D3FEND.
* **StellaOps Mirror**: Pre-aggregated advisory mirror for offline/air-gap deployments.
### Source category enum
Primary, Vendor, Distribution, Ecosystem, Cert, Csaf, Threat, Exploit, Container, Hardware, Ics, PackageManager, Mirror, Other
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 5) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```jsonc
* **Fetch**: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors). KEV references use `reference` provenance anchored to the catalog search URL.
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 6) Observation & linkset pipeline
> **Goal:** deterministically ingest raw documents into immutable observations, correlate them into evidence-rich linksets, and broadcast changes without precedence or mutation.
### 5.1 Observation flow
1. **Connector fetch/parse/map** — connectors download upstream payloads, validate signatures, and map to DTOs (identifiers, references, raw payload, provenance).
2. **AOC guard** — `AOCWriteGuard` verifies forbidden keys, provenance completeness, tenant claims, timestamp normalization, and content hash idempotency. Violations raise `ERR_AOC_00x` mapped to structured logs and metrics.
3. **Append-only write** — observations insert into `advisory_observations`; duplicates by `(tenant, source.vendor, upstream.upstreamId, upstream.contentHash)` become no-ops; new content for same upstream id creates a supersedes chain.
4. **Replication + event** — PostgreSQL logical replication triggers `advisory.observation.updated@1` events with deterministic payloads (IDs, hash, supersedes pointer, linkset summary). Policy Engine, Offline Kit builder, and guard dashboards subscribe.
### 5.2 Linkset correlation
1. **Queue** — observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph.
2. **Canonical grouping** — builder resolves aliases using Concelier's alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
3. **Linkset materialization** — `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates.
4. **Conflict detection** — builder emits structured conflicts with typed severities (Hard/Soft/Info). Conflicts carry per-observation values for explainability.
5. **Event emission** — `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation.
#### Correlation Algorithm (v2)
The v2 correlation algorithm (see `linkset-correlation-v2.md`) replaces intersection-based scoring with graph-based connectivity and adds new signals:
| Signal | Weight | Description |
|--------|--------|-------------|
| Alias connectivity | 0.30 | LCC ratio from bipartite graph (transitive bridging) |
| Alias authority | 0.10 | Scope hierarchy (CVE > GHSA > VND > DST) |
| Package coverage | 0.20 | Pairwise + IDF-weighted overlap |
| Version compatibility | 0.10 | Equivalent/Overlapping/Disjoint classification |
| CPE match | 0.10 | Exact or vendor/product overlap |
| Patch lineage | 0.10 | Shared commit SHA from fix references |
| Reference overlap | 0.05 | Positive-only URL matching |
| Freshness | 0.05 | Fetch timestamp spread |
Conflict penalties are typed:
- **Hard** (`distinct-cves`, `disjoint-version-ranges`): -0.30 to -0.40
- **Soft** (`affected-range-divergence`, `severity-mismatch`): -0.05 to -0.10
- **Info** (`reference-clash` on simple disjoint sets): no penalty
Configure via `concelier:correlation:version` (v1 or v2) and optional weight overrides.
### 5.3 Event contract
| Event | Schema | Notes |
|-------|--------|-------|
| `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. |
| `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. |
Events are emitted via Valkey Streams. Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures event streams during bundle creation for air-gapped replay.
---
## 7) Storage schema (PostgreSQL)
### Tables & indexes (LNM path)
* `concelier.sources` `{_id, type, baseUrl, enabled, notes}` — connector catalog.
* `concelier.source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` — run-state (TTL indexes on `backoffUntil`).
* `concelier.documents` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` — raw payload registry.
* Indexes: `{sourceName:1, uri:1}` unique; `{fetchedAt:-1}` for recent fetches.
* `concelier.dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` — normalized connector DTOs used for replay.
* Index: `{sourceName:1, documentId:1}`.
* `concelier.advisory_observations`
{ _id: "tenant:vendor:upstreamId:revision", tenant, source: { vendor, stream, api, collectorVersion }, upstream: { upstreamId, documentVersion, fetchedAt, receivedAt, contentHash, signature }, content: { format, specVersion, raw, metadata? }, identifiers: { cve?, ghsa?, vendorIds[], aliases[] }, linkset: { purls[], cpes[], aliases[], references[], reconciledFrom[] }, rawLinkset: { aliases[], purls[], cpes[], references[], reconciledFrom[], notes? }, supersedes?: "prevObservationId", createdAt, attributes?: object }
* Indexes: `{tenant:1, upstream.upstreamId:1}`, `{tenant:1, source.vendor:1, linkset.purls:1}`, `{tenant:1, linkset.aliases:1}`, `{tenant:1, createdAt:-1}`.
* `concelier.advisory_linksets`
{ _id: "sha256:...", tenant, key: { vulnerabilityId, productKey, confidence }, observations: [ { observationId, sourceVendor, statement, collectedAt } ], aliases: { primary, others: [] }, purls: [], cpes: [], conflicts: [], createdAt, updatedAt }
* Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, aliases.primary:1}`, `{tenant:1, updatedAt:-1}`.
* `concelier.advisory_events`
{ _id: ObjectId, tenant, type: "advisory.observation.updated" | "advisory.linkset.updated", key, delta, hash, occurredAt }
* TTL index on `occurredAt` (configurable retention), `{type:1, occurredAt:-1}` for replay.
* `concelier.export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**Legacy tables** (`advisory`, `alias`, `affected`, `reference`, `merge_event`) remain read-only during the migration window to support back-compat exports. New code must not write to them; scheduled cleanup removes them after Link-Not-Merge GA.
**Object storage**: `documents` for raw payloads (immutable); `exports` for historical JSON/Trivy archives.
---
## 8) Exporters
### 7.1 Deterministic JSON (vuln‑list style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA‑256 and a top‑level **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```json
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
- Optional ORAS push (OCI layout) for registries.
- Offline kit bundles include Trivy DB + JSON tree + export manifest.
- Mirror-ready bundles: when
concelier.trivy.mirrordefines domains, the exporter emitsmirror/index.jsonplus per-domainmanifest.json,metadata.json, anddb.tar.gzfiles with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints. - Concelier.WebService serves
/concelier/exports/index.jsonand/concelier/exports/mirror/{domain}/…directly from the export tree with hour-long budgets (index: 60 s, bundles: 300 s, immutable) and per-domain rate limiting; the endpoints honour Stella Ops Authority or CIDR bypass lists depending on mirror topology.
7.3 Hand‑off to Signer/Attestor (optional)
- On export completion, if
attest: trueis set in job args, Concelier posts the artifact metadata to Signer/Attestor; Concelier itself does not hold signing keys. - Export record stores returned
{ uuid, index, url }from Rekor v2.
9) REST APIs
All under /api/v1/concelier.
Health & status
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
Sources & jobs
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
Exports
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
GET /concelier/exports/index.json → mirror index describing available domains/bundles
GET /concelier/exports/mirror/{domain}/manifest.json
GET /concelier/exports/mirror/{domain}/bundle.json
GET /concelier/exports/mirror/{domain}/bundle.json.jws
Search (operator debugging)
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
Mirror domain management (under /api/v1/mirror)
GET /config → current mirror config (mode, signing, refresh interval)
PUT /config → update mirror mode/signing/refresh settings
GET /domains → list all mirror domains with export counts
POST /domains → create a new mirror domain with exports/filters
GET /domains/{domainId} → domain detail (exports, status)
PUT /domains/{domainId} → update domain (name, auth, rate limits, exports)
DELETE /domains/{domainId} → remove a mirror domain
POST /domains/{domainId}/exports → add an export to a domain
DELETE /domains/{domainId}/exports/{exportKey} → remove an export from a domain
POST /domains/{domainId}/generate → trigger on-demand bundle generation
GET /domains/{domainId}/status → domain sync status (last generate, staleness)
POST /test → test mirror endpoint connectivity
Mirror consumer configuration (under /api/v1/mirror)
GET /consumer → current consumer connector configuration (base address, domain, signature, timeout, connection status, last sync)
PUT /consumer → update consumer connector config (base address, domain ID, index path, HTTP timeout, signature settings)
POST /consumer/discover → fetch mirror index from base address, return available domains with metadata (domain ID, display name, advisory count, bundle size, export formats, signed status, last generated)
POST /consumer/verify-signature → fetch JWS header from selected domain's bundle, return detected algorithm, key ID, and provider
The consumer endpoints configure the StellaOpsMirrorConnector at runtime without requiring service restarts. Configuration is persisted via IMirrorConsumerConfigStore (in-memory, with planned DB backend). The /consumer/discover endpoint enables the UI setup wizard to present operators with a list of available domains before committing to a configuration.
Air-gap bundle import (under /api/v1/mirror)
POST /import → import a mirror bundle from a local filesystem path { bundlePath, verifyChecksums, verifyDsse, trustRootsPath? }
GET /import/status → import progress and result (exports imported, total size, errors, warnings)
The import endpoint triggers an async import of a mirror bundle directory accessible to the Concelier container. It parses the bundle manifest, verifies SHA-256 checksums (when verifyChecksums is true), detects DSSE envelopes (when verifyDsse is true), and copies artifacts into the local data store. Import state is tracked by IMirrorBundleImportStore. This exposes the same functionality as the CLI MirrorBundleImportService via HTTP.
Mirror domains group export plans with shared rate limits and authentication rules. Exports support multi-value filter shorthands: sourceCategory (e.g., "Distribution" resolves to all distro sources), sourceTag (e.g., "linux"), and comma-separated sourceVendor values. Domain configuration is persisted in excititor.mirror_domains / excititor.mirror_exports tables, with env-var config as fallback. The MirrorExportScheduler background service periodically regenerates stale bundles (configurable via RefreshIntervalMinutes, default 60 minutes).
AuthN/Z: Authority tokens (OpTok) with roles: concelier.read, concelier.admin, concelier.export.
10) Configuration (YAML)
concelier:
postgres:
connectionString: "Host=postgres;Port=5432;Database=concelier;Username=stellaops;Password=stellaops"
s3:
endpoint: "http://rustfs:8080"
bucket: "stellaops-concelier"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-concelier/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-concelier/trivy/
oras:
enabled: false
repo: ghcr.io/org/concelier
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
11) Security & compliance
- Outbound allowlist per connector (domains, protocols); proxy support; TLS pinning where possible.
- Signature verification for raw docs (PGP/cosign/x509) with results stored in
document.metadata.sig. Docs failing verification may still be ingested but flagged; Policy Engine or downstream policy can down-weight them. - No secrets in logs; auth material via
env:or mounted files; HTTP redaction ofAuthorizationheaders. - Multi‑tenant: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens.
- Determinism: canonical JSON writer; export digests stable across runs given same inputs.
12) Performance targets & scale
- Ingest: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
- Normalize/map: ≥ 50k observation statements/min on 4 cores.
- Observation write: ≤ 5 ms P95 per row (including guard + PostgreSQL write).
- Linkset build: ≤ 15 ms P95 per
(vulnerabilityId, productKey)update, even with 20+ contributing observations. - Export: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores.
- Memory: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
Scale pattern: add Concelier replicas; PostgreSQL scaling via indices and read/write connection pooling; object storage for oversized docs.
13) Observability
-
Metrics
concelier.fetch.docs_total{source}concelier.fetch.bytes_total{source}concelier.parse.failures_total{source}concelier.map.statements_total{source}concelier.observations.write_total{result=ok|noop|error}concelier.linksets.updated_total{result=ok|skip|error}concelier.linksets.conflicts_total{type}concelier.export.bytes{kind}concelier.export.duration_seconds{kind}advisory_ai_chunk_requests_total{tenant,result,cache}andadvisory_ai_guardrail_blocks_total{tenant,reason,cache}instrument the/advisories/{key}/chunkssurfaces that Advisory AI consumes. Cache hits now emit the same guardrail counters so operators can see blocked segments even when responses are served from cache.
-
Tracing around fetch/parse/map/observe/linkset/export.
-
Logs: structured with
source,uri,docDigest,advisoryKey,exportId.
14) Testing matrix
- Connectors: fixture suites for each provider/format (happy path; malformed; signature fail).
- Version semantics: EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases).
- Linkset correlation: multi-source conflicts (severity, range, alias) produce deterministic conflict payloads; ensure confidence scoring stable.
- Export determinism: byte‑for‑byte stable outputs across runs; digest equality.
- Performance: soak tests with 1M advisories; cap memory; verify backpressure.
- API: pagination, filters, RBAC, error envelopes (RFC 7807).
- Offline kit: bundle build & import correctness.
15) Failure modes & recovery
- Source outages: scheduler backs off with exponential delay;
source_state.backoffUntil; alerts on staleness. - Schema drifts: parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
- Partial exports: exporters write to temp prefix; manifest commit is atomic; only then move to final prefix and update
export_state. - Resume: all stages idempotent;
source_state.cursorsupports window resume.
16) Operator runbook (quick)
- Trigger all sources:
POST /api/v1/concelier/sources/*/trigger - Force full export JSON:
POST /api/v1/concelier/exports/json { "full": true, "force": true } - Force Trivy DB delta publish:
POST /api/v1/concelier/exports/trivy { "full": false, "publish": true } - Inspect observation:
GET /api/v1/concelier/observations/{observationId} - Query linkset:
GET /api/v1/concelier/linksets?vulnerabilityId=CVE-2025-12345&productKey=pkg:rpm/redhat/openssl - Pause noisy source:
POST /api/v1/concelier/sources/osv/pause
17) Rollout plan
- MVP: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
- Add: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
- Attestation hand‑off: integrate with Signer/Attestor (optional).
- Advisory evidence attestation parameters and path rules are documented in
docs/modules/concelier/attestation.md.
- Advisory evidence attestation parameters and path rules are documented in
- Scale & diagnostics: provider dashboards, staleness alerts, export cache reuse.
- Offline kit: end‑to‑end verified bundles for air‑gap.
ADR: Advisory Domain Source Consolidation (Sprint 203, 2026-03-04)
Decision
Absorb src/Feedser/ (4 projects) and src/Excititor/ (38+ projects) into src/Concelier/ as a source-only consolidation. No namespace renames. No database schema merge. No service identity changes.
Context
The advisory domain spans three service-level source directories (Concelier, Feedser, Excititor) that all contribute to the same logical pipeline: raw advisory ingestion, proof evidence generation, and VEX observation correlation. Keeping them as separate top-level directories created confusion about domain ownership and complicated cross-module reference tracking for 17+ dependent projects.
Rationale for no DB merge
All three DbContexts (ConcelierDbContext, ExcititorDbContext, ProofServiceDbContext) connect to the same PostgreSQL database (stellaops_platform) but own distinct schemas (vuln/concelier, vex/excititor, vuln/feedser). The 49 entities across 5 schemas have distinct write lifecycles (raw ingestion vs. proof generation vs. VEX processing). Merging DbContexts would couple unrelated write patterns for zero operational benefit. Schema isolation is a feature.
Consequences
src/Concelier/is now the single domain root for all advisory-related source code.- Feedser projects live at
src/Concelier/StellaOps.Feedser.*andsrc/Concelier/__Tests/StellaOps.Feedser.*. - Excititor projects live at
src/Concelier/StellaOps.Excititor.*,src/Concelier/__Libraries/StellaOps.Excititor.*, andsrc/Concelier/__Tests/StellaOps.Excititor.*. - Runtime service identities are unchanged: Excititor WebService and Worker deploy as separate containers with the same Docker image names and HTTP paths.
- Deployment boundary is frozen: Concelier and Excititor remain independently deployable services.
- CI path-filters updated:
excititorsection replaced with comment pointing toconcelierpaths. src/Feedser/andsrc/Excititor/top-level directories have been deleted.