Files
git.stella-ops.org/docs/ARCHITECTURE_CONCELIER.md

467 lines
19 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# component_architecture_concelier.md — **StellaOps Concelier** (2025Q4)
> **Scope.** Implementationready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
**Boundaries.**
* Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 2) Canonical domain model
> Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
**Advisory**
```
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
**Alias**
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Affected**
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**AdvisoryStatement (event log)**
```
statementId // GUID (immutable)
vulnerabilityKey // canonical advisory key (e.g., CVE-2025-12345)
advisoryKey // merge snapshot advisory key (may reference variant)
statementHash // canonical hash of advisory payload
asOf // timestamp of snapshot (UTC)
recordedAt // persistence timestamp (UTC)
inputDocuments[] // document IDs contributing to the snapshot
payload // canonical advisory document (BSON / canonical JSON)
```
**AdvisoryConflict**
```
conflictId // GUID
vulnerabilityKey // canonical advisory key
conflictHash // deterministic hash of conflict payload
asOf // timestamp aligned with originating statement set
recordedAt // persistence timestamp
statementIds[] // related advisoryStatement identifiers
details // structured conflict explanation / merge reasoning
```
- `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`.
- Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
**ExportState**
```
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 3) Source families & precedence
### 3.1 Families
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine…
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 4) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 5) Merge engine
### 5.1 Keying & identity
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Conceliers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.2 Merge algorithm (deterministic)
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
---
## 7) Exporters
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
* Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 8) REST APIs
All under `/api/v1/concelier`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
GET /concelier/exports/index.json → mirror index describing available domains/bundles
GET /concelier/exports/mirror/{domain}/manifest.json
GET /concelier/exports/mirror/{domain}/bundle.json
GET /concelier/exports/mirror/{domain}/bundle.json.jws
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`.
---
## 9) Configuration (YAML)
```yaml
concelier:
mongo: { uri: "mongodb://mongo/concelier" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-concelier"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-concelier/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-concelier/trivy/
oras:
enabled: false
repo: ghcr.io/org/concelier
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 12) Observability
* **Metrics**
* `concelier.fetch.docs_total{source}`
* `concelier.fetch.bytes_total{source}`
* `concelier.parse.failures_total{source}`
* `concelier.map.affected_total{source}`
* `concelier.merge.changed_total`
* `concelier.export.bytes{kind}`
* `concelier.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 13) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 14) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 15) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345`
* **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause`
---
## 16) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.