feat: Implement advisory event replay API with conflict explainers
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Added `/concelier/advisories/{vulnerabilityKey}/replay` endpoint to return conflict summaries and explainers.
- Introduced `MergeConflictExplainerPayload` to structure conflict details including type, reason, and source rankings.
- Enhanced `MergeConflictSummary` to include structured explainer payloads and hashes for persisted conflicts.
- Updated `MirrorEndpointExtensions` to enforce rate limits and cache headers for mirror distribution endpoints.
- Refactored tests to cover new replay endpoint functionality and validate conflict explainers.
- Documented changes in TASKS.md, noting completion of mirror distribution endpoints and updated operational runbook.
This commit is contained in:
Vladimir Moushkov
2025-10-20 18:59:26 +03:00
parent 44ad31591c
commit 2b6304c9c3
20 changed files with 3966 additions and 3493 deletions

View File

@@ -1,466 +1,468 @@
# component_architecture_concelier.md — **StellaOps Concelier** (2025Q4)
> **Scope.** Implementationready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
**Boundaries.**
* Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 2) Canonical domain model
> Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
**Advisory**
```
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
**Alias**
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Affected**
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**AdvisoryStatement (event log)**
```
statementId // GUID (immutable)
vulnerabilityKey // canonical advisory key (e.g., CVE-2025-12345)
advisoryKey // merge snapshot advisory key (may reference variant)
statementHash // canonical hash of advisory payload
asOf // timestamp of snapshot (UTC)
recordedAt // persistence timestamp (UTC)
inputDocuments[] // document IDs contributing to the snapshot
payload // canonical advisory document (BSON / canonical JSON)
```
**AdvisoryConflict**
```
conflictId // GUID
vulnerabilityKey // canonical advisory key
conflictHash // deterministic hash of conflict payload
asOf // timestamp aligned with originating statement set
recordedAt // persistence timestamp
statementIds[] // related advisoryStatement identifiers
details // structured conflict explanation / merge reasoning
```
- `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`.
- Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
**ExportState**
```
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 3) Source families & precedence
### 3.1 Families
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 4) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 5) Merge engine
### 5.1 Keying & identity
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Conceliers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.2 Merge algorithm (deterministic)
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
---
## 7) Exporters
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
* Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 8) REST APIs
All under `/api/v1/concelier`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
GET /concelier/exports/index.json → mirror index describing available domains/bundles
GET /concelier/exports/mirror/{domain}/manifest.json
GET /concelier/exports/mirror/{domain}/bundle.json
GET /concelier/exports/mirror/{domain}/bundle.json.jws
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`.
---
## 9) Configuration (YAML)
```yaml
concelier:
mongo: { uri: "mongodb://mongo/concelier" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-concelier"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-concelier/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-concelier/trivy/
oras:
enabled: false
repo: ghcr.io/org/concelier
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 12) Observability
* **Metrics**
* `concelier.fetch.docs_total{source}`
* `concelier.fetch.bytes_total{source}`
* `concelier.parse.failures_total{source}`
* `concelier.map.affected_total{source}`
* `concelier.merge.changed_total`
* `concelier.export.bytes{kind}`
* `concelier.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 13) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 14) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 15) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345`
* **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause`
---
## 16) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.
# component_architecture_concelier.md — **StellaOps Concelier** (2025Q4)
> **Scope.** Implementationready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
**Boundaries.**
* Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 2) Canonical domain model
> Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
**Advisory**
```
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
**Alias**
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Affected**
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**AdvisoryStatement (event log)**
```
statementId // GUID (immutable)
vulnerabilityKey // canonical advisory key (e.g., CVE-2025-12345)
advisoryKey // merge snapshot advisory key (may reference variant)
statementHash // canonical hash of advisory payload
asOf // timestamp of snapshot (UTC)
recordedAt // persistence timestamp (UTC)
inputDocuments[] // document IDs contributing to the snapshot
payload // canonical advisory document (BSON / canonical JSON)
```
**AdvisoryConflict**
```
conflictId // GUID
vulnerabilityKey // canonical advisory key
conflictHash // deterministic hash of conflict payload
asOf // timestamp aligned with originating statement set
recordedAt // persistence timestamp
statementIds[] // related advisoryStatement identifiers
details // structured conflict explanation / merge reasoning
```
- `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`.
- Conflict explainers are serialized as deterministic `MergeConflictExplainerPayload` records (type, reason, source ranks, winning values); replay clients can parse the payload to render human-readable rationales without re-computing precedence.
- Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
**ExportState**
```
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 3) Source families & precedence
### 3.1 Families
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine…
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 4) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 5) Merge engine
### 5.1 Keying & identity
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Conceliers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.2 Merge algorithm (deterministic)
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
---
## 7) Exporters
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
* Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints.
* Concelier.WebService serves `/concelier/exports/index.json` and `/concelier/exports/mirror/{domain}/…` directly from the export tree with hour-long budgets (index: 60s, bundles: 300s, immutable) and per-domain rate limiting; the endpoints honour Stella Ops Authority or CIDR bypass lists depending on mirror topology.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 8) REST APIs
All under `/api/v1/concelier`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
GET /concelier/exports/index.json → mirror index describing available domains/bundles
GET /concelier/exports/mirror/{domain}/manifest.json
GET /concelier/exports/mirror/{domain}/bundle.json
GET /concelier/exports/mirror/{domain}/bundle.json.jws
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`.
---
## 9) Configuration (YAML)
```yaml
concelier:
mongo: { uri: "mongodb://mongo/concelier" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-concelier"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-concelier/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-concelier/trivy/
oras:
enabled: false
repo: ghcr.io/org/concelier
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 12) Observability
* **Metrics**
* `concelier.fetch.docs_total{source}`
* `concelier.fetch.bytes_total{source}`
* `concelier.parse.failures_total{source}`
* `concelier.map.affected_total{source}`
* `concelier.merge.changed_total`
* `concelier.export.bytes{kind}`
* `concelier.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 13) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 14) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 15) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345`
* **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause`
---
## 16) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.

View File

@@ -337,7 +337,7 @@ Prometheus + OTLP; Grafana dashboards ship in the charts.
* **Vulnerability response**:
* Concelier red-flag advisories trigger accelerated **stable** patch rollout; UI/CLI “security patch available” notice.
* 2025-10: Pinned `MongoDB.Driver` **3.5.0** and `SharpCompress` **0.41.0** across services (DEVOPS-SEC-10-301) to eliminate NU1902/NU1903 warnings surfaced during scanner cache/worker test runs; future dependency bumps follow the same central override pattern.
* 2025-10: Pinned `MongoDB.Driver` **3.5.0** and `SharpCompress` **0.41.0** across services (DEVOPS-SEC-10-301) to eliminate NU1902/NU1903 warnings surfaced during scanner cache/worker test runs; repacked the local `Mongo2Go` feed so test fixtures inherit the patched dependencies; future bumps follow the same central override pattern.
* **Backups/DR**:

View File

@@ -1,196 +1,224 @@
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
## 2. Secret & certificate layout
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
- `deploy/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `deploy/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `deploy/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `deploy/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle:
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
`deploy/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/ARCHITECTURE_CONCELIER.md`,
`docs/ARCHITECTURE_EXCITITOR_MIRRORS.md`
- Export bundling: `docs/ARCHITECTURE_DEVOPS.md` §3, `docs/ARCHITECTURE_EXCITITOR.md` §7
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
### 1.1 Service configuration quick reference
Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`.
Key knobs:
- `CONCELIER__MIRROR__EXPORTROOT` root folder containing export snapshots (`<exportId>/mirror/*`).
- `CONCELIER__MIRROR__ACTIVEEXPORTID` optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory.
- `CONCELIER__MIRROR__REQUIREAUTHENTICATION` default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`.
- `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`.
- `CONCELIER__MIRROR__DOMAINS__{n}__ID` domain identifier matching the exporter manifest; additional keys configure display name and rate budgets.
> The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures.
Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded.
## 2. Secret & certificate layout
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
- `deploy/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `deploy/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `deploy/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `deploy/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses):
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature and cache headers
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \
| tee /tmp/manifest-headers.txt
grep -E '^Cache-Control: ' /tmp/manifest-headers.txt # expect public, max-age=300, immutable
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
# Service-level auth check (inside cluster no gateway credentials)
kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \
| head -n 5 # expect HTTP/1.1 401 with WWW-Authenticate: Bearer
# Rate limit smoke (repeat quickly; second call should return 429 + Retry-After)
for i in 1 2; do
curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \
-u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)'
sleep 1
done
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
`deploy/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/ARCHITECTURE_CONCELIER.md`,
`docs/ARCHITECTURE_EXCITITOR_MIRRORS.md`
- Export bundling: `docs/ARCHITECTURE_DEVOPS.md` §3, `docs/ARCHITECTURE_EXCITITOR.md` §7