feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
2025-10-30 00:09:39 +02:00
parent 3154c67978
commit 7b5bdcf4d3
503 changed files with 16136 additions and 54638 deletions

View File

@@ -0,0 +1,22 @@
# Concelier agent guide
## Mission
Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC).
## Key docs
- [Module README](./README.md)
- [Architecture](./architecture.md)
- [Implementation plan](./implementation_plan.md)
- [Task board](./TASKS.md)
## How to get started
1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.
2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED).
3. Read the architecture and README for domain context before editing code or docs.
4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan.
## Guardrails
- Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md).
- Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts.
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
- Update runbooks/observability assets when operational characteristics change.

View File

@@ -0,0 +1,36 @@
# StellaOps Concelier
Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC).
## Responsibilities
- Fetch and normalise vulnerability advisories via restart-time connectors.
- Persist observations and correlation linksets without precedence decisions.
- Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation.
- Coordinate offline/air-gap updates via Offline Kit bundles.
## Key components
- `StellaOps.Concelier.WebService` orchestration host.
- Connector libraries under `StellaOps.Concelier.Connector.*`.
- Exporter packages (`StellaOps.Concelier.Exporter.*`).
## Integrations & dependencies
- MongoDB for canonical observations and schedules.
- Policy Engine / Export Center / CLI for evidence consumption.
- Notify and UI for advisory deltas.
## Operational notes
- Connector runbooks in ./operations/connectors/.
- Mirror operations for Offline Kit parity.
- Grafana dashboards for connector health.
## Related resources
- ./operations/conflict-resolution.md
- ./operations/mirror.md
## Backlog references
- DOCS-LNM-22-001, DOCS-LNM-22-007 in ../../TASKS.md.
- Connector-specific TODOs in `src/Concelier/**/TASKS.md`.
## Epic alignment
- **Epic 1 AOC enforcement:** uphold raw observation invariants, provenance requirements, linkset-only enrichment, and AOC verifier guardrails across every connector.
- **Epic 10 Export Center:** expose deterministic advisory exports and metadata required by JSON/Trivy/mirror bundles.

View File

@@ -0,0 +1,9 @@
# Task board — Concelier
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
| ID | Status | Owner(s) | Description | Notes |
|----|--------|----------|-------------|-------|
| CONCELIER-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
| CONCELIER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
| CONCELIER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |

View File

@@ -0,0 +1,600 @@
# component_architecture_concelier.md — **StellaOps Concelier** (Sprint22)
> Derived from Epic1 AOC enforcement and aligned with the Export Center evidence interfaces first scoped in Epic10.
> **Scope.** Implementation-ready architecture for **Concelier**: the advisory ingestion and Link-Not-Merge (LNM) observation pipeline that produces deterministic raw observations, correlation linksets, and evidence events consumed by Policy Engine, Console, CLI, and Export centers. Covers domain models, connectors, observation/linkset builders, storage schema, events, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), persist them as immutable **observations** under the Aggregation-Only Contract (AOC), construct **linksets** that correlate observations without merging or precedence, and export deterministic evidence bundles (JSON, Trivy DB, Offline Kit) for downstream policy evaluation and operator tooling.
**Boundaries.**
* Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Aggregation-Only Contract guardrails
**Epic1 distilled** — the service itself is the enforcement point for AOC. The guardrail checklist is embedded in code (`AOCWriteGuard`) and must be satisfied before any advisory hits Mongo:
1. **No derived semantics in ingestion.** The DTOs produced by connectors cannot contain severity, consensus, reachability, merged status, or fix hints. Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors and fail builds if forbidden properties appear.
2. **Immutable raw docs.** Every upstream advisory is persisted in `advisory_raw` with append-only semantics. Revisions produce new `_id`s via version suffix (`:v2`, `:v3`), linking back through `supersedes`.
3. **Mandatory provenance.** Collectors record `source`, `upstream` metadata (`document_version`, `fetched_at`, `received_at`, `content_hash`), and signature presence before writing.
4. **Linkset only.** Derived joins (aliases, PURLs, CPEs, references) are stored inside `linkset` and never mutate `content.raw`.
5. **Deterministic canonicalisation.** Writers use canonical JSON (sorted object keys, lexicographic arrays) ensuring identical inputs yield the same hashes/diff-friendly outputs.
6. **Idempotent upserts.** `(source.vendor, upstream.upstream_id, upstream.content_hash)` uniquely identify a document. Duplicate hashes short-circuit; new hashes create a new version.
7. **Verifier & CI.** `StellaOps.AOC.Verifier` processes observation batches in CI and at runtime, rejecting writes lacking provenance, introducing unordered collections, or violating the schema.
### 1.1 Advisory raw document shape
```json
{
"_id": "advisory_raw:osv:GHSA-xxxx-....:v3",
"source": {
"vendor": "OSV",
"stream": "github",
"api": "https://api.osv.dev/v1/.../GHSA-...",
"collector_version": "concelier/1.7.3"
},
"upstream": {
"upstream_id": "GHSA-xxxx-....",
"document_version": "2025-09-01T12:13:14Z",
"fetched_at": "2025-09-01T13:04:05Z",
"received_at": "2025-09-01T13:04:06Z",
"content_hash": "sha256:...",
"signature": {
"present": true,
"format": "dsse",
"key_id": "rekor:.../key/abc",
"sig": "base64..."
}
},
"content": {
"format": "OSV",
"spec_version": "1.6",
"raw": { /* unmodified upstream document */ }
},
"identifiers": {
"cve": ["CVE-2025-12345"],
"ghsa": ["GHSA-xxxx-...."],
"aliases": ["CVE-2025-12345", "GHSA-xxxx-...."]
},
"linkset": {
"purls": ["pkg:npm/lodash@4.17.21"],
"cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"],
"references": [
{"type":"advisory","url":"https://..."},
{"type":"fix","url":"https://..."}
],
"reconciled_from": ["content.raw.affected.ranges", "content.raw.pkg"]
},
"supersedes": "advisory_raw:osv:GHSA-xxxx-....:v2",
"tenant": "default"
}
```
### 1.2 Connector lifecycle
1. **Snapshot stage** — connectors fetch signed feeds or use offline mirrors keyed by `{vendor, stream, snapshot_date}`.
2. **Parse stage** — upstream payloads are normalised into strongly-typed DTOs with UTC timestamps.
3. **Guard stage** — DTOs run through `AOCWriteGuard` performing schema validation, forbidden-field checks, provenance validation, deterministic sorting, and `_id` computation.
4. **Write stage** — append-only Mongo insert; duplicate hash is ignored, changed hash creates a new version and emits `supersedes` pointer.
5. **Event stage** — DSSE-backed events `advisory.observation.updated` and `advisory.linkset.updated` notify downstream services (Policy, Export Center, CLI).
### 1.3 Export readiness
Concelier feeds Export Center profiles (Epic10) by:
- Maintaining canonical JSON exports with deterministic manifests (`export.json`) listing content hashes, counts, and `supersedes` chains.
- Producing Trivy DB-compatible artifacts (SQLite + metadata) packaged under `db/` with hash manifests.
- Surfacing mirror manifests that reference Mongo snapshot digests, enabling Offline Kit bundle verification.
Running the same export job twice against the same snapshot must yield byte-identical archives and manifest hashes.
---
## 2) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map) that emit immutable observation candidates.
* **Observation writer** enforcing AOC invariants via `AOCWriteGuard`.
* **Linkset builder** that correlates observations into `advisory_linksets` and annotates conflicts.
* **Event publisher** emitting `advisory.observation.updated` and `advisory.linkset.updated` messages.
* **Exporters** (JSON, Trivy DB, Offline Kit slices) fed from observation/linkset stores.
* **Minimal REST** for health/status/trigger/export and observation/linkset reads.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 3) Canonical domain model
> Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
#### AdvisoryObservation
```jsonc
observationId // deterministic id: {tenant}:{source.vendor}:{upstreamId}:{revision}
tenant // issuing tenant (lower-case)
source{
vendor, stream, api, collectorVersion
}
upstream{
upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?}
}
content{
format, specVersion, raw, metadata?
}
identifiers{
cve?, ghsa?, vendorIds[], aliases[]
}
linkset{
purls[], cpes[], aliases[], references[{type,url}],
reconciledFrom[]
}
createdAt // when Concelier recorded the observation
attributes // optional provenance metadata (batch ids, ingest cursor)
```jsonc
#### AdvisoryLinkset
```jsonc
linksetId // sha256 over sorted (tenant, product/vuln tuple, observation ids)
tenant
key{
vulnerabilityId,
productKey,
confidence // low|medium|high
}
observations[] = [
{
observationId,
sourceVendor,
statement{
status?, severity?, references?, notes?
},
collectedAt
}
]
aliases{
primary,
others[]
}
purls[]
cpes[]
conflicts[]? // see AdvisoryLinksetConflict
createdAt
updatedAt
```jsonc
#### AdvisoryLinksetConflict
```jsonc
conflictId // deterministic hash
type // severity-mismatch | affected-range-divergence | reference-clash | alias-inconsistency | metadata-gap
field? // optional JSON pointer (e.g., /statement/severity/vector)
observations[] // per-source values contributing to the conflict
confidence // low|medium|high (heuristic weight)
detectedAt
```jsonc
#### ObservationEvent / LinksetEvent
```jsonc
eventId // ULID
tenant
type // advisory.observation.updated | advisory.linkset.updated
key{
observationId? // on observation event
linksetId? // on linkset event
vulnerabilityId?,
productKey?
}
delta{
added[], removed[], changed[] // normalized summary for consumers
}
hash // canonical hash of serialized delta payload
occurredAt
```jsonc
#### ExportState
```jsonc
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```jsonc
Legacy `Advisory`, `Affected`, and merge-centric entities remain in the repository for historical exports and replay but are being phased out as Link-Not-Merge takes over. New code paths must interact with `AdvisoryObservation` / `AdvisoryLinkset` exclusively and emit conflicts through the structured payloads described above.
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 4) Source families & precedence
### 3.1 Families
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine…
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 5) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```jsonc
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 6) Observation & linkset pipeline
> **Goal:** deterministically ingest raw documents into immutable observations, correlate them into evidence-rich linksets, and broadcast changes without precedence or mutation.
### 5.1 Observation flow
1. **Connector fetch/parse/map** connectors download upstream payloads, validate signatures, and map to DTOs (identifiers, references, raw payload, provenance).
2. **AOC guard** `AOCWriteGuard` verifies forbidden keys, provenance completeness, tenant claims, timestamp normalization, and content hash idempotency. Violations raise `ERR_AOC_00x` mapped to structured logs and metrics.
3. **Append-only write** observations insert into `advisory_observations`; duplicates by `(tenant, source.vendor, upstream.upstreamId, upstream.contentHash)` become no-ops; new content for same upstream id creates a supersedes chain.
4. **Change feed + event** Mongo change streams trigger `advisory.observation.updated@1` events with deterministic payloads (IDs, hash, supersedes pointer, linkset summary). Policy Engine, Offline Kit builder, and guard dashboards subscribe.
### 5.2 Linkset correlation
1. **Queue** observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph.
2. **Canonical grouping** builder resolves aliases using Conceliers alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
3. **Linkset materialization** `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates.
4. **Conflict detection** builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability.
5. **Event emission** `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation.
### 5.3 Event contract
| Event | Schema | Notes |
|-------|--------|-------|
| `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. |
| `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. |
Events are emitted via NATS (primary) and Redis Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay.
---
## 7) Storage schema (MongoDB)
### Collections & indexes (LNM path)
* `concelier.sources` `{_id, type, baseUrl, enabled, notes}` connector catalog.
* `concelier.source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` run-state (TTL indexes on `backoffUntil`).
* `concelier.documents` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` raw payload registry.
* Indexes: `{sourceName:1, uri:1}` unique; `{fetchedAt:-1}` for recent fetches.
* `concelier.dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` normalized connector DTOs used for replay.
* Index: `{sourceName:1, documentId:1}`.
* `concelier.advisory_observations`
```
{
_id: "tenant:vendor:upstreamId:revision",
tenant,
source: { vendor, stream, api, collectorVersion },
upstream: { upstreamId, documentVersion, fetchedAt, receivedAt, contentHash, signature },
content: { format, specVersion, raw, metadata? },
identifiers: { cve?, ghsa?, vendorIds[], aliases[] },
linkset: { purls[], cpes[], aliases[], references[], reconciledFrom[] },
supersedes?: "prevObservationId",
createdAt,
attributes?: object
}
```
* Indexes: `{tenant:1, upstream.upstreamId:1}`, `{tenant:1, source.vendor:1, linkset.purls:1}`, `{tenant:1, linkset.aliases:1}`, `{tenant:1, createdAt:-1}`.
* `concelier.advisory_linksets`
```
{
_id: "sha256:...",
tenant,
key: { vulnerabilityId, productKey, confidence },
observations: [
{ observationId, sourceVendor, statement, collectedAt }
],
aliases: { primary, others: [] },
purls: [],
cpes: [],
conflicts: [],
createdAt,
updatedAt
}
```
* Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, aliases.primary:1}`, `{tenant:1, updatedAt:-1}`.
* `concelier.advisory_events`
```
{
_id: ObjectId,
tenant,
type: "advisory.observation.updated" | "advisory.linkset.updated",
key,
delta,
hash,
occurredAt
}
```
* TTL index on `occurredAt` (configurable retention), `{type:1, occurredAt:-1}` for replay.
* `concelier.export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**Legacy collections** (`advisory`, `alias`, `affected`, `reference`, `merge_event`) remain read-only during the migration window to support back-compat exports. New code must not write to them; scheduled cleanup removes them after Link-Not-Merge GA.
**GridFS buckets**: `fs.documents` for raw payloads (immutable); `fs.exports` for historical JSON/Trivy archives.
---
## 8) Exporters
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```json
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
* Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints.
* Concelier.WebService serves `/concelier/exports/index.json` and `/concelier/exports/mirror/{domain}/…` directly from the export tree with hour-long budgets (index: 60s, bundles: 300s, immutable) and per-domain rate limiting; the endpoints honour Stella Ops Authority or CIDR bypass lists depending on mirror topology.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 9) REST APIs
All under `/api/v1/concelier`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
GET /concelier/exports/index.json → mirror index describing available domains/bundles
GET /concelier/exports/mirror/{domain}/manifest.json
GET /concelier/exports/mirror/{domain}/bundle.json
GET /concelier/exports/mirror/{domain}/bundle.json.jws
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`.
---
## 10) Configuration (YAML)
```yaml
concelier:
mongo: { uri: "mongodb://mongo/concelier" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-concelier"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-concelier/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-concelier/trivy/
oras:
enabled: false
repo: ghcr.io/org/concelier
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 11) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; Policy Engine or downstream policy can down-weight them.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 12) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k observation statements/min on 4 cores.
* **Observation write**: ≤ 5ms P95 per document (including guard + Mongo write).
* **Linkset build**: ≤ 15ms P95 per `(vulnerabilityId, productKey)` update, even with 20+ contributing observations.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 13) Observability
* **Metrics**
* `concelier.fetch.docs_total{source}`
* `concelier.fetch.bytes_total{source}`
* `concelier.parse.failures_total{source}`
* `concelier.map.statements_total{source}`
* `concelier.observations.write_total{result=ok|noop|error}`
* `concelier.linksets.updated_total{result=ok|skip|error}`
* `concelier.linksets.conflicts_total{type}`
* `concelier.export.bytes{kind}`
* `concelier.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/observe/linkset/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 14) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Linkset correlation:** multi-source conflicts (severity, range, alias) produce deterministic conflict payloads; ensure confidence scoring stable.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 15) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 16) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }`
* **Inspect observation:** `GET /api/v1/concelier/observations/{observationId}`
* **Query linkset:** `GET /api/v1/concelier/linksets?vulnerabilityId=CVE-2025-12345&productKey=pkg:rpm/redhat/openssl`
* **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause`
---
## 17) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.

View File

@@ -0,0 +1,67 @@
# Implementation plan — Concelier
## Delivery timeline
- **Phase 1 — Guardrails & schema**
Stand up Mongo JSON validators for `advisory_raw` and `vex_raw`, wire the `AOCWriteGuard` repository interceptor, and seed deterministic linkset builders. Freeze legacy normalisation paths and migrate callers to the new raw schema.
- **Phase 2 — API & observability**
Publish ingestion and verification endpoints (`POST /ingest/*`, `GET /advisories.raw`, `POST /aoc/verify`) with Authority scopes, expose telemetry (`aoc_violation_total`, guard spans, structured logs), and ensure Offline Kit packaging captures validator deployment steps.
- **Phase 3 — Experience polish**
Ship CLI/Console affordances (`stella sources ingest --dry-run`, dashboard tiles, violation drill-downs), finish Export Center hand-off metadata, and close out CI enforcement (`stella aoc verify` preflight, AST lint, seeded fixtures).
## Work breakdown by component
- **Concelier WebService & worker**
- Add Mongo validators and unique indexes over `(tenant, source.vendor, upstream.upstream_id, upstream.content_hash)`.
- Implement write interceptors rejecting forbidden fields, missing provenance, or merge attempts.
- Deterministically compute linksets and persist canonical JSON payloads.
- Introduce `/ingest/advisory`, `/advisories/raw*`, and `/aoc/verify` surfaces guarded by `advisory:*` and `aoc:verify` scopes.
- Emit guard metrics/traces and surface supersedes/violation audit logs.
- **Excititor (shared ingestion contract)**
- Mirror Concelier guard and schema changes for `vex_raw`.
- Maintain restart-time plug-in determinism and linkset extraction parity.
- **Shared libraries**
- Publish `StellaOps.Ingestion.AOC` (forbidden key catalog, guard middleware, provenance helpers, signature verification).
- Share error codes (`ERR_AOC_00x`) and deterministic hashing utilities.
- **Policy Engine integration**
- Enforce `effective_finding_*` write exclusivity.
- Consume only raw documents + linksets, removing any implicit normalisation.
- **Authority scopes**
- Provision `advisory:ingest|read`, `vex:ingest|read`, `aoc:verify`; propagate tenant claims to ingestion services.
- **CLI & Console**
- Implement `stella sources ingest --dry-run` and `stella aoc verify` (with exit codes mapped to `ERR_AOC_00x`).
- Surface AOC dashboards, violation drill-down, and verification shortcuts in the Console.
- **CI/CD**
- Add Roslyn analyzer / AST lint to block forbidden writes.
- Seed fixtures and run `stella aoc verify` against snapshots in pipeline gating.
## Documentation deliverables
- Update `docs/ingestion/aggregation-only-contract.md` with guard invariants, schemas, error codes, and migration guidance.
- Refresh `docs/modules/concelier/operations/*.md` (mirror, conflict-resolution, authority audit) with validator rollouts and observability dashboards.
- Cross-link Authority scope definitions, CLI reference, Console sources guide, and observability runbooks to the AOC guard changes.
- Ensure Offline Kit documentation captures validator bootstrap and verify workflows.
## Acceptance criteria
- Mongo validators and runtime guards reject forbidden fields and missing provenance with the documented `ERR_AOC_00x` codes.
- Linksets and supersedes chains are deterministic; rerunning ingestion over identical payloads yields byte-identical documents.
- CLI `stella aoc verify` exits non-zero on seeded violations and zero on clean datasets; Console dashboards show real-time guard status.
- Export Center consumes advisory datasets without relying on legacy normalised fields.
- CI fails if lint rules detect forbidden writes or if seeded guard tests regress.
## Risks & mitigations
- **Collector drift introduces new forbidden keys.** Mitigated by guard middleware + CI lint + schema validation; RFC required for linkset changes.
- **Migration complexity from legacy normalisation.** Staged cutover with `_backup_*` copies and temporary views to keep Policy Engine parity.
- **Performance overhead during ingest.** Guard remains O(number of keys); index review ensures insert latency stays within warm (<5s) / cold (<30s) targets.
- **Tenancy leakage.** `tenant` required in schema, Authority-supplied claims enforced per request, observability alerts fire on missing tenant identifiers.
## Test strategy
- **Unit**: guard rejection paths, provenance enforcement, idempotent insertions, linkset determinism.
- **Property**: fuzz upstream payloads to guarantee no forbidden fields emerge.
- **Integration**: batch ingest (50k advisories, mixed VEX fixtures), verifying zero guard violations and consistent supersedes.
- **Contract**: Policy Engine consumers verify raw-only reads; Export Center consumes canonical datasets.
- **End-to-end**: ingest/verify flow with CLI + Console actions to confirm observability and guard reporting.
## Definition of done
- Validators deployed and verified in staging/offline environments.
- Runtime guards, CLI/Console workflows, and CI linting all active.
- Observability dashboards and runbooks updated; metrics visible.
- Documentation updates merged; Offline Kit instructions published.
- ./TASKS.md reflects status transitions; cross-module dependencies acknowledged in ../../TASKS.md.

View File

@@ -0,0 +1,159 @@
# Concelier Authority Audit Runbook
_Last updated: 2025-10-22_
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
## 1. Prerequisites
- Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM.
- Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
- The rollout table in `docs/10_CONCELIER_CLI_QUICKSTART.md` has been reviewed so stakeholders align on the staged → enforced toggle timeline.
### Configuration snippet
```yaml
concelier:
authority:
enabled: true
allowAnonymousFallback: false # keep true only during initial rollout
issuer: "https://authority.internal"
audiences:
- "api://concelier"
requiredScopes:
- "concelier.jobs.trigger"
- "advisory:read"
- "advisory:ingest"
requiredTenants:
- "tenant-default"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
clientId: "concelier-jobs"
clientSecretFile: "/run/secrets/concelier_authority_client"
tokenClockSkewSeconds: 60
resilience:
enableRetries: true
retryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
allowOfflineCacheFallback: true
offlineCacheTolerance: "00:10:00"
```
> Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
### Resilience tuning
- **Connected sites:** keep the default 1s / 2s / 5s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (1530minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
- Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
## 2. Key Signals
### 2.1 Audit log channel
Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
```
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger advisory:ingest bypass=False remote=10.1.4.7
```
| Field | Sample value | Meaning |
|--------------|-------------------------|------------------------------------------------------------------------------------------|
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
| `clientId` | `concelier-cli` | OAuth client ID provided by Authority (`(none)` if the token lacked the claim). |
| `scopes` | `concelier.jobs.trigger advisory:ingest advisory:read` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
| `tenant` | `tenant-default` | Tenant claim extracted from the Authority token (`(none)` when the token lacked it). |
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- `status=401 AND bypass=True` bypass network accepted an unauthenticated call (should be temporary during rollout).
- `status=202 AND scopes="(none)"` a token without scopes triggered a job; tighten client configuration.
- `status=202 AND NOT contains(scopes,"advisory:ingest")` ingestion attempted without the new AOC scopes; confirm the Authority client registration matches the sample above.
- `tenant!=(tenant-default)` indicates a cross-tenant token was accepted. Ensure Concelier `requiredTenants` is aligned with Authority client registration.
- Spike in `clientId="(none)"` indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
### 2.2 Metrics
Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
| Metric name | Description | PromQL example |
|-------------------------------|----------------------------------------------------|----------------|
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipelines generated metric names.
Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`:
- `concelier.source.http.requests_total{concelier_source="jobs-run"}` ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
## 3. Alerting Guidance
1. **Unauthorized bypass attempt**
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
2. **Missing scopes**
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
- Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`, `advisory:ingest`, and `advisory:read`.
3. **Trigger failure surge**
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
- Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors.
4. **Conflict spike**
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
5. **Authority offline**
- Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
## 4. Rollout & Verification Procedure
1. **Pre-checks**
- Align with the rollout phases documented in `docs/10_CONCELIER_CLI_QUICKSTART.md` (validation → rehearsal → enforced) and record the target dates in your change request.
- Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
- Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host).
2. **Smoke test with valid token**
- Obtain a token via CLI: `stella auth login --scope "concelier.jobs.trigger advisory:ingest" --scope advisory:read`.
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`.
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger advisory:ingest advisory:read`, and `tenant=tenant-default`.
3. **Negative test without token**
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
4. **Bypass check (if applicable)**
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
5. **Metrics validation**
- Ensure `web.jobs.triggered` counter increments during accepted runs.
- Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled.
## 5. Troubleshooting
| Symptom | Probable cause | Remediation |
|---------|----------------|-------------|
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. |
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. |
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
## 6. References
- `docs/21_INSTALL_GUIDE.md` Authority configuration quick start.
- `docs/17_SECURITY_HARDENING_GUIDE.md` Security guardrails and enforcement deadlines.
- `docs/modules/authority/operations/monitoring.md` Authority-side monitoring and alerting playbook.
- `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` source of audit log fields.

View File

@@ -0,0 +1,160 @@
# Concelier Conflict Resolution Runbook (Sprint 3)
This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
---
## 1. Precedence Model (recap)
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it.
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
---
## 2. Telemetry Shipped This Sprint
| Instrument | Type | Key Tags | Purpose |
|------------|------|----------|---------|
| `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
| `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
| `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
| `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
| `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
### Structured logs
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
- `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments.
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`.
---
## 3. Detection & Alerting
1. **Dashboard panels**
- `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
- `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
- `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
- `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
2. **Log based alerts**
- `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
- `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
3. **Job health**
- `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops.
### Threshold updates (2025-10-12)
- `concelier.merge.conflicts` Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
- `concelier.merge.overrides` Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
- `concelier.merge.range_overrides` Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
---
## 4. Triage Workflow
1. **Confirm job context**
- `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
2. **Inspect metrics**
- Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`.
3. **Pull structured logs**
- Example (vector output):
```
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log
```
4. **Review merge events**
- `mongosh`:
```javascript
use concelier;
db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
```
- Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
5. **Interrogate provenance**
- `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
- Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
---
## 5. Conflict Classification Matrix
| Signal | Likely Cause | Immediate Action |
|--------|--------------|------------------|
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. |
| Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
| `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
---
## 6. Resolution Playbook
1. **Connector data fix**
- Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
- Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
2. **Temporary precedence override**
- Edit `etc/concelier.yaml`:
```yaml
concelier:
merge:
precedence:
ranks:
osv: 1
ghsa: 0
```
- Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks.
- Document the override with expiry in the change log.
3. **Alias remediation**
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
- Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
4. **Escalation**
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
---
## 7. Validation Checklist
- [ ] Merge job rerun returns exit code `0`.
- [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action.
- [ ] Latest `merge_event` entry shows expected hash delta.
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
---
## 8. Reference Material
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
- Merge engine internals: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`.
- Metrics definitions: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
- Storage audit trail: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/Concelier/__Libraries/StellaOps.Concelier.Storage.Mongo/MergeEvents`.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
---
## 9. Synthetic Regression Fixtures
- **Locations** Canonical conflict snapshots now live at `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
- **Validation commands** To regenerate and verify the fixtures offline, run:
```bash
dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/Concelier/__Tests/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
```
- **Expected signals** The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
---
## 10. Change Log
| Date (UTC) | Change | Notes |
|------------|--------|-------|
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |

View File

@@ -0,0 +1,77 @@
# Concelier Apple Security Update Connector Operations
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
## 1. Prerequisites
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
- Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline:
```yaml
concelier:
sources:
apple:
softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
advisoryBaseUri: "https://support.apple.com/"
localeSegment: "en-us"
maxAdvisoriesPerFetch: 25
initialBackfill: "120.00:00:00"
modifiedTolerance: "02:00:00"
failureBackoff: "00:05:00"
```
> `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient.
## 2. Staging Smoke Test
1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound.
2. Trigger a full connector cycle:
- CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
- REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`:
- `apple.fetch.items` (documents fetched)
- `apple.fetch.failures`
- `apple.fetch.unchanged`
- `apple.parse.failures`
- `apple.map.affected.count` (histogram of affected package counts)
4. Cross-check the shared HTTP counters:
- `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases.
- `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run.
5. Inspect the info logs:
- `Apple software index fetch … processed=X newDocuments=Y`
- `Apple advisory parse complete … aliases=… affected=…`
- `Mapped Apple advisory … pendingMappings=0`
6. Confirm MongoDB state:
- `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
- `dtos` store has `schemaVersion="apple.security.update.v1"`.
- `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
- `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
## 3. Production Monitoring
- **Dashboards** Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed):
- `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])`
- `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
- `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
- `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
- **Alerts** Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
- **Logs** Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
- **Telemetry pipeline** `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it.
## 4. Fixture Maintenance
Regression fixtures live under `src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
1. Run the helper script matching your platform:
- Bash: `./scripts/update-apple-fixtures.sh`
- PowerShell: `./scripts/update-apple-fixtures.ps1`
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism.
4. Commit fixture updates together with any parser/mapping changes that motivated them.
## 5. Known Issues & Follow-up Tasks
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
- Multi-locale content is still under regression sweep (`src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.

View File

@@ -0,0 +1,72 @@
# Concelier CCCS Connector Operations
This runbook covers daytoday operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
## 1. Configuration Checklist
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
- Set the Concelier options before restarting workers. Example `concelier.yaml` snippet:
```yaml
concelier:
sources:
cccs:
feeds:
- language: "en"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
- language: "fr"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
maxEntriesPerFetch: 80 # increase temporarily for backfill runs
maxKnownEntries: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5100 rows each as of 20251014). The connector honours `maxEntriesPerFetch`, so leave it low for steadystate and raise it for planned backfills.
## 2. Telemetry & Logging
- **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):**
- `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
- `cccs.fetch.documents`, `cccs.fetch.unchanged`
- `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
- `cccs.map.success`, `cccs.map.failures`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="cccs"}`
- `concelier.source.http.failures{concelier.source="cccs"}`
- `concelier.source.http.duration{concelier.source="cccs"}`
- **Structured logs**
- `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
- `CCCS parse completed parsed=… failures=…`
- `CCCS map completed mapped=… failures=…`
- Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
Suggested Grafana alerts:
- `increase(cccs.fetch.failures_total[15m]) > 0`
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
- `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s`
## 3. Historical Backfill Plan
1. **Snapshot the source** the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 20180608 for EN, 20180608 for FR). Mirror those responses into Offline Kit storage when operating airgapped.
2. **Stage ingestion**:
- Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers.
- Run chained jobs until `pendingDocuments` drains:
`stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
- Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
3. **Optional pagination sweep** for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
4. **Language split** keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
5. **Throttle planning** schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250ms request delay or raise it if mirrored traffic is not available.
## 4. Selector & Sanitiser Notes
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
## 5. Fixture Maintenance
- Regression fixtures live in `src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`.
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`.
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.

View File

@@ -0,0 +1,146 @@
# Concelier CERT-Bund Connector Operations
_Last updated: 2025-10-17_
Germanys Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portals JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
---
## 1. Configuration Checklist
- Allow outbound access (or stage mirrors) for:
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
- `https://wid.cert-bund.de/portal/` (session/bootstrap)
- `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
- Ensure the HTTP client reuses a cookie container (the connectors dependency injection wiring already sets this up).
Example `concelier.yaml` fragment:
```yaml
concelier:
sources:
cert-bund:
feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
portalBootstrapUri: "https://wid.cert-bund.de/portal/"
detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
maxAdvisoriesPerFetch: 50
maxKnownAdvisories: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
---
## 2. Telemetry & Logging
- **Meter**: `StellaOps.Concelier.Connector.CertBund`
- **Counters / histograms**:
- `certbund.feed.fetch.attempts|success|failures`
- `certbund.feed.items.count`
- `certbund.feed.enqueued.count`
- `certbund.feed.coverage.days`
- `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
- `certbund.parse.success|failures{reason}`
- `certbund.parse.products.count`, `certbund.parse.cve.count`
- `certbund.map.success|failures{reason}`
- `certbund.map.affected.count`, `certbund.map.aliases.count`
- Shared HTTP metrics remain available through `concelier.source.http.*`.
**Structured logs** (all emitted at information level when work occurs):
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
Alerting ideas:
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
2. `rate(certbund.map.success_total[30m]) == 0`
3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s`
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
---
## 3. Historical Backfill & Export Strategy
### 3.1 Retention snapshot
- RSS window: ~250 advisories (≈90days at current cadence).
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
### 3.2 JSON search pagination
```bash
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
curl -s -b cookies.txt -c cookies.txt \
-H "X-Requested-With: XMLHttpRequest" \
"https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
# 2. Page search results
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "X-XSRF-TOKEN: ${XSRF}" \
-X POST \
--data '{"page":4,"size":100,"sort":["published,desc"]}' \
"https://wid.cert-bund.de/portal/api/securityadvisory/search" \
> certbund-page4.json
```
Iterate `page` until the response `content` array is empty. Pages 09 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
> **Shortcut** run `python src/Tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
> to bootstrap the session, capture the paginated search responses, and regenerate
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
> if the portal requires a browser-derived session (see options via `--help`).
### 3.3 Export bundles
```bash
python src/Tools/certbund_offline_snapshot.py \
--output seed-data/cert-bund \
--start-year 2014 \
--end-year "$(date -u +%Y)"
```
The helper stores yearly exports under `seed-data/cert-bund/export/`,
captures paginated search snapshots in `seed-data/cert-bund/search/`,
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
Split ranges according to your compliance window (default: one file per
calendar year). Concelier can ingest these JSON payloads directly when
operating offline.
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
> rebuild the manifest from the existing files.
### 3.4 Connector-driven catch-up
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
3. Restore defaults and capture the cursor snapshot for audit.
---
## 4. Locale & Translation Guidance
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
---
## 5. Verification Checklist
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
5. For Offline Kit exports, validate SHA256 hashes before distribution.

View File

@@ -0,0 +1,94 @@
# Concelier Cisco PSIRT Connector OAuth Provisioning SOP
_Last updated: 2025-10-14_
## 1. Scope
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
## 2. Prerequisites
- Active Cisco.com (CCO) account with access to the Cisco API Console.
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
- Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory.
## 3. Provisioning workflow
1. **Register the application**
- Sign in at <https://apiconsole.cisco.com>.
- Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
- Record the generated `clientId` and `clientSecret` in the Ops vault.
2. **Verify token issuance**
- Request an access token with:
```bash
curl -s https://id.cisco.com/oauth2/default/v1/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials" \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}"
```
- Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
- Preserve the response only long enough to validate syntax; do **not** persist tokens.
3. **Authorize Concelier runtime**
- Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
- For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platforms sealed secret format.
4. **Connectivity validation**
- From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
- Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
## 4. Rotation SOP
| Step | Owner | Notes |
| --- | --- | --- |
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
| 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. |
| 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
**Automation hooks**
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task.
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
## 5. Offline Kit packaging
1. Generate the credential bundle using the Offline Kit CLI:
`offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
## 6. Quota and throttling guidance
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5000 requests/day per application.citeturn0search0turn3search6
- Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
- Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented.
## 7. Telemetry & Monitoring
- **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)**
- `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
- `cisco.parse.success`, `cisco.parse.failures`
- `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="vndr-cisco"}`
- `concelier.source.http.failures{concelier.source="vndr-cisco"}`
- `concelier.source.http.duration{concelier.source="vndr-cisco"}`
- **Structured logs**
- `Cisco fetch completed date=… pages=… added=…` (info)
- `Cisco parse completed parsed=… failures=…` (info)
- `Cisco map completed mapped=… failures=…` (info)
- Warnings surface when DTO serialization fails or GridFS payload is missing.
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
## 8. Incident response
- **Token compromise** revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
- **Persistent 401/403** confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
- **429 spikes** inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
## 9. References
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
- openVuln API rate limit documentation.citeturn0search0turn3search6

View File

@@ -0,0 +1,151 @@
{
"title": "Concelier CVE & KEV Observability",
"uid": "concelier-cve-kev",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"time": {
"from": "now-24h",
"to": "now"
},
"refresh": "5m",
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0
}
]
},
"panels": [
{
"type": "timeseries",
"title": "CVE fetch success vs failure",
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_fetch_success_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(cve_fetch_failures_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
}
]
},
{
"type": "timeseries",
"title": "KEV fetch cadence",
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(kev_fetch_success_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(kev_fetch_failures_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
},
{
"refId": "C",
"expr": "rate(kev_fetch_unchanged_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "unchanged"
}
]
},
{
"type": "table",
"title": "KEV parse anomalies (24h)",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "short"
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
"format": "table",
"datasource": { "type": "prometheus", "uid": "${datasource}" }
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"Value": "count"
}
}
}
]
},
{
"type": "timeseries",
"title": "Advisories emitted",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_map_success_total[15m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "CVE"
},
{
"refId": "B",
"expr": "rate(kev_map_advisories_total[24h])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "KEV"
}
]
}
]
}

View File

@@ -0,0 +1,143 @@
# Concelier CVE & KEV Connector Operations
This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
## 1. CVE Services Connector (`source:cve:*`)
### 1.1 Prerequisites
- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers.
- Updated `concelier.yaml` (or the matching environment variables) with the following section:
```yaml
concelier:
sources:
cve:
baseEndpoint: "https://cveawg.mitre.org/api/"
apiOrg: "ORG123"
apiUser: "user@example.org"
apiKeyFile: "/var/run/secrets/concelier/cve-api-key"
seedDirectory: "./seed-data/cve"
pageSize: 200
maxPagesPerFetch: 5
initialBackfill: "30.00:00:00"
requestDelay: "00:00:00.250"
failureBackoff: "00:10:00"
```
> Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`.
> 🪙 When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repos `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied.
### 1.2 Smoke Test (staging)
1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials.
2. Trigger one end-to-end cycle:
- Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
- REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`):
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
- `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
- `cve.map.success`
4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat.
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
### 1.3 Production Monitoring
- **Dashboards** Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
- `rate(cve_fetch_failures_total[5m]) > 0` for 10minutes (`severity=warning`)
- `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
- `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
- **Logs** Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
- **Grafana pack** Import `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
- **Backfill window** Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes.
### 1.4 Staging smoke log (2025-10-15)
While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests:
- Command: `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"`
- Summary log emitted by the connector:
```
CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1
```
- Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`:
| Metric | Value |
|--------|-------|
| `cve.fetch.attempts` | 1 |
| `cve.fetch.success` | 1 |
| `cve.fetch.documents` | 1 |
| `cve.parse.success` | 1 |
| `cve.map.success` | 1 |
The Grafana pack `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
## 2. CISA KEV Connector (`source:kev:*`)
### 2.1 Prerequisites
- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
- Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed):
```yaml
concelier:
sources:
kev:
feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
requestTimeout: "00:01:00"
failureBackoff: "00:05:00"
```
### 2.2 Schema validation & anomaly handling
The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons:
| Reason | Meaning |
| --- | --- |
| `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. |
| `countMismatch` | Catalog `count` field disagreed with the actual entry total. |
| `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). |
Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers.
### 2.3 Smoke Test (staging)
1. Deploy the configuration and restart Concelier.
2. Trigger a pipeline run:
- CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
- REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`:
- `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
- `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
- `kev.map.advisories` (tag `catalogVersion`)
4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase).
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
### 2.4 Production Monitoring
- Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`.
- Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror.
- Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs.
- Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run.
### 2.5 Known good dashboard tiles
Add the following panels to the Concelier observability board:
| Metric | Recommended visualisation |
|--------|---------------------------|
| `rate(kev_fetch_success_total[30m])` | Single-stat (last 24h) with warning threshold `>0` |
| `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area highlights daily release size |
| `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table anomaly breakdown (matches dashboard panel) |
| `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted |
## 3. Runbook updates
- Record staging/production smoke test results (date, catalog version, advisory counts) in your teams change log.
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
- Version-control dashboard tweaks alongside `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.

View File

@@ -0,0 +1,123 @@
# Concelier GHSA Connector Operations Runbook
_Last updated: 2025-10-16_
## 1. Overview
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
## 2. Rate-limit telemetry
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
| Metric | Description | Tags |
|--------|-------------|------|
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
### Dashboards & alerts
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10%** for longer than a single reset window helps avoid secondary limits.
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
## 3. Logging signals
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
```
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
```
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
## 4. Configuration knobs (`concelier.yaml`)
```yaml
concelier:
sources:
ghsa:
apiToken: "${GITHUB_PAT}"
pageSize: 50
requestDelay: "00:00:00.200"
failureBackoff: "00:05:00"
rateLimitWarningThreshold: 500 # warn below this many remaining calls
secondaryRateLimitBackoff: "00:02:00" # fallback delay when GitHub omits Retry-After
```
### Recommendations
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHubs secondary-limit guidance.
#### Default job schedule
| Job kind | Cron | Timeout | Lease |
|----------|------|---------|-------|
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
## 5. Provisioning credentials
Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`.
### Docker Compose (stack operators)
```yaml
services:
concelier:
environment:
CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
secrets:
- ghsa_pat
secrets:
ghsa_pat:
file: ./secrets/ghsa_pat.txt # contains only the PAT value
```
### Helm values (cluster operators)
```yaml
concelier:
extraEnv:
- name: CONCELIER__SOURCES__GHSA__APITOKEN
valueFrom:
secretKeyRef:
name: concelier-ghsa
key: apiToken
extraSecrets:
concelier-ghsa:
apiToken: "<paste PAT here or source from external secret store>"
```
After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads.
When enabling GHSA the first time, run a staged backfill:
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`.
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
## 6. Runbook steps when throttled
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
2. Confirm the connector is delayinglogs will show `GHSA rate limit exhausted...` with the chosen backoff.
3. If rate limits stay exhausted:
- Verify no other jobs are sharing the PAT.
- Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
- Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier.
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
## 7. Alert integration quick reference
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) use `histogram_quantile(0.99, ...)` to trend capacity.
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
- Grafana: stack remaining + used to visualise total limit per resource.
## 8. Canonical metric fallback analytics
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.

View File

@@ -0,0 +1,122 @@
# Concelier CISA ICS Connector Operations
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
## 1. Credential Provisioning
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
- `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
- `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
- `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`.
> GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
## 2. Feed Validation
Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value):
```bash
curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \
"https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
```
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
## 3. Configuration Snippet
Add the connector configuration to `concelier.yaml` (or equivalent environment variables):
```yaml
concelier:
sources:
icscisa:
govDelivery:
code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}"
topics:
- "USDHSCISA_16"
- "USDHSCISA_19"
- "USDHSCISA_17"
rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
Environment variable example:
```bash
export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
```
Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
Optional tuning keys (set only when needed):
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
## 4. Seeding Without GovDelivery
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
> The CSVs are licensed under ODbL1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
## 4. Integration Validation
1. Ensure secrets are in place and restart the Concelier workers.
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
```bash
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \
CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \
stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
```
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
## 4. Rotation & Incident Response
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
## 5. Offline Kit Handling
Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`:
```
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
```
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user.
## 6. Telemetry & Monitoring
The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
- `icscisa.fetch.*` counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`).
- `icscisa.parse.*` counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
- `icscisa.detail.*` counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
- `icscisa.map.*` counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
Suggested alerts:
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
- Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns.
## 6. Related Tasks
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.

View File

@@ -0,0 +1,74 @@
# Concelier KISA Connector Operations
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
## 1. Prerequisites
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
- Connector options defined under `concelier:sources:kisa`:
```yaml
concelier:
sources:
kisa:
feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
maxAdvisoriesPerFetch: 10
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
> Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
## 2. Staging Smoke Test
1. Restart the Concelier workers so the KISA options bind.
2. Run a full connector cycle:
- CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
- REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`):
- `kisa.feed.success`, `kisa.feed.items`
- `kisa.detail.success` / `.failures`
- `kisa.parse.success` / `.failures`
- `kisa.map.success` / `.failures`
- `kisa.cursor.updates`
4. Inspect logs for structured entries:
- `KISA feed returned {ItemCount}`
- `KISA fetched detail for {Idx} … category={Category}`
- `KISA mapped advisory {AdvisoryId} (severity={Severity})`
- Absence of warnings such as `document missing GridFS payload`.
5. Validate MongoDB state:
- `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
- DTO store contains `schemaVersion="kisa.detail.v1"`.
- Advisories include aliases (`IDX`, CVE) and `language="ko"`.
- `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
## 3. Production Monitoring
- **Dashboards** Add the following Prometheus/OTEL expressions:
- `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])`
- `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
- `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
- `increase(kisa_map_failures_total[1h])` to flag schema drift
- `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
- **Alerts** Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
- **Logs** Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
## 4. Localisation Handling
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF8 and avoid transliteration.
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
## 5. Fixture & Regression Maintenance
- Regression fixtures: `src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`.
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
## 6. Known Issues
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.

View File

@@ -0,0 +1,86 @@
# Concelier MSRC Connector Azure AD Onboarding Brief
_Drafted: 2025-10-15_
## 1. App registration requirements
- **Tenant**: shared StellaOps production Azure AD.
- **Application type**: confidential client (web/API) issuing client credentials.
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
- **Token audience**: `https://api.msrc.microsoft.com/`.
- **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
## 2. Secret/credential policy
- Maintain two client secrets (primary + standby) rotating every 90 days.
- Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
- Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
## 3. Concelier configuration sample
```yaml
concelier:
sources:
vndr.msrc:
tenantId: "<azure-tenant-guid>"
clientId: "<app-registration-client-id>"
clientSecret: "<pull from secret store>"
apiVersion: "2024-08-01"
locale: "en-US"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
cursorOverlapMinutes: 10
downloadCvrf: false # set true to persist CVRF ZIP alongside JSON detail
```
## 4. CVRF artefacts
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
### 4.1 State seeding helper
Use `src/Tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
```json
{
"source": "vndr.msrc",
"cursor": {
"lastModifiedCursor": "2024-01-01T00:00:00Z"
},
"documents": [
{
"uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
"contentFile": "./seeds/adv2024-0001.json",
"contentType": "application/json",
"metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
"addToPendingDocuments": true
},
{
"uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
"contentFile": "./seeds/adv2024-0001.cvrf.zip",
"contentType": "application/zip",
"status": "mapped",
"addToPendingDocuments": false
}
]
}
```
Run the helper:
```bash
dotnet run --project src/Tools/SourceStateSeeder -- \
--connection-string "mongodb://localhost:27017" \
--database concelier \
--input seeds/msrc-backfill.json
```
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
## 5. Outstanding items
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.

View File

@@ -0,0 +1,48 @@
# NKCKI Connector Operations Guide
## Overview
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
## Configuration
Key options exposed through `concelier:sources:ru-nkcki:http`:
- `maxBulletinsPerFetch` limits new bulletin downloads in a single run (default `5`).
- `maxListingPagesPerFetch` maximum listing pages visited during pagination (default `3`).
- `listingCacheDuration` minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
- `cacheDirectory` optional path for persisted bulletin archives used during offline or failure scenarios.
- `requestDelay` delay inserted between bulletin downloads to respect upstream politeness.
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
## Telemetry
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
- `nkcki.listing.pages.visited` (histogram, `pages`)
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
- `nkcki.entries.processed` (histogram, `entries`)
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
## Archive Backfill Strategy
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
## Failure Handling
- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`src/Tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
Refer to `ru-nkcki` entries in `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.

View File

@@ -0,0 +1,24 @@
# Concelier OSV Connector Operations Notes
_Last updated: 2025-10-16_
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
## 1. Canonical metric fallbacks
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
## 2. CWE provenance
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
## 3. Dashboards & alerts
- Extend existing merge dashboards with the new counter:
- Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
- Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
## 4. Runbook updates
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`.
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.

View File

@@ -0,0 +1,238 @@
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
### 1.1 Service configuration quick reference
Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`.
Key knobs:
- `CONCELIER__MIRROR__EXPORTROOT` root folder containing export snapshots (`<exportId>/mirror/*`).
- `CONCELIER__MIRROR__ACTIVEEXPORTID` optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory.
- `CONCELIER__MIRROR__REQUIREAUTHENTICATION` default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`.
- `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`.
- `CONCELIER__MIRROR__DOMAINS__{n}__ID` domain identifier matching the exporter manifest; additional keys configure display name and rate budgets.
> The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures.
Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded.
### 1.2 Mirror connector configuration
Downstream Concelier instances ingest published bundles using the `StellaOpsMirrorConnector`. Operators running the connector in airgapped or limited connectivity environments can tune the following options (environment prefix `CONCELIER__SOURCES__STELLAOPSMIRROR__`):
- `BASEADDRESS` absolute mirror root (e.g., `https://mirror-primary.stella-ops.org`).
- `INDEXPATH` relative path to the mirror index (`/concelier/exports/index.json` by default).
- `DOMAINID` mirror domain identifier from the index (`primary`, `community`, etc.).
- `HTTPTIMEOUT` request timeout; raise when mirrors sit behind slow WAN links.
- `SIGNATURE__ENABLED` require detached JWS verification for `bundle.json`.
- `SIGNATURE__KEYID` / `SIGNATURE__PROVIDER` expected signing key metadata.
- `SIGNATURE__PUBLICKEYPATH` PEM fallback used when the mirror key registry is offline.
The connector keeps a per-export fingerprint (bundle digest + generated-at timestamp) and tracks outstanding document IDs. If a scan is interrupted, the next run resumes parse/map work using the stored fingerprint and pending document lists—no network requests are reissued unless the upstream digest changes.
## 2. Secret & certificate layout
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
- `deploy/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `deploy/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `deploy/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `deploy/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses):
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature and cache headers
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \
| tee /tmp/manifest-headers.txt
grep -E '^Cache-Control: ' /tmp/manifest-headers.txt # expect public, max-age=300, immutable
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
# Service-level auth check (inside cluster no gateway credentials)
kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \
| head -n 5 # expect HTTP/1.1 401 with WWW-Authenticate: Bearer
# Rate limit smoke (repeat quickly; second call should return 429 + Retry-After)
for i in 1 2; do
curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \
-u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)'
sleep 1
done
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
`deploy/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/modules/concelier/architecture.md`,
`docs/modules/excititor/mirrors.md`
- Export bundling: `docs/modules/devops/architecture.md` §3, `docs/modules/excititor/architecture.md` §7