feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		
							
								
								
									
										22
									
								
								docs/modules/concelier/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/concelier/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Concelier agent guide | ||||
|  | ||||
| ## Mission | ||||
| Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC). | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										36
									
								
								docs/modules/concelier/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										36
									
								
								docs/modules/concelier/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,36 @@ | ||||
| # StellaOps Concelier | ||||
|  | ||||
| Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC). | ||||
|  | ||||
| ## Responsibilities | ||||
| - Fetch and normalise vulnerability advisories via restart-time connectors. | ||||
| - Persist observations and correlation linksets without precedence decisions. | ||||
| - Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation. | ||||
| - Coordinate offline/air-gap updates via Offline Kit bundles. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Concelier.WebService` orchestration host. | ||||
| - Connector libraries under `StellaOps.Concelier.Connector.*`. | ||||
| - Exporter packages (`StellaOps.Concelier.Exporter.*`). | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - MongoDB for canonical observations and schedules. | ||||
| - Policy Engine / Export Center / CLI for evidence consumption. | ||||
| - Notify and UI for advisory deltas. | ||||
|  | ||||
| ## Operational notes | ||||
| - Connector runbooks in ./operations/connectors/. | ||||
| - Mirror operations for Offline Kit parity. | ||||
| - Grafana dashboards for connector health. | ||||
|  | ||||
| ## Related resources | ||||
| - ./operations/conflict-resolution.md | ||||
| - ./operations/mirror.md | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-LNM-22-001, DOCS-LNM-22-007 in ../../TASKS.md. | ||||
| - Connector-specific TODOs in `src/Concelier/**/TASKS.md`. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** uphold raw observation invariants, provenance requirements, linkset-only enrichment, and AOC verifier guardrails across every connector. | ||||
| - **Epic 10 – Export Center:** expose deterministic advisory exports and metadata required by JSON/Trivy/mirror bundles. | ||||
							
								
								
									
										9
									
								
								docs/modules/concelier/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/concelier/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Concelier | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | CONCELIER-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | CONCELIER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | CONCELIER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										600
									
								
								docs/modules/concelier/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										600
									
								
								docs/modules/concelier/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,600 @@ | ||||
| # component_architecture_concelier.md — **Stella Ops Concelier** (Sprint 22) | ||||
|  | ||||
| > Derived from Epic 1 – AOC enforcement and aligned with the Export Center evidence interfaces first scoped in Epic 10. | ||||
|  | ||||
| > **Scope.** Implementation-ready architecture for **Concelier**: the advisory ingestion and Link-Not-Merge (LNM) observation pipeline that produces deterministic raw observations, correlation linksets, and evidence events consumed by Policy Engine, Console, CLI, and Export centers. Covers domain models, connectors, observation/linkset builders, storage schema, events, APIs, performance, security, and test matrices. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), persist them as immutable **observations** under the Aggregation-Only Contract (AOC), construct **linksets** that correlate observations without merging or precedence, and export deterministic evidence bundles (JSON, Trivy DB, Offline Kit) for downstream policy evaluation and operator tooling. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (out‑of‑process). | ||||
| * Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine. | ||||
| * Online operation is **allowlist‑only**; air‑gapped deployments use the **Offline Kit**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Aggregation-Only Contract guardrails | ||||
|  | ||||
| **Epic 1 distilled** — the service itself is the enforcement point for AOC. The guardrail checklist is embedded in code (`AOCWriteGuard`) and must be satisfied before any advisory hits Mongo: | ||||
|  | ||||
| 1. **No derived semantics in ingestion.** The DTOs produced by connectors cannot contain severity, consensus, reachability, merged status, or fix hints. Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors and fail builds if forbidden properties appear. | ||||
| 2. **Immutable raw docs.** Every upstream advisory is persisted in `advisory_raw` with append-only semantics. Revisions produce new `_id`s via version suffix (`:v2`, `:v3`), linking back through `supersedes`. | ||||
| 3. **Mandatory provenance.** Collectors record `source`, `upstream` metadata (`document_version`, `fetched_at`, `received_at`, `content_hash`), and signature presence before writing. | ||||
| 4. **Linkset only.** Derived joins (aliases, PURLs, CPEs, references) are stored inside `linkset` and never mutate `content.raw`. | ||||
| 5. **Deterministic canonicalisation.** Writers use canonical JSON (sorted object keys, lexicographic arrays) ensuring identical inputs yield the same hashes/diff-friendly outputs. | ||||
| 6. **Idempotent upserts.** `(source.vendor, upstream.upstream_id, upstream.content_hash)` uniquely identify a document. Duplicate hashes short-circuit; new hashes create a new version. | ||||
| 7. **Verifier & CI.** `StellaOps.AOC.Verifier` processes observation batches in CI and at runtime, rejecting writes lacking provenance, introducing unordered collections, or violating the schema. | ||||
|  | ||||
| ### 1.1 Advisory raw document shape | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "_id": "advisory_raw:osv:GHSA-xxxx-....:v3", | ||||
|   "source": { | ||||
|     "vendor": "OSV", | ||||
|     "stream": "github", | ||||
|     "api": "https://api.osv.dev/v1/.../GHSA-...", | ||||
|     "collector_version": "concelier/1.7.3" | ||||
|   }, | ||||
|   "upstream": { | ||||
|     "upstream_id": "GHSA-xxxx-....", | ||||
|     "document_version": "2025-09-01T12:13:14Z", | ||||
|     "fetched_at": "2025-09-01T13:04:05Z", | ||||
|     "received_at": "2025-09-01T13:04:06Z", | ||||
|     "content_hash": "sha256:...", | ||||
|     "signature": { | ||||
|       "present": true, | ||||
|       "format": "dsse", | ||||
|       "key_id": "rekor:.../key/abc", | ||||
|       "sig": "base64..." | ||||
|     } | ||||
|   }, | ||||
|   "content": { | ||||
|     "format": "OSV", | ||||
|     "spec_version": "1.6", | ||||
|     "raw": { /* unmodified upstream document */ } | ||||
|   }, | ||||
|   "identifiers": { | ||||
|     "cve": ["CVE-2025-12345"], | ||||
|     "ghsa": ["GHSA-xxxx-...."], | ||||
|     "aliases": ["CVE-2025-12345", "GHSA-xxxx-...."] | ||||
|   }, | ||||
|   "linkset": { | ||||
|     "purls": ["pkg:npm/lodash@4.17.21"], | ||||
|     "cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"], | ||||
|     "references": [ | ||||
|       {"type":"advisory","url":"https://..."}, | ||||
|       {"type":"fix","url":"https://..."} | ||||
|     ], | ||||
|     "reconciled_from": ["content.raw.affected.ranges", "content.raw.pkg"] | ||||
|   }, | ||||
|   "supersedes": "advisory_raw:osv:GHSA-xxxx-....:v2", | ||||
|   "tenant": "default" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 1.2 Connector lifecycle | ||||
|  | ||||
| 1. **Snapshot stage** — connectors fetch signed feeds or use offline mirrors keyed by `{vendor, stream, snapshot_date}`. | ||||
| 2. **Parse stage** — upstream payloads are normalised into strongly-typed DTOs with UTC timestamps. | ||||
| 3. **Guard stage** — DTOs run through `AOCWriteGuard` performing schema validation, forbidden-field checks, provenance validation, deterministic sorting, and `_id` computation. | ||||
| 4. **Write stage** — append-only Mongo insert; duplicate hash is ignored, changed hash creates a new version and emits `supersedes` pointer. | ||||
| 5. **Event stage** — DSSE-backed events `advisory.observation.updated` and `advisory.linkset.updated` notify downstream services (Policy, Export Center, CLI). | ||||
|  | ||||
| ### 1.3 Export readiness | ||||
|  | ||||
| Concelier feeds Export Center profiles (Epic 10) by: | ||||
|  | ||||
| - Maintaining canonical JSON exports with deterministic manifests (`export.json`) listing content hashes, counts, and `supersedes` chains. | ||||
| - Producing Trivy DB-compatible artifacts (SQLite + metadata) packaged under `db/` with hash manifests. | ||||
| - Surfacing mirror manifests that reference Mongo snapshot digests, enabling Offline Kit bundle verification. | ||||
|  | ||||
| Running the same export job twice against the same snapshot must yield byte-identical archives and manifest hashes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Topology & processes | ||||
|  | ||||
| **Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting: | ||||
|  | ||||
| * **Scheduler** with distributed locks (Mongo backed). | ||||
| * **Connectors** (fetch/parse/map) that emit immutable observation candidates. | ||||
| * **Observation writer** enforcing AOC invariants via `AOCWriteGuard`. | ||||
| * **Linkset builder** that correlates observations into `advisory_linksets` and annotates conflicts. | ||||
| * **Event publisher** emitting `advisory.observation.updated` and `advisory.linkset.updated` messages. | ||||
| * **Exporters** (JSON, Trivy DB, Offline Kit slices) fed from observation/linkset stores. | ||||
| * **Minimal REST** for health/status/trigger/export and observation/linkset reads. | ||||
|  | ||||
| **Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Canonical domain model | ||||
|  | ||||
| > Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps). | ||||
|  | ||||
| ### 2.1 Core entities | ||||
|  | ||||
| #### AdvisoryObservation | ||||
|  | ||||
| ```jsonc | ||||
| observationId       // deterministic id: {tenant}:{source.vendor}:{upstreamId}:{revision} | ||||
| tenant              // issuing tenant (lower-case) | ||||
| source{ | ||||
|     vendor, stream, api, collectorVersion | ||||
| } | ||||
| upstream{ | ||||
|     upstreamId, documentVersion, fetchedAt, receivedAt, | ||||
|     contentHash, signature{present, format?, keyId?, signature?} | ||||
| } | ||||
| content{ | ||||
|     format, specVersion, raw, metadata? | ||||
| } | ||||
| identifiers{ | ||||
|     cve?, ghsa?, vendorIds[], aliases[] | ||||
| } | ||||
| linkset{ | ||||
|     purls[], cpes[], aliases[], references[{type,url}], | ||||
|     reconciledFrom[] | ||||
| } | ||||
| createdAt           // when Concelier recorded the observation | ||||
| attributes          // optional provenance metadata (batch ids, ingest cursor) | ||||
| ```jsonc | ||||
|  | ||||
| #### AdvisoryLinkset | ||||
|  | ||||
| ```jsonc | ||||
| linksetId           // sha256 over sorted (tenant, product/vuln tuple, observation ids) | ||||
| tenant | ||||
| key{ | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     confidence        // low|medium|high | ||||
| } | ||||
| observations[] = [ | ||||
|   { | ||||
|     observationId, | ||||
|     sourceVendor, | ||||
|     statement{ | ||||
|       status?, severity?, references?, notes? | ||||
|     }, | ||||
|     collectedAt | ||||
|   } | ||||
| ] | ||||
| aliases{ | ||||
|     primary, | ||||
|     others[] | ||||
| } | ||||
| purls[] | ||||
| cpes[] | ||||
| conflicts[]?        // see AdvisoryLinksetConflict | ||||
| createdAt | ||||
| updatedAt | ||||
| ```jsonc | ||||
|  | ||||
| #### AdvisoryLinksetConflict | ||||
|  | ||||
| ```jsonc | ||||
| conflictId          // deterministic hash | ||||
| type                // severity-mismatch | affected-range-divergence | reference-clash | alias-inconsistency | metadata-gap | ||||
| field?              // optional JSON pointer (e.g., /statement/severity/vector) | ||||
| observations[]      // per-source values contributing to the conflict | ||||
| confidence          // low|medium|high (heuristic weight) | ||||
| detectedAt | ||||
| ```jsonc | ||||
|  | ||||
| #### ObservationEvent / LinksetEvent | ||||
|  | ||||
| ```jsonc | ||||
| eventId             // ULID | ||||
| tenant | ||||
| type                // advisory.observation.updated | advisory.linkset.updated | ||||
| key{ | ||||
|     observationId?  // on observation event | ||||
|     linksetId?      // on linkset event | ||||
|     vulnerabilityId?, | ||||
|     productKey? | ||||
| } | ||||
| delta{ | ||||
|     added[], removed[], changed[]   // normalized summary for consumers | ||||
| } | ||||
| hash               // canonical hash of serialized delta payload | ||||
| occurredAt | ||||
| ```jsonc | ||||
|  | ||||
| #### ExportState | ||||
|  | ||||
| ```jsonc | ||||
| exportKind          // json | trivydb | ||||
| baseExportId?       // last full baseline | ||||
| baseDigest?         // digest of last full baseline | ||||
| lastFullDigest?     // digest of last full export | ||||
| lastDeltaDigest?    // digest of last delta export | ||||
| cursor              // per-kind incremental cursor | ||||
| files[]             // last manifest snapshot (path → sha256) | ||||
| ```jsonc | ||||
|  | ||||
| Legacy `Advisory`, `Affected`, and merge-centric entities remain in the repository for historical exports and replay but are being phased out as Link-Not-Merge takes over. New code paths must interact with `AdvisoryObservation` / `AdvisoryLinkset` exclusively and emit conflicts through the structured payloads described above. | ||||
|  | ||||
| ### 2.2 Product identity (`productKey`) | ||||
|  | ||||
| * **Primary:** `purl` (Package URL). | ||||
| * **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved. | ||||
| * **Secondary:** `cpe` retained for compatibility; advisory records may carry both. | ||||
| * **Image/platform:** `oci:<registry>/<repo>@<digest>` for image‑level advisories (rare). | ||||
| * **Unmappable:** if a source is non‑deterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **non‑joinable**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Source families & precedence | ||||
|  | ||||
| ### 3.1 Families | ||||
|  | ||||
| * **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium… | ||||
| * **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine… | ||||
| * **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go. | ||||
| * **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERT‑FR/BUND, etc. | ||||
|  | ||||
| ### 3.2 Precedence (when claims conflict) | ||||
|  | ||||
| 1. **Vendor PSIRT** (authoritative for their product). | ||||
| 2. **Distro** (authoritative for packages they ship, including backports). | ||||
| 3. **Ecosystem** (OSV/GHSA) for library semantics. | ||||
| 4. **CERTs/aggregators** for enrichment (KEV/known exploited). | ||||
|  | ||||
| > Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Connectors & normalization | ||||
|  | ||||
| ### 4.1 Connector contract | ||||
|  | ||||
| ```csharp | ||||
| public interface IFeedConnector { | ||||
|   string SourceName { get; } | ||||
|   Task FetchAsync(IServiceProvider sp, CancellationToken ct);   // -> document collection | ||||
|   Task ParseAsync(IServiceProvider sp, CancellationToken ct);   // -> dto collection (validated) | ||||
|   Task MapAsync(IServiceProvider sp, CancellationToken ct);     // -> advisory/alias/affected/reference | ||||
| } | ||||
| ```jsonc | ||||
|  | ||||
| * **Fetch**: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting. | ||||
| * **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing. | ||||
| * **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors). | ||||
|  | ||||
| ### 4.2 Version range normalization | ||||
|  | ||||
| * **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals). | ||||
| * **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query. | ||||
| * **DEB**: dpkg version comparison semantics mirrored; store computed keys. | ||||
| * **APK**: Alpine version semantics; compute order keys. | ||||
| * **Generic**: if provider uses text, retain raw; do **not** invent ranges. | ||||
|  | ||||
| ### 4.3 Severity & CVSS | ||||
|  | ||||
| * Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity). | ||||
| * If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable). | ||||
| * **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Observation & linkset pipeline | ||||
|  | ||||
| > **Goal:** deterministically ingest raw documents into immutable observations, correlate them into evidence-rich linksets, and broadcast changes without precedence or mutation. | ||||
|  | ||||
| ### 5.1 Observation flow | ||||
|  | ||||
| 1. **Connector fetch/parse/map** — connectors download upstream payloads, validate signatures, and map to DTOs (identifiers, references, raw payload, provenance). | ||||
| 2. **AOC guard** — `AOCWriteGuard` verifies forbidden keys, provenance completeness, tenant claims, timestamp normalization, and content hash idempotency. Violations raise `ERR_AOC_00x` mapped to structured logs and metrics. | ||||
| 3. **Append-only write** — observations insert into `advisory_observations`; duplicates by `(tenant, source.vendor, upstream.upstreamId, upstream.contentHash)` become no-ops; new content for same upstream id creates a supersedes chain. | ||||
| 4. **Change feed + event** — Mongo change streams trigger `advisory.observation.updated@1` events with deterministic payloads (IDs, hash, supersedes pointer, linkset summary). Policy Engine, Offline Kit builder, and guard dashboards subscribe. | ||||
|  | ||||
| ### 5.2 Linkset correlation | ||||
|  | ||||
| 1. **Queue** — observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph. | ||||
| 2. **Canonical grouping** — builder resolves aliases using Concelier’s alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores. | ||||
| 3. **Linkset materialization** — `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates. | ||||
| 4. **Conflict detection** — builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability. | ||||
| 5. **Event emission** — `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation. | ||||
|  | ||||
| ### 5.3 Event contract | ||||
|  | ||||
| | Event | Schema | Notes | | ||||
| |-------|--------|-------| | ||||
| | `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. | | ||||
| | `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. | | ||||
|  | ||||
| Events are emitted via NATS (primary) and Redis Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Storage schema (MongoDB) | ||||
|  | ||||
| ### Collections & indexes (LNM path) | ||||
|  | ||||
| * `concelier.sources` `{_id, type, baseUrl, enabled, notes}` — connector catalog. | ||||
| * `concelier.source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` — run-state (TTL indexes on `backoffUntil`). | ||||
| * `concelier.documents` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` — raw payload registry. | ||||
|   * Indexes: `{sourceName:1, uri:1}` unique; `{fetchedAt:-1}` for recent fetches. | ||||
| * `concelier.dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` — normalized connector DTOs used for replay. | ||||
|   * Index: `{sourceName:1, documentId:1}`. | ||||
| * `concelier.advisory_observations` | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "tenant:vendor:upstreamId:revision", | ||||
|   tenant, | ||||
|   source: { vendor, stream, api, collectorVersion }, | ||||
|   upstream: { upstreamId, documentVersion, fetchedAt, receivedAt, contentHash, signature }, | ||||
|   content: { format, specVersion, raw, metadata? }, | ||||
|   identifiers: { cve?, ghsa?, vendorIds[], aliases[] }, | ||||
|   linkset: { purls[], cpes[], aliases[], references[], reconciledFrom[] }, | ||||
|   supersedes?: "prevObservationId", | ||||
|   createdAt, | ||||
|   attributes?: object | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, upstream.upstreamId:1}`, `{tenant:1, source.vendor:1, linkset.purls:1}`, `{tenant:1, linkset.aliases:1}`, `{tenant:1, createdAt:-1}`. | ||||
| * `concelier.advisory_linksets` | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "sha256:...", | ||||
|   tenant, | ||||
|   key: { vulnerabilityId, productKey, confidence }, | ||||
|   observations: [ | ||||
|     { observationId, sourceVendor, statement, collectedAt } | ||||
|   ], | ||||
|   aliases: { primary, others: [] }, | ||||
|   purls: [], | ||||
|   cpes: [], | ||||
|   conflicts: [], | ||||
|   createdAt, | ||||
|   updatedAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, aliases.primary:1}`, `{tenant:1, updatedAt:-1}`. | ||||
| * `concelier.advisory_events` | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: ObjectId, | ||||
|   tenant, | ||||
|   type: "advisory.observation.updated" | "advisory.linkset.updated", | ||||
|   key, | ||||
|   delta, | ||||
|   hash, | ||||
|   occurredAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * TTL index on `occurredAt` (configurable retention), `{type:1, occurredAt:-1}` for replay. | ||||
| * `concelier.export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}` | ||||
| * `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks) | ||||
| * `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}` | ||||
|  | ||||
| **Legacy collections** (`advisory`, `alias`, `affected`, `reference`, `merge_event`) remain read-only during the migration window to support back-compat exports. New code must not write to them; scheduled cleanup removes them after Link-Not-Merge GA. | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw payloads (immutable); `fs.exports` for historical JSON/Trivy archives. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Exporters | ||||
|  | ||||
| ### 7.1 Deterministic JSON (vuln‑list style) | ||||
|  | ||||
| * Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace. | ||||
| * `manifest.json` lists all files with SHA‑256 and a top‑level **export digest**. | ||||
|  | ||||
| ### 7.2 Trivy DB exporter | ||||
|  | ||||
| * Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes. | ||||
| * In delta, unchanged blobs are reused from the base; metadata captures: | ||||
|  | ||||
|   ```json | ||||
|   { | ||||
|     "mode": "delta|full", | ||||
|     "baseExportId": "...", | ||||
|     "baseManifestDigest": "sha256:...", | ||||
|     "changed": ["path1", "path2"], | ||||
|     "removed": ["path3"] | ||||
|   } | ||||
|   ``` | ||||
| * Optional ORAS push (OCI layout) for registries. | ||||
| * Offline kit bundles include Trivy DB + JSON tree + export manifest. | ||||
| * Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints. | ||||
| * Concelier.WebService serves `/concelier/exports/index.json` and `/concelier/exports/mirror/{domain}/…` directly from the export tree with hour-long budgets (index: 60 s, bundles: 300 s, immutable) and per-domain rate limiting; the endpoints honour Stella Ops Authority or CIDR bypass lists depending on mirror topology. | ||||
|  | ||||
| ### 7.3 Hand‑off to Signer/Attestor (optional) | ||||
|  | ||||
| * On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys. | ||||
| * Export record stores returned `{ uuid, index, url }` from **Rekor v2**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) REST APIs | ||||
|  | ||||
| All under `/api/v1/concelier`. | ||||
|  | ||||
| **Health & status** | ||||
|  | ||||
| ``` | ||||
| GET  /healthz | /readyz | ||||
| GET  /status                              → sources, last runs, export cursors | ||||
| ``` | ||||
|  | ||||
| **Sources & jobs** | ||||
|  | ||||
| ``` | ||||
| GET  /sources                              → list of configured sources | ||||
| POST /sources/{name}/trigger               → { jobId } | ||||
| POST /sources/{name}/pause | /resume       → toggle | ||||
| GET  /jobs/{id}                            → job status | ||||
| ``` | ||||
|  | ||||
| **Exports** | ||||
|  | ||||
| ``` | ||||
| POST /exports/json   { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? } | ||||
| POST /exports/trivy  { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? } | ||||
| GET  /exports/{id}   → export metadata (kind, digest, createdAt, rekor?) | ||||
| GET  /concelier/exports/index.json        → mirror index describing available domains/bundles | ||||
| GET  /concelier/exports/mirror/{domain}/manifest.json | ||||
| GET  /concelier/exports/mirror/{domain}/bundle.json | ||||
| GET  /concelier/exports/mirror/{domain}/bundle.json.jws | ||||
| ``` | ||||
|  | ||||
| **Search (operator debugging)** | ||||
|  | ||||
| ``` | ||||
| GET  /advisories/{key} | ||||
| GET  /advisories?scheme=CVE&value=CVE-2025-12345 | ||||
| GET  /affected?productKey=pkg:rpm/openssl&limit=100 | ||||
| ``` | ||||
|  | ||||
| **AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   mongo: { uri: "mongodb://mongo/concelier" } | ||||
|   s3: | ||||
|     endpoint: "http://minio:9000" | ||||
|     bucket: "stellaops-concelier" | ||||
|   scheduler: | ||||
|     windowSeconds: 30 | ||||
|     maxParallelSources: 4 | ||||
|   sources: | ||||
|     - name: redhat | ||||
|       kind: csaf | ||||
|       baseUrl: https://access.redhat.com/security/data/csaf/v2/ | ||||
|       signature: { type: pgp, keys: [ "…redhat PGP…" ] } | ||||
|       enabled: true | ||||
|       windowDays: 7 | ||||
|     - name: suse | ||||
|       kind: csaf | ||||
|       baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ | ||||
|       signature: { type: pgp, keys: [ "…suse PGP…" ] } | ||||
|     - name: ubuntu | ||||
|       kind: usn-json | ||||
|       baseUrl: https://ubuntu.com/security/notices.json | ||||
|       signature: { type: none } | ||||
|     - name: osv | ||||
|       kind: osv | ||||
|       baseUrl: https://api.osv.dev/v1/ | ||||
|       signature: { type: none } | ||||
|     - name: ghsa | ||||
|       kind: ghsa | ||||
|       baseUrl: https://api.github.com/graphql | ||||
|       auth: { tokenRef: "env:GITHUB_TOKEN" } | ||||
|   exporters: | ||||
|     json: | ||||
|       enabled: true | ||||
|       output: s3://stellaops-concelier/json/ | ||||
|     trivy: | ||||
|       enabled: true | ||||
|       mode: full | ||||
|       output: s3://stellaops-concelier/trivy/ | ||||
|       oras: | ||||
|         enabled: false | ||||
|         repo: ghcr.io/org/concelier | ||||
|   precedence: | ||||
|     vendorWinsOverDistro: true | ||||
|     distroWinsOverOsv: true | ||||
|   severity: | ||||
|     policy: max    # or 'vendorPreferred' / 'distroPreferred' | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Security & compliance | ||||
|  | ||||
| * **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible. | ||||
| * **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; Policy Engine or downstream policy can down-weight them. | ||||
| * **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers. | ||||
| * **Multi‑tenant**: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens. | ||||
| * **Determinism**: canonical JSON writer; export digests stable across runs given same inputs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Performance targets & scale | ||||
|  | ||||
| * **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON). | ||||
| * **Normalize/map**: ≥ 50k observation statements/min on 4 cores. | ||||
| * **Observation write**: ≤ 5 ms P95 per document (including guard + Mongo write). | ||||
| * **Linkset build**: ≤ 15 ms P95 per `(vulnerabilityId, productKey)` update, even with 20+ contributing observations. | ||||
| * **Export**: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores. | ||||
| * **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes. | ||||
|  | ||||
| **Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Observability | ||||
|  | ||||
| * **Metrics** | ||||
|  | ||||
|   * `concelier.fetch.docs_total{source}` | ||||
|   * `concelier.fetch.bytes_total{source}` | ||||
|   * `concelier.parse.failures_total{source}` | ||||
|   * `concelier.map.statements_total{source}` | ||||
|   * `concelier.observations.write_total{result=ok|noop|error}` | ||||
|   * `concelier.linksets.updated_total{result=ok|skip|error}` | ||||
|   * `concelier.linksets.conflicts_total{type}` | ||||
|   * `concelier.export.bytes{kind}` | ||||
|   * `concelier.export.duration_seconds{kind}` | ||||
| * **Tracing** around fetch/parse/map/observe/linkset/export. | ||||
| * **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Testing matrix | ||||
|  | ||||
| * **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail). | ||||
| * **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases). | ||||
| * **Linkset correlation:** multi-source conflicts (severity, range, alias) produce deterministic conflict payloads; ensure confidence scoring stable. | ||||
| * **Export determinism:** byte‑for‑byte stable outputs across runs; digest equality. | ||||
| * **Performance:** soak tests with 1M advisories; cap memory; verify backpressure. | ||||
| * **API:** pagination, filters, RBAC, error envelopes (RFC 7807). | ||||
| * **Offline kit:** bundle build & import correctness. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Failure modes & recovery | ||||
|  | ||||
| * **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness. | ||||
| * **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges. | ||||
| * **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`. | ||||
| * **Resume:** all stages idempotent; `source_state.cursor` supports window resume. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Operator runbook (quick) | ||||
|  | ||||
| * **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger` | ||||
| * **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }` | ||||
| * **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }` | ||||
| * **Inspect observation:** `GET /api/v1/concelier/observations/{observationId}` | ||||
| * **Query linkset:** `GET /api/v1/concelier/linksets?vulnerabilityId=CVE-2025-12345&productKey=pkg:rpm/redhat/openssl` | ||||
| * **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Rollout plan | ||||
|  | ||||
| 1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export. | ||||
| 2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export. | ||||
| 3. **Attestation hand‑off**: integrate with **Signer/Attestor** (optional). | ||||
| 4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse. | ||||
| 5. **Offline kit**: end‑to‑end verified bundles for air‑gap. | ||||
							
								
								
									
										67
									
								
								docs/modules/concelier/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										67
									
								
								docs/modules/concelier/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,67 @@ | ||||
| # Implementation plan — Concelier | ||||
|  | ||||
| ## Delivery timeline | ||||
| - **Phase 1 — Guardrails & schema**   | ||||
|   Stand up Mongo JSON validators for `advisory_raw` and `vex_raw`, wire the `AOCWriteGuard` repository interceptor, and seed deterministic linkset builders. Freeze legacy normalisation paths and migrate callers to the new raw schema. | ||||
| - **Phase 2 — API & observability**   | ||||
|   Publish ingestion and verification endpoints (`POST /ingest/*`, `GET /advisories.raw`, `POST /aoc/verify`) with Authority scopes, expose telemetry (`aoc_violation_total`, guard spans, structured logs), and ensure Offline Kit packaging captures validator deployment steps. | ||||
| - **Phase 3 — Experience polish**   | ||||
|   Ship CLI/Console affordances (`stella sources ingest --dry-run`, dashboard tiles, violation drill-downs), finish Export Center hand-off metadata, and close out CI enforcement (`stella aoc verify` preflight, AST lint, seeded fixtures). | ||||
|  | ||||
| ## Work breakdown by component | ||||
| - **Concelier WebService & worker** | ||||
|   - Add Mongo validators and unique indexes over `(tenant, source.vendor, upstream.upstream_id, upstream.content_hash)`. | ||||
|   - Implement write interceptors rejecting forbidden fields, missing provenance, or merge attempts. | ||||
|   - Deterministically compute linksets and persist canonical JSON payloads. | ||||
|   - Introduce `/ingest/advisory`, `/advisories/raw*`, and `/aoc/verify` surfaces guarded by `advisory:*` and `aoc:verify` scopes. | ||||
|   - Emit guard metrics/traces and surface supersedes/violation audit logs. | ||||
| - **Excititor (shared ingestion contract)** | ||||
|   - Mirror Concelier guard and schema changes for `vex_raw`. | ||||
|   - Maintain restart-time plug-in determinism and linkset extraction parity. | ||||
| - **Shared libraries** | ||||
|   - Publish `StellaOps.Ingestion.AOC` (forbidden key catalog, guard middleware, provenance helpers, signature verification). | ||||
|   - Share error codes (`ERR_AOC_00x`) and deterministic hashing utilities. | ||||
| - **Policy Engine integration** | ||||
|   - Enforce `effective_finding_*` write exclusivity. | ||||
|   - Consume only raw documents + linksets, removing any implicit normalisation. | ||||
| - **Authority scopes** | ||||
|   - Provision `advisory:ingest|read`, `vex:ingest|read`, `aoc:verify`; propagate tenant claims to ingestion services. | ||||
| - **CLI & Console** | ||||
|   - Implement `stella sources ingest --dry-run` and `stella aoc verify` (with exit codes mapped to `ERR_AOC_00x`). | ||||
|   - Surface AOC dashboards, violation drill-down, and verification shortcuts in the Console. | ||||
| - **CI/CD** | ||||
|   - Add Roslyn analyzer / AST lint to block forbidden writes. | ||||
|   - Seed fixtures and run `stella aoc verify` against snapshots in pipeline gating. | ||||
|  | ||||
| ## Documentation deliverables | ||||
| - Update `docs/ingestion/aggregation-only-contract.md` with guard invariants, schemas, error codes, and migration guidance. | ||||
| - Refresh `docs/modules/concelier/operations/*.md` (mirror, conflict-resolution, authority audit) with validator rollouts and observability dashboards. | ||||
| - Cross-link Authority scope definitions, CLI reference, Console sources guide, and observability runbooks to the AOC guard changes. | ||||
| - Ensure Offline Kit documentation captures validator bootstrap and verify workflows. | ||||
|  | ||||
| ## Acceptance criteria | ||||
| - Mongo validators and runtime guards reject forbidden fields and missing provenance with the documented `ERR_AOC_00x` codes. | ||||
| - Linksets and supersedes chains are deterministic; rerunning ingestion over identical payloads yields byte-identical documents. | ||||
| - CLI `stella aoc verify` exits non-zero on seeded violations and zero on clean datasets; Console dashboards show real-time guard status. | ||||
| - Export Center consumes advisory datasets without relying on legacy normalised fields. | ||||
| - CI fails if lint rules detect forbidden writes or if seeded guard tests regress. | ||||
|  | ||||
| ## Risks & mitigations | ||||
| - **Collector drift introduces new forbidden keys.** Mitigated by guard middleware + CI lint + schema validation; RFC required for linkset changes. | ||||
| - **Migration complexity from legacy normalisation.** Staged cutover with `_backup_*` copies and temporary views to keep Policy Engine parity. | ||||
| - **Performance overhead during ingest.** Guard remains O(number of keys); index review ensures insert latency stays within warm (<5 s) / cold (<30 s) targets. | ||||
| - **Tenancy leakage.** `tenant` required in schema, Authority-supplied claims enforced per request, observability alerts fire on missing tenant identifiers. | ||||
|  | ||||
| ## Test strategy | ||||
| - **Unit**: guard rejection paths, provenance enforcement, idempotent insertions, linkset determinism. | ||||
| - **Property**: fuzz upstream payloads to guarantee no forbidden fields emerge. | ||||
| - **Integration**: batch ingest (50k advisories, mixed VEX fixtures), verifying zero guard violations and consistent supersedes. | ||||
| - **Contract**: Policy Engine consumers verify raw-only reads; Export Center consumes canonical datasets. | ||||
| - **End-to-end**: ingest/verify flow with CLI + Console actions to confirm observability and guard reporting. | ||||
|  | ||||
| ## Definition of done | ||||
| - Validators deployed and verified in staging/offline environments. | ||||
| - Runtime guards, CLI/Console workflows, and CI linting all active. | ||||
| - Observability dashboards and runbooks updated; metrics visible. | ||||
| - Documentation updates merged; Offline Kit instructions published. | ||||
| - ./TASKS.md reflects status transitions; cross-module dependencies acknowledged in ../../TASKS.md. | ||||
							
								
								
									
										159
									
								
								docs/modules/concelier/operations/authority-audit-runbook.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										159
									
								
								docs/modules/concelier/operations/authority-audit-runbook.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,159 @@ | ||||
| # Concelier Authority Audit Runbook | ||||
|  | ||||
| _Last updated: 2025-10-22_ | ||||
|  | ||||
| This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes. | ||||
| - OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM. | ||||
| - Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests. | ||||
| - The rollout table in `docs/10_CONCELIER_CLI_QUICKSTART.md` has been reviewed so stakeholders align on the staged → enforced toggle timeline. | ||||
|  | ||||
| ### Configuration snippet | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   authority: | ||||
|     enabled: true | ||||
|     allowAnonymousFallback: false          # keep true only during initial rollout | ||||
|     issuer: "https://authority.internal" | ||||
|     audiences: | ||||
|       - "api://concelier" | ||||
|     requiredScopes: | ||||
|       - "concelier.jobs.trigger" | ||||
|       - "advisory:read" | ||||
|       - "advisory:ingest" | ||||
|     requiredTenants: | ||||
|       - "tenant-default" | ||||
|     bypassNetworks: | ||||
|       - "127.0.0.1/32" | ||||
|       - "::1/128" | ||||
|     clientId: "concelier-jobs" | ||||
|     clientSecretFile: "/run/secrets/concelier_authority_client" | ||||
|     tokenClockSkewSeconds: 60 | ||||
|     resilience: | ||||
|       enableRetries: true | ||||
|       retryDelays: | ||||
|         - "00:00:01" | ||||
|         - "00:00:02" | ||||
|         - "00:00:05" | ||||
|       allowOfflineCacheFallback: true | ||||
|       offlineCacheTolerance: "00:10:00" | ||||
| ``` | ||||
|  | ||||
| > Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service. | ||||
|  | ||||
| ### Resilience tuning | ||||
|  | ||||
| - **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts. | ||||
| - **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs. | ||||
| - Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled. | ||||
|  | ||||
| ## 2. Key Signals | ||||
|  | ||||
| ### 2.1 Audit log channel | ||||
|  | ||||
| Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active. | ||||
|  | ||||
| ``` | ||||
| Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger advisory:ingest bypass=False remote=10.1.4.7 | ||||
| ``` | ||||
|  | ||||
| | Field        | Sample value            | Meaning                                                                                  | | ||||
| |--------------|-------------------------|------------------------------------------------------------------------------------------| | ||||
| | `route`      | `/jobs/definitions`     | Endpoint that processed the request.                                                     | | ||||
| | `status`     | `200` / `401` / `409`   | Final HTTP status code returned to the caller.                                           | | ||||
| | `subject`    | `ops@example.com`       | User or service principal subject (falls back to `(anonymous)` when unauthenticated).    | | ||||
| | `clientId`   | `concelier-cli`         | OAuth client ID provided by Authority (`(none)` if the token lacked the claim).         | | ||||
| | `scopes`     | `concelier.jobs.trigger advisory:ingest advisory:read` | Normalised scope list extracted from token claims; `(none)` if the token carried none.   | | ||||
| | `tenant`     | `tenant-default`        | Tenant claim extracted from the Authority token (`(none)` when the token lacked it).     | | ||||
| | `bypass`     | `True` / `False`        | Indicates whether the request succeeded because its source IP matched a bypass CIDR.    | | ||||
| | `remote`     | `10.1.4.7`              | Remote IP recorded from the connection / forwarded header test hooks.                    | | ||||
|  | ||||
| Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations: | ||||
|  | ||||
| - `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout). | ||||
| - `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration. | ||||
| - `status=202 AND NOT contains(scopes,"advisory:ingest")` – ingestion attempted without the new AOC scopes; confirm the Authority client registration matches the sample above. | ||||
| - `tenant!=(tenant-default)` – indicates a cross-tenant token was accepted. Ensure Concelier `requiredTenants` is aligned with Authority client registration. | ||||
| - Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated. | ||||
|  | ||||
| ### 2.2 Metrics | ||||
|  | ||||
| Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`. | ||||
|  | ||||
| | Metric name                   | Description                                        | PromQL example | | ||||
| |-------------------------------|----------------------------------------------------|----------------| | ||||
| | `web.jobs.triggered`          | Accepted job trigger requests.                     | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` | | ||||
| | `web.jobs.trigger.conflict`   | Rejected triggers (already running, disabled…).    | `sum(rate(web_jobs_trigger_conflict_total[5m]))` | | ||||
| | `web.jobs.trigger.failed`     | Server-side job failures.                          | `sum(rate(web_jobs_trigger_failed_total[5m]))` | | ||||
|  | ||||
| > Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names. | ||||
|  | ||||
| Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`: | ||||
|  | ||||
| - `concelier.source.http.requests_total{concelier_source="jobs-run"}` – ensures REST/manual triggers route through Authority. | ||||
| - If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries. | ||||
|  | ||||
| ## 3. Alerting Guidance | ||||
|  | ||||
| 1. **Unauthorized bypass attempt**   | ||||
|    - Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`   | ||||
|    - Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious. | ||||
|  | ||||
| 2. **Missing scopes**   | ||||
|    - Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`   | ||||
|    - Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`, `advisory:ingest`, and `advisory:read`. | ||||
|  | ||||
| 3. **Trigger failure surge**   | ||||
|    - Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.   | ||||
|    - Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors. | ||||
|  | ||||
| 4. **Conflict spike**   | ||||
|    - Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).   | ||||
|    - Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly. | ||||
|  | ||||
| 5. **Authority offline**   | ||||
|    - Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback. | ||||
|  | ||||
| ## 4. Rollout & Verification Procedure | ||||
|  | ||||
| 1. **Pre-checks** | ||||
|    - Align with the rollout phases documented in `docs/10_CONCELIER_CLI_QUICKSTART.md` (validation → rehearsal → enforced) and record the target dates in your change request. | ||||
|    - Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation. | ||||
|    - Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host). | ||||
|  | ||||
| 2. **Smoke test with valid token** | ||||
|    - Obtain a token via CLI: `stella auth login --scope "concelier.jobs.trigger advisory:ingest" --scope advisory:read`. | ||||
|    - Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`. | ||||
|    - Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger advisory:ingest advisory:read`, and `tenant=tenant-default`. | ||||
|  | ||||
| 3. **Negative test without token** | ||||
|    - Call the same endpoint without a token. Expect HTTP 401, `bypass=False`. | ||||
|    - If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled. | ||||
|  | ||||
| 4. **Bypass check (if applicable)** | ||||
|    - From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries. | ||||
|  | ||||
| 5. **Metrics validation** | ||||
|    - Ensure `web.jobs.triggered` counter increments during accepted runs. | ||||
|    - Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled. | ||||
|  | ||||
| ## 5. Troubleshooting | ||||
|  | ||||
| | Symptom | Probable cause | Remediation | | ||||
| |---------|----------------|-------------| | ||||
| | Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. | | ||||
| | Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. | | ||||
| | HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. | | ||||
| | Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. | | ||||
| | Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. | | ||||
|  | ||||
| ## 6. References | ||||
|  | ||||
| - `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start. | ||||
| - `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines. | ||||
| - `docs/modules/authority/operations/monitoring.md` – Authority-side monitoring and alerting playbook. | ||||
| - `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields. | ||||
							
								
								
									
										160
									
								
								docs/modules/concelier/operations/conflict-resolution.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										160
									
								
								docs/modules/concelier/operations/conflict-resolution.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,160 @@ | ||||
| # Concelier Conflict Resolution Runbook (Sprint 3) | ||||
|  | ||||
| This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Precedence Model (recap) | ||||
|  | ||||
| - **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it. | ||||
| - **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`. | ||||
| - **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`. | ||||
| - **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Telemetry Shipped This Sprint | ||||
|  | ||||
| | Instrument | Type | Key Tags | Purpose | | ||||
| |------------|------|----------|---------| | ||||
| | `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. | | ||||
| | `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. | | ||||
| | `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. | | ||||
| | `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. | | ||||
| | `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. | | ||||
|  | ||||
| ### Structured logs | ||||
|  | ||||
| - `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts. | ||||
| - `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions. | ||||
| - `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios. | ||||
| - `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments. | ||||
|  | ||||
| Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Detection & Alerting | ||||
|  | ||||
| 1. **Dashboard panels** | ||||
|    - `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window. | ||||
|    - `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data. | ||||
|    - `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA). | ||||
|    - `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day. | ||||
| 2. **Log based alerts** | ||||
|    - `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners. | ||||
|    - `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained. | ||||
| 3. **Job health** | ||||
|    - `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops. | ||||
|  | ||||
| ### Threshold updates (2025-10-12) | ||||
|  | ||||
| - `concelier.merge.conflicts` – Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging. | ||||
| - `concelier.merge.overrides` – Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`). | ||||
| - `concelier.merge.range_overrides` – Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Triage Workflow | ||||
|  | ||||
| 1. **Confirm job context** | ||||
|    - `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage. | ||||
| 2. **Inspect metrics** | ||||
|    - Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`. | ||||
| 3. **Pull structured logs** | ||||
|    - Example (vector output): | ||||
|      ``` | ||||
|      jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log | ||||
|      ``` | ||||
| 4. **Review merge events** | ||||
|    - `mongosh`: | ||||
|      ```javascript | ||||
|      use concelier; | ||||
|      db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5); | ||||
|      ``` | ||||
|    - Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output. | ||||
| 5. **Interrogate provenance** | ||||
|    - `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })` | ||||
|    - Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Conflict Classification Matrix | ||||
|  | ||||
| | Signal | Likely Cause | Immediate Action | | ||||
| |--------|--------------|------------------| | ||||
| | `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. | | ||||
| | `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. | | ||||
| | `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. | | ||||
| | Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. | | ||||
| | `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Resolution Playbook | ||||
|  | ||||
| 1. **Connector data fix** | ||||
|    - Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.). | ||||
|    - Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected. | ||||
| 2. **Temporary precedence override** | ||||
|    - Edit `etc/concelier.yaml`: | ||||
|      ```yaml | ||||
|      concelier: | ||||
|        merge: | ||||
|          precedence: | ||||
|            ranks: | ||||
|              osv: 1 | ||||
|              ghsa: 0 | ||||
|      ``` | ||||
|    - Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks. | ||||
|    - Document the override with expiry in the change log. | ||||
| 3. **Alias remediation** | ||||
|    - Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs). | ||||
|    - Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing). | ||||
| 4. **Escalation** | ||||
|    - If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. Validation Checklist | ||||
|  | ||||
| - [ ] Merge job rerun returns exit code `0`. | ||||
| - [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action. | ||||
| - [ ] Latest `merge_event` entry shows expected hash delta. | ||||
| - [ ] Affected advisory document shows updated `provenance[].decisionReason`. | ||||
| - [ ] Ops change log updated with incident summary, config overrides, and rollback plan. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8. Reference Material | ||||
|  | ||||
| - Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`. | ||||
| - Merge engine internals: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`. | ||||
| - Metrics definitions: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`. | ||||
| - Storage audit trail: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/Concelier/__Libraries/StellaOps.Concelier.Storage.Mongo/MergeEvents`. | ||||
|  | ||||
| Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9. Synthetic Regression Fixtures | ||||
|  | ||||
| - **Locations** – Canonical conflict snapshots now live at `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`. | ||||
| - **Validation commands** – To regenerate and verify the fixtures offline, run: | ||||
|  | ||||
| ```bash | ||||
| dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests | ||||
| dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests | ||||
| dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests | ||||
| dotnet test src/Concelier/__Tests/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions | ||||
| ``` | ||||
|  | ||||
| - **Expected signals** – The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10. Change Log | ||||
|  | ||||
| | Date (UTC) | Change | Notes | | ||||
| |------------|--------|-------| | ||||
| | 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. | | ||||
							
								
								
									
										77
									
								
								docs/modules/concelier/operations/connectors/apple.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										77
									
								
								docs/modules/concelier/operations/connectors/apple.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,77 @@ | ||||
| # Concelier Apple Security Update Connector Operations | ||||
|  | ||||
| This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`). | ||||
| - Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered. | ||||
| - Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     apple: | ||||
|       softwareLookupUri: "https://gdmf.apple.com/v2/pmv" | ||||
|       advisoryBaseUri: "https://support.apple.com/" | ||||
|       localeSegment: "en-us" | ||||
|       maxAdvisoriesPerFetch: 25 | ||||
|       initialBackfill: "120.00:00:00" | ||||
|       modifiedTolerance: "02:00:00" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient. | ||||
|  | ||||
| ## 2. Staging Smoke Test | ||||
|  | ||||
| 1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound. | ||||
| 2. Trigger a full connector cycle: | ||||
|    - CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map` | ||||
|    - REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }` | ||||
| 3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`: | ||||
|    - `apple.fetch.items` (documents fetched) | ||||
|    - `apple.fetch.failures` | ||||
|    - `apple.fetch.unchanged` | ||||
|    - `apple.parse.failures` | ||||
|    - `apple.map.affected.count` (histogram of affected package counts) | ||||
| 4. Cross-check the shared HTTP counters: | ||||
|    - `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases. | ||||
|    - `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run. | ||||
| 5. Inspect the info logs: | ||||
|    - `Apple software index fetch … processed=X newDocuments=Y` | ||||
|    - `Apple advisory parse complete … aliases=… affected=…` | ||||
|    - `Mapped Apple advisory … pendingMappings=0` | ||||
| 6. Confirm MongoDB state: | ||||
|    - `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`). | ||||
|    - `dtos` store has `schemaVersion="apple.security.update.v1"`. | ||||
|    - `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules. | ||||
|    - `source_states` entry for `apple` shows a recent `cursor.lastPosted`. | ||||
|  | ||||
| ## 3. Production Monitoring | ||||
|  | ||||
| - **Dashboards** – Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed): | ||||
|   - `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])` | ||||
|   - `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`) | ||||
|   - `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out | ||||
|   - `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`) | ||||
| - **Alerts** – Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists. | ||||
| - **Logs** – Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions. | ||||
| - **Telemetry pipeline** – `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it. | ||||
|  | ||||
| ## 4. Fixture Maintenance | ||||
|  | ||||
| Regression fixtures live under `src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear. | ||||
|  | ||||
| 1. Run the helper script matching your platform: | ||||
|    - Bash: `./scripts/update-apple-fixtures.sh` | ||||
|    - PowerShell: `./scripts/update-apple-fixtures.ps1` | ||||
| 2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots. | ||||
| 3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism. | ||||
| 4. Commit fixture updates together with any parser/mapping changes that motivated them. | ||||
|  | ||||
| ## 5. Known Issues & Follow-up Tasks | ||||
|  | ||||
| - Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows. | ||||
| - Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage. | ||||
| - Multi-locale content is still under regression sweep (`src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises. | ||||
							
								
								
									
										72
									
								
								docs/modules/concelier/operations/connectors/cccs.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										72
									
								
								docs/modules/concelier/operations/connectors/cccs.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,72 @@ | ||||
| # Concelier CCCS Connector Operations | ||||
|  | ||||
| This runbook covers day‑to‑day operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories. | ||||
|  | ||||
| ## 1. Configuration Checklist | ||||
|  | ||||
| - Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`. | ||||
| - Set the Concelier options before restarting workers. Example `concelier.yaml` snippet: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     cccs: | ||||
|       feeds: | ||||
|         - language: "en" | ||||
|           uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat" | ||||
|         - language: "fr" | ||||
|           uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat" | ||||
|       maxEntriesPerFetch: 80        # increase temporarily for backfill runs | ||||
|       maxKnownEntries: 512 | ||||
|       requestTimeout: "00:00:30" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5 100 rows each as of 2025‑10‑14). The connector honours `maxEntriesPerFetch`, so leave it low for steady‑state and raise it for planned backfills. | ||||
|  | ||||
| ## 2. Telemetry & Logging | ||||
|  | ||||
| - **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):** | ||||
|   - `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures` | ||||
|   - `cccs.fetch.documents`, `cccs.fetch.unchanged` | ||||
|   - `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine` | ||||
|   - `cccs.map.success`, `cccs.map.failures` | ||||
| - **Shared HTTP metrics** via `SourceDiagnostics`: | ||||
|   - `concelier.source.http.requests{concelier.source="cccs"}` | ||||
|   - `concelier.source.http.failures{concelier.source="cccs"}` | ||||
|   - `concelier.source.http.duration{concelier.source="cccs"}` | ||||
| - **Structured logs** | ||||
|   - `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…` | ||||
|   - `CCCS parse completed parsed=… failures=…` | ||||
|   - `CCCS map completed mapped=… failures=…` | ||||
|   - Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails. | ||||
|  | ||||
| Suggested Grafana alerts: | ||||
| - `increase(cccs.fetch.failures_total[15m]) > 0` | ||||
| - `rate(cccs.map.success_total[1h]) == 0` while other connectors are active | ||||
| - `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s` | ||||
|  | ||||
| ## 3. Historical Backfill Plan | ||||
|  | ||||
| 1. **Snapshot the source** – the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 2018‑06‑08 for EN, 2018‑06‑08 for FR). Mirror those responses into Offline Kit storage when operating air‑gapped. | ||||
| 2. **Stage ingestion**: | ||||
|    - Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers. | ||||
|    - Run chained jobs until `pendingDocuments` drains:   | ||||
|      `stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map` | ||||
|    - Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete. | ||||
| 3. **Optional pagination sweep** – for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift. | ||||
| 4. **Language split** – keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number. | ||||
| 5. **Throttle planning** – schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250 ms request delay or raise it if mirrored traffic is not available. | ||||
|  | ||||
| ## 4. Selector & Sanitiser Notes | ||||
|  | ||||
| - `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`. | ||||
| - Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers. | ||||
| - `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation. | ||||
|  | ||||
| ## 5. Fixture Maintenance | ||||
|  | ||||
| - Regression fixtures live in `src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`. | ||||
| - Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`. | ||||
| - Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing. | ||||
							
								
								
									
										146
									
								
								docs/modules/concelier/operations/connectors/certbund.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										146
									
								
								docs/modules/concelier/operations/connectors/certbund.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,146 @@ | ||||
| # Concelier CERT-Bund Connector Operations | ||||
|  | ||||
| _Last updated: 2025-10-17_ | ||||
|  | ||||
| Germany’s Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portal’s JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Configuration Checklist | ||||
|  | ||||
| - Allow outbound access (or stage mirrors) for: | ||||
|   - `https://wid.cert-bund.de/content/public/securityAdvisory/rss` | ||||
|   - `https://wid.cert-bund.de/portal/` (session/bootstrap) | ||||
|   - `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON) | ||||
| - Ensure the HTTP client reuses a cookie container (the connector’s dependency injection wiring already sets this up). | ||||
|  | ||||
| Example `concelier.yaml` fragment: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     cert-bund: | ||||
|       feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss" | ||||
|       portalBootstrapUri: "https://wid.cert-bund.de/portal/" | ||||
|       detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory" | ||||
|       maxAdvisoriesPerFetch: 50 | ||||
|       maxKnownAdvisories: 512 | ||||
|       requestTimeout: "00:00:30" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Telemetry & Logging | ||||
|  | ||||
| - **Meter**: `StellaOps.Concelier.Connector.CertBund` | ||||
| - **Counters / histograms**: | ||||
|   - `certbund.feed.fetch.attempts|success|failures` | ||||
|   - `certbund.feed.items.count` | ||||
|   - `certbund.feed.enqueued.count` | ||||
|   - `certbund.feed.coverage.days` | ||||
|   - `certbund.detail.fetch.attempts|success|not_modified|failures{reason}` | ||||
|   - `certbund.parse.success|failures{reason}` | ||||
|   - `certbund.parse.products.count`, `certbund.parse.cve.count` | ||||
|   - `certbund.map.success|failures{reason}` | ||||
|   - `certbund.map.affected.count`, `certbund.map.aliases.count` | ||||
| - Shared HTTP metrics remain available through `concelier.source.http.*`. | ||||
|  | ||||
| **Structured logs** (all emitted at information level when work occurs): | ||||
|  | ||||
| - `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}` | ||||
| - `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …` | ||||
| - `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …` | ||||
|  | ||||
| Alerting ideas: | ||||
|  | ||||
| 1. `increase(certbund.detail.fetch.failures_total[10m]) > 0` | ||||
| 2. `rate(certbund.map.success_total[30m]) == 0` | ||||
| 3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s` | ||||
|  | ||||
| The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Historical Backfill & Export Strategy | ||||
|  | ||||
| ### 3.1 Retention snapshot | ||||
|  | ||||
| - RSS window: ~250 advisories (≈90 days at current cadence). | ||||
| - Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied. | ||||
|  | ||||
| ### 3.2 JSON search pagination | ||||
|  | ||||
| ```bash | ||||
| # 1. Bootstrap cookies (client_config + XSRF-TOKEN) | ||||
| curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null | ||||
| curl -s -b cookies.txt -c cookies.txt \ | ||||
|      -H "X-Requested-With: XMLHttpRequest" \ | ||||
|      "https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null | ||||
|  | ||||
| XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt) | ||||
|  | ||||
| # 2. Page search results | ||||
| curl -s -b cookies.txt \ | ||||
|      -H "Content-Type: application/json" \ | ||||
|      -H "Accept: application/json" \ | ||||
|      -H "X-XSRF-TOKEN: ${XSRF}" \ | ||||
|      -X POST \ | ||||
|      --data '{"page":4,"size":100,"sort":["published,desc"]}' \ | ||||
|      "https://wid.cert-bund.de/portal/api/securityadvisory/search" \ | ||||
|      > certbund-page4.json | ||||
| ``` | ||||
|  | ||||
| Iterate `page` until the response `content` array is empty. Pages 0–9 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity. | ||||
|  | ||||
| > **Shortcut** – run `python src/Tools/certbund_offline_snapshot.py --output seed-data/cert-bund` | ||||
| > to bootstrap the session, capture the paginated search responses, and regenerate | ||||
| > the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token` | ||||
| > if the portal requires a browser-derived session (see options via `--help`). | ||||
|  | ||||
| ### 3.3 Export bundles | ||||
|  | ||||
| ```bash | ||||
| python src/Tools/certbund_offline_snapshot.py \ | ||||
|   --output seed-data/cert-bund \ | ||||
|   --start-year 2014 \ | ||||
|   --end-year "$(date -u +%Y)" | ||||
| ``` | ||||
|  | ||||
| The helper stores yearly exports under `seed-data/cert-bund/export/`, | ||||
| captures paginated search snapshots in `seed-data/cert-bund/search/`, | ||||
| and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`. | ||||
| Split ranges according to your compliance window (default: one file per | ||||
| calendar year). Concelier can ingest these JSON payloads directly when | ||||
| operating offline. | ||||
|  | ||||
| > When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the | ||||
| > manual `curl` flow above, then rerun the helper with `--skip-fetch` to | ||||
| > rebuild the manifest from the existing files. | ||||
|  | ||||
| ### 3.4 Connector-driven catch-up | ||||
|  | ||||
| 1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`. | ||||
| 2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`. | ||||
| 3. Restore defaults and capture the cursor snapshot for audit. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Locale & Translation Guidance | ||||
|  | ||||
| - Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy. | ||||
| - UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text. | ||||
| - Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Verification Checklist | ||||
|  | ||||
| 1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window. | ||||
| 2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit. | ||||
| 3. During backfills, watch `certbund.feed.enqueued.count` trend to zero. | ||||
| 4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint. | ||||
| 5. For Offline Kit exports, validate SHA256 hashes before distribution. | ||||
							
								
								
									
										94
									
								
								docs/modules/concelier/operations/connectors/cisco.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										94
									
								
								docs/modules/concelier/operations/connectors/cisco.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,94 @@ | ||||
| # Concelier Cisco PSIRT Connector – OAuth Provisioning SOP | ||||
|  | ||||
| _Last updated: 2025-10-14_ | ||||
|  | ||||
| ## 1. Scope | ||||
|  | ||||
| This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths. | ||||
|  | ||||
| ## 2. Prerequisites | ||||
|  | ||||
| - Active Cisco.com (CCO) account with access to the Cisco API Console. | ||||
| - Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0 | ||||
| - Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory. | ||||
|  | ||||
| ## 3. Provisioning workflow | ||||
|  | ||||
| 1. **Register the application** | ||||
|    - Sign in at <https://apiconsole.cisco.com>. | ||||
|    - Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0 | ||||
|    - Record the generated `clientId` and `clientSecret` in the Ops vault. | ||||
| 2. **Verify token issuance** | ||||
|    - Request an access token with: | ||||
|      ```bash | ||||
|      curl -s https://id.cisco.com/oauth2/default/v1/token \ | ||||
|        -H "Content-Type: application/x-www-form-urlencoded" \ | ||||
|        -d "grant_type=client_credentials" \ | ||||
|        -d "client_id=${CLIENT_ID}" \ | ||||
|        -d "client_secret=${CLIENT_SECRET}" | ||||
|      ``` | ||||
|    - Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7 | ||||
|    - Preserve the response only long enough to validate syntax; do **not** persist tokens. | ||||
| 3. **Authorize Concelier runtime** | ||||
|    - Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials. | ||||
|    - For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platform’s sealed secret format. | ||||
| 4. **Connectivity validation** | ||||
|    - From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`. | ||||
|    - Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses. | ||||
|  | ||||
| ## 4. Rotation SOP | ||||
|  | ||||
| | Step | Owner | Notes | | ||||
| | --- | --- | --- | | ||||
| | 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. | | ||||
| | 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. | | ||||
| | 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. | | ||||
| | 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. | | ||||
| | 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. | | ||||
|  | ||||
| **Automation hooks** | ||||
| - Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task. | ||||
| - Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit. | ||||
|  | ||||
| ## 5. Offline Kit packaging | ||||
|  | ||||
| 1. Generate the credential bundle using the Offline Kit CLI:   | ||||
|    `offline-kit secrets add cisco-openvuln --client-id … --client-secret …` | ||||
| 2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`. | ||||
| 3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata). | ||||
| 4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror). | ||||
|  | ||||
| ## 6. Quota and throttling guidance | ||||
|  | ||||
| - Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5 000 requests/day per application.citeturn0search0turn3search6 | ||||
| - Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments. | ||||
| - Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented. | ||||
|  | ||||
| ## 7. Telemetry & Monitoring | ||||
|  | ||||
| - **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)** | ||||
|   - `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged` | ||||
|   - `cisco.parse.success`, `cisco.parse.failures` | ||||
|   - `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages` | ||||
| - **Shared HTTP metrics** via `SourceDiagnostics`: | ||||
|   - `concelier.source.http.requests{concelier.source="vndr-cisco"}` | ||||
|   - `concelier.source.http.failures{concelier.source="vndr-cisco"}` | ||||
|   - `concelier.source.http.duration{concelier.source="vndr-cisco"}` | ||||
| - **Structured logs** | ||||
|   - `Cisco fetch completed date=… pages=… added=…` (info) | ||||
|   - `Cisco parse completed parsed=… failures=…` (info) | ||||
|   - `Cisco map completed mapped=… failures=…` (info) | ||||
|   - Warnings surface when DTO serialization fails or GridFS payload is missing. | ||||
| - Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues. | ||||
|  | ||||
| ## 8. Incident response | ||||
|  | ||||
| - **Token compromise** – revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4. | ||||
| - **Persistent 401/403** – confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID. | ||||
| - **429 spikes** – inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco. | ||||
|  | ||||
| ## 9. References | ||||
|  | ||||
| - Cisco PSIRT openVuln API Authentication Guide.citeturn3search0 | ||||
| - Accessing the openVuln API using curl (token lifetime).citeturn3search7 | ||||
| - openVuln API rate limit documentation.citeturn0search0turn3search6 | ||||
| @@ -0,0 +1,151 @@ | ||||
| { | ||||
|   "title": "Concelier CVE & KEV Observability", | ||||
|   "uid": "concelier-cve-kev", | ||||
|   "schemaVersion": 38, | ||||
|   "version": 1, | ||||
|   "editable": true, | ||||
|   "timezone": "", | ||||
|   "time": { | ||||
|     "from": "now-24h", | ||||
|     "to": "now" | ||||
|   }, | ||||
|   "refresh": "5m", | ||||
|   "templating": { | ||||
|     "list": [ | ||||
|       { | ||||
|         "name": "datasource", | ||||
|         "type": "datasource", | ||||
|         "query": "prometheus", | ||||
|         "refresh": 1, | ||||
|         "hide": 0 | ||||
|       } | ||||
|     ] | ||||
|   }, | ||||
|   "panels": [ | ||||
|     { | ||||
|       "type": "timeseries", | ||||
|       "title": "CVE fetch success vs failure", | ||||
|       "gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "ops", | ||||
|           "custom": { | ||||
|             "drawStyle": "line", | ||||
|             "lineWidth": 2, | ||||
|             "fillOpacity": 10 | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "rate(cve_fetch_success_total[5m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "success" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "B", | ||||
|           "expr": "rate(cve_fetch_failures_total[5m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "failure" | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "type": "timeseries", | ||||
|       "title": "KEV fetch cadence", | ||||
|       "gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "ops", | ||||
|           "custom": { | ||||
|             "drawStyle": "line", | ||||
|             "lineWidth": 2, | ||||
|             "fillOpacity": 10 | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "rate(kev_fetch_success_total[30m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "success" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "B", | ||||
|           "expr": "rate(kev_fetch_failures_total[30m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "failure" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "C", | ||||
|           "expr": "rate(kev_fetch_unchanged_total[30m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "unchanged" | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "type": "table", | ||||
|       "title": "KEV parse anomalies (24h)", | ||||
|       "gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "short" | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))", | ||||
|           "format": "table", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" } | ||||
|         } | ||||
|       ], | ||||
|       "transformations": [ | ||||
|         { | ||||
|           "id": "organize", | ||||
|           "options": { | ||||
|             "renameByName": { | ||||
|               "Value": "count" | ||||
|             } | ||||
|           } | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "type": "timeseries", | ||||
|       "title": "Advisories emitted", | ||||
|       "gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "ops", | ||||
|           "custom": { | ||||
|             "drawStyle": "line", | ||||
|             "lineWidth": 2, | ||||
|             "fillOpacity": 10 | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "rate(cve_map_success_total[15m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "CVE" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "B", | ||||
|           "expr": "rate(kev_map_advisories_total[24h])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "KEV" | ||||
|         } | ||||
|       ] | ||||
|     } | ||||
|   ] | ||||
| } | ||||
							
								
								
									
										143
									
								
								docs/modules/concelier/operations/connectors/cve-kev.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										143
									
								
								docs/modules/concelier/operations/connectors/cve-kev.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,143 @@ | ||||
| # Concelier CVE & KEV Connector Operations | ||||
|  | ||||
| This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments. | ||||
|  | ||||
| ## 1. CVE Services Connector (`source:cve:*`) | ||||
|  | ||||
| ### 1.1 Prerequisites | ||||
|  | ||||
| - CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API. | ||||
| - Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers. | ||||
| - Updated `concelier.yaml` (or the matching environment variables) with the following section: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     cve: | ||||
|       baseEndpoint: "https://cveawg.mitre.org/api/" | ||||
|       apiOrg: "ORG123" | ||||
|       apiUser: "user@example.org" | ||||
|       apiKeyFile: "/var/run/secrets/concelier/cve-api-key" | ||||
|       seedDirectory: "./seed-data/cve" | ||||
|       pageSize: 200 | ||||
|       maxPagesPerFetch: 5 | ||||
|       initialBackfill: "30.00:00:00" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:10:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`. | ||||
|  | ||||
| > 🪙  When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repo’s `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied. | ||||
|  | ||||
| ### 1.2 Smoke Test (staging) | ||||
|  | ||||
| 1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials. | ||||
| 2. Trigger one end-to-end cycle: | ||||
|    - Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map` | ||||
|    - REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }` | ||||
| 3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`): | ||||
|    - `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged` | ||||
|    - `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine` | ||||
|    - `cve.map.success` | ||||
| 4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat. | ||||
| 5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`. | ||||
| 6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced. | ||||
|  | ||||
| ### 1.3 Production Monitoring | ||||
|  | ||||
| - **Dashboards** – Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts: | ||||
|   - `rate(cve_fetch_failures_total[5m]) > 0` for 10 minutes (`severity=warning`) | ||||
|   - `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`) | ||||
|   - `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies | ||||
| - **Logs** – Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests. | ||||
| - **Grafana pack** – Import `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout. | ||||
| - **Backfill window** – Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes. | ||||
|  | ||||
| ### 1.4 Staging smoke log (2025-10-15) | ||||
|  | ||||
| While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests: | ||||
|  | ||||
| - Command: `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"` | ||||
| - Summary log emitted by the connector: | ||||
|   ``` | ||||
|   CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1 | ||||
|   ``` | ||||
| - Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`: | ||||
|   | Metric | Value | | ||||
|   |--------|-------| | ||||
|   | `cve.fetch.attempts` | 1 | | ||||
|   | `cve.fetch.success` | 1 | | ||||
|   | `cve.fetch.documents` | 1 | | ||||
|   | `cve.parse.success` | 1 | | ||||
|   | `cve.map.success` | 1 | | ||||
|  | ||||
| The Grafana pack `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place. | ||||
|  | ||||
| ## 2. CISA KEV Connector (`source:kev:*`) | ||||
|  | ||||
| ### 2.1 Prerequisites | ||||
|  | ||||
| - Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`. | ||||
| - No credentials are required, but the HTTP allow-list must include `www.cisa.gov`. | ||||
| - Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed): | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     kev: | ||||
|       feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json" | ||||
|       requestTimeout: "00:01:00" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| ### 2.2 Schema validation & anomaly handling | ||||
|  | ||||
| The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons: | ||||
|  | ||||
| | Reason | Meaning | | ||||
| | --- | --- | | ||||
| | `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. | | ||||
| | `countMismatch` | Catalog `count` field disagreed with the actual entry total. | | ||||
| | `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). | | ||||
|  | ||||
| Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers. | ||||
|  | ||||
| ### 2.3 Smoke Test (staging) | ||||
|  | ||||
| 1. Deploy the configuration and restart Concelier. | ||||
| 2. Trigger a pipeline run: | ||||
|    - CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map` | ||||
|    - REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }` | ||||
| 3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`: | ||||
|    - `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures` | ||||
|    - `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`) | ||||
|    - `kev.map.advisories` (tag `catalogVersion`) | ||||
| 4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase). | ||||
| 5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta. | ||||
| 6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written. | ||||
|  | ||||
| ### 2.4 Production Monitoring | ||||
|  | ||||
| - Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`. | ||||
| - Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror. | ||||
| - Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs. | ||||
| - Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run. | ||||
|  | ||||
| ### 2.5 Known good dashboard tiles | ||||
|  | ||||
| Add the following panels to the Concelier observability board: | ||||
|  | ||||
| | Metric | Recommended visualisation | | ||||
| |--------|---------------------------| | ||||
| | `rate(kev_fetch_success_total[30m])` | Single-stat (last 24 h) with warning threshold `>0` | | ||||
| | `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area – highlights daily release size | | ||||
| | `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table – anomaly breakdown (matches dashboard panel) | | ||||
| | `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted | | ||||
|  | ||||
| ## 3. Runbook updates | ||||
|  | ||||
| - Record staging/production smoke test results (date, catalog version, advisory counts) in your team’s change log. | ||||
| - Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime. | ||||
| - Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics). | ||||
| - Version-control dashboard tweaks alongside `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores. | ||||
							
								
								
									
										123
									
								
								docs/modules/concelier/operations/connectors/ghsa.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										123
									
								
								docs/modules/concelier/operations/connectors/ghsa.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,123 @@ | ||||
| # Concelier GHSA Connector – Operations Runbook | ||||
|  | ||||
| _Last updated: 2025-10-16_ | ||||
|  | ||||
| ## 1. Overview | ||||
| The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents. | ||||
|  | ||||
| ## 2. Rate-limit telemetry | ||||
| The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry: | ||||
|  | ||||
| | Metric | Description | Tags | | ||||
| |--------|-------------|------| | ||||
| | `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). | | ||||
| | `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. | | ||||
|  | ||||
| ### Dashboards & alerts | ||||
| - Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes. | ||||
| - Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10 %** for longer than a single reset window helps avoid secondary limits. | ||||
| - Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles. | ||||
| - Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective. | ||||
|  | ||||
| ## 3. Logging signals | ||||
| When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits: | ||||
| ``` | ||||
| GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%) | ||||
| ``` | ||||
| When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`). | ||||
|  | ||||
| After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly. | ||||
|  | ||||
| ## 4. Configuration knobs (`concelier.yaml`) | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     ghsa: | ||||
|       apiToken: "${GITHUB_PAT}" | ||||
|       pageSize: 50 | ||||
|       requestDelay: "00:00:00.200" | ||||
|       failureBackoff: "00:05:00" | ||||
|       rateLimitWarningThreshold: 500    # warn below this many remaining calls | ||||
|       secondaryRateLimitBackoff: "00:02:00"  # fallback delay when GitHub omits Retry-After | ||||
| ``` | ||||
|  | ||||
| ### Recommendations | ||||
| - Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption. | ||||
| - Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative. | ||||
| - For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHub’s secondary-limit guidance. | ||||
|  | ||||
| #### Default job schedule | ||||
|  | ||||
| | Job kind | Cron | Timeout | Lease | | ||||
| |----------|------|---------|-------| | ||||
| | `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes | | ||||
| | `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes | | ||||
| | `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes | | ||||
|  | ||||
| These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner. | ||||
|  | ||||
| ## 5. Provisioning credentials | ||||
|  | ||||
| Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`. | ||||
|  | ||||
| ### Docker Compose (stack operators) | ||||
| ```yaml | ||||
| services: | ||||
|   concelier: | ||||
|     environment: | ||||
|       CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat | ||||
|     secrets: | ||||
|       - ghsa_pat | ||||
|  | ||||
| secrets: | ||||
|   ghsa_pat: | ||||
|     file: ./secrets/ghsa_pat.txt  # contains only the PAT value | ||||
| ``` | ||||
|  | ||||
| ### Helm values (cluster operators) | ||||
| ```yaml | ||||
| concelier: | ||||
|   extraEnv: | ||||
|     - name: CONCELIER__SOURCES__GHSA__APITOKEN | ||||
|       valueFrom: | ||||
|         secretKeyRef: | ||||
|           name: concelier-ghsa | ||||
|           key: apiToken | ||||
|  | ||||
| extraSecrets: | ||||
|   concelier-ghsa: | ||||
|     apiToken: "<paste PAT here or source from external secret store>" | ||||
| ``` | ||||
|  | ||||
| After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads. | ||||
|  | ||||
| When enabling GHSA the first time, run a staged backfill: | ||||
|  | ||||
| 1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours. | ||||
| 2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`. | ||||
| 3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes). | ||||
|  | ||||
| ## 6. Runbook steps when throttled | ||||
| 1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`). | ||||
| 2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff. | ||||
| 3. If rate limits stay exhausted: | ||||
|    - Verify no other jobs are sharing the PAT. | ||||
|    - Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size. | ||||
|    - Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier. | ||||
| 4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour. | ||||
|  | ||||
| ## 7. Alert integration quick reference | ||||
| - Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) – use `histogram_quantile(0.99, ...)` to trend capacity. | ||||
| - VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs. | ||||
| - Grafana: stack remaining + used to visualise total limit per resource. | ||||
|  | ||||
| ## 8. Canonical metric fallback analytics | ||||
| When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data. | ||||
|  | ||||
| - Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`. | ||||
| - Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters. | ||||
| - Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data. | ||||
							
								
								
									
										122
									
								
								docs/modules/concelier/operations/connectors/ics-cisa.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										122
									
								
								docs/modules/concelier/operations/connectors/ics-cisa.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,122 @@ | ||||
| # Concelier CISA ICS Connector Operations | ||||
|  | ||||
| This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations. | ||||
|  | ||||
| ## 1. Credential Provisioning | ||||
|  | ||||
| 1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).   | ||||
| 2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics: | ||||
|    - `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`). | ||||
|    - `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`). | ||||
|    - `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness. | ||||
| 3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`. | ||||
| 4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`. | ||||
|  | ||||
| > ℹ️  GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git. | ||||
|  | ||||
| ## 2. Feed Validation | ||||
|  | ||||
| Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value): | ||||
|  | ||||
| ```bash | ||||
| curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \ | ||||
|      "https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>" | ||||
| ``` | ||||
|  | ||||
| If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped. | ||||
|  | ||||
| ## 3. Configuration Snippet | ||||
|  | ||||
| Add the connector configuration to `concelier.yaml` (or equivalent environment variables): | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     icscisa: | ||||
|       govDelivery: | ||||
|         code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}" | ||||
|         topics: | ||||
|           - "USDHSCISA_16" | ||||
|           - "USDHSCISA_19" | ||||
|           - "USDHSCISA_17" | ||||
|       rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA" | ||||
|       requestDelay: "00:00:01" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| Environment variable example: | ||||
|  | ||||
| ```bash | ||||
| export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF" | ||||
| ``` | ||||
|  | ||||
| Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded. | ||||
|  | ||||
|  | ||||
| Optional tuning keys (set only when needed): | ||||
|  | ||||
| - `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls. | ||||
| - `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1. | ||||
| - `enableDetailScrape` — toggle HTML detail fallback (defaults to true). | ||||
| - `captureAttachments` — collect PDF attachments from detail pages (defaults to true). | ||||
| - `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout. | ||||
|  | ||||
| ## 4. Seeding Without GovDelivery | ||||
|  | ||||
| If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch: | ||||
|  | ||||
| 1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`. | ||||
| 2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds. | ||||
| 3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories. | ||||
| 4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed. | ||||
|  | ||||
| > The CSVs are licensed under ODbL 1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them. | ||||
|  | ||||
| ## 4. Integration Validation | ||||
|  | ||||
| 1. Ensure secrets are in place and restart the Concelier workers. | ||||
| 2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic: | ||||
|    ```bash | ||||
|    CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \  | ||||
|    CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \  | ||||
|    stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map | ||||
|    ``` | ||||
| 3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits. | ||||
| 4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run. | ||||
|  | ||||
| ## 4. Rotation & Incident Response | ||||
|  | ||||
| - Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.   | ||||
| - Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.   | ||||
| - If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault. | ||||
|  | ||||
| ## 5. Offline Kit Handling | ||||
|  | ||||
| Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`: | ||||
|  | ||||
| ``` | ||||
| CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF | ||||
| ``` | ||||
|  | ||||
| The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user. | ||||
|  | ||||
| ## 6. Telemetry & Monitoring | ||||
|  | ||||
| The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out. | ||||
|  | ||||
| - `icscisa.fetch.*` – counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`). | ||||
| - `icscisa.parse.*` – counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document. | ||||
| - `icscisa.detail.*` – counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages. | ||||
| - `icscisa.map.*` – counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out. | ||||
|  | ||||
| Suggested alerts: | ||||
|  | ||||
| - `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues. | ||||
| - `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change). | ||||
| - `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating. | ||||
| - Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns. | ||||
|  | ||||
| ## 6. Related Tasks | ||||
|  | ||||
| - `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault. | ||||
| - `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`. | ||||
							
								
								
									
										74
									
								
								docs/modules/concelier/operations/connectors/kisa.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										74
									
								
								docs/modules/concelier/operations/connectors/kisa.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,74 @@ | ||||
| # Concelier KISA Connector Operations | ||||
|  | ||||
| Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`. | ||||
| - Connector options defined under `concelier:sources:kisa`: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     kisa: | ||||
|       feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do" | ||||
|       detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do" | ||||
|       detailPageUri: "https://knvd.krcert.or.kr/detailDos.do" | ||||
|       maxAdvisoriesPerFetch: 10 | ||||
|       requestDelay: "00:00:01" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically. | ||||
|  | ||||
| ## 2. Staging Smoke Test | ||||
|  | ||||
| 1. Restart the Concelier workers so the KISA options bind. | ||||
| 2. Run a full connector cycle: | ||||
|    - CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map` | ||||
|    - REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }` | ||||
| 3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`): | ||||
|    - `kisa.feed.success`, `kisa.feed.items` | ||||
|    - `kisa.detail.success` / `.failures` | ||||
|    - `kisa.parse.success` / `.failures` | ||||
|    - `kisa.map.success` / `.failures` | ||||
|    - `kisa.cursor.updates` | ||||
| 4. Inspect logs for structured entries: | ||||
|    - `KISA feed returned {ItemCount}` | ||||
|    - `KISA fetched detail for {Idx} … category={Category}` | ||||
|    - `KISA mapped advisory {AdvisoryId} (severity={Severity})` | ||||
|    - Absence of warnings such as `document missing GridFS payload`. | ||||
| 5. Validate MongoDB state: | ||||
|    - `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`. | ||||
|    - DTO store contains `schemaVersion="kisa.detail.v1"`. | ||||
|    - Advisories include aliases (`IDX`, CVE) and `language="ko"`. | ||||
|    - `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`. | ||||
|  | ||||
| ## 3. Production Monitoring | ||||
|  | ||||
| - **Dashboards** – Add the following Prometheus/OTEL expressions: | ||||
|   - `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])` | ||||
|   - `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0` | ||||
|   - `increase(kisa_parse_failures_total[1h])` for storage/JSON issues | ||||
|   - `increase(kisa_map_failures_total[1h])` to flag schema drift | ||||
|   - `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn | ||||
| - **Alerts** – Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`. | ||||
| - **Logs** – Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`. | ||||
|  | ||||
| ## 4. Localisation Handling | ||||
|  | ||||
| - Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF‑8 and avoid transliteration. | ||||
| - HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely. | ||||
| - Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering. | ||||
|  | ||||
| ## 5. Fixture & Regression Maintenance | ||||
|  | ||||
| - Regression fixtures: `src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`. | ||||
| - Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`. | ||||
| - The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating. | ||||
|  | ||||
| ## 6. Known Issues | ||||
|  | ||||
| - RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds. | ||||
| - Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills. | ||||
| - If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout. | ||||
							
								
								
									
										86
									
								
								docs/modules/concelier/operations/connectors/msrc.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										86
									
								
								docs/modules/concelier/operations/connectors/msrc.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,86 @@ | ||||
| # Concelier MSRC Connector – Azure AD Onboarding Brief | ||||
|  | ||||
| _Drafted: 2025-10-15_ | ||||
|  | ||||
| ## 1. App registration requirements | ||||
|  | ||||
| - **Tenant**: shared StellaOps production Azure AD. | ||||
| - **Application type**: confidential client (web/API) issuing client credentials. | ||||
| - **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once. | ||||
| - **Token audience**: `https://api.msrc.microsoft.com/`. | ||||
| - **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`. | ||||
|  | ||||
| ## 2. Secret/credential policy | ||||
|  | ||||
| - Maintain two client secrets (primary + standby) rotating every 90 days. | ||||
| - Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store. | ||||
| - Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry. | ||||
|  | ||||
| ## 3. Concelier configuration sample | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     vndr.msrc: | ||||
|       tenantId: "<azure-tenant-guid>" | ||||
|       clientId: "<app-registration-client-id>" | ||||
|       clientSecret: "<pull from secret store>" | ||||
|       apiVersion: "2024-08-01" | ||||
|       locale: "en-US" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
|       cursorOverlapMinutes: 10 | ||||
|       downloadCvrf: false  # set true to persist CVRF ZIP alongside JSON detail | ||||
| ``` | ||||
|  | ||||
| ## 4. CVRF artefacts | ||||
|  | ||||
| - The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default. | ||||
| - Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access. | ||||
| - Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval. | ||||
|  | ||||
| ### 4.1 State seeding helper | ||||
|  | ||||
| Use `src/Tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file: | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "source": "vndr.msrc", | ||||
|   "cursor": { | ||||
|     "lastModifiedCursor": "2024-01-01T00:00:00Z" | ||||
|   }, | ||||
|   "documents": [ | ||||
|     { | ||||
|       "uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001", | ||||
|       "contentFile": "./seeds/adv2024-0001.json", | ||||
|       "contentType": "application/json", | ||||
|       "metadata": { "msrc.vulnerabilityId": "ADV2024-0001" }, | ||||
|       "addToPendingDocuments": true | ||||
|     }, | ||||
|     { | ||||
|       "uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip", | ||||
|       "contentFile": "./seeds/adv2024-0001.cvrf.zip", | ||||
|       "contentType": "application/zip", | ||||
|       "status": "mapped", | ||||
|       "addToPendingDocuments": false | ||||
|     } | ||||
|   ] | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Run the helper: | ||||
|  | ||||
| ```bash | ||||
| dotnet run --project src/Tools/SourceStateSeeder -- \ | ||||
|   --connection-string "mongodb://localhost:27017" \ | ||||
|   --database concelier \ | ||||
|   --input seeds/msrc-backfill.json | ||||
| ``` | ||||
|  | ||||
| Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded. | ||||
|  | ||||
| ## 5. Outstanding items | ||||
|  | ||||
| - Ops to confirm tenant/app names and provide client credentials through the secure channel. | ||||
| - Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials. | ||||
| - Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions. | ||||
							
								
								
									
										48
									
								
								docs/modules/concelier/operations/connectors/nkcki.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										48
									
								
								docs/modules/concelier/operations/connectors/nkcki.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,48 @@ | ||||
| # NKCKI Connector Operations Guide | ||||
|  | ||||
| ## Overview | ||||
|  | ||||
| The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring. | ||||
|  | ||||
| ## Configuration | ||||
|  | ||||
| Key options exposed through `concelier:sources:ru-nkcki:http`: | ||||
|  | ||||
| - `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`). | ||||
| - `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`). | ||||
| - `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`). | ||||
| - `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios. | ||||
| - `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness. | ||||
|  | ||||
| When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit. | ||||
|  | ||||
| ## Telemetry | ||||
|  | ||||
| `RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`: | ||||
|  | ||||
| - `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures` | ||||
| - `nkcki.listing.pages.visited` (histogram, `pages`) | ||||
| - `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new` | ||||
| - `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures` | ||||
| - `nkcki.entries.processed` (histogram, `entries`) | ||||
|  | ||||
| Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates. | ||||
|  | ||||
| ## Archive Backfill Strategy | ||||
|  | ||||
| Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy: | ||||
|  | ||||
| 1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms. | ||||
| 2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked. | ||||
| 3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`. | ||||
| 4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`). | ||||
|  | ||||
| For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs. | ||||
|  | ||||
| ## Failure Handling | ||||
|  | ||||
| - Listing failures mark the source state with exponential backoff while attempting cache replay. | ||||
| - Bulletin fetches fall back to cached copies before surfacing an error. | ||||
| - Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`src/Tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros. | ||||
|  | ||||
| Refer to `ru-nkcki` entries in `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items. | ||||
							
								
								
									
										24
									
								
								docs/modules/concelier/operations/connectors/osv.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										24
									
								
								docs/modules/concelier/operations/connectors/osv.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,24 @@ | ||||
| # Concelier OSV Connector – Operations Notes | ||||
|  | ||||
| _Last updated: 2025-10-16_ | ||||
|  | ||||
| The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4. | ||||
|  | ||||
| ## 1. Canonical metric fallbacks | ||||
| - When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`. | ||||
| - Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories. | ||||
| - Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data. | ||||
|  | ||||
| ## 2. CWE provenance | ||||
| - `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value. | ||||
| - If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker. | ||||
|  | ||||
| ## 3. Dashboards & alerts | ||||
| - Extend existing merge dashboards with the new counter: | ||||
|   - Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly. | ||||
|   - Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only). | ||||
| - Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update. | ||||
|  | ||||
| ## 4. Runbook updates | ||||
| - Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`. | ||||
| - When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs. | ||||
							
								
								
									
										238
									
								
								docs/modules/concelier/operations/mirror.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										238
									
								
								docs/modules/concelier/operations/mirror.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,238 @@ | ||||
| # Concelier & Excititor Mirror Operations | ||||
|  | ||||
| This runbook describes how Stella Ops operates the managed mirrors under `*.stella-ops.org`. | ||||
| It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant | ||||
| authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - **Authority access** – client credentials (`client_id` + secret) authorised for | ||||
|   `concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git. | ||||
| - **Signed TLS certificates** – wildcard or per-domain (`mirror-primary`, `mirror-community`). | ||||
|   Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets. | ||||
| - **Mirror gateway credentials** – Basic Auth htpasswd files per domain. Generate with | ||||
|   `htpasswd -B`. Operators distribute credentials to downstream consumers. | ||||
| - **Export artifact source** – read access to the canonical S3 buckets (or rsync share) | ||||
|   that hold `concelier` JSON bundles and `excititor` VEX exports. | ||||
| - **Persistent volumes** – storage for Concelier job metadata and mirror export trees. | ||||
|   For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`, | ||||
|   `excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout. | ||||
|  | ||||
| ### 1.1 Service configuration quick reference | ||||
|  | ||||
| Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`. | ||||
| Key knobs: | ||||
|  | ||||
| - `CONCELIER__MIRROR__EXPORTROOT` – root folder containing export snapshots (`<exportId>/mirror/*`). | ||||
| - `CONCELIER__MIRROR__ACTIVEEXPORTID` – optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory. | ||||
| - `CONCELIER__MIRROR__REQUIREAUTHENTICATION` – default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`. | ||||
| - `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` – budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`. | ||||
| - `CONCELIER__MIRROR__DOMAINS__{n}__ID` – domain identifier matching the exporter manifest; additional keys configure display name and rate budgets. | ||||
|  | ||||
| > The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures. | ||||
|  | ||||
| Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded. | ||||
|  | ||||
| ### 1.2 Mirror connector configuration | ||||
|  | ||||
| Downstream Concelier instances ingest published bundles using the `StellaOpsMirrorConnector`. Operators running the connector in air‑gapped or limited connectivity environments can tune the following options (environment prefix `CONCELIER__SOURCES__STELLAOPSMIRROR__`): | ||||
|  | ||||
| - `BASEADDRESS` – absolute mirror root (e.g., `https://mirror-primary.stella-ops.org`). | ||||
| - `INDEXPATH` – relative path to the mirror index (`/concelier/exports/index.json` by default). | ||||
| - `DOMAINID` – mirror domain identifier from the index (`primary`, `community`, etc.). | ||||
| - `HTTPTIMEOUT` – request timeout; raise when mirrors sit behind slow WAN links. | ||||
| - `SIGNATURE__ENABLED` – require detached JWS verification for `bundle.json`. | ||||
| - `SIGNATURE__KEYID` / `SIGNATURE__PROVIDER` – expected signing key metadata. | ||||
| - `SIGNATURE__PUBLICKEYPATH` – PEM fallback used when the mirror key registry is offline. | ||||
|  | ||||
| The connector keeps a per-export fingerprint (bundle digest + generated-at timestamp) and tracks outstanding document IDs. If a scan is interrupted, the next run resumes parse/map work using the stored fingerprint and pending document lists—no network requests are reissued unless the upstream digest changes. | ||||
|  | ||||
| ## 2. Secret & certificate layout | ||||
|  | ||||
| ### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`) | ||||
|  | ||||
| - `deploy/compose/env/mirror.env.example` – copy to `.env` and adjust quotas or domain IDs. | ||||
| - `deploy/compose/mirror-secrets/` – mount read-only into `/run/secrets`. Place: | ||||
|   - `concelier-authority-client` – Authority client secret. | ||||
|   - `excititor-authority-client` (optional) – reserve for future authn. | ||||
| - `deploy/compose/mirror-gateway/tls/` – PEM-encoded cert/key pairs: | ||||
|   - `mirror-primary.crt`, `mirror-primary.key` | ||||
|   - `mirror-community.crt`, `mirror-community.key` | ||||
| - `deploy/compose/mirror-gateway/secrets/` – htpasswd files: | ||||
|   - `mirror-primary.htpasswd` | ||||
|   - `mirror-community.htpasswd` | ||||
|  | ||||
| ### Helm (`deploy/helm/stellaops/values-mirror.yaml`) | ||||
|  | ||||
| Create secrets in the target namespace: | ||||
|  | ||||
| ```bash | ||||
| kubectl create secret generic concelier-mirror-auth \ | ||||
|   --from-file=concelier-authority-client=concelier-authority-client | ||||
|  | ||||
| kubectl create secret generic excititor-mirror-auth \ | ||||
|   --from-file=excititor-authority-client=excititor-authority-client | ||||
|  | ||||
| kubectl create secret tls mirror-gateway-tls \ | ||||
|   --cert=mirror-primary.crt --key=mirror-primary.key | ||||
|  | ||||
| kubectl create secret generic mirror-gateway-htpasswd \ | ||||
|   --from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd | ||||
| ``` | ||||
|  | ||||
| > Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients. | ||||
|  | ||||
| ## 3. Deployment | ||||
|  | ||||
| ### 3.1 Docker Compose (edge mirrors, lab validation) | ||||
|  | ||||
| 1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env` | ||||
| 2. Populate secrets/tls directories as described above. | ||||
| 3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted | ||||
|    on the host path backing the `concelier-exports` and `excititor-exports` volumes. | ||||
| 4. Run the profile validator: `deploy/tools/validate-profiles.sh`. | ||||
| 5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`. | ||||
|  | ||||
| ### 3.2 Helm (production mirrors) | ||||
|  | ||||
| 1. Provision PVCs sized for mirror bundles (baseline: 20 GiB per domain). | ||||
| 2. Create secrets/tls config maps (§2). | ||||
| 3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`. | ||||
| 4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by | ||||
|    your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout). | ||||
|  | ||||
| ## 4. Artifact sync workflow | ||||
|  | ||||
| Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor | ||||
| export jobs. Recommended sync pattern: | ||||
|  | ||||
| ### 4.1 Compose host (systemd timer) | ||||
|  | ||||
| `/usr/local/bin/mirror-sync.sh`: | ||||
|  | ||||
| ```bash | ||||
| #!/usr/bin/env bash | ||||
| set -euo pipefail | ||||
| export AWS_ACCESS_KEY_ID=… | ||||
| export AWS_SECRET_ACCESS_KEY=… | ||||
|  | ||||
| aws s3 sync s3://mirror-stellaops/concelier/latest \ | ||||
|   /opt/stellaops/mirror-data/concelier --delete --size-only | ||||
|  | ||||
| aws s3 sync s3://mirror-stellaops/excititor/latest \ | ||||
|   /opt/stellaops/mirror-data/excititor --delete --size-only | ||||
| ``` | ||||
|  | ||||
| Schedule with a systemd timer every 5 minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*` | ||||
| into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and | ||||
| `EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`. | ||||
|  | ||||
| ### 4.2 Kubernetes (CronJob) | ||||
|  | ||||
| Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs: | ||||
|  | ||||
| ```yaml | ||||
| apiVersion: batch/v1 | ||||
| kind: CronJob | ||||
| metadata: | ||||
|   name: mirror-sync | ||||
| spec: | ||||
|   schedule: "*/5 * * * *" | ||||
|   jobTemplate: | ||||
|     spec: | ||||
|       template: | ||||
|         spec: | ||||
|           containers: | ||||
|           - name: sync | ||||
|             image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea | ||||
|             command: | ||||
|               - /bin/sh | ||||
|               - -c | ||||
|               - > | ||||
|                 aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only && | ||||
|                 aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only | ||||
|             volumeMounts: | ||||
|               - name: concelier-exports | ||||
|                 mountPath: /exports/concelier | ||||
|               - name: excititor-exports | ||||
|                 mountPath: /exports/excititor | ||||
|             envFrom: | ||||
|               - secretRef: | ||||
|                   name: mirror-sync-aws | ||||
|           restartPolicy: OnFailure | ||||
|           volumes: | ||||
|             - name: concelier-exports | ||||
|               persistentVolumeClaim: | ||||
|                 claimName: concelier-mirror-exports | ||||
|             - name: excititor-exports | ||||
|               persistentVolumeClaim: | ||||
|                 claimName: excititor-mirror-exports | ||||
| ``` | ||||
|  | ||||
| ## 5. CDN integration | ||||
|  | ||||
| 1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer). | ||||
| 2. Honour the response headers emitted by the gateway and Concelier/Excititor: | ||||
|    `Cache-Control: public, max-age=300, immutable` for mirror payloads. | ||||
| 3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs: | ||||
|    - Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60 s. | ||||
|    - Bundle/manifest payloads → 300 s. | ||||
| 4. Forward the `Authorization` header—Basic Auth terminates at the gateway. | ||||
| 5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging | ||||
|    to SIEM for anomaly detection. | ||||
|  | ||||
| ## 6. Smoke tests | ||||
|  | ||||
| After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses): | ||||
|  | ||||
| ```bash | ||||
| # Index with Basic Auth | ||||
| curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys' | ||||
|  | ||||
| # Mirror manifest signature and cache headers | ||||
| curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \ | ||||
|   | tee /tmp/manifest-headers.txt | ||||
| grep -E '^Cache-Control: ' /tmp/manifest-headers.txt   # expect public, max-age=300, immutable | ||||
|  | ||||
| # Excititor consensus bundle metadata | ||||
| curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \ | ||||
|   | jq '.exports[].exportKey' | ||||
|  | ||||
| # Signed bundle + detached JWS (spot check digests) | ||||
| curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \ | ||||
|   -o bundle.json.jws | ||||
| cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json | ||||
|  | ||||
| # Service-level auth check (inside cluster – no gateway credentials) | ||||
| kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \ | ||||
|   | head -n 5   # expect HTTP/1.1 401 with WWW-Authenticate: Bearer | ||||
|  | ||||
| # Rate limit smoke (repeat quickly; second call should return 429 + Retry-After) | ||||
| for i in 1 2; do | ||||
|   curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \ | ||||
|     -u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)' | ||||
|   sleep 1 | ||||
| done | ||||
| ``` | ||||
|  | ||||
| Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway` | ||||
| should show `X-Cache-Status: HIT/MISS`. | ||||
|  | ||||
| ## 7. Maintenance & rotation | ||||
|  | ||||
| - **Bundle freshness** – alert if sync job lag exceeds 15 minutes or if `concelier` logs | ||||
|   `Mirror export root is not configured`. | ||||
| - **Secret rotation** – change Authority client secrets and Basic Auth credentials quarterly. | ||||
|   Update the mounted secrets and restart deployments (`docker compose restart concelier` or | ||||
|   `kubectl rollout restart deploy/stellaops-concelier`). | ||||
| - **TLS renewal** – reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`). | ||||
| - **Quota tuning** – adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file. | ||||
|   Align CDN rate limits and inform downstreams. | ||||
|  | ||||
| ## 8. References | ||||
|  | ||||
| - Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`, | ||||
|   `deploy/helm/stellaops/values-mirror.yaml` | ||||
| - Mirror architecture dossiers: `docs/modules/concelier/architecture.md`, | ||||
|   `docs/modules/excititor/mirrors.md` | ||||
| - Export bundling: `docs/modules/devops/architecture.md` §3, `docs/modules/excititor/architecture.md` §7 | ||||
		Reference in New Issue
	
	Block a user