feat: Implement Scheduler Worker Options and Planner Loop
	
		
			
	
		
	
	
		
	
		
			Some checks failed
		
		
	
	
		
			
				
	
				Docs CI / lint-and-preview (push) Has been cancelled
				
			
		
		
	
	
				
					
				
			
		
			Some checks failed
		
		
	
	Docs CI / lint-and-preview (push) Has been cancelled
				
			- Added `SchedulerWorkerOptions` class to encapsulate configuration for the scheduler worker. - Introduced `PlannerBackgroundService` to manage the planner loop, fetching and processing planning runs. - Created `PlannerExecutionService` to handle the execution logic for planning runs, including impact targeting and run persistence. - Developed `PlannerExecutionResult` and `PlannerExecutionStatus` to standardize execution outcomes. - Implemented validation logic within `SchedulerWorkerOptions` to ensure proper configuration. - Added documentation for the planner loop and impact targeting features. - Established health check endpoints and authentication mechanisms for the Signals service. - Created unit tests for the Signals API to ensure proper functionality and response handling. - Configured options for authority integration and fallback authentication methods.
This commit is contained in:
		| @@ -1,12 +1,12 @@ | ||||
| # component_architecture_concelier.md — **Stella Ops Concelier** (2025Q4) | ||||
| # component_architecture_concelier.md — **Stella Ops Concelier** (Sprint 22) | ||||
|  | ||||
| > **Scope.** Implementation‑ready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices. | ||||
| > **Scope.** Implementation-ready architecture for **Concelier**: the advisory ingestion and Link-Not-Merge (LNM) observation pipeline that produces deterministic raw observations, correlation linksets, and evidence events consumed by Policy Engine, Console, CLI, and Export centers. Covers domain models, connectors, observation/linkset builders, storage schema, events, APIs, performance, security, and test matrices. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins. | ||||
| **Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), persist them as immutable **observations** under the Aggregation-Only Contract (AOC), construct **linksets** that correlate observations without merging or precedence, and export deterministic evidence bundles (JSON, Trivy DB, Offline Kit) for downstream policy evaluation and operator tooling. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| @@ -21,10 +21,12 @@ | ||||
| **Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting: | ||||
|  | ||||
| * **Scheduler** with distributed locks (Mongo backed). | ||||
| * **Connectors** (fetch/parse/map). | ||||
| * **Merger** (canonical record assembly + precedence). | ||||
| * **Exporters** (JSON, Trivy DB). | ||||
| * **Minimal REST** for health/status/trigger/export. | ||||
| * **Connectors** (fetch/parse/map) that emit immutable observation candidates. | ||||
| * **Observation writer** enforcing AOC invariants via `AOCWriteGuard`. | ||||
| * **Linkset builder** that correlates observations into `advisory_linksets` and annotates conflicts. | ||||
| * **Event publisher** emitting `advisory.observation.updated` and `advisory.linkset.updated` messages. | ||||
| * **Exporters** (JSON, Trivy DB, Offline Kit slices) fed from observation/linkset stores. | ||||
| * **Minimal REST** for health/status/trigger/export and observation/linkset reads. | ||||
|  | ||||
| **Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter. | ||||
|  | ||||
| @@ -36,113 +38,96 @@ | ||||
|  | ||||
| ### 2.1 Core entities | ||||
|  | ||||
| **Advisory** | ||||
| #### AdvisoryObservation | ||||
|  | ||||
| ``` | ||||
| advisoryId          // internal GUID | ||||
| advisoryKey         // stable string key (e.g., CVE-2025-12345 or vendor ID) | ||||
| title               // short title (best-of from sources) | ||||
| summary             // normalized summary (English; i18n optional) | ||||
| published           // earliest source timestamp | ||||
| modified            // latest source timestamp | ||||
| severity            // normalized {none, low, medium, high, critical} | ||||
| cvss                // {v2?, v3?, v4?} objects (vector, baseScore, severity, source) | ||||
| exploitKnown        // bool (e.g., KEV/active exploitation flags) | ||||
| references[]        // typed links (advisory, kb, patch, vendor, exploit, blog) | ||||
| sources[]           // provenance for traceability (doc digests, URIs) | ||||
| ``` | ||||
|  | ||||
| **Alias** | ||||
|  | ||||
| ``` | ||||
| advisoryId | ||||
| scheme              // CVE, GHSA, RHSA, DSA, USN, MSRC, etc. | ||||
| value               // e.g., "CVE-2025-12345" | ||||
| ``` | ||||
|  | ||||
| **Affected** | ||||
|  | ||||
| ``` | ||||
| advisoryId | ||||
| productKey          // canonical product identity (see 2.2) | ||||
| rangeKind           // semver | evr | nvra | apk | rpm | deb | generic | exact | ||||
| introduced?         // string (format depends on rangeKind) | ||||
| fixed?              // string (format depends on rangeKind) | ||||
| lastKnownSafe?      // optional explicit safe floor | ||||
| arch?               // arch or platform qualifier if source declares (x86_64, aarch64) | ||||
| distro?             // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19) | ||||
| ecosystem?          // npm|pypi|maven|nuget|golang|… | ||||
| notes?              // normalized notes per source | ||||
| ``` | ||||
|  | ||||
| **Reference** | ||||
|  | ||||
| ``` | ||||
| advisoryId | ||||
| url | ||||
| kind                // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf | ||||
| sourceTag           // e.g., vendor/redhat, distro/debian, oss/ghsa | ||||
| ``` | ||||
|  | ||||
| **MergeEvent** | ||||
|  | ||||
| ``` | ||||
| advisoryKey | ||||
| beforeHash          // canonical JSON hash before merge | ||||
| afterHash           // canonical JSON hash after merge | ||||
| mergedAt | ||||
| inputs[]            // source doc digests that contributed | ||||
| ``` | ||||
|  | ||||
| **AdvisoryStatement (event log)** | ||||
|  | ||||
| ``` | ||||
| statementId         // GUID (immutable) | ||||
| vulnerabilityKey    // canonical advisory key (e.g., CVE-2025-12345) | ||||
| advisoryKey         // merge snapshot advisory key (may reference variant) | ||||
| statementHash       // canonical hash of advisory payload | ||||
| asOf                // timestamp of snapshot (UTC) | ||||
| recordedAt          // persistence timestamp (UTC) | ||||
| inputDocuments[]    // document IDs contributing to the snapshot | ||||
| payload             // canonical advisory document (BSON / canonical JSON) | ||||
| ``` | ||||
|  | ||||
| **AdvisoryConflict** | ||||
|  | ||||
| ``` | ||||
| conflictId          // GUID | ||||
| vulnerabilityKey    // canonical advisory key | ||||
| conflictHash        // deterministic hash of conflict payload | ||||
| asOf                // timestamp aligned with originating statement set | ||||
| recordedAt          // persistence timestamp | ||||
| statementIds[]      // related advisoryStatement identifiers | ||||
| details             // structured conflict explanation / merge reasoning | ||||
| ``` | ||||
|  | ||||
| - `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`. | ||||
| - Conflict explainers are serialized as deterministic `MergeConflictExplainerPayload` records (type, reason, source ranks, winning values); replay clients can parse the payload to render human-readable rationales without re-computing precedence. | ||||
| - Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs. | ||||
|  | ||||
| **AdvisoryObservation (new in Sprint 24)** | ||||
|  | ||||
| ``` | ||||
| observationId       // deterministic id: {tenant}:{source}:{upstreamId}:{revision} | ||||
| ```jsonc | ||||
| observationId       // deterministic id: {tenant}:{source.vendor}:{upstreamId}:{revision} | ||||
| tenant              // issuing tenant (lower-case) | ||||
| source{vendor,stream,api,collectorVersion} | ||||
| source{ | ||||
|     vendor, stream, api, collectorVersion | ||||
| } | ||||
| upstream{ | ||||
|     upstreamId, documentVersion, contentHash, | ||||
|     fetchedAt, receivedAt, signature{present,format,keyId,signature}} | ||||
| content{format,specVersion,raw,metadata} | ||||
| linkset{aliases[], purls[], cpes[], references[{type,url}]} | ||||
|     upstreamId, documentVersion, fetchedAt, receivedAt, | ||||
|     contentHash, signature{present, format?, keyId?, signature?} | ||||
| } | ||||
| content{ | ||||
|     format, specVersion, raw, metadata? | ||||
| } | ||||
| identifiers{ | ||||
|     cve?, ghsa?, vendorIds[], aliases[] | ||||
| } | ||||
| linkset{ | ||||
|     purls[], cpes[], aliases[], references[{type,url}], | ||||
|     reconciledFrom[] | ||||
| } | ||||
| createdAt           // when Concelier recorded the observation | ||||
| attributes          // optional provenance metadata (e.g., batch, connector) | ||||
| ``` | ||||
| attributes          // optional provenance metadata (batch ids, ingest cursor) | ||||
| ```jsonc | ||||
|  | ||||
| The observation is an immutable projection of the raw ingestion document (post provenance validation, pre-merge) that powers Link‑Not‑Merge overlays and Vuln Explorer. Observations live in the `advisory_observations` collection, keyed by tenant + upstream identity. `linkset` provides normalized aliases/PURLs/CPES that downstream services (Graph/Vuln Explorer) join against without triggering merge logic. Concelier.Core exposes strongly-typed models (`AdvisoryObservation`, `AdvisoryObservationLinkset`, etc.) and a Mongo-backed store for filtered queries by tenant/alias; this keeps overlay consumers read-only while preserving AOC guarantees. | ||||
| #### AdvisoryLinkset | ||||
|  | ||||
| **ExportState** | ||||
| ```jsonc | ||||
| linksetId           // sha256 over sorted (tenant, product/vuln tuple, observation ids) | ||||
| tenant | ||||
| key{ | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     confidence        // low|medium|high | ||||
| } | ||||
| observations[] = [ | ||||
|   { | ||||
|     observationId, | ||||
|     sourceVendor, | ||||
|     statement{ | ||||
|       status?, severity?, references?, notes? | ||||
|     }, | ||||
|     collectedAt | ||||
|   } | ||||
| ] | ||||
| aliases{ | ||||
|     primary, | ||||
|     others[] | ||||
| } | ||||
| purls[] | ||||
| cpes[] | ||||
| conflicts[]?        // see AdvisoryLinksetConflict | ||||
| createdAt | ||||
| updatedAt | ||||
| ```jsonc | ||||
|  | ||||
| ``` | ||||
| #### AdvisoryLinksetConflict | ||||
|  | ||||
| ```jsonc | ||||
| conflictId          // deterministic hash | ||||
| type                // severity-mismatch | affected-range-divergence | reference-clash | alias-inconsistency | metadata-gap | ||||
| field?              // optional JSON pointer (e.g., /statement/severity/vector) | ||||
| observations[]      // per-source values contributing to the conflict | ||||
| confidence          // low|medium|high (heuristic weight) | ||||
| detectedAt | ||||
| ```jsonc | ||||
|  | ||||
| #### ObservationEvent / LinksetEvent | ||||
|  | ||||
| ```jsonc | ||||
| eventId             // ULID | ||||
| tenant | ||||
| type                // advisory.observation.updated | advisory.linkset.updated | ||||
| key{ | ||||
|     observationId?  // on observation event | ||||
|     linksetId?      // on linkset event | ||||
|     vulnerabilityId?, | ||||
|     productKey? | ||||
| } | ||||
| delta{ | ||||
|     added[], removed[], changed[]   // normalized summary for consumers | ||||
| } | ||||
| hash               // canonical hash of serialized delta payload | ||||
| occurredAt | ||||
| ```jsonc | ||||
|  | ||||
| #### ExportState | ||||
|  | ||||
| ```jsonc | ||||
| exportKind          // json | trivydb | ||||
| baseExportId?       // last full baseline | ||||
| baseDigest?         // digest of last full baseline | ||||
| @@ -150,7 +135,9 @@ lastFullDigest?     // digest of last full export | ||||
| lastDeltaDigest?    // digest of last delta export | ||||
| cursor              // per-kind incremental cursor | ||||
| files[]             // last manifest snapshot (path → sha256) | ||||
| ``` | ||||
| ```jsonc | ||||
|  | ||||
| Legacy `Advisory`, `Affected`, and merge-centric entities remain in the repository for historical exports and replay but are being phased out as Link-Not-Merge takes over. New code paths must interact with `AdvisoryObservation` / `AdvisoryLinkset` exclusively and emit conflicts through the structured payloads described above. | ||||
|  | ||||
| ### 2.2 Product identity (`productKey`) | ||||
|  | ||||
| @@ -193,7 +180,7 @@ public interface IFeedConnector { | ||||
|   Task ParseAsync(IServiceProvider sp, CancellationToken ct);   // -> dto collection (validated) | ||||
|   Task MapAsync(IServiceProvider sp, CancellationToken ct);     // -> advisory/alias/affected/reference | ||||
| } | ||||
| ``` | ||||
| ```jsonc | ||||
|  | ||||
| * **Fetch**: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting. | ||||
| * **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing. | ||||
| @@ -215,63 +202,106 @@ public interface IFeedConnector { | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Merge engine | ||||
| ## 5) Observation & linkset pipeline | ||||
|  | ||||
| ### 5.1 Keying & identity | ||||
| > **Goal:** deterministically ingest raw documents into immutable observations, correlate them into evidence-rich linksets, and broadcast changes without precedence or mutation. | ||||
|  | ||||
| * Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Concelier’s alias tables). | ||||
| * `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key). | ||||
| ### 5.1 Observation flow | ||||
|  | ||||
| ### 5.2 Merge algorithm (deterministic) | ||||
| 1. **Connector fetch/parse/map** — connectors download upstream payloads, validate signatures, and map to DTOs (identifiers, references, raw payload, provenance). | ||||
| 2. **AOC guard** — `AOCWriteGuard` verifies forbidden keys, provenance completeness, tenant claims, timestamp normalization, and content hash idempotency. Violations raise `ERR_AOC_00x` mapped to structured logs and metrics. | ||||
| 3. **Append-only write** — observations insert into `advisory_observations`; duplicates by `(tenant, source.vendor, upstream.upstreamId, upstream.contentHash)` become no-ops; new content for same upstream id creates a supersedes chain. | ||||
| 4. **Change feed + event** — Mongo change streams trigger `advisory.observation.updated@1` events with deterministic payloads (IDs, hash, supersedes pointer, linkset summary). Policy Engine, Offline Kit builder, and guard dashboards subscribe. | ||||
|  | ||||
| 1. **Gather** all rows for `advisoryKey` (across sources). | ||||
| 2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert). | ||||
| 3. **Union aliases** (dedupe by scheme+value). | ||||
| 4. **Merge `Affected`** with rules: | ||||
| ### 5.2 Linkset correlation | ||||
|  | ||||
|    * Prefer **vendor** ranges for vendor products; prefer **distro** for **distro‑shipped** packages. | ||||
|    * If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide. | ||||
|    * Never collapse range semantics across different families (e.g., rpm EVR vs semver). | ||||
| 5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override). | ||||
| 6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`. | ||||
| 7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes. | ||||
| 1. **Queue** — observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph. | ||||
| 2. **Canonical grouping** — builder resolves aliases using Concelier’s alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores. | ||||
| 3. **Linkset materialization** — `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates. | ||||
| 4. **Conflict detection** — builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability. | ||||
| 5. **Event emission** — `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation. | ||||
|  | ||||
| > The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably. | ||||
| ### 5.3 Event contract | ||||
|  | ||||
| | Event | Schema | Notes | | ||||
| |-------|--------|-------| | ||||
| | `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. | | ||||
| | `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. | | ||||
|  | ||||
| Events are emitted via NATS (primary) and Redis Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Storage schema (MongoDB) | ||||
|  | ||||
| **Collections & indexes** | ||||
| ### Collections & indexes (LNM path) | ||||
|  | ||||
| * `source` `{_id, type, baseUrl, enabled, notes}` | ||||
| * `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` | ||||
| * `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` | ||||
| * `concelier.sources` `{_id, type, baseUrl, enabled, notes}` — connector catalog. | ||||
| * `concelier.source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` — run-state (TTL indexes on `backoffUntil`). | ||||
| * `concelier.documents` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` — raw payload registry. | ||||
|   * Indexes: `{sourceName:1, uri:1}` unique; `{fetchedAt:-1}` for recent fetches. | ||||
| * `concelier.dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` — normalized connector DTOs used for replay. | ||||
|   * Index: `{sourceName:1, documentId:1}`. | ||||
| * `concelier.advisory_observations` | ||||
|  | ||||
|   * Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}` | ||||
| * `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` | ||||
| ``` | ||||
| { | ||||
|   _id: "tenant:vendor:upstreamId:revision", | ||||
|   tenant, | ||||
|   source: { vendor, stream, api, collectorVersion }, | ||||
|   upstream: { upstreamId, documentVersion, fetchedAt, receivedAt, contentHash, signature }, | ||||
|   content: { format, specVersion, raw, metadata? }, | ||||
|   identifiers: { cve?, ghsa?, vendorIds[], aliases[] }, | ||||
|   linkset: { purls[], cpes[], aliases[], references[], reconciledFrom[] }, | ||||
|   supersedes?: "prevObservationId", | ||||
|   createdAt, | ||||
|   attributes?: object | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Index: `{sourceName:1, documentId:1}` | ||||
| * `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}` | ||||
|   * Indexes: `{tenant:1, upstream.upstreamId:1}`, `{tenant:1, source.vendor:1, linkset.purls:1}`, `{tenant:1, linkset.aliases:1}`, `{tenant:1, createdAt:-1}`. | ||||
| * `concelier.advisory_linksets` | ||||
|  | ||||
|   * Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary) | ||||
| * `alias` `{advisoryId, scheme, value}` | ||||
| ``` | ||||
| { | ||||
|   _id: "sha256:...", | ||||
|   tenant, | ||||
|   key: { vulnerabilityId, productKey, confidence }, | ||||
|   observations: [ | ||||
|     { observationId, sourceVendor, statement, collectedAt } | ||||
|   ], | ||||
|   aliases: { primary, others: [] }, | ||||
|   purls: [], | ||||
|   cpes: [], | ||||
|   conflicts: [], | ||||
|   createdAt, | ||||
|   updatedAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Index: `{scheme:1,value:1}`, `{advisoryId:1}` | ||||
| * `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}` | ||||
|   * Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, aliases.primary:1}`, `{tenant:1, updatedAt:-1}`. | ||||
| * `concelier.advisory_events` | ||||
|  | ||||
|   * Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}` | ||||
| * `reference` `{advisoryId, url, kind, sourceTag}` | ||||
| ``` | ||||
| { | ||||
|   _id: ObjectId, | ||||
|   tenant, | ||||
|   type: "advisory.observation.updated" | "advisory.linkset.updated", | ||||
|   key, | ||||
|   delta, | ||||
|   hash, | ||||
|   occurredAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Index: `{advisoryId:1}`, `{kind:1}` | ||||
| * `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}` | ||||
|  | ||||
|   * Index: `{advisoryKey:1, mergedAt:-1}` | ||||
| * `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}` | ||||
|   * TTL index on `occurredAt` (configurable retention), `{type:1, occurredAt:-1}` for replay. | ||||
| * `concelier.export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}` | ||||
| * `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks) | ||||
| * `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}` | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw payloads. | ||||
| **Legacy collections** (`advisory`, `alias`, `affected`, `reference`, `merge_event`) remain read-only during the migration window to support back-compat exports. New code must not write to them; scheduled cleanup removes them after Link-Not-Merge GA. | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw payloads (immutable); `fs.exports` for historical JSON/Trivy archives. | ||||
|  | ||||
| --- | ||||
|  | ||||
| @@ -287,7 +317,7 @@ public interface IFeedConnector { | ||||
| * Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes. | ||||
| * In delta, unchanged blobs are reused from the base; metadata captures: | ||||
|  | ||||
|   ``` | ||||
|   ```json | ||||
|   { | ||||
|     "mode": "delta|full", | ||||
|     "baseExportId": "...", | ||||
| @@ -409,7 +439,7 @@ concelier: | ||||
| ## 10) Security & compliance | ||||
|  | ||||
| * **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible. | ||||
| * **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can down‑weight or ignore them by config. | ||||
| * **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; Policy Engine or downstream policy can down-weight them. | ||||
| * **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers. | ||||
| * **Multi‑tenant**: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens. | ||||
| * **Determinism**: canonical JSON writer; export digests stable across runs given same inputs. | ||||
| @@ -419,8 +449,9 @@ concelier: | ||||
| ## 11) Performance targets & scale | ||||
|  | ||||
| * **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON). | ||||
| * **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores. | ||||
| * **Merge**: ≤ 10 ms P95 per advisory at steady‑state updates. | ||||
| * **Normalize/map**: ≥ 50k observation statements/min on 4 cores. | ||||
| * **Observation write**: ≤ 5 ms P95 per document (including guard + Mongo write). | ||||
| * **Linkset build**: ≤ 15 ms P95 per `(vulnerabilityId, productKey)` update, even with 20+ contributing observations. | ||||
| * **Export**: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores. | ||||
| * **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes. | ||||
|  | ||||
| @@ -435,11 +466,13 @@ concelier: | ||||
|   * `concelier.fetch.docs_total{source}` | ||||
|   * `concelier.fetch.bytes_total{source}` | ||||
|   * `concelier.parse.failures_total{source}` | ||||
|   * `concelier.map.affected_total{source}` | ||||
|   * `concelier.merge.changed_total` | ||||
|   * `concelier.map.statements_total{source}` | ||||
|   * `concelier.observations.write_total{result=ok|noop|error}` | ||||
|   * `concelier.linksets.updated_total{result=ok|skip|error}` | ||||
|   * `concelier.linksets.conflicts_total{type}` | ||||
|   * `concelier.export.bytes{kind}` | ||||
|   * `concelier.export.duration_seconds{kind}` | ||||
| * **Tracing** around fetch/parse/map/merge/export. | ||||
| * **Tracing** around fetch/parse/map/observe/linkset/export. | ||||
| * **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`. | ||||
|  | ||||
| --- | ||||
| @@ -448,7 +481,7 @@ concelier: | ||||
|  | ||||
| * **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail). | ||||
| * **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases). | ||||
| * **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention. | ||||
| * **Linkset correlation:** multi-source conflicts (severity, range, alias) produce deterministic conflict payloads; ensure confidence scoring stable. | ||||
| * **Export determinism:** byte‑for‑byte stable outputs across runs; digest equality. | ||||
| * **Performance:** soak tests with 1M advisories; cap memory; verify backpressure. | ||||
| * **API:** pagination, filters, RBAC, error envelopes (RFC 7807). | ||||
| @@ -470,7 +503,8 @@ concelier: | ||||
| * **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger` | ||||
| * **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }` | ||||
| * **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }` | ||||
| * **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345` | ||||
| * **Inspect observation:** `GET /api/v1/concelier/observations/{observationId}` | ||||
| * **Query linkset:** `GET /api/v1/concelier/linksets?vulnerabilityId=CVE-2025-12345&productKey=pkg:rpm/redhat/openssl` | ||||
| * **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause` | ||||
|  | ||||
| --- | ||||
| @@ -482,4 +516,3 @@ concelier: | ||||
| 3. **Attestation hand‑off**: integrate with **Signer/Attestor** (optional). | ||||
| 4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse. | ||||
| 5. **Offline kit**: end‑to‑end verified bundles for air‑gap. | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user