feat: Implement Scheduler Worker Options and Planner Loop
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Added `SchedulerWorkerOptions` class to encapsulate configuration for the scheduler worker.
- Introduced `PlannerBackgroundService` to manage the planner loop, fetching and processing planning runs.
- Created `PlannerExecutionService` to handle the execution logic for planning runs, including impact targeting and run persistence.
- Developed `PlannerExecutionResult` and `PlannerExecutionStatus` to standardize execution outcomes.
- Implemented validation logic within `SchedulerWorkerOptions` to ensure proper configuration.
- Added documentation for the planner loop and impact targeting features.
- Established health check endpoints and authentication mechanisms for the Signals service.
- Created unit tests for the Signals API to ensure proper functionality and response handling.
- Configured options for authority integration and fallback authentication methods.
This commit is contained in:
2025-10-27 09:46:31 +02:00
parent 96d52884e8
commit 730354a1af
135 changed files with 10721 additions and 946 deletions

View File

@@ -1,12 +1,12 @@
# component_architecture_concelier.md — **StellaOps Concelier** (2025Q4)
# component_architecture_concelier.md — **StellaOps Concelier** (Sprint22)
> **Scope.** Implementationready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
> **Scope.** Implementation-ready architecture for **Concelier**: the advisory ingestion and Link-Not-Merge (LNM) observation pipeline that produces deterministic raw observations, correlation linksets, and evidence events consumed by Policy Engine, Console, CLI, and Export centers. Covers domain models, connectors, observation/linkset builders, storage schema, events, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), persist them as immutable **observations** under the Aggregation-Only Contract (AOC), construct **linksets** that correlate observations without merging or precedence, and export deterministic evidence bundles (JSON, Trivy DB, Offline Kit) for downstream policy evaluation and operator tooling.
**Boundaries.**
@@ -21,10 +21,12 @@
**Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
* **Connectors** (fetch/parse/map) that emit immutable observation candidates.
* **Observation writer** enforcing AOC invariants via `AOCWriteGuard`.
* **Linkset builder** that correlates observations into `advisory_linksets` and annotates conflicts.
* **Event publisher** emitting `advisory.observation.updated` and `advisory.linkset.updated` messages.
* **Exporters** (JSON, Trivy DB, Offline Kit slices) fed from observation/linkset stores.
* **Minimal REST** for health/status/trigger/export and observation/linkset reads.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
@@ -36,113 +38,96 @@
### 2.1 Core entities
**Advisory**
#### AdvisoryObservation
```
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
**Alias**
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Affected**
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**AdvisoryStatement (event log)**
```
statementId // GUID (immutable)
vulnerabilityKey // canonical advisory key (e.g., CVE-2025-12345)
advisoryKey // merge snapshot advisory key (may reference variant)
statementHash // canonical hash of advisory payload
asOf // timestamp of snapshot (UTC)
recordedAt // persistence timestamp (UTC)
inputDocuments[] // document IDs contributing to the snapshot
payload // canonical advisory document (BSON / canonical JSON)
```
**AdvisoryConflict**
```
conflictId // GUID
vulnerabilityKey // canonical advisory key
conflictHash // deterministic hash of conflict payload
asOf // timestamp aligned with originating statement set
recordedAt // persistence timestamp
statementIds[] // related advisoryStatement identifiers
details // structured conflict explanation / merge reasoning
```
- `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`.
- Conflict explainers are serialized as deterministic `MergeConflictExplainerPayload` records (type, reason, source ranks, winning values); replay clients can parse the payload to render human-readable rationales without re-computing precedence.
- Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
**AdvisoryObservation (new in Sprint 24)**
```
observationId // deterministic id: {tenant}:{source}:{upstreamId}:{revision}
```jsonc
observationId // deterministic id: {tenant}:{source.vendor}:{upstreamId}:{revision}
tenant // issuing tenant (lower-case)
source{vendor,stream,api,collectorVersion}
source{
vendor, stream, api, collectorVersion
}
upstream{
upstreamId, documentVersion, contentHash,
fetchedAt, receivedAt, signature{present,format,keyId,signature}}
content{format,specVersion,raw,metadata}
linkset{aliases[], purls[], cpes[], references[{type,url}]}
upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?}
}
content{
format, specVersion, raw, metadata?
}
identifiers{
cve?, ghsa?, vendorIds[], aliases[]
}
linkset{
purls[], cpes[], aliases[], references[{type,url}],
reconciledFrom[]
}
createdAt // when Concelier recorded the observation
attributes // optional provenance metadata (e.g., batch, connector)
```
attributes // optional provenance metadata (batch ids, ingest cursor)
```jsonc
The observation is an immutable projection of the raw ingestion document (post provenance validation, pre-merge) that powers LinkNotMerge overlays and Vuln Explorer. Observations live in the `advisory_observations` collection, keyed by tenant + upstream identity. `linkset` provides normalized aliases/PURLs/CPES that downstream services (Graph/Vuln Explorer) join against without triggering merge logic. Concelier.Core exposes strongly-typed models (`AdvisoryObservation`, `AdvisoryObservationLinkset`, etc.) and a Mongo-backed store for filtered queries by tenant/alias; this keeps overlay consumers read-only while preserving AOC guarantees.
#### AdvisoryLinkset
**ExportState**
```jsonc
linksetId // sha256 over sorted (tenant, product/vuln tuple, observation ids)
tenant
key{
vulnerabilityId,
productKey,
confidence // low|medium|high
}
observations[] = [
{
observationId,
sourceVendor,
statement{
status?, severity?, references?, notes?
},
collectedAt
}
]
aliases{
primary,
others[]
}
purls[]
cpes[]
conflicts[]? // see AdvisoryLinksetConflict
createdAt
updatedAt
```jsonc
```
#### AdvisoryLinksetConflict
```jsonc
conflictId // deterministic hash
type // severity-mismatch | affected-range-divergence | reference-clash | alias-inconsistency | metadata-gap
field? // optional JSON pointer (e.g., /statement/severity/vector)
observations[] // per-source values contributing to the conflict
confidence // low|medium|high (heuristic weight)
detectedAt
```jsonc
#### ObservationEvent / LinksetEvent
```jsonc
eventId // ULID
tenant
type // advisory.observation.updated | advisory.linkset.updated
key{
observationId? // on observation event
linksetId? // on linkset event
vulnerabilityId?,
productKey?
}
delta{
added[], removed[], changed[] // normalized summary for consumers
}
hash // canonical hash of serialized delta payload
occurredAt
```jsonc
#### ExportState
```jsonc
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
@@ -150,7 +135,9 @@ lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
```jsonc
Legacy `Advisory`, `Affected`, and merge-centric entities remain in the repository for historical exports and replay but are being phased out as Link-Not-Merge takes over. New code paths must interact with `AdvisoryObservation` / `AdvisoryLinkset` exclusively and emit conflicts through the structured payloads described above.
### 2.2 Product identity (`productKey`)
@@ -193,7 +180,7 @@ public interface IFeedConnector {
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
```jsonc
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
@@ -215,63 +202,106 @@ public interface IFeedConnector {
---
## 5) Merge engine
## 5) Observation & linkset pipeline
### 5.1 Keying & identity
> **Goal:** deterministically ingest raw documents into immutable observations, correlate them into evidence-rich linksets, and broadcast changes without precedence or mutation.
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Conceliers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.1 Observation flow
### 5.2 Merge algorithm (deterministic)
1. **Connector fetch/parse/map** connectors download upstream payloads, validate signatures, and map to DTOs (identifiers, references, raw payload, provenance).
2. **AOC guard** `AOCWriteGuard` verifies forbidden keys, provenance completeness, tenant claims, timestamp normalization, and content hash idempotency. Violations raise `ERR_AOC_00x` mapped to structured logs and metrics.
3. **Append-only write** observations insert into `advisory_observations`; duplicates by `(tenant, source.vendor, upstream.upstreamId, upstream.contentHash)` become no-ops; new content for same upstream id creates a supersedes chain.
4. **Change feed + event** Mongo change streams trigger `advisory.observation.updated@1` events with deterministic payloads (IDs, hash, supersedes pointer, linkset summary). Policy Engine, Offline Kit builder, and guard dashboards subscribe.
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
### 5.2 Linkset correlation
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
1. **Queue** observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph.
2. **Canonical grouping** builder resolves aliases using Conceliers alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
3. **Linkset materialization** `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates.
4. **Conflict detection** builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability.
5. **Event emission** `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
### 5.3 Event contract
| Event | Schema | Notes |
|-------|--------|-------|
| `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. |
| `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. |
Events are emitted via NATS (primary) and Redis Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
### Collections & indexes (LNM path)
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* `concelier.sources` `{_id, type, baseUrl, enabled, notes}` connector catalog.
* `concelier.source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` run-state (TTL indexes on `backoffUntil`).
* `concelier.documents` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` raw payload registry.
* Indexes: `{sourceName:1, uri:1}` unique; `{fetchedAt:-1}` for recent fetches.
* `concelier.dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` normalized connector DTOs used for replay.
* Index: `{sourceName:1, documentId:1}`.
* `concelier.advisory_observations`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
```
{
_id: "tenant:vendor:upstreamId:revision",
tenant,
source: { vendor, stream, api, collectorVersion },
upstream: { upstreamId, documentVersion, fetchedAt, receivedAt, contentHash, signature },
content: { format, specVersion, raw, metadata? },
identifiers: { cve?, ghsa?, vendorIds[], aliases[] },
linkset: { purls[], cpes[], aliases[], references[], reconciledFrom[] },
supersedes?: "prevObservationId",
createdAt,
attributes?: object
}
```
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Indexes: `{tenant:1, upstream.upstreamId:1}`, `{tenant:1, source.vendor:1, linkset.purls:1}`, `{tenant:1, linkset.aliases:1}`, `{tenant:1, createdAt:-1}`.
* `concelier.advisory_linksets`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
```
{
_id: "sha256:...",
tenant,
key: { vulnerabilityId, productKey, confidence },
observations: [
{ observationId, sourceVendor, statement, collectedAt }
],
aliases: { primary, others: [] },
purls: [],
cpes: [],
conflicts: [],
createdAt,
updatedAt
}
```
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, aliases.primary:1}`, `{tenant:1, updatedAt:-1}`.
* `concelier.advisory_events`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
```
{
_id: ObjectId,
tenant,
type: "advisory.observation.updated" | "advisory.linkset.updated",
key,
delta,
hash,
occurredAt
}
```
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* TTL index on `occurredAt` (configurable retention), `{type:1, occurredAt:-1}` for replay.
* `concelier.export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
**Legacy collections** (`advisory`, `alias`, `affected`, `reference`, `merge_event`) remain read-only during the migration window to support back-compat exports. New code must not write to them; scheduled cleanup removes them after Link-Not-Merge GA.
**GridFS buckets**: `fs.documents` for raw payloads (immutable); `fs.exports` for historical JSON/Trivy archives.
---
@@ -287,7 +317,7 @@ public interface IFeedConnector {
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
```json
{
"mode": "delta|full",
"baseExportId": "...",
@@ -409,7 +439,7 @@ concelier:
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; Policy Engine or downstream policy can down-weight them.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
@@ -419,8 +449,9 @@ concelier:
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Normalize/map**: ≥ 50k observation statements/min on 4 cores.
* **Observation write**: ≤ 5ms P95 per document (including guard + Mongo write).
* **Linkset build**: ≤ 15ms P95 per `(vulnerabilityId, productKey)` update, even with 20+ contributing observations.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
@@ -435,11 +466,13 @@ concelier:
* `concelier.fetch.docs_total{source}`
* `concelier.fetch.bytes_total{source}`
* `concelier.parse.failures_total{source}`
* `concelier.map.affected_total{source}`
* `concelier.merge.changed_total`
* `concelier.map.statements_total{source}`
* `concelier.observations.write_total{result=ok|noop|error}`
* `concelier.linksets.updated_total{result=ok|skip|error}`
* `concelier.linksets.conflicts_total{type}`
* `concelier.export.bytes{kind}`
* `concelier.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Tracing** around fetch/parse/map/observe/linkset/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
@@ -448,7 +481,7 @@ concelier:
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Linkset correlation:** multi-source conflicts (severity, range, alias) produce deterministic conflict payloads; ensure confidence scoring stable.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
@@ -470,7 +503,8 @@ concelier:
* **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345`
* **Inspect observation:** `GET /api/v1/concelier/observations/{observationId}`
* **Query linkset:** `GET /api/v1/concelier/linksets?vulnerabilityId=CVE-2025-12345&productKey=pkg:rpm/redhat/openssl`
* **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause`
---
@@ -482,4 +516,3 @@ concelier:
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.