feat: Implement Scheduler Worker Options and Planner Loop
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Added `SchedulerWorkerOptions` class to encapsulate configuration for the scheduler worker.
- Introduced `PlannerBackgroundService` to manage the planner loop, fetching and processing planning runs.
- Created `PlannerExecutionService` to handle the execution logic for planning runs, including impact targeting and run persistence.
- Developed `PlannerExecutionResult` and `PlannerExecutionStatus` to standardize execution outcomes.
- Implemented validation logic within `SchedulerWorkerOptions` to ensure proper configuration.
- Added documentation for the planner loop and impact targeting features.
- Established health check endpoints and authentication mechanisms for the Signals service.
- Created unit tests for the Signals API to ensure proper functionality and response handling.
- Configured options for authority integration and fallback authentication methods.
This commit is contained in:
2025-10-27 09:46:31 +02:00
parent 96d52884e8
commit 730354a1af
135 changed files with 10721 additions and 946 deletions

View File

@@ -1,17 +1,17 @@
# component_architecture_excititor.md — **StellaOps Excititor** (2025Q4)
> **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, APIs, plugin contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Concelier, and the attestation chain. It is implementationready.
# component_architecture_excititor.md — **StellaOps Excititor** (Sprint22)
> **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, observation/linkset pipelines, APIs, plug-in contracts, storage schema, performance budgets, testing matrix, and how it integrates with Concelier, Policy Engine, and evidence surfaces. It is implementation-ready.
---
## 0) Mission & role in the platform
**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress nonexploitable findings, prioritize remaining risk, and explain decisions.
**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into immutable **VEX observations**, correlate them into **linksets** that retain provenance/conflicts without precedence, and publish deterministic evidence exports and events that Policy Engine, Console, and CLI use to suppress or explain findings.
**Boundaries.**
* Excititor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
* Excititor preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
* Excititor preserves **conflicting observations** unchanged; consensus (when enabled) merely annotates how policy might choose, but raw evidence remains exportable.
* VEX consumption is **backendonly**: Scanner never applies VEX. The backends **Policy Engine** asks Excititor for status evidence and then decides what to show.
---
@@ -27,38 +27,121 @@
All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
### 1.2 Canonical model (normalized)
Every incoming statement becomes a set of **VexClaim** records:
```
VexClaim
- providerId // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
- vulnId // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
- productKey // canonical product identity (see §2.2)
- status // affected | not_affected | fixed | under_investigation
- justification? // for 'not_affected'/'affected' where provided
- introducedVersion? // semantics per provider (range or exact)
- fixedVersion? // where provided (range or exact)
- lastObserved // timestamp from source or fetch time
- provenance // doc digest, signature status, fetch URI, line/offset anchors
- evidence[] // raw source snippets for explainability
- supersedes? // optional cross-doc chain (docDigest → docDigest)
```
### 1.3 Exports (consumption)
* **VexConsensus** per `(vulnId, productKey)` with:
* `rollupStatus` (after policy weights/justification gates),
* `sources[]` (winning + losing claims with weights & reasons),
* `policyRevisionId` (identifier of the Excititor policy used),
* `consensusDigest` (stable SHA256 over canonical JSON).
* **Raw claims** export for auditing (unchanged, with provenance).
* **Provider snapshots** (per source, last N days) for operator debugging.
* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
### 1.2 Canonical model (observations & linksets)
#### VexObservation
```jsonc
observationId // {tenant}:{providerId}:{upstreamId}:{revision}
tenant
providerId // e.g., redhat, suse, ubuntu, osv
streamId // connector stream (csaf, openvex, cyclonedx, attestation)
upstream{
upstreamId,
documentVersion?,
fetchedAt,
receivedAt,
contentHash,
signature{present, format?, keyId?, signature?}
}
statements[
{
vulnerabilityId,
productKey,
status, // affected | not_affected | fixed | under_investigation
justification?,
introducedVersion?,
fixedVersion?,
lastObserved,
locator?, // JSON Pointer/line for provenance
evidence?[]
}
]
content{
format,
specVersion?,
raw
}
linkset{
aliases[], // CVE/GHSA/vendor IDs
purls[],
cpes[],
references[{type,url}],
reconciledFrom[]
}
supersedes?
createdAt
attributes?
```
#### VexLinkset
```jsonc
linksetId // sha256 over sorted (tenant, vulnId, productKey, observationIds)
tenant
key{
vulnerabilityId,
productKey,
confidence // low|medium|high
}
observations[] = [
{
observationId,
providerId,
status,
justification?,
introducedVersion?,
fixedVersion?,
evidence?,
collectedAt
}
]
aliases{
primary,
others[]
}
purls[]
cpes[]
conflicts[]? // see VexLinksetConflict
createdAt
updatedAt
```
#### VexLinksetConflict
```jsonc
conflictId
type // status-mismatch | justification-divergence | version-range-clash | non-joinable-overlap | metadata-gap
field? // optional pointer for UI rendering
statements[] // per-observation values with providerId + status/justification/version data
confidence
detectedAt
```
#### VexConsensus (optional)
```jsonc
consensusId // sha256(vulnerabilityId, productKey, policyRevisionId)
vulnerabilityId
productKey
rollupStatus // derived by Excititor policy adapter (linkset aware)
sources[] // observation references with weight, accepted flag, reason
policyRevisionId
evaluatedAt
consensusDigest
```
Consensus persists only when Excititor policy adapters require pre-computed rollups (e.g., Offline Kit). Policy Engine can also compute consensus on demand from linksets.
### 1.3 Exports & evidence bundles
* **Raw observations** — JSON tree per observation for auditing/offline.
* **Linksets** — grouped evidence for policy/Console/CLI consumption.
* **Consensus (optional)** — if enabled, mirrors existing API contracts.
* **Provider snapshots** — last N days of observations per provider to support diagnostics.
* **Index** — `(productKey, vulnerabilityId) → {status candidates, confidence, observationIds}` for high-speed joins.
All exports remain deterministic and, when configured, attested via DSSE + Rekor v2.
---
@@ -98,73 +181,106 @@ enabled: bool
createdAt, modifiedAt
```
**`vex.raw`** (immutable raw documents)
**`vex.raw`** (immutable raw documents)
```
_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }
```
**`vex.observations`**
```
{
_id: "tenant:providerId:upstreamId:revision",
tenant,
providerId,
streamId,
upstream: { upstreamId, documentVersion?, fetchedAt, receivedAt, contentHash, signature },
statements: [
{
vulnerabilityId,
productKey,
status,
justification?,
introducedVersion?,
fixedVersion?,
lastObserved,
locator?,
evidence?
}
],
content: { format, specVersion?, raw },
linkset: { aliases[], purls[], cpes[], references[], reconciledFrom[] },
supersedes?,
createdAt,
attributes?
}
```
* Indexes: `{tenant:1, providerId:1, upstream.upstreamId:1}`, `{tenant:1, statements.vulnerabilityId:1}`, `{tenant:1, linkset.purls:1}`, `{tenant:1, createdAt:-1}`.
**`vex.linksets`**
```
{
_id: "sha256:...",
tenant,
key: { vulnerabilityId, productKey, confidence },
observations: [
{ observationId, providerId, status, justification?, introducedVersion?, fixedVersion?, evidence?, collectedAt }
],
aliases: { primary, others: [] },
purls: [],
cpes: [],
conflicts: [],
createdAt,
updatedAt
}
```
* Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, updatedAt:-1}`.
```
_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }
```
**`vex.statements`** (immutable normalized rows; append-only event log)
```
_id: ObjectId
providerId
vulnId
productKey
status
justification?
introducedVersion?
fixedVersion?
lastObserved
docDigest
provenance { uri, line?, pointer?, signatureState }
evidence[] { key, value, locator }
signals? {
severity? { scheme, score?, label?, vector? }
kev?: bool
epss?: double
}
insertedAt
indices:
- {vulnId:1, productKey:1}
- {providerId:1, insertedAt:-1}
- {docDigest:1}
- {status:1}
- text index (optional) on evidence.value for debugging
```
**`vex.consensus`** (rollups)
```
_id: sha256(canonical(vulnId, productKey, policyRevision))
vulnId
productKey
rollupStatus
sources[]: [
{ providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
]
policyRevisionId
evaluatedAt
signals? {
severity? { scheme, score?, label?, vector? }
kev?: bool
epss?: double
}
consensusDigest // same as _id
indices:
- {vulnId:1, productKey:1}
- {policyRevisionId:1, evaluatedAt:-1}
```
**`vex.exports`** (manifest of emitted artifacts)
**`vex.events`** (observation/linkset events, optional long retention)
```
{
_id: ObjectId,
tenant,
type: "vex.observation.updated" | "vex.linkset.updated",
key,
delta,
hash,
occurredAt
}
```
* Indexes: `{type:1, occurredAt:-1}`, TTL on `occurredAt` for configurable retention.
**`vex.consensus`** (optional rollups)
```
_id: sha256(canonical(vulnerabilityId, productKey, policyRevisionId))
vulnerabilityId
productKey
rollupStatus
sources[] // observation references with weights/reasons
policyRevisionId
evaluatedAt
signals? // optional severity/kev/epss hints
consensusDigest
```
* Indexes: `{vulnerabilityId:1, productKey:1}`, `{policyRevisionId:1, evaluatedAt:-1}`.
**`vex.exports`** (manifest of emitted artifacts)
```
_id
@@ -177,23 +293,15 @@ policyRevisionId
cacheable: bool
```
**`vex.cache`**
```
querySignature -> exportId (for fast reuse)
ttl, hits
```
**`vex.migrations`**
* ordered migrations applied at bootstrap to ensure indexes.
* `20251019-consensus-signals-statements` introduces the statements log indexes and the `policyRevisionId + evaluatedAt` lookup for consensus — rerun consensus writers once to hydrate newly persisted signals.
### 3.2 Indexing strategy
* Hot path queries use exact `(vulnId, productKey)` and timebounded windows; compound indexes cover both.
* Providers list view by `lastObserved` for monitoring staleness.
* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
**`vex.cache`** — observation/linkset export cache: `{querySignature, exportId, ttl, hits}`.
**`vex.migrations`** — ordered migrations ensuring new indexes (`20251027-linksets-introduced`, etc.).
### 3.2 Indexing strategy
* Hot path queries rely on `{tenant, key.vulnerabilityId, key.productKey}` covering linkset lookup.
* Observability queries use `{tenant, updatedAt}` to monitor staleness.
* Consensus (if enabled) keyed by `{vulnerabilityId, productKey, policyRevisionId}` for deterministic reuse.
---
@@ -202,30 +310,30 @@ ttl, hits
### 4.1 Connector contract
```csharp
public interface IVexConnector
{
string ProviderId { get; }
Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs
Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
}
```
* **Fetch** must implement: window scheduling, conditional GET (ETag/IfModifiedSince), rate limiting, retry/backoff.
* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
public interface IVexConnector
{
string ProviderId { get; }
Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs
Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> ObservationStatements[]
}
```
* **Fetch** must implement: window scheduling, conditional GET (ETag/IfModifiedSince), rate limiting, retry/backoff.
* **Normalize** parses the format, validates schema, maps product identities deterministically, emits observation statements with **provenance** metadata (locator, justification, version ranges).
### 4.2 Signature verification (per provider)
* **cosign (keyless or keyful)** for OCI referrers or HTTPserved JSON with Sigstore bundles.
* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
* **x509** (mutual TLS / providerpinned certs) where applicable.
* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can downweight or ignore them.
* Signature state is stored on **vex.raw.sig** and copied into `statements[].signatureState` so downstream policy can gate by verification result.
> Observation statements from sources failing signature policy are marked `"signatureState.verified=false"` and policy can down-weight or ignore them.
### 4.3 Time discipline
* For each doc, prefer **providers document timestamp**; if absent, use fetch time.
* Claims carry `lastObserved` which drives **tiebreaking** within equal weight tiers.
* Statements carry `lastObserved` which drives **tie-breaking** within equal weight tiers.
---
@@ -235,7 +343,7 @@ public interface IVexConnector
* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
* Where a provider publishes **platformlevel** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
* If expansion would be speculative, the claim remains **platformscoped** with `productKey="platform:redhat:rhel:9"` and is flagged **nonjoinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
* If expansion would be speculative, the statement remains **platform-scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non-joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
### 5.2 Status + justification mapping
@@ -254,11 +362,11 @@ public interface IVexConnector
## 6) Consensus algorithm
**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` when consumers opt into Excititor-managed consensus derived from linksets.
### 6.1 Inputs
* Set **S** of `VexClaim` for the key.
* Set **S** of observation statements drawn from the current `VexLinkset` for `(tenant, vulnId, productKey)`.
* **Excititor policy snapshot**:
* **weights** per provider tier and per provider overrides.
@@ -268,19 +376,19 @@ public interface IVexConnector
### 6.2 Steps
1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
2. **Score** each claim:
`score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
1. **Filter invalid** statements by signature policy & justification gates → set `S'`.
2. **Score** each statement:
`score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect). Observations lacking verified signatures receive policy-configured penalties.
3. **Aggregate** scores per status: `W(status) = Σ score(statements with that status)`.
4. **Pick** `rollupStatus = argmax_status W(status)`.
5. **Tiebreakers** (in order):
* Higher **max single** provider score wins (vendor > distro > platform > hub).
* More **recent** lastObserved wins.
* Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
6. **Explain**: mark accepted observations (`accepted=true; reason="weight"`/`"freshness"`/`"confidence"`) and rejected ones with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`, `"low_confidence_linkset"`).
> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
> The algorithm is **pure** given `S` and policy snapshot; result is reproducible and hashed into `consensusDigest`.
---
@@ -291,9 +399,13 @@ All endpoints are versioned under `/api/v1/vex`.
### 7.1 Query (online)
```
POST /claims/search
body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
→ { claims[], nextPageToken? }
POST /observations/search
body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
→ { observations[], nextPageToken? }
POST /linksets/search
body: { vulnIds?: string[], productKeys?: string[], confidence?: string[], since?: timestamp, limit?: int, pageToken?: string }
→ { linksets[], nextPageToken? }
POST /consensus/search
body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
@@ -301,7 +413,7 @@ POST /consensus/search
POST /excititor/resolve (scope: vex.read)
body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
→ { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] }
→ { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, observations[], conflicts[], linksetConfidence, consensus?, signals?, envelope? } ] }
```
### 7.2 Exports (cacheable snapshots)
@@ -407,17 +519,18 @@ Run the ingestion endpoint once after applying migration `20251019-consensus-sig
* **Connector allowlists**: outbound fetch constrained to configured domains.
* **Tenant isolation**: pertenant DB prefixes or separate DBs; pertenant S3 prefixes; pertenant policies.
* **AuthN/Z**: Authorityissued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, observationId, and linksetId.
---
## 11) Performance & scale
* **Targets:**
* Normalize 10k VEX claims/minute/core.
* Consensus compute ≤ 50ms for 1k unique `(vuln, product)` pairs in hot cache.
* Export (consensus) 1M rows in ≤ 60s on 8 cores with streaming writer.
* **Targets:**
* Normalize 10k observation statements/minute/core.
* Linkset rebuild ≤ 20ms P95 for 1k unique `(vuln, product)` pairs in hot cache.
* Consensus (when enabled) compute ≤ 50ms for 1k unique `(vuln, product)` pairs.
* Export (observations + linksets) 1M rows in ≤60s on 8 cores with streaming writer.
* **Scaling:**
@@ -465,26 +578,29 @@ Excititor.Worker ships with a background refresh service that re-evaluates stale
## 12) Observability
* **Metrics:**
* `vex.ingest.docs_total{provider}`
* `vex.normalize.claims_total{provider}`
* `vex.signature.failures_total{provider,method}`
* `vex.consensus.conflicts_total{vulnId}`
* `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hitrate.
* **Metrics:**
* `vex.fetch.requests_total{provider}` / `vex.fetch.bytes_total{provider}`
* `vex.fetch.failures_total{provider,reason}` / `vex.signature.failures_total{provider,method}`
* `vex.normalize.statements_total{provider}`
* `vex.observations.write_total{result}`
* `vex.linksets.updated_total{result}` / `vex.linksets.conflicts_total{type}`
* `vex.consensus.rollup_total{status}` (when enabled)
* `vex.exports.bytes_total{format}` / `vex.exports.latency_seconds{format}`
* **Tracing:** spans for fetch, verify, parse, map, observe, linkset, consensus, export.
* **Dashboards:** provider staleness, linkset conflict hot spots, signature posture, export cache hit-rate.
---
## 13) Testing matrix
* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
* **Connectors:** golden raw docs → deterministic observation statements (fixtures per provider/format).
* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
* **Normalization edge cases:** platformonly claims, freetext justifications, nonpurl products.
* **Consensus:** conflict scenarios across tiers; check tiebreakers; justification gates.
* **Performance:** 1Mrow export timing; memory ceilings; stream correctness.
* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
* **Normalization edge cases:** platform-scoped statements, free-text justifications, non-purl products.
* **Linksets:** conflict scenarios across tiers; verify confidence scoring + conflict payload stability.
* **Consensus (optional):** ensure tie-breakers honour policy weights/justification gates.
* **Performance:** 1M-row observation/linkset export timing; memory ceilings; stream correctness.
* **Determinism:** same inputs + policy → identical linkset hashes, conflict payloads, optional `consensusDigest`, and export bytes.
* **API contract tests:** pagination, filters, RBAC, rate limits.
---
@@ -493,8 +609,8 @@ Excititor.Worker ships with a background refresh service that re-evaluates stale
* **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
* **Concelier**: provides alias graph (CVE↔vendor IDs) and may supply VEXadjacent metadata (e.g., KEV flag) for policy escalation.
* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
* **UI**: VEX explorer screens use `/observations/search`, `/linksets/search`, and `/consensus/search`; show conflicts & provenance.
* **CLI**: `stella vex linksets export --since 7d --out vex-linksets.json` (optionally `--include-consensus`) for audits and Offline Kit parity.
---