Files

master 14617e9c3b feat: Implement Scheduler Worker Options and Planner Loop

- Added `SchedulerWorkerOptions` class to encapsulate configuration for the scheduler worker.
- Introduced `PlannerBackgroundService` to manage the planner loop, fetching and processing planning runs.
- Created `PlannerExecutionService` to handle the execution logic for planning runs, including impact targeting and run persistence.
- Developed `PlannerExecutionResult` and `PlannerExecutionStatus` to standardize execution outcomes.
- Implemented validation logic within `SchedulerWorkerOptions` to ensure proper configuration.
- Added documentation for the planner loop and impact targeting features.
- Established health check endpoints and authentication mechanisms for the Signals service.
- Created unit tests for the Signals API to ensure proper functionality and response handling.
- Configured options for authority integration and fallback authentication methods.

2025-10-27 09:46:31 +02:00

25 KiB

Raw Blame History

component_architecture_excititor.md — Stella Ops Excititor (Sprint 22)

Scope. This document specifies the Excititor service: its purpose, trust model, data structures, observation/linkset pipelines, APIs, plug-in contracts, storage schema, performance budgets, testing matrix, and how it integrates with Concelier, Policy Engine, and evidence surfaces. It is implementation-ready.

0) Mission & role in the platform

Mission. Convert heterogeneous VEX statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into immutable VEX observations, correlate them into linksets that retain provenance/conflicts without precedence, and publish deterministic evidence exports and events that Policy Engine, Console, and CLI use to suppress or explain findings.

Boundaries.

Excititor does not decide PASS/FAIL. It supplies evidence (statuses + justifications + provenance weights).
Excititor preserves conflicting observations unchanged; consensus (when enabled) merely annotates how policy might choose, but raw evidence remains exportable.
VEX consumption is backend‑only: Scanner never applies VEX. The backend’s Policy Engine asks Excititor for status evidence and then decides what to show.

1) Inputs, outputs & canonical domain

1.1 Accepted input formats (ingest)

OpenVEX JSON documents (attested or raw).
CSAF VEX 2.x (vendor PSIRTs and distros commonly publish CSAF).
CycloneDX VEX 1.4+ (standalone VEX or embedded VEX blocks).
OCI‑attached attestations (VEX statements shipped as OCI referrers) — optional connectors.

All connectors register source metadata: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.

1.2 Canonical model (observations & linksets)

VexObservation

observationId       // {tenant}:{providerId}:{upstreamId}:{revision}
tenant
providerId          // e.g., redhat, suse, ubuntu, osv
streamId            // connector stream (csaf, openvex, cyclonedx, attestation)
upstream{
    upstreamId,
    documentVersion?,
    fetchedAt,
    receivedAt,
    contentHash,
    signature{present, format?, keyId?, signature?}
}
statements[
  {
    vulnerabilityId,
    productKey,
    status,                    // affected | not_affected | fixed | under_investigation
    justification?,
    introducedVersion?,
    fixedVersion?,
    lastObserved,
    locator?,                  // JSON Pointer/line for provenance
    evidence?[]
  }
]
content{
    format,
    specVersion?,
    raw
}
linkset{
    aliases[],                 // CVE/GHSA/vendor IDs
    purls[],
    cpes[],
    references[{type,url}],
    reconciledFrom[]
}
supersedes?
createdAt
attributes?

VexLinkset

linksetId           // sha256 over sorted (tenant, vulnId, productKey, observationIds)
tenant
key{
    vulnerabilityId,
    productKey,
    confidence          // low|medium|high
}
observations[] = [
  {
    observationId,
    providerId,
    status,
    justification?,
    introducedVersion?,
    fixedVersion?,
    evidence?,
    collectedAt
  }
]
aliases{
    primary,
    others[]
}
purls[]
cpes[]
conflicts[]?        // see VexLinksetConflict
createdAt
updatedAt

VexLinksetConflict

conflictId
type                // status-mismatch | justification-divergence | version-range-clash | non-joinable-overlap | metadata-gap
field?              // optional pointer for UI rendering
statements[]        // per-observation values with providerId + status/justification/version data
confidence
detectedAt

VexConsensus (optional)

consensusId         // sha256(vulnerabilityId, productKey, policyRevisionId)
vulnerabilityId
productKey
rollupStatus        // derived by Excititor policy adapter (linkset aware)
sources[]           // observation references with weight, accepted flag, reason
policyRevisionId
evaluatedAt
consensusDigest

Consensus persists only when Excititor policy adapters require pre-computed rollups (e.g., Offline Kit). Policy Engine can also compute consensus on demand from linksets.

1.3 Exports & evidence bundles

Raw observations — JSON tree per observation for auditing/offline.
Linksets — grouped evidence for policy/Console/CLI consumption.
Consensus (optional) — if enabled, mirrors existing API contracts.
Provider snapshots — last N days of observations per provider to support diagnostics.
Index — (productKey, vulnerabilityId) → {status candidates, confidence, observationIds} for high-speed joins.

All exports remain deterministic and, when configured, attested via DSSE + Rekor v2.

2) Identity model — products & joins

2.1 Vuln identity

Accepts CVE, GHSA, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to vulnId with alias sets.
Alias graph maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to GHSA where applicable.

2.2 Product identity (`productKey`)

Primary: purl (Package URL).
Secondary links: cpe, OS package NVRA/EVR, NuGet/Maven/Golang identity, and OS package name when purl unavailable.
Fallback: oci:<registry>/<repo>@<digest> for image‑level VEX.
Special cases: kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical productKey).

Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native product string and mark the claim as non‑joinable; the backend will ignore it unless a policy explicitly whitelists that provider mapping.

3) Storage schema (MongoDB)

Database: excititor

3.1 Collections

vex.providers

_id: providerId
name, homepage, contact
trustTier: enum {vendor, distro, platform, hub, attestation}
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
enabled: bool
createdAt, modifiedAt

vex.raw (immutable raw documents)

_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }

vex.observations

{
  _id: "tenant:providerId:upstreamId:revision",
  tenant,
  providerId,
  streamId,
  upstream: { upstreamId, documentVersion?, fetchedAt, receivedAt, contentHash, signature },
  statements: [
    {
      vulnerabilityId,
      productKey,
      status,
      justification?,
      introducedVersion?,
      fixedVersion?,
      lastObserved,
      locator?,
      evidence?
    }
  ],
  content: { format, specVersion?, raw },
  linkset: { aliases[], purls[], cpes[], references[], reconciledFrom[] },
  supersedes?,
  createdAt,
  attributes?
}

Indexes: {tenant:1, providerId:1, upstream.upstreamId:1}, {tenant:1, statements.vulnerabilityId:1}, {tenant:1, linkset.purls:1}, {tenant:1, createdAt:-1}.

vex.linksets

{
  _id: "sha256:...",
  tenant,
  key: { vulnerabilityId, productKey, confidence },
  observations: [
    { observationId, providerId, status, justification?, introducedVersion?, fixedVersion?, evidence?, collectedAt }
  ],
  aliases: { primary, others: [] },
  purls: [],
  cpes: [],
  conflicts: [],
  createdAt,
  updatedAt
}

Indexes: {tenant:1, key.vulnerabilityId:1, key.productKey:1}, {tenant:1, purls:1}, {tenant:1, updatedAt:-1}.

vex.events (observation/linkset events, optional long retention)

{
  _id: ObjectId,
  tenant,
  type: "vex.observation.updated" | "vex.linkset.updated",
  key,
  delta,
  hash,
  occurredAt
}

Indexes: {type:1, occurredAt:-1}, TTL on occurredAt for configurable retention.

vex.consensus (optional rollups)

_id: sha256(canonical(vulnerabilityId, productKey, policyRevisionId))
vulnerabilityId
productKey
rollupStatus
sources[]      // observation references with weights/reasons
policyRevisionId
evaluatedAt
signals?       // optional severity/kev/epss hints
consensusDigest

Indexes: {vulnerabilityId:1, productKey:1}, {policyRevisionId:1, evaluatedAt:-1}.

vex.exports (manifest of emitted artifacts)

_id
querySignature
format: raw|consensus|index
artifactSha256
rekor { uuid, index, url }?
createdAt
policyRevisionId
cacheable: bool

vex.cache — observation/linkset export cache: {querySignature, exportId, ttl, hits}.

vex.migrations — ordered migrations ensuring new indexes (20251027-linksets-introduced, etc.).

3.2 Indexing strategy

Hot path queries rely on {tenant, key.vulnerabilityId, key.productKey} covering linkset lookup.
Observability queries use {tenant, updatedAt} to monitor staleness.
Consensus (if enabled) keyed by {vulnerabilityId, productKey, policyRevisionId} for deterministic reuse.

4) Ingestion pipeline

4.1 Connector contract

public interface IVexConnector
{
    string ProviderId { get; }
    Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs
    Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> ObservationStatements[]
}

Fetch must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff.
Normalize parses the format, validates schema, maps product identities deterministically, emits observation statements with provenance metadata (locator, justification, version ranges).

4.2 Signature verification (per provider)

cosign (keyless or keyful) for OCI referrers or HTTP‑served JSON with Sigstore bundles.
PGP (provider keyrings) for distro/vendor feeds that sign docs.
x509 (mutual TLS / provider‑pinned certs) where applicable.
Signature state is stored on vex.raw.sig and copied into statements[].signatureState so downstream policy can gate by verification result.

Observation statements from sources failing signature policy are marked "signatureState.verified=false" and policy can down-weight or ignore them.

4.3 Time discipline

For each doc, prefer provider’s document timestamp; if absent, use fetch time.
Statements carry lastObserved which drives tie-breaking within equal weight tiers.

5) Normalization: product & status semantics

5.1 Product mapping

purl first; cpe second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
Where a provider publishes platform‑level VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits evidence indicating the rule applied.
If expansion would be speculative, the statement remains platform-scoped with productKey="platform:redhat:rhel:9" and is flagged non-joinable; backend can decide to use platform VEX only when Scanner proves the platform runtime.

5.2 Status + justification mapping

Canonical status: affected | not_affected | fixed | under_investigation.
Justifications normalized to a controlled vocabulary (CISA‑aligned), e.g.:
- component_not_present
- vulnerable_code_not_in_execute_path
- vulnerable_configuration_unused
- inline_mitigation_applied
- fix_available (with fixedVersion)
- under_investigation
Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as evidence.

6) Consensus algorithm

Goal: produce a stable, explainable rollupStatus per (vulnId, productKey) when consumers opt into Excititor-managed consensus derived from linksets.

6.1 Inputs

Set S of observation statements drawn from the current VexLinkset for (tenant, vulnId, productKey).
Excititor policy snapshot:
- weights per provider tier and per provider overrides.
- justification gates (e.g., require justification for not_affected to be acceptable).
- minEvidence rules (e.g., not_affected must come from ≥1 vendor or 2 distros).
- signature requirements (e.g., require verified signature for ‘fixed’ to be considered).

6.2 Steps

Filter invalid statements by signature policy & justification gates → set S'.
Score each statement: score = weight(provider) * freshnessFactor(lastObserved) where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect). Observations lacking verified signatures receive policy-configured penalties.
Aggregate scores per status: W(status) = Σ score(statements with that status).
Pick rollupStatus = argmax_status W(status).
Tie‑breakers (in order):
- Higher max single provider score wins (vendor > distro > platform > hub).
- More recent lastObserved wins.
- Deterministic lexicographic order of status (fixed > not_affected > under_investigation > affected) as final tiebreaker.
Explain: mark accepted observations (accepted=true; reason="weight"/"freshness"/"confidence") and rejected ones with explicit reason ("insufficient_justification", "signature_unverified", "lower_weight", "low_confidence_linkset").

The algorithm is pure given S and policy snapshot; result is reproducible and hashed into consensusDigest.

7) Query & export APIs

All endpoints are versioned under /api/v1/vex.

7.1 Query (online)

POST /observations/search
  body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
  → { observations[], nextPageToken? }

POST /linksets/search
  body: { vulnIds?: string[], productKeys?: string[], confidence?: string[], since?: timestamp, limit?: int, pageToken?: string }
  → { linksets[], nextPageToken? }

POST /consensus/search
  body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
  → { entries[], nextPageToken? }

POST /excititor/resolve (scope: vex.read)
  body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
  → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, observations[], conflicts[], linksetConfidence, consensus?, signals?, envelope? } ] }

7.2 Exports (cacheable snapshots)

POST /exports
  body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
  → { exportId, artifactSha256, rekor? }

GET  /exports/{exportId}        → bytes (application/json or binary index)
GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }

7.3 Provider operations

GET  /providers                  → provider list & signature policy
POST /providers/{id}/refresh     → trigger fetch/normalize window
GET  /providers/{id}/status      → last fetch, doc counts, signature stats

Auth: service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC.

8) Attestation integration

Exports can be DSSE‑signed via Signer and logged to Rekor v2 via Attestor (optional but recommended for regulated pipelines).
vex.exports.rekor stores {uuid, index, url} when present.
Predicate type: https://stella-ops.org/attestations/vex-export/1 with fields:
- querySignature, policyRevisionId, artifactSha256, createdAt.

9) Configuration (YAML)

excititor:
  mongo: { uri: "mongodb://mongo/excititor" }
  s3:
    endpoint: http://minio:9000
    bucket: stellaops
  policy:
    weights:
      vendor: 1.0
      distro: 0.9
      platform: 0.7
      hub: 0.5
      attestation: 0.6
      ceiling: 1.25
    scoring:
      alpha: 0.25
      beta: 0.5
    providerOverrides:
      redhat: 1.0
      suse: 0.95
    requireJustificationForNotAffected: true
    signatureRequiredForFixed: true
    minEvidence:
      not_affected:
        vendorOrTwoDistros: true
  connectors:
    - providerId: redhat
      kind: csaf
      baseUrl: https://access.redhat.com/security/data/csaf/v2/
      signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
      windowDays: 7
    - providerId: suse
      kind: csaf
      baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
      signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
    - providerId: ubuntu
      kind: openvex
      baseUrl: https://…/vex/
      signaturePolicy: { type: none }
    - providerId: vendorX
      kind: cyclonedx-vex
      ociRef: ghcr.io/vendorx/vex@sha256:…
      signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }

9.1 WebService endpoints

With storage configured, the WebService exposes the following ingress and diagnostic APIs:

GET /excititor/status – returns the active storage configuration and registered artifact stores.
GET /excititor/health – simple liveness probe.
POST /excititor/statements – accepts normalized VEX statements and persists them via IVexClaimStore; use this for migrations/backfills.
GET /excititor/statements/{vulnId}/{productKey}?since= – returns the immutable statement log for a vulnerability/product pair.
POST /excititor/resolve – requires vex.read scope; accepts up to 256 (vulnId, productKey) pairs via productKeys or purls and returns deterministic consensus results, decision telemetry, and a signed envelope (artifact digest, optional signer signature, optional attestation metadata + DSSE envelope). Returns 409 Conflict when the requested policyRevisionId mismatches the active snapshot.

Run the ingestion endpoint once after applying migration 20251019-consensus-signals-statements to repopulate historical statements with the new severity/KEV/EPSS signal fields.

weights.ceiling raises the deterministic clamp applied to provider tiers/overrides (range 1.0‒5.0). Values outside the range are clamped with warnings so operators can spot typos.
scoring.alpha / scoring.beta configure KEV/EPSS boosts for the Phase 1 → Phase 2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics.

10) Security model

Input signature verification enforced per provider policy (PGP, cosign, x509).
Connector allowlists: outbound fetch constrained to configured domains.
Tenant isolation: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies.
AuthN/Z: Authority‑issued OpToks; RBAC roles (vex.read, vex.admin, vex.export).
No secrets in logs; deterministic logging contexts include providerId, docDigest, observationId, and linksetId.

11) Performance & scale

Targets:
- Normalize 10k observation statements/minute/core.
- Linkset rebuild ≤ 20 ms P95 for 1k unique (vuln, product) pairs in hot cache.
- Consensus (when enabled) compute ≤ 50 ms for 1k unique (vuln, product) pairs.
- Export (observations + linksets) 1M rows in ≤ 60 s on 8 cores with streaming writer.
Scaling:
- WebService handles control APIs; Worker background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys.
- Exports stream straight to S3 (MinIO) with rolling buffers.
Caching:
- vex.cache maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless force.

11.1 Worker TTL refresh controls

Excititor.Worker ships with a background refresh service that re-evaluates stale consensus rows and applies stability dampers before publishing status flips. Operators can tune its behaviour through the following configuration (shown in appsettings.json syntax):

{
  "Excititor": {
    "Worker": {
      "Refresh": {
        "Enabled": true,
        "ConsensusTtl": "02:00:00",       // refresh consensus older than 2 hours
        "ScanInterval": "00:10:00",       // sweep cadence
        "ScanBatchSize": 250,              // max documents examined per sweep
        "Damper": {
          "Minimum": "1.00:00:00",       // lower bound before status flip publishes
          "Maximum": "2.00:00:00",       // upper bound guardrail
          "DefaultDuration": "1.12:00:00",
          "Rules": [
            { "MinWeight": 0.90, "Duration": "1.00:00:00" },
            { "MinWeight": 0.75, "Duration": "1.06:00:00" },
            { "MinWeight": 0.50, "Duration": "1.12:00:00" }
          ]
        }
      }
    }
  }
}

ConsensusTtl governs when the worker issues a fresh resolve for cached consensus data.
Damper lengths are clamped between Minimum/Maximum; duration is bypassed when component fingerprints (VexProduct.ComponentIdentifiers) change.
The same keys are available through environment variables (e.g., Excititor__Worker__Refresh__ConsensusTtl=02:00:00).

12) Observability

Metrics:
- vex.fetch.requests_total{provider} / vex.fetch.bytes_total{provider}
- vex.fetch.failures_total{provider,reason} / vex.signature.failures_total{provider,method}
- vex.normalize.statements_total{provider}
- vex.observations.write_total{result}
- vex.linksets.updated_total{result} / vex.linksets.conflicts_total{type}
- vex.consensus.rollup_total{status} (when enabled)
- vex.exports.bytes_total{format} / vex.exports.latency_seconds{format}
Tracing: spans for fetch, verify, parse, map, observe, linkset, consensus, export.
Dashboards: provider staleness, linkset conflict hot spots, signature posture, export cache hit-rate.

13) Testing matrix

Connectors: golden raw docs → deterministic observation statements (fixtures per provider/format).
Signature policies: valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
Normalization edge cases: platform-scoped statements, free-text justifications, non-purl products.
Linksets: conflict scenarios across tiers; verify confidence scoring + conflict payload stability.
Consensus (optional): ensure tie-breakers honour policy weights/justification gates.
Performance: 1M-row observation/linkset export timing; memory ceilings; stream correctness.
Determinism: same inputs + policy → identical linkset hashes, conflict payloads, optional consensusDigest, and export bytes.
API contract tests: pagination, filters, RBAC, rate limits.

14) Integration points

Backend Policy Engine (in Scanner.WebService): calls POST /excititor/resolve (scope vex.read) with batched (purl, vulnId) pairs to fetch rollupStatus + sources.
Concelier: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation.
UI: VEX explorer screens use /observations/search, /linksets/search, and /consensus/search; show conflicts & provenance.
CLI: stella vex linksets export --since 7d --out vex-linksets.json (optionally --include-consensus) for audits and Offline Kit parity.

15) Failure modes & fallback

Provider unreachable: stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor).
Signature outage: continue to ingest but mark signatureState.verified=false; consensus will likely exclude or down‑weight per policy.
Schema drift: unknown fields are preserved as evidence; normalization rejects only on invalid identity or status.

16) Rollout plan (incremental)

MVP: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + /excititor/resolve.
Signature policies: PGP for distros; cosign for OCI.
Exports + optional attestation.
CycloneDX VEX connectors; platform claim expansion tables; UI explorer.
Scale hardening: export indexes; conflict analytics.

17) Operational runbooks

Statement backfill — see docs/dev/EXCITITOR_STATEMENT_BACKFILL.md for the CLI workflow, required permissions, observability guidance, and rollback steps.

18) Appendix — canonical JSON (stable ordering)

All exports and consensus entries are serialized via VexCanonicalJsonSerializer:

UTF‑8 without BOM;
keys sorted (ASCII);
arrays sorted by (providerId, vulnId, productKey, lastObserved) unless semantic order mandated;
timestamps in YYYY‑MM‑DDThh:mm:ssZ;
no insignificant whitespace.

25 KiB Raw Blame History Unescape Escape

component_architecture_excititor.md — Stella Ops Excititor (Sprint 22)