# component_architecture_excititor.md — **Stella Ops Excititor** (2025Q4) > **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, APIs, plug‑in contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Concelier, and the attestation chain. It is implementation‑ready. --- ## 0) Mission & role in the platform **Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress non‑exploitable findings, prioritize remaining risk, and explain decisions. **Boundaries.** * Excititor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights). * Excititor preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable. * VEX consumption is **backend‑only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Excititor for status evidence and then decides what to show. --- ## 1) Inputs, outputs & canonical domain ### 1.1 Accepted input formats (ingest) * **OpenVEX** JSON documents (attested or raw). * **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF). * **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks). * **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors. All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors. ### 1.2 Canonical model (normalized) Every incoming statement becomes a set of **VexClaim** records: ``` VexClaim - providerId // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX' - vulnId // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized - productKey // canonical product identity (see §2.2) - status // affected | not_affected | fixed | under_investigation - justification? // for 'not_affected'/'affected' where provided - introducedVersion? // semantics per provider (range or exact) - fixedVersion? // where provided (range or exact) - lastObserved // timestamp from source or fetch time - provenance // doc digest, signature status, fetch URI, line/offset anchors - evidence[] // raw source snippets for explainability - supersedes? // optional cross-doc chain (docDigest → docDigest) ``` ### 1.3 Exports (consumption) * **VexConsensus** per `(vulnId, productKey)` with: * `rollupStatus` (after policy weights/justification gates), * `sources[]` (winning + losing claims with weights & reasons), * `policyRevisionId` (identifier of the Excititor policy used), * `consensusDigest` (stable SHA‑256 over canonical JSON). * **Raw claims** export for auditing (unchanged, with provenance). * **Provider snapshots** (per source, last N days) for operator debugging. * **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`. All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2. --- ## 2) Identity model — products & joins ### 2.1 Vuln identity * Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets. * **Alias graph** maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable. ### 2.2 Product identity (`productKey`) * **Primary:** `purl` (Package URL). * **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable. * **Fallback:** `oci:/@` for image‑level VEX. * **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`). > Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping. --- ## 3) Storage schema (MongoDB) Database: `excititor` ### 3.1 Collections **`vex.providers`** ``` _id: providerId name, homepage, contact trustTier: enum {vendor, distro, platform, hub, attestation} signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] } fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays } enabled: bool createdAt, modifiedAt ``` **`vex.raw`** (immutable raw documents) ``` _id: sha256(doc bytes) providerId uri ingestedAt contentType sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? } payload: GridFS pointer (if large) disposition: kept|replaced|superseded correlation: { replaces?: sha256, replacedBy?: sha256 } ``` **`vex.statements`** (immutable normalized rows; append-only event log) ``` _id: ObjectId providerId vulnId productKey status justification? introducedVersion? fixedVersion? lastObserved docDigest provenance { uri, line?, pointer?, signatureState } evidence[] { key, value, locator } signals? { severity? { scheme, score?, label?, vector? } kev?: bool epss?: double } insertedAt indices: - {vulnId:1, productKey:1} - {providerId:1, insertedAt:-1} - {docDigest:1} - {status:1} - text index (optional) on evidence.value for debugging ``` **`vex.consensus`** (rollups) ``` _id: sha256(canonical(vulnId, productKey, policyRevision)) vulnId productKey rollupStatus sources[]: [ { providerId, status, justification?, weight, lastObserved, accepted:bool, reason } ] policyRevisionId evaluatedAt signals? { severity? { scheme, score?, label?, vector? } kev?: bool epss?: double } consensusDigest // same as _id indices: - {vulnId:1, productKey:1} - {policyRevisionId:1, evaluatedAt:-1} ``` **`vex.exports`** (manifest of emitted artifacts) ``` _id querySignature format: raw|consensus|index artifactSha256 rekor { uuid, index, url }? createdAt policyRevisionId cacheable: bool ``` **`vex.cache`** ``` querySignature -> exportId (for fast reuse) ttl, hits ``` **`vex.migrations`** * ordered migrations applied at bootstrap to ensure indexes. * `20251019-consensus-signals-statements` introduces the statements log indexes and the `policyRevisionId + evaluatedAt` lookup for consensus — rerun consensus writers once to hydrate newly persisted signals. ### 3.2 Indexing strategy * Hot path queries use exact `(vulnId, productKey)` and time‑bounded windows; compound indexes cover both. * Providers list view by `lastObserved` for monitoring staleness. * `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse. --- ## 4) Ingestion pipeline ### 4.1 Connector contract ```csharp public interface IVexConnector { string ProviderId { get; } Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[] } ``` * **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff. * **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**. ### 4.2 Signature verification (per provider) * **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles. * **PGP** (provider keyrings) for distro/vendor feeds that sign docs. * **x509** (mutual TLS / provider‑pinned certs) where applicable. * Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims. > Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can down‑weight or ignore them. ### 4.3 Time discipline * For each doc, prefer **provider’s document timestamp**; if absent, use fetch time. * Claims carry `lastObserved` which drives **tie‑breaking** within equal weight tiers. --- ## 5) Normalization: product & status semantics ### 5.1 Product mapping * **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb). * Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied. * If expansion would be speculative, the claim remains **platform‑scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non‑joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime. ### 5.2 Status + justification mapping * Canonical **status**: `affected | not_affected | fixed | under_investigation`. * **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.: * `component_not_present` * `vulnerable_code_not_in_execute_path` * `vulnerable_configuration_unused` * `inline_mitigation_applied` * `fix_available` (with `fixedVersion`) * `under_investigation` * Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`. --- ## 6) Consensus algorithm **Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims. ### 6.1 Inputs * Set **S** of `VexClaim` for the key. * **Excititor policy snapshot**: * **weights** per provider tier and per provider overrides. * **justification gates** (e.g., require justification for `not_affected` to be acceptable). * **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros). * **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered). ### 6.2 Steps 1. **Filter invalid** claims by signature policy & justification gates → set `S'`. 2. **Score** each claim: `score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect). 3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`. 4. **Pick** `rollupStatus = argmax_status W(status)`. 5. **Tie‑breakers** (in order): * Higher **max single** provider score wins (vendor > distro > platform > hub). * More **recent** lastObserved wins. * Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker. 6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`). > The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`. --- ## 7) Query & export APIs All endpoints are versioned under `/api/v1/vex`. ### 7.1 Query (online) ``` POST /claims/search body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string } → { claims[], nextPageToken? } POST /consensus/search body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string } → { entries[], nextPageToken? } POST /excititor/resolve (scope: vex.read) body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string } → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] } ``` ### 7.2 Exports (cacheable snapshots) ``` POST /exports body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool } → { exportId, artifactSha256, rekor? } GET /exports/{exportId} → bytes (application/json or binary index) GET /exports/{exportId}/meta → { signature, policyRevisionId, createdAt, artifactSha256, rekor? } ``` ### 7.3 Provider operations ``` GET /providers → provider list & signature policy POST /providers/{id}/refresh → trigger fetch/normalize window GET /providers/{id}/status → last fetch, doc counts, signature stats ``` **Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC. --- ## 8) Attestation integration * Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines). * `vex.exports.rekor` stores `{uuid, index, url}` when present. * **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields: * `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`. --- ## 9) Configuration (YAML) ```yaml excititor: mongo: { uri: "mongodb://mongo/excititor" } s3: endpoint: http://minio:9000 bucket: stellaops policy: weights: vendor: 1.0 distro: 0.9 platform: 0.7 hub: 0.5 attestation: 0.6 ceiling: 1.25 scoring: alpha: 0.25 beta: 0.5 providerOverrides: redhat: 1.0 suse: 0.95 requireJustificationForNotAffected: true signatureRequiredForFixed: true minEvidence: not_affected: vendorOrTwoDistros: true connectors: - providerId: redhat kind: csaf baseUrl: https://access.redhat.com/security/data/csaf/v2/ signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] } windowDays: 7 - providerId: suse kind: csaf baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] } - providerId: ubuntu kind: openvex baseUrl: https://…/vex/ signaturePolicy: { type: none } - providerId: vendorX kind: cyclonedx-vex ociRef: ghcr.io/vendorx/vex@sha256:… signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] } ``` ### 9.1 WebService endpoints With storage configured, the WebService exposes the following ingress and diagnostic APIs: * `GET /excititor/status` – returns the active storage configuration and registered artifact stores. * `GET /excititor/health` – simple liveness probe. * `POST /excititor/statements` – accepts normalized VEX statements and persists them via `IVexClaimStore`; use this for migrations/backfills. * `GET /excititor/statements/{vulnId}/{productKey}?since=` – returns the immutable statement log for a vulnerability/product pair. * `POST /excititor/resolve` – requires `vex.read` scope; accepts up to 256 `(vulnId, productKey)` pairs via `productKeys` or `purls` and returns deterministic consensus results, decision telemetry, and a signed envelope (`artifact` digest, optional signer signature, optional attestation metadata + DSSE envelope). Returns **409 Conflict** when the requested `policyRevisionId` mismatches the active snapshot. Run the ingestion endpoint once after applying migration `20251019-consensus-signals-statements` to repopulate historical statements with the new severity/KEV/EPSS signal fields. * `weights.ceiling` raises the deterministic clamp applied to provider tiers/overrides (range 1.0‒5.0). Values outside the range are clamped with warnings so operators can spot typos. * `scoring.alpha` / `scoring.beta` configure KEV/EPSS boosts for the Phase 1 → Phase 2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics. --- ## 10) Security model * **Input signature verification** enforced per provider policy (PGP, cosign, x509). * **Connector allowlists**: outbound fetch constrained to configured domains. * **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies. * **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`). * **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys. --- ## 11) Performance & scale * **Targets:** * Normalize 10k VEX claims/minute/core. * Consensus compute ≤ 50 ms for 1k unique `(vuln, product)` pairs in hot cache. * Export (consensus) 1M rows in ≤ 60 s on 8 cores with streaming writer. * **Scaling:** * WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys. * Exports stream straight to S3 (MinIO) with rolling buffers. * **Caching:** * `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`. ### 11.1 Worker TTL refresh controls Excititor.Worker ships with a background refresh service that re-evaluates stale consensus rows and applies stability dampers before publishing status flips. Operators can tune its behaviour through the following configuration (shown in `appsettings.json` syntax): ```jsonc { "Excititor": { "Worker": { "Refresh": { "Enabled": true, "ConsensusTtl": "02:00:00", // refresh consensus older than 2 hours "ScanInterval": "00:10:00", // sweep cadence "ScanBatchSize": 250, // max documents examined per sweep "Damper": { "Minimum": "1.00:00:00", // lower bound before status flip publishes "Maximum": "2.00:00:00", // upper bound guardrail "DefaultDuration": "1.12:00:00", "Rules": [ { "MinWeight": 0.90, "Duration": "1.00:00:00" }, { "MinWeight": 0.75, "Duration": "1.06:00:00" }, { "MinWeight": 0.50, "Duration": "1.12:00:00" } ] } } } } } ``` * `ConsensusTtl` governs when the worker issues a fresh resolve for cached consensus data. * `Damper` lengths are clamped between `Minimum`/`Maximum`; duration is bypassed when component fingerprints (`VexProduct.ComponentIdentifiers`) change. * The same keys are available through environment variables (e.g., `Excititor__Worker__Refresh__ConsensusTtl=02:00:00`). --- ## 12) Observability * **Metrics:** * `vex.ingest.docs_total{provider}` * `vex.normalize.claims_total{provider}` * `vex.signature.failures_total{provider,method}` * `vex.consensus.conflicts_total{vulnId}` * `vex.exports.bytes{format}` / `vex.exports.latency_seconds` * **Tracing:** spans for fetch, verify, parse, map, consensus, export. * **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hit‑rate. --- ## 13) Testing matrix * **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format). * **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted. * **Normalization edge cases:** platform‑only claims, free‑text justifications, non‑purl products. * **Consensus:** conflict scenarios across tiers; check tie‑breakers; justification gates. * **Performance:** 1M‑row export timing; memory ceilings; stream correctness. * **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes. * **API contract tests:** pagination, filters, RBAC, rate limits. --- ## 14) Integration points * **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`. * **Concelier**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation. * **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance. * **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits. --- ## 15) Failure modes & fallback * **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor). * **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy. * **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**. --- ## 16) Rollout plan (incremental) 1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`. 2. **Signature policies**: PGP for distros; cosign for OCI. 3. **Exports + optional attestation**. 4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer. 5. **Scale hardening**: export indexes; conflict analytics. --- ## 17) Operational runbooks * **Statement backfill** — see `docs/dev/EXCITITOR_STATEMENT_BACKFILL.md` for the CLI workflow, required permissions, observability guidance, and rollback steps. --- ## 18) Appendix — canonical JSON (stable ordering) All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`: * UTF‑8 without BOM; * keys sorted (ASCII); * arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated; * timestamps in `YYYY‑MM‑DDThh:mm:ssZ`; * no insignificant whitespace.