Files
git.stella-ops.org/docs/modules/excititor/architecture.md
StellaOps Bot bc0762e97d up
2025-12-09 00:20:52 +02:00

36 KiB
Raw Blame History

component_architecture_excititor.md — StellaOps Excititor (Sprint22)

Consolidates the VEX ingestion guardrails from Epic1 with consensus and AI-facing requirements from Epics7 and8. This is the authoritative architecture record for Excititor.

Scope. This document specifies the Excititor service: its purpose, trust model, data structures, observation/linkset pipelines, APIs, plug-in contracts, storage schema, performance budgets, testing matrix, and how it integrates with Concelier, Policy Engine, and evidence surfaces. It is implementation-ready. The immutable observation store schema lives in vex_observations.md.


0) Mission & role in the platform

Mission. Convert heterogeneous VEX statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into immutable VEX observations, correlate them into linksets that retain provenance/conflicts without precedence, and publish deterministic evidence exports and events that Policy Engine, Console, and CLI use to suppress or explain findings.

Boundaries.

  • Excititor does not decide PASS/FAIL. It supplies evidence (statuses + justifications + provenance weights).
  • Excititor preserves conflicting observations unchanged; consensus (when enabled) merely annotates how policy might choose, but raw evidence remains exportable.
  • VEX consumption is backend-only: Scanner never applies VEX. The backends Policy Engine asks Excititor for status evidence and then decides what to show.

1) Aggregation guardrails (AOC baseline)

Excititor enforces the same ingestion covenant as Concelier, tailored to VEX payloads:

  1. Immutable vex_raw documents. Upstream OpenVEX/CSAF/CycloneDX files are stored verbatim (content.raw) with provenance (issuer, statement_id, timestamps, signatures). Revisions append new versions linked by supersedes.
  2. No derived consensus at ingest time. Fields such as effective_status, merged_state, severity, or reachability are forbidden. Roslyn analyzers and runtime guards block violations before writes.
  3. Linkset-only joins. Product aliases, CVE keys, SBOM hints, and references live under linkset; ingestion must never mutate the underlying statement.

Raw VEX endpoints (WebService)

  • POST /ingest/vex (scope: vex.admin) accepts deterministic VexIngestRequest payloads. Clients must send X-Stella-Tenant. Optional dependencies (e.g., orchestrators, loggers) are wired through [FromServices] SomeType? service = null parameters so tests do not need bespoke service registrations.
  • GET /vex/raw, GET /vex/raw/{digest}, and GET /vex/raw/{digest}/provenance (scope: vex.read) expose raw documents, cursored listings, and metadata-only projections.
  • POST /aoc/verify replays stored documents through the Aggregation-Only Contract for audits and Grafana alert sources.
  • To satisfy the AOC rule forbidding derived data, serialized raw responses omit the statements array unless replay tooling explicitly materializes it.
  • Optional/minor DI dependencies must be declared as [FromServices] IFoo? foo = null parameters so host startup (and tests) remain stable when the service is not registered.
  1. Deterministic canonicalisation. Writers sort JSON keys/arrays, normalize timestamps (UTC ISO8601), and hash content for reproducible exports.
  2. AOC verifier. StellaOps.AOC.Verifier runs in CI and production, checking schema compliance, provenance completeness, sorted collections, and signature metadata.

1.1 VEX raw document shape

{
  "_id": "vex_raw:openvex:VEX-2025-00001:v2",
  "source": {
    "issuer": "vendor:redhat",
    "stream": "openvex",
    "api": "https://vendor/api/vex/VEX-2025-00001.json",
    "collector_version": "excititor/0.9.4"
  },
  "upstream": {
    "statement_id": "VEX-2025-00001",
    "document_version": "2025-08-30T12:00:00Z",
    "fetched_at": "2025-08-30T12:05:00Z",
    "received_at": "2025-08-30T12:05:01Z",
    "content_hash": "sha256:...",
    "signature": {
      "present": true,
      "format": "dsse",
      "key_id": "rekor:uuid",
      "sig": "base64..."
    }
  },
  "content": {
    "format": "openvex",
    "spec_version": "1.0",
    "raw": { /* upstream statement */ }
  },
  "identifiers": {
    "cve": ["CVE-2025-13579"],
    "products": [
      {"purl": "pkg:rpm/redhat/openssl@3.0.9", "component": "openssl"}
    ]
  },
  "linkset": {
    "aliases": ["REDHAT:RHSA-2025:1234"],
    "sbom_products": ["pkg:rpm/redhat/openssl@3.0.9"],
    "justifications": ["reasonable_worst_case_assumption"],
    "references": [
      {"type": "advisory", "url": "https://..."}
    ]
  },
  "supersedes": "vex_raw:openvex:VEX-2025-00001:v1",
  "tenant": "default"
}

1.2 Issuer trust registry

To enable Epic7s consensus lens, Excititor maintains vex_issuer_registry documents containing:

  • issuer_id, canonical name, and allowed domains.
  • trust.tier (critical, high, medium, low), trust.confidence (01).
  • products PURL patterns the issuer is authoritative for.
  • signing_keys with key IDs and expiry.
  • last_validated_at, revocation_status.

The registry is distributed as a signed bundle and cached locally; ingestion rejects statements from issuers without registry entries or valid signatures.

1.3 Normalised tuple store

Excititor derives vex_normalized tuples (without making decisions) for downstream consumers:

{
  "advisory_key": "CVE-2025-13579",
  "artifact": "pkg:rpm/redhat/openssl@3.0.9",
  "issuer": "vendor:redhat",
  "status": "not_affected",
  "justification": "component_not_present",
  "scope": "runtime_path",
  "timestamp": "2025-08-30T12:00:00Z",
  "trust": {"tier": "high", "confidence": 0.95},
  "statement_id": "VEX-2025-00001:v2",
  "content_hash": "sha256:..."
}

These tuples allow VEX Lens to compute deterministic consensus without re-parsing heavy upstream documents.

Excititor workers now hydrate signature metadata with issuer trust data retrieved from the Issuer Directory service. The worker-side IssuerDirectoryClient performs tenant-aware lookups (including global fallbacks) and caches responses offline so attestation verification exposes an effective trust weight alongside the cryptographic details captured on ingest.

1.4 AI-ready citations

GET /v1/vex/statements/{advisory_key} produces sorted JSON responses containing raw statement metadata (issuer, content_hash, signature), normalised tuples, and provenance pointers. Advisory AI consumes this endpoint to build retrieval contexts with explicit citations.

1.5 Postgres raw store (replaces Mongo/GridFS)

Mongo/BSON/GridFS are being removed. This is the canonical design for the Postgres-backed raw store that powers /vex/raw and ingestion.

Schema: vex

  • vex_raw_documents (append-only)

    • digest TEXT PRIMARY KEYsha256:{hex} of canonical UTF-8 JSON bytes.
    • tenant TEXT NOT NULL
    • provider_id TEXT NOT NULL
    • format TEXT NOT NULL CHECK (format IN ('openvex','csaf','cyclonedx','custom'))
    • source_uri TEXT NOT NULL, etag TEXT NULL
    • retrieved_at TIMESTAMPTZ NOT NULL, recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
    • supersedes_digest TEXT NULL REFERENCES vex_raw_documents(digest)
    • content_json JSONB NOT NULL — canonicalised payload (truncated when blobbed)
    • content_size_bytes INT NOT NULL
    • metadata_json JSONB NOT NULL — statement_id, issuer, spec_version, content_type, connector version, hashes, quarantine flags
    • provenance_json JSONB NOT NULL — DSSE/chain/rekor/trust info
    • inline_payload BOOLEAN NOT NULL DEFAULT TRUE
    • UNIQUE (tenant, provider_id, source_uri, etag)
    • Indexes: (tenant, retrieved_at DESC), (tenant, provider_id, retrieved_at DESC), (tenant, supersedes_digest), GIN on metadata_json, GIN on provenance_json.
  • vex_raw_blobs (large payloads)

    • digest TEXT PRIMARY KEY REFERENCES vex_raw_documents(digest) ON DELETE CASCADE
    • payload BYTEA NOT NULL (canonical JSON bytes; no compression to preserve determinism)
    • payload_hash TEXT NOT NULL (hash of stored bytes)
  • vex_raw_attachments (optional future)

    • digest TEXT REFERENCES vex_raw_documents(digest) ON DELETE CASCADE
    • name TEXT NOT NULL, media_type TEXT NOT NULL
    • payload BYTEA NOT NULL, payload_hash TEXT NOT NULL
    • PRIMARY KEY (digest, name)
  • Observations/linksets — use the append-only Postgres linkset schema already defined for IAppendOnlyLinksetStore (tables vex_linksets, vex_linkset_observations, vex_linkset_disagreements, vex_linkset_mutations) with indexes on (tenant, vulnerability_id, product_key) and updated_at.

Canonicalisation & hashing

  1. Parse upstream JSON; sort keys; normalize newlines; encode UTF-8 without BOM. Preserve array order.
  2. Compute digest = "sha256:{hex}" over canonical bytes.
  3. If size <= inline_threshold_bytes (default 256 KiB) set inline_payload=true and store in content_json; otherwise store bytes in vex_raw_blobs and set inline_payload=false.
  4. Persist content_size_bytes (pre-canonical length) and payload_hash for integrity.

API mapping (replaces Mongo/BSON)
List/query /vex/raw via SELECT ... FROM vex.vex_raw_documents WHERE tenant=@t ORDER BY retrieved_at DESC, digest LIMIT @n OFFSET @offset; cursor uses (retrieved_at, digest). GET /vex/raw/{digest} loads the row and optional blob; GET /vex/raw/{digest}/provenance projects provenance_json + metadata_json. Filters (providerId, format, since, until, supersedes, hasAttachments) map to indexed predicates; JSON subfields use metadata_json ->> 'field'.

Write semantics

  • IVexRawStore Postgres implementation enforces append-only inserts; duplicate digest => no-op; duplicate (tenant, provider_id, source_uri, etag) with new digest inserts a new row and sets supersedes_digest.
  • IVexRawWriteGuard runs before insert; tenant is mandatory on every query and write.

Rollout

  1. Add migration under src/Excititor/__Libraries/StellaOps.Excititor.Storage.Postgres/Migrations creating the tables/indexes above.
  2. Implement PostgresVexRawStore and switch WebService/Worker DI to AddExcititorPostgresStorage; remove VexMongoStorageOptions, IMongoDatabase, and GridFS paths.
  3. Update /vex/raw endpoints/tests to the Postgres store; delete Mongo fixtures once parity is green. Mark Mongo storage paths as deprecated and remove them in the next release.

2) Inputs, outputs & canonical domain

1.1 Accepted input formats (ingest)

  • OpenVEX JSON documents (attested or raw).
  • CSAF VEX 2.x (vendor PSIRTs and distros commonly publish CSAF).
  • CycloneDX VEX 1.4+ (standalone VEX or embedded VEX blocks).
  • OCIattached attestations (VEX statements shipped as OCI referrers) — optional connectors.

All connectors register source metadata: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.

1.2 Canonical model (observations & linksets)

VexObservation

observationId       // {tenant}:{providerId}:{upstreamId}:{revision}
tenant
providerId          // e.g., redhat, suse, ubuntu, osv
streamId            // connector stream (csaf, openvex, cyclonedx, attestation)
upstream{
    upstreamId,
    documentVersion?,
    fetchedAt,
    receivedAt,
    contentHash,
    signature{present, format?, keyId?, signature?}
}
statements[
  {
    vulnerabilityId,
    productKey,
    status,                    // affected | not_affected | fixed | under_investigation
    justification?,
    introducedVersion?,
    fixedVersion?,
    lastObserved,
    locator?,                  // JSON Pointer/line for provenance
    evidence?[]
  }
]
content{
    format,
    specVersion?,
    raw
}
linkset{
    aliases[],                 // CVE/GHSA/vendor IDs
    purls[],
    cpes[],
    references[{type,url}],
    reconciledFrom[]
}
supersedes?
createdAt
attributes?

VexLinkset

linksetId           // sha256 over sorted (tenant, vulnId, productKey, observationIds)
tenant
key{
    vulnerabilityId,
    productKey,
    confidence          // low|medium|high
}
observations[] = [
  {
    observationId,
    providerId,
    status,
    justification?,
    introducedVersion?,
    fixedVersion?,
    evidence?,
    collectedAt
  }
]
aliases{
    primary,
    others[]
}
purls[]
cpes[]
conflicts[]?        // see VexLinksetConflict
createdAt
updatedAt

VexLinksetConflict

conflictId
type                // status-mismatch | justification-divergence | version-range-clash | non-joinable-overlap | metadata-gap
field?              // optional pointer for UI rendering
statements[]        // per-observation values with providerId + status/justification/version data
confidence
detectedAt

VexConsensus (optional)

consensusId         // sha256(vulnerabilityId, productKey, policyRevisionId)
vulnerabilityId
productKey
rollupStatus        // derived by Excititor policy adapter (linkset aware)
sources[]           // observation references with weight, accepted flag, reason
policyRevisionId
evaluatedAt
consensusDigest

Consensus persists only when Excititor policy adapters require pre-computed rollups (e.g., Offline Kit). Policy Engine can also compute consensus on demand from linksets.

1.3 Exports & evidence bundles

  • Raw observations — JSON tree per observation for auditing/offline.
  • Linksets — grouped evidence for policy/Console/CLI consumption.
  • Consensus (optional) — if enabled, mirrors existing API contracts.
  • Provider snapshots — last N days of observations per provider to support diagnostics.
  • Index(productKey, vulnerabilityId) → {status candidates, confidence, observationIds} for high-speed joins.

All exports remain deterministic and, when configured, attested via DSSE + Rekor v2.


3) Identity model — products & joins

2.1 Vuln identity

  • Accepts CVE, GHSA, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to vulnId with alias sets.
  • Alias graph maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to GHSA where applicable.

2.2 Product identity (productKey)

  • Primary: purl (Package URL).
  • Secondary links: cpe, OS package NVRA/EVR, NuGet/Maven/Golang identity, and OS package name when purl unavailable.
  • Fallback: oci:<registry>/<repo>@<digest> for imagelevel VEX.
  • Special cases: kernel modules, firmware, platforms → providerspecific mapping helpers (connector captures providers product taxonomy → canonical productKey).

Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native product string and mark the claim as nonjoinable; the backend will ignore it unless a policy explicitly whitelists that provider mapping.


4) Storage schema (MongoDB)

Database: excititor

3.1 Collections

vex.providers

_id: providerId
name, homepage, contact
trustTier: enum {vendor, distro, platform, hub, attestation}
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
enabled: bool
createdAt, modifiedAt

vex.raw (immutable raw documents)

_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }

vex.observations

{
  _id: "tenant:providerId:upstreamId:revision",
  tenant,
  providerId,
  streamId,
  upstream: { upstreamId, documentVersion?, fetchedAt, receivedAt, contentHash, signature },
  statements: [
    {
      vulnerabilityId,
      productKey,
      status,
      justification?,
      introducedVersion?,
      fixedVersion?,
      lastObserved,
      locator?,
      evidence?
    }
  ],
  content: { format, specVersion?, raw },
  linkset: { aliases[], purls[], cpes[], references[], reconciledFrom[] },
  supersedes?,
  createdAt,
  attributes?
}
  • Indexes: {tenant:1, providerId:1, upstream.upstreamId:1}, {tenant:1, statements.vulnerabilityId:1}, {tenant:1, linkset.purls:1}, {tenant:1, createdAt:-1}.

vex.linksets

{
  _id: "sha256:...",
  tenant,
  key: { vulnerabilityId, productKey, confidence },
  observations: [
    { observationId, providerId, status, justification?, introducedVersion?, fixedVersion?, evidence?, collectedAt }
  ],
  aliases: { primary, others: [] },
  purls: [],
  cpes: [],
  conflicts: [],
  createdAt,
  updatedAt
}
  • Indexes: {tenant:1, key.vulnerabilityId:1, key.productKey:1}, {tenant:1, purls:1}, {tenant:1, updatedAt:-1}.

vex.events (observation/linkset events, optional long retention)

{
  _id: ObjectId,
  tenant,
  type: "vex.observation.updated" | "vex.linkset.updated",
  key,
  delta,
  hash,
  occurredAt
}
  • Indexes: {type:1, occurredAt:-1}, TTL on occurredAt for configurable retention.

vex.consensus (optional rollups)

_id: sha256(canonical(vulnerabilityId, productKey, policyRevisionId))
vulnerabilityId
productKey
rollupStatus
sources[]      // observation references with weights/reasons
policyRevisionId
evaluatedAt
signals?       // optional severity/kev/epss hints
consensusDigest
  • Indexes: {vulnerabilityId:1, productKey:1}, {policyRevisionId:1, evaluatedAt:-1}.

vex.exports (manifest of emitted artifacts)

_id
querySignature
format: raw|consensus|index
artifactSha256
rekor { uuid, index, url }?
createdAt
policyRevisionId
cacheable: bool

vex.cache — observation/linkset export cache: {querySignature, exportId, ttl, hits}.

vex.migrations — ordered migrations ensuring new indexes (20251027-linksets-introduced, etc.).

3.2 Indexing strategy

  • Hot path queries rely on {tenant, key.vulnerabilityId, key.productKey} covering linkset lookup.
  • Observability queries use {tenant, updatedAt} to monitor staleness.
  • Consensus (if enabled) keyed by {vulnerabilityId, productKey, policyRevisionId} for deterministic reuse.

5) Ingestion pipeline

4.1 Connector contract

public interface IVexConnector
{
    string ProviderId { get; }
    Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs
    Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> ObservationStatements[]
}
  • Fetch must implement: window scheduling, conditional GET (ETag/IfModifiedSince), rate limiting, retry/backoff.
  • Normalize parses the format, validates schema, maps product identities deterministically, emits observation statements with provenance metadata (locator, justification, version ranges).

4.2 Signature verification (per provider)

  • cosign (keyless or keyful) for OCI referrers or HTTPserved JSON with Sigstore bundles.
  • PGP (provider keyrings) for distro/vendor feeds that sign docs.
  • x509 (mutual TLS / providerpinned certs) where applicable.
  • Signature state is stored on vex.raw.sig and copied into statements[].signatureState so downstream policy can gate by verification result.

Observation statements from sources failing signature policy are marked "signatureState.verified=false" and policy can down-weight or ignore them.

4.3 Time discipline

  • For each doc, prefer providers document timestamp; if absent, use fetch time.
  • Statements carry lastObserved which drives tie-breaking within equal weight tiers.

6) Normalization: product & status semantics

5.1 Product mapping

  • purl first; cpe second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
  • Where a provider publishes platformlevel VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits evidence indicating the rule applied.
  • If expansion would be speculative, the statement remains platform-scoped with productKey="platform:redhat:rhel:9" and is flagged non-joinable; backend can decide to use platform VEX only when Scanner proves the platform runtime.

5.2 Status + justification mapping

  • Canonical status: affected | not_affected | fixed | under_investigation.

  • Justifications normalized to a controlled vocabulary (CISAaligned), e.g.:

    • component_not_present
    • vulnerable_code_not_in_execute_path
    • vulnerable_configuration_unused
    • inline_mitigation_applied
    • fix_available (with fixedVersion)
    • under_investigation
  • Providers with freetext justifications are mapped by deterministic tables; raw text preserved as evidence.


7) Consensus algorithm

Goal: produce a stable, explainable rollupStatus per (vulnId, productKey) when consumers opt into Excititor-managed consensus derived from linksets.

6.1 Inputs

  • Set S of observation statements drawn from the current VexLinkset for (tenant, vulnId, productKey).

  • Excititor policy snapshot:

    • weights per provider tier and per provider overrides.
    • justification gates (e.g., require justification for not_affected to be acceptable).
    • minEvidence rules (e.g., not_affected must come from ≥1 vendor or 2 distros).
    • signature requirements (e.g., require verified signature for fixed to be considered).

6.2 Steps

  1. Filter invalid statements by signature policy & justification gates → set S'.

  2. Score each statement: score = weight(provider) * freshnessFactor(lastObserved) where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect). Observations lacking verified signatures receive policy-configured penalties.

  3. Aggregate scores per status: W(status) = Σ score(statements with that status).

  4. Pick rollupStatus = argmax_status W(status).

  5. Tiebreakers (in order):

    • Higher max single provider score wins (vendor > distro > platform > hub).
    • More recent lastObserved wins.
    • Deterministic lexicographic order of status (fixed > not_affected > under_investigation > affected) as final tiebreaker.
  6. Explain: mark accepted observations (accepted=true; reason="weight"/"freshness"/"confidence") and rejected ones with explicit reason ("insufficient_justification", "signature_unverified", "lower_weight", "low_confidence_linkset").

The algorithm is pure given S and policy snapshot; result is reproducible and hashed into consensusDigest.


8) Query & export APIs

All endpoints are versioned under /api/v1/vex.

7.1 Query (online)

POST /observations/search
  body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
  → { observations[], nextPageToken? }

POST /linksets/search
  body: { vulnIds?: string[], productKeys?: string[], confidence?: string[], since?: timestamp, limit?: int, pageToken?: string }
  → { linksets[], nextPageToken? }

POST /consensus/search
  body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
  → { entries[], nextPageToken? }

POST /excititor/resolve (scope: vex.read)
  body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
  → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, observations[], conflicts[], linksetConfidence, consensus?, signals?, envelope? } ] }

7.2 Exports (cacheable snapshots)

POST /exports
  body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
  → { exportId, artifactSha256, rekor? }

GET  /exports/{exportId}        → bytes (application/json or binary index)
GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }

7.3 Provider operations

GET  /providers                  → provider list & signature policy
POST /providers/{id}/refresh     → trigger fetch/normalize window
GET  /providers/{id}/status      → last fetch, doc counts, signature stats

Auth: servicetoservice via Authority tokens; operator operations via UI/CLI with RBAC.


9) Attestation integration

  • Exports can be DSSEsigned via Signer and logged to Rekor v2 via Attestor (optional but recommended for regulated pipelines).

  • vex.exports.rekor stores {uuid, index, url} when present.

  • Predicate type: https://stella-ops.org/attestations/vex-export/1 with fields:

    • querySignature, policyRevisionId, artifactSha256, createdAt.

10) Configuration (YAML)

excititor:
  mongo: { uri: "mongodb://mongo/excititor" }
  s3:
    endpoint: http://minio:9000
    bucket: stellaops
  policy:
    weights:
      vendor: 1.0
      distro: 0.9
      platform: 0.7
      hub: 0.5
      attestation: 0.6
      ceiling: 1.25
    scoring:
      alpha: 0.25
      beta: 0.5
    providerOverrides:
      redhat: 1.0
      suse: 0.95
    requireJustificationForNotAffected: true
    signatureRequiredForFixed: true
    minEvidence:
      not_affected:
        vendorOrTwoDistros: true
  connectors:
    - providerId: redhat
      kind: csaf
      baseUrl: https://access.redhat.com/security/data/csaf/v2/
      signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
      windowDays: 7
    - providerId: suse
      kind: csaf
      baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
      signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
    - providerId: ubuntu
      kind: openvex
      baseUrl: https://…/vex/
      signaturePolicy: { type: none }
    - providerId: vendorX
      kind: cyclonedx-vex
      ociRef: ghcr.io/vendorx/vex@sha256:…
      signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }

9.1 WebService endpoints

With storage configured, the WebService exposes the following ingress and diagnostic APIs (deterministic ordering, offline-friendly):

  • GET /excititor/status returns the active storage configuration and registered artifact stores.
  • GET /excititor/health simple liveness probe.
  • POST /excititor/statements accepts normalized VEX statements and persists them via IVexClaimStore; use this for migrations/backfills.
  • GET /excititor/statements/{vulnId}/{productKey}?since= returns the immutable statement log for a vulnerability/product pair.
  • POST /vex/evidence/chunks submits aggregation-only chunks (OpenAPI: schemas/vex-chunk-api.yaml); responds with deterministic chunk_digest and queue id. Telemetry published under meter StellaOps.Excititor.Chunks (see Operations).
  • POST /v1/attestations/verify verifies Evidence Locker attestations for exports/chunks using IVexAttestationVerifier; returns { valid, diagnostics } (deterministic key order). Aligns with Evidence Locker contract v1.
  • POST /excititor/resolve requires vex.read scope; accepts up to 256 (vulnId, productKey) pairs via productKeys or purls and returns deterministic consensus results, decision telemetry, and a signed envelope (artifact digest, optional signer signature, optional attestation metadata + DSSE envelope). Returns 409 Conflict when the requested policyRevisionId mismatches the active snapshot.

Run the ingestion endpoint once after applying migration 20251019-consensus-signals-statements to repopulate historical statements with the new severity/KEV/EPSS signal fields.

  • weights.ceiling raises the deterministic clamp applied to provider tiers/overrides (range 1.05.0). Values outside the range are clamped with warnings so operators can spot typos.
  • scoring.alpha / scoring.beta configure KEV/EPSS boosts for the Phase1 → Phase2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics.

11) Security model

  • Input signature verification enforced per provider policy (PGP, cosign, x509).
  • Connector allowlists: outbound fetch constrained to configured domains.
  • Tenant isolation: pertenant DB prefixes or separate DBs; pertenant S3 prefixes; pertenant policies.
  • AuthN/Z: Authorityissued OpToks; RBAC roles (vex.read, vex.admin, vex.export).
  • No secrets in logs; deterministic logging contexts include providerId, docDigest, observationId, and linksetId.

12) Performance & scale

  • Targets:

    • Normalize 10k observation statements/minute/core.
    • Linkset rebuild ≤20ms P95 for 1k unique (vuln, product) pairs in hot cache.
    • Consensus (when enabled) compute ≤50ms for 1k unique (vuln, product) pairs.
    • Export (observations + linksets) 1M rows in ≤60s on 8 cores with streaming writer.
  • Scaling:

    • WebService handles control APIs; Worker background services (same image) execute fetch/normalize in parallel with ratelimits; Mongo writes batched; upserts by natural keys.
    • Exports stream straight to S3 (MinIO) with rolling buffers.
  • Caching:

    • vex.cache maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless force.

11.1 Worker TTL refresh controls

Excititor.Worker ships with a background refresh service that re-evaluates stale consensus rows and applies stability dampers before publishing status flips. Operators can tune its behaviour through the following configuration (shown in appsettings.json syntax):

{
  "Excititor": {
    "Worker": {
      "Refresh": {
        "Enabled": true,
        "ConsensusTtl": "02:00:00",       // refresh consensus older than 2 hours
        "ScanInterval": "00:10:00",       // sweep cadence
        "ScanBatchSize": 250,              // max documents examined per sweep
        "Damper": {
          "Minimum": "1.00:00:00",       // lower bound before status flip publishes
          "Maximum": "2.00:00:00",       // upper bound guardrail
          "DefaultDuration": "1.12:00:00",
          "Rules": [
            { "MinWeight": 0.90, "Duration": "1.00:00:00" },
            { "MinWeight": 0.75, "Duration": "1.06:00:00" },
            { "MinWeight": 0.50, "Duration": "1.12:00:00" }
          ]
        }
      }
    }
  }
}
  • ConsensusTtl governs when the worker issues a fresh resolve for cached consensus data.
  • Damper lengths are clamped between Minimum/Maximum; duration is bypassed when component fingerprints (VexProduct.ComponentIdentifiers) change.
  • The same keys are available through environment variables (e.g., Excititor__Worker__Refresh__ConsensusTtl=02:00:00).

13) Observability

  • Metrics:

    • vex.fetch.requests_total{provider} / vex.fetch.bytes_total{provider}
    • vex.fetch.failures_total{provider,reason} / vex.signature.failures_total{provider,method}
    • vex.normalize.statements_total{provider}
    • vex.observations.write_total{result}
    • vex.linksets.updated_total{result} / vex.linksets.conflicts_total{type}
    • vex.consensus.rollup_total{status} (when enabled)
    • vex.exports.bytes_total{format} / vex.exports.latency_seconds{format}
  • Tracing: spans for fetch, verify, parse, map, observe, linkset, consensus, export.

  • Dashboards: provider staleness, linkset conflict hot spots, signature posture, export cache hit-rate.

  • Telemetry configuration: Excititor:Telemetry toggles OpenTelemetry for the host (Enabled, EnableTracing, EnableMetrics, ServiceName, OtlpEndpoint, optional OtlpHeaders and ResourceAttributes). Point it at the collector profile listed in docs/observability/observability.md so Excititors ingestion_* metrics land in the same Grafana dashboards as Concelier.

  • Health endpoint: /obs/excititor/health (scope vex.admin) surfaces ingest/link/signature/conflict SLOs for Console + Grafana. Thresholds are configurable via Excititor:Observability:* (see docs/observability/observability.md).

  • Local replica set: tools/mongodb/local-mongo.sh start downloads the vetted MongoDB binaries (6.0.x), boots a rs0 single-node replica set, and prints the EXCITITOR_TEST_MONGO_URI export line so storage/integration tests can bypass Mongo2Go. restart restarts in-place, clean wipes the managed data/logs for deterministic runs, and stop/status/logs cover teardown/inspection.

  • API headers: responses echo X-Stella-TraceId and X-Stella-CorrelationId to keep Console/Loki links deterministic; inbound correlation headers are preserved when present.


14) Testing matrix

  • Connectors: golden raw docs → deterministic observation statements (fixtures per provider/format).
  • Signature policies: valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
  • Normalization edge cases: platform-scoped statements, free-text justifications, non-purl products.
  • Linksets: conflict scenarios across tiers; verify confidence scoring + conflict payload stability.
  • Consensus (optional): ensure tie-breakers honour policy weights/justification gates.
  • Batch ingest validation: dotnet test src/Excititor/__Tests/StellaOps.Excititor.WebService.Tests/StellaOps.Excititor.WebService.Tests.csproj --filter "Category=BatchIngestValidation" ingests mixed CycloneDX/CSAF/OpenVEX fixtures, asserts /vex/raw parity, confirms ingestion_write_total tags, and checks /aoc/verify output—run after touching ingest/telemetry code.
  • Performance: 1M-row observation/linkset export timing; memory ceilings; stream correctness.
  • Determinism: same inputs + policy → identical linkset hashes, conflict payloads, optional consensusDigest, and export bytes.
  • API contract tests: pagination, filters, RBAC, rate limits.

15) Integration points

  • Backend Policy Engine (in Scanner.WebService): calls POST /excititor/resolve (scope vex.read) with batched (purl, vulnId) pairs to fetch rollupStatus + sources.
  • Concelier: provides alias graph (CVE↔vendor IDs) and may supply VEXadjacent metadata (e.g., KEV flag) for policy escalation.
  • UI: VEX explorer screens use /observations/search, /linksets/search, and /consensus/search; show conflicts & provenance.
  • CLI: stella vex linksets export --since 7d --out vex-linksets.json (optionally --include-consensus) for audits and Offline Kit parity.

16) Failure modes & fallback

  • Provider unreachable: stale thresholds trigger warnings; policy can downweight stale providers automatically (freshness factor).
  • Signature outage: continue to ingest but mark signatureState.verified=false; consensus will likely exclude or downweight per policy.
  • Schema drift: unknown fields are preserved as evidence; normalization rejects only on invalid identity or status.

17) Rollout plan (incremental)

  1. MVP: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + /excititor/resolve.
  2. Signature policies: PGP for distros; cosign for OCI.
  3. Exports + optional attestation.
  4. CycloneDX VEX connectors; platform claim expansion tables; UI explorer.
  5. Scale hardening: export indexes; conflict analytics.

18) Operational runbooks

  • Statement backfill — see docs/dev/EXCITITOR_STATEMENT_BACKFILL.md for the CLI workflow, required permissions, observability guidance, and rollback steps.

19) Appendix — canonical JSON (stable ordering)

All exports and consensus entries are serialized via VexCanonicalJsonSerializer:

  • UTF8 without BOM;
  • keys sorted (ASCII);
  • arrays sorted by (providerId, vulnId, productKey, lastObserved) unless semantic order mandated;
  • timestamps in YYYYMMDDThh:mm:ssZ;
  • no insignificant whitespace.