Files

master 0137856fdb Rename Feedser to Concelier

2025-10-18 20:46:16 +03:00

18 KiB

Raw Blame History

component_architecture_excititor.md — Stella Ops Excititor (2025Q4)

Scope. This document specifies the Excititor service: its purpose, trust model, data structures, APIs, plug‑in contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Concelier, and the attestation chain. It is implementation‑ready.

0) Mission & role in the platform

Mission. Convert heterogeneous VEX statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into canonical, queryable claims; compute deterministic consensus per (vuln, product); preserve conflicts with provenance; publish stable, attestable exports that the backend uses to suppress non‑exploitable findings, prioritize remaining risk, and explain decisions.

Boundaries.

Excititor does not decide PASS/FAIL. It supplies evidence (statuses + justifications + provenance weights).
Excititor preserves conflicting claims unchanged; consensus encodes how we would pick, but the raw set is always exportable.
VEX consumption is backend‑only: Scanner never applies VEX. The backend’s Policy Engine asks Excititor for status evidence and then decides what to show.

1) Inputs, outputs & canonical domain

1.1 Accepted input formats (ingest)

OpenVEX JSON documents (attested or raw).
CSAF VEX 2.x (vendor PSIRTs and distros commonly publish CSAF).
CycloneDX VEX 1.4+ (standalone VEX or embedded VEX blocks).
OCI‑attached attestations (VEX statements shipped as OCI referrers) — optional connectors.

All connectors register source metadata: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.

1.2 Canonical model (normalized)

Every incoming statement becomes a set of VexClaim records:

VexClaim
- providerId           // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
- vulnId               // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
- productKey           // canonical product identity (see §2.2)
- status               // affected | not_affected | fixed | under_investigation
- justification?       // for 'not_affected'/'affected' where provided
- introducedVersion?   // semantics per provider (range or exact)
- fixedVersion?        // where provided (range or exact)
- lastObserved         // timestamp from source or fetch time
- provenance           // doc digest, signature status, fetch URI, line/offset anchors
- evidence[]           // raw source snippets for explainability
- supersedes?          // optional cross-doc chain (docDigest → docDigest)

1.3 Exports (consumption)

VexConsensus per (vulnId, productKey) with:
- rollupStatus (after policy weights/justification gates),
- sources[] (winning + losing claims with weights & reasons),
- policyRevisionId (identifier of the Excititor policy used),
- consensusDigest (stable SHA‑256 over canonical JSON).
Raw claims export for auditing (unchanged, with provenance).
Provider snapshots (per source, last N days) for operator debugging.
Index optimized for backend joins: (productKey, vulnId) → (status, confidence, sourceSet).

All exports are deterministic, and (optionally) attested via DSSE and logged to Rekor v2.

2) Identity model — products & joins

2.1 Vuln identity

Accepts CVE, GHSA, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to vulnId with alias sets.
Alias graph maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to GHSA where applicable.

2.2 Product identity (`productKey`)

Primary: purl (Package URL).
Secondary links: cpe, OS package NVRA/EVR, NuGet/Maven/Golang identity, and OS package name when purl unavailable.
Fallback: oci:<registry>/<repo>@<digest> for image‑level VEX.
Special cases: kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical productKey).

Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native product string and mark the claim as non‑joinable; the backend will ignore it unless a policy explicitly whitelists that provider mapping.

3) Storage schema (MongoDB)

Database: excititor

3.1 Collections

vex.providers

_id: providerId
name, homepage, contact
trustTier: enum {vendor, distro, platform, hub, attestation}
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
enabled: bool
createdAt, modifiedAt

vex.raw (immutable raw documents)

_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }

vex.claims (normalized rows; dedupe on providerId+vulnId+productKey+docDigest)

_id
providerId
vulnId
productKey
status
justification?
introducedVersion?
fixedVersion?
lastObserved
docDigest
provenance { uri, line?, pointer?, signatureState }
evidence[] { key, value, locator }
indices: 
  - {vulnId:1, productKey:1}
  - {providerId:1, lastObserved:-1}
  - {status:1}
  - text index (optional) on evidence.value for debugging

vex.consensus (rollups)

_id: sha256(canonical(vulnId, productKey, policyRevision))
vulnId
productKey
rollupStatus
sources[]: [
  { providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
]
policyRevisionId
evaluatedAt
consensusDigest  // same as _id
indices:
  - {vulnId:1, productKey:1}
  - {policyRevisionId:1, evaluatedAt:-1}

vex.exports (manifest of emitted artifacts)

_id
querySignature
format: raw|consensus|index
artifactSha256
rekor { uuid, index, url }?
createdAt
policyRevisionId
cacheable: bool

vex.cache

querySignature -> exportId (for fast reuse)
ttl, hits

vex.migrations

ordered migrations applied at bootstrap to ensure indexes.

3.2 Indexing strategy

Hot path queries use exact (vulnId, productKey) and time‑bounded windows; compound indexes cover both.
Providers list view by lastObserved for monitoring staleness.
vex.consensus keyed by (vulnId, productKey, policyRevision) for deterministic reuse.

4) Ingestion pipeline

4.1 Connector contract

public interface IVexConnector
{
    string ProviderId { get; }
    Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs
    Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
}

Fetch must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff.
Normalize parses the format, validates schema, maps product identities deterministically, emits VexClaim records with provenance.

4.2 Signature verification (per provider)

cosign (keyless or keyful) for OCI referrers or HTTP‑served JSON with Sigstore bundles.
PGP (provider keyrings) for distro/vendor feeds that sign docs.
x509 (mutual TLS / provider‑pinned certs) where applicable.
Signature state is stored on vex.raw.sig and copied into provenance.signatureState on claims.

Claims from sources failing signature policy are marked "signatureState.verified=false" and policy can down‑weight or ignore them.

4.3 Time discipline

For each doc, prefer provider’s document timestamp; if absent, use fetch time.
Claims carry lastObserved which drives tie‑breaking within equal weight tiers.

5) Normalization: product & status semantics

5.1 Product mapping

purl first; cpe second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
Where a provider publishes platform‑level VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits evidence indicating the rule applied.
If expansion would be speculative, the claim remains platform‑scoped with productKey="platform:redhat:rhel:9" and is flagged non‑joinable; backend can decide to use platform VEX only when Scanner proves the platform runtime.

5.2 Status + justification mapping

Canonical status: affected | not_affected | fixed | under_investigation.
Justifications normalized to a controlled vocabulary (CISA‑aligned), e.g.:
- component_not_present
- vulnerable_code_not_in_execute_path
- vulnerable_configuration_unused
- inline_mitigation_applied
- fix_available (with fixedVersion)
- under_investigation
Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as evidence.

6) Consensus algorithm

Goal: produce a stable, explainable rollupStatus per (vulnId, productKey) given possibly conflicting claims.

6.1 Inputs

Set S of VexClaim for the key.
Excititor policy snapshot:
- weights per provider tier and per provider overrides.
- justification gates (e.g., require justification for not_affected to be acceptable).
- minEvidence rules (e.g., not_affected must come from ≥1 vendor or 2 distros).
- signature requirements (e.g., require verified signature for ‘fixed’ to be considered).

6.2 Steps

Filter invalid claims by signature policy & justification gates → set S'.
Score each claim: score = weight(provider) * freshnessFactor(lastObserved) where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
Aggregate scores per status: W(status) = Σ score(claims with that status).
Pick rollupStatus = argmax_status W(status).
Tie‑breakers (in order):
- Higher max single provider score wins (vendor > distro > platform > hub).
- More recent lastObserved wins.
- Deterministic lexicographic order of status (fixed > not_affected > under_investigation > affected) as final tiebreaker.
Explain: mark accepted sources (accepted=true; reason="weight"/"freshness"), mark rejected sources with explicit reason ("insufficient_justification", "signature_unverified", "lower_weight").

The algorithm is pure given S and policy snapshot; result is reproducible and hashed into consensusDigest.

7) Query & export APIs

All endpoints are versioned under /api/v1/vex.

7.1 Query (online)

POST /claims/search
  body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
  → { claims[], nextPageToken? }

POST /consensus/search
  body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
  → { entries[], nextPageToken? }

POST /resolve
  body: { purls: string[], vulnIds: string[], policyRevisionId?: string }
  → { results: [ { vulnId, productKey, rollupStatus, sources[] } ] }

7.2 Exports (cacheable snapshots)

POST /exports
  body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
  → { exportId, artifactSha256, rekor? }

GET  /exports/{exportId}        → bytes (application/json or binary index)
GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }

7.3 Provider operations

GET  /providers                  → provider list & signature policy
POST /providers/{id}/refresh     → trigger fetch/normalize window
GET  /providers/{id}/status      → last fetch, doc counts, signature stats

Auth: service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC.

8) Attestation integration

Exports can be DSSE‑signed via Signer and logged to Rekor v2 via Attestor (optional but recommended for regulated pipelines).
vex.exports.rekor stores {uuid, index, url} when present.
Predicate type: https://stella-ops.org/attestations/vex-export/1 with fields:
- querySignature, policyRevisionId, artifactSha256, createdAt.

9) Configuration (YAML)

excititor:
  mongo: { uri: "mongodb://mongo/excititor" }
  s3:
    endpoint: http://minio:9000
    bucket: stellaops
  policy:
    weights:
      vendor: 1.0
      distro: 0.9
      platform: 0.7
      hub: 0.5
      attestation: 0.6
    providerOverrides:
      redhat: 1.0
      suse: 0.95
    requireJustificationForNotAffected: true
    signatureRequiredForFixed: true
    minEvidence:
      not_affected:
        vendorOrTwoDistros: true
  connectors:
    - providerId: redhat
      kind: csaf
      baseUrl: https://access.redhat.com/security/data/csaf/v2/
      signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
      windowDays: 7
    - providerId: suse
      kind: csaf
      baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
      signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
    - providerId: ubuntu
      kind: openvex
      baseUrl: https://…/vex/
      signaturePolicy: { type: none }
    - providerId: vendorX
      kind: cyclonedx-vex
      ociRef: ghcr.io/vendorx/vex@sha256:…
      signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }

10) Security model

Input signature verification enforced per provider policy (PGP, cosign, x509).
Connector allowlists: outbound fetch constrained to configured domains.
Tenant isolation: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies.
AuthN/Z: Authority‑issued OpToks; RBAC roles (vex.read, vex.admin, vex.export).
No secrets in logs; deterministic logging contexts include providerId, docDigest, claim keys.

11) Performance & scale

Targets:
- Normalize 10k VEX claims/minute/core.
- Consensus compute ≤ 50 ms for 1k unique (vuln, product) pairs in hot cache.
- Export (consensus) 1M rows in ≤ 60 s on 8 cores with streaming writer.
Scaling:
- WebService handles control APIs; Worker background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys.
- Exports stream straight to S3 (MinIO) with rolling buffers.
Caching:
- vex.cache maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless force.

12) Observability

Metrics:
- vex.ingest.docs_total{provider}
- vex.normalize.claims_total{provider}
- vex.signature.failures_total{provider,method}
- vex.consensus.conflicts_total{vulnId}
- vex.exports.bytes{format} / vex.exports.latency_seconds
Tracing: spans for fetch, verify, parse, map, consensus, export.
Dashboards: provider staleness, top conflicting vulns/components, signature posture, export cache hit‑rate.

13) Testing matrix

Connectors: golden raw docs → deterministic claims (fixtures per provider/format).
Signature policies: valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
Normalization edge cases: platform‑only claims, free‑text justifications, non‑purl products.
Consensus: conflict scenarios across tiers; check tie‑breakers; justification gates.
Performance: 1M‑row export timing; memory ceilings; stream correctness.
Determinism: same inputs + policy → identical consensusDigest and export bytes.
API contract tests: pagination, filters, RBAC, rate limits.

14) Integration points

Backend Policy Engine (in Scanner.WebService): calls POST /resolve with batched (purl, vulnId) pairs to fetch rollupStatus + sources.
Concelier: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation.
UI: VEX explorer screens use /claims/search and /consensus/search; show conflicts & provenance.
CLI: stellaops vex export --consensus --since 7d --out vex.json for audits.

15) Failure modes & fallback

Provider unreachable: stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor).
Signature outage: continue to ingest but mark signatureState.verified=false; consensus will likely exclude or down‑weight per policy.
Schema drift: unknown fields are preserved as evidence; normalization rejects only on invalid identity or status.

16) Rollout plan (incremental)

MVP: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + /resolve.
Signature policies: PGP for distros; cosign for OCI.
Exports + optional attestation.
CycloneDX VEX connectors; platform claim expansion tables; UI explorer.
Scale hardening: export indexes; conflict analytics.

17) Appendix — canonical JSON (stable ordering)

All exports and consensus entries are serialized via VexCanonicalJsonSerializer:

UTF‑8 without BOM;
keys sorted (ASCII);
arrays sorted by (providerId, vulnId, productKey, lastObserved) unless semantic order mandated;
timestamps in YYYY‑MM‑DDThh:mm:ssZ;
no insignificant whitespace.

18 KiB Raw Blame History Unescape Escape

component_architecture_excititor.md — Stella Ops Excititor (2025Q4)