Restructure solution layout by module

2025-10-28 15:10:40 +02:00
parent 4e3e575db5
commit 68da90a11a
4103 changed files with 192899 additions and 187024 deletions
--- a/docs/ARCHITECTURE_VEXER.md
+++ b/docs/ARCHITECTURE_VEXER.md
@@ -1,463 +1,463 @@
-# component_architecture_vexer.md — **Stella Ops Vexer** (2025Q4)
-
-> **Scope.** This document specifies the **Vexer** service: its purpose, trust model, data structures, APIs, plug‑in contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Feedser, and the attestation chain. It is implementation‑ready.
-
---
-
-## 0) Mission & role in the platform
-
-**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress non‑exploitable findings, prioritize remaining risk, and explain decisions.
-
-**Boundaries.**
-
-* Vexer **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
-* Vexer preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
-* VEX consumption is **backend‑only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Vexer for status evidence and then decides what to show.
-
---
-
-## 1) Inputs, outputs & canonical domain
-
-### 1.1 Accepted input formats (ingest)
-
-* **OpenVEX** JSON documents (attested or raw).
-* **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF).
-* **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks).
-* **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors.
-
-All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
-
-### 1.2 Canonical model (normalized)
-
-Every incoming statement becomes a set of **VexClaim** records:
-
-```
-VexClaim
- providerId           // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
- vulnId               // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
- productKey           // canonical product identity (see §2.2)
- status               // affected | not_affected | fixed | under_investigation
- justification?       // for 'not_affected'/'affected' where provided
- introducedVersion?   // semantics per provider (range or exact)
- fixedVersion?        // where provided (range or exact)
- lastObserved         // timestamp from source or fetch time
- provenance           // doc digest, signature status, fetch URI, line/offset anchors
- evidence[]           // raw source snippets for explainability
- supersedes?          // optional cross-doc chain (docDigest → docDigest)
-```
-
-### 1.3 Exports (consumption)
-
-* **VexConsensus** per `(vulnId, productKey)` with:
-
-  * `rollupStatus` (after policy weights/justification gates),
-  * `sources[]` (winning + losing claims with weights & reasons),
-  * `policyRevisionId` (identifier of the Vexer policy used),
-  * `consensusDigest` (stable SHA‑256 over canonical JSON).
-* **Raw claims** export for auditing (unchanged, with provenance).
-* **Provider snapshots** (per source, last N days) for operator debugging.
-* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
-
-All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
-
---
-
-## 2) Identity model — products & joins
-
-### 2.1 Vuln identity
-
-* Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets.
-* **Alias graph** maintained (from Feedser) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable.
-
-### 2.2 Product identity (`productKey`)
-
-* **Primary:** `purl` (Package URL).
-* **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable.
-* **Fallback:** `oci:<registry>/<repo>@<digest>` for image‑level VEX.
-* **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`).
-
-> Vexer does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping.
-
---
-
-## 3) Storage schema (MongoDB)
-
-Database: `vexer`
-
-### 3.1 Collections
-
-**`vex.providers`**
-
-```
-_id: providerId
-name, homepage, contact
-trustTier: enum {vendor, distro, platform, hub, attestation}
-signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
-fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
-enabled: bool
-createdAt, modifiedAt
-```
-
-**`vex.raw`** (immutable raw documents)
-
-```
-_id: sha256(doc bytes)
-providerId
-uri
-ingestedAt
-contentType
-sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
-payload: GridFS pointer (if large)
-disposition: kept|replaced|superseded
-correlation: { replaces?: sha256, replacedBy?: sha256 }
-```
-
-**`vex.claims`** (normalized rows; dedupe on providerId+vulnId+productKey+docDigest)
-
-```
-_id
-providerId
-vulnId
-productKey
-status
-justification?
-introducedVersion?
-fixedVersion?
-lastObserved
-docDigest
-provenance { uri, line?, pointer?, signatureState }
-evidence[] { key, value, locator }
-indices: 
-  - {vulnId:1, productKey:1}
-  - {providerId:1, lastObserved:-1}
-  - {status:1}
-  - text index (optional) on evidence.value for debugging
-```
-
-**`vex.consensus`** (rollups)
-
-```
-_id: sha256(canonical(vulnId, productKey, policyRevision))
-vulnId
-productKey
-rollupStatus
-sources[]: [
-  { providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
-]
-policyRevisionId
-evaluatedAt
-consensusDigest  // same as _id
-indices:
-  - {vulnId:1, productKey:1}
-  - {policyRevisionId:1, evaluatedAt:-1}
-```
-
-**`vex.exports`** (manifest of emitted artifacts)
-
-```
-_id
-querySignature
-format: raw|consensus|index
-artifactSha256
-rekor { uuid, index, url }?
-createdAt
-policyRevisionId
-cacheable: bool
-```
-
-**`vex.cache`**
-
-```
-querySignature -> exportId (for fast reuse)
-ttl, hits
-```
-
-**`vex.migrations`**
-
-* ordered migrations applied at bootstrap to ensure indexes.
-
-### 3.2 Indexing strategy
-
-* Hot path queries use exact `(vulnId, productKey)` and time‑bounded windows; compound indexes cover both.
-* Providers list view by `lastObserved` for monitoring staleness.
-* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
-
---
-
-## 4) Ingestion pipeline
-
-### 4.1 Connector contract
-
-```csharp
-public interface IVexConnector
-{
-    string ProviderId { get; }
-    Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs
-    Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
-}
-```
-
-* **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff.
-* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
-
-### 4.2 Signature verification (per provider)
-
-* **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles.
-* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
-* **x509** (mutual TLS / provider‑pinned certs) where applicable.
-* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
-
-> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can down‑weight or ignore them.
-
-### 4.3 Time discipline
-
-* For each doc, prefer **provider’s document timestamp**; if absent, use fetch time.
-* Claims carry `lastObserved` which drives **tie‑breaking** within equal weight tiers.
-
---
-
-## 5) Normalization: product & status semantics
-
-### 5.1 Product mapping
-
-* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
-* Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
-* If expansion would be speculative, the claim remains **platform‑scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non‑joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
-
-### 5.2 Status + justification mapping
-
-* Canonical **status**: `affected | not_affected | fixed | under_investigation`.
-* **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.:
-
-  * `component_not_present`
-  * `vulnerable_code_not_in_execute_path`
-  * `vulnerable_configuration_unused`
-  * `inline_mitigation_applied`
-  * `fix_available` (with `fixedVersion`)
-  * `under_investigation`
-* Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`.
-
---
-
-## 6) Consensus algorithm
-
-**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
-
-### 6.1 Inputs
-
-* Set **S** of `VexClaim` for the key.
-* **Vexer policy snapshot**:
-
-  * **weights** per provider tier and per provider overrides.
-  * **justification gates** (e.g., require justification for `not_affected` to be acceptable).
-  * **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros).
-  * **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered).
-
-### 6.2 Steps
-
-1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
-2. **Score** each claim:
-   `score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
-3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
-4. **Pick** `rollupStatus = argmax_status W(status)`.
-5. **Tie‑breakers** (in order):
-
-   * Higher **max single** provider score wins (vendor > distro > platform > hub).
-   * More **recent** lastObserved wins.
-   * Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
-6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
-
-> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
-
---
-
-## 7) Query & export APIs
-
-All endpoints are versioned under `/api/v1/vex`.
-
-### 7.1 Query (online)
-
-```
-POST /claims/search
-  body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
-  → { claims[], nextPageToken? }
-
-POST /consensus/search
-  body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
-  → { entries[], nextPageToken? }
-
-POST /excititor/resolve (scope: vex.read)
-  body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
-  → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] }
-```
-
-### 7.2 Exports (cacheable snapshots)
-
-```
-POST /exports
-  body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
-  → { exportId, artifactSha256, rekor? }
-
-GET  /exports/{exportId}        → bytes (application/json or binary index)
-GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }
-```
-
-### 7.3 Provider operations
-
-```
-GET  /providers                  → provider list & signature policy
-POST /providers/{id}/refresh     → trigger fetch/normalize window
-GET  /providers/{id}/status      → last fetch, doc counts, signature stats
-```
-
-**Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC.
-
---
-
-## 8) Attestation integration
-
-* Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines).
-* `vex.exports.rekor` stores `{uuid, index, url}` when present.
-* **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields:
-
-  * `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`.
-
---
-
-## 9) Configuration (YAML)
-
-```yaml
-vexer:
-  mongo: { uri: "mongodb://mongo/vexer" }
-  s3:
-    endpoint: http://minio:9000
-    bucket: stellaops
-  policy:
-    weights:
-      vendor: 1.0
-      distro: 0.9
-      platform: 0.7
-      hub: 0.5
-      attestation: 0.6
-    providerOverrides:
-      redhat: 1.0
-      suse: 0.95
-    requireJustificationForNotAffected: true
-    signatureRequiredForFixed: true
-    minEvidence:
-      not_affected:
-        vendorOrTwoDistros: true
-  connectors:
-    - providerId: redhat
-      kind: csaf
-      baseUrl: https://access.redhat.com/security/data/csaf/v2/
-      signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
-      windowDays: 7
-    - providerId: suse
-      kind: csaf
-      baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
-      signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
-    - providerId: ubuntu
-      kind: openvex
-      baseUrl: https://…/vex/
-      signaturePolicy: { type: none }
-    - providerId: vendorX
-      kind: cyclonedx-vex
-      ociRef: ghcr.io/vendorx/vex@sha256:…
-      signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }
-```
-
---
-
-## 10) Security model
-
-* **Input signature verification** enforced per provider policy (PGP, cosign, x509).
-* **Connector allowlists**: outbound fetch constrained to configured domains.
-* **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies.
-* **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
-* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
-
---
-
-## 11) Performance & scale
-
-* **Targets:**
-
-  * Normalize 10k VEX claims/minute/core.
-  * Consensus compute ≤ 50 ms for 1k unique `(vuln, product)` pairs in hot cache.
-  * Export (consensus) 1M rows in ≤ 60 s on 8 cores with streaming writer.
-
-* **Scaling:**
-
-  * WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys.
-  * Exports stream straight to S3 (MinIO) with rolling buffers.
-
-* **Caching:**
-
-  * `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`.
-
---
-
-## 12) Observability
-
-* **Metrics:**
-
-  * `vex.ingest.docs_total{provider}`
-  * `vex.normalize.claims_total{provider}`
-  * `vex.signature.failures_total{provider,method}`
-  * `vex.consensus.conflicts_total{vulnId}`
-  * `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
-* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
-* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hit‑rate.
-
---
-
-## 13) Testing matrix
-
-* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
-* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
-* **Normalization edge cases:** platform‑only claims, free‑text justifications, non‑purl products.
-* **Consensus:** conflict scenarios across tiers; check tie‑breakers; justification gates.
-* **Performance:** 1M‑row export timing; memory ceilings; stream correctness.
-* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
-* **API contract tests:** pagination, filters, RBAC, rate limits.
-
---
-
-## 14) Integration points
-
-* **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
-* **Feedser**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation.
-* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
-* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
-
---
-
-## 15) Failure modes & fallback
-
-* **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor).
-* **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy.
-* **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**.
-
---
-
-## 16) Rollout plan (incremental)
-
-1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`.
-2. **Signature policies**: PGP for distros; cosign for OCI.
-3. **Exports + optional attestation**.
-4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer.
-5. **Scale hardening**: export indexes; conflict analytics.
-
---
-
-## 17) Appendix — canonical JSON (stable ordering)
-
-All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`:
-
-* UTF‑8 without BOM;
-* keys sorted (ASCII);
-* arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated;
-* timestamps in `YYYY‑MM‑DDThh:mm:ssZ`;
-* no insignificant whitespace.
-
+# component_architecture_vexer.md — **Stella Ops Vexer** (2025Q4)
+
+> **Scope.** This document specifies the **Vexer** service: its purpose, trust model, data structures, APIs, plug‑in contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Feedser, and the attestation chain. It is implementation‑ready.
+
+---
+
+## 0) Mission & role in the platform
+
+**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress non‑exploitable findings, prioritize remaining risk, and explain decisions.
+
+**Boundaries.**
+
+* Vexer **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
+* Vexer preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
+* VEX consumption is **backend‑only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Vexer for status evidence and then decides what to show.
+
+---
+
+## 1) Inputs, outputs & canonical domain
+
+### 1.1 Accepted input formats (ingest)
+
+* **OpenVEX** JSON documents (attested or raw).
+* **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF).
+* **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks).
+* **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors.
+
+All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
+
+### 1.2 Canonical model (normalized)
+
+Every incoming statement becomes a set of **VexClaim** records:
+
+```
+VexClaim
+- providerId           // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
+- vulnId               // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
+- productKey           // canonical product identity (see §2.2)
+- status               // affected | not_affected | fixed | under_investigation
+- justification?       // for 'not_affected'/'affected' where provided
+- introducedVersion?   // semantics per provider (range or exact)
+- fixedVersion?        // where provided (range or exact)
+- lastObserved         // timestamp from source or fetch time
+- provenance           // doc digest, signature status, fetch URI, line/offset anchors
+- evidence[]           // raw source snippets for explainability
+- supersedes?          // optional cross-doc chain (docDigest → docDigest)
+```
+
+### 1.3 Exports (consumption)
+
+* **VexConsensus** per `(vulnId, productKey)` with:
+
+  * `rollupStatus` (after policy weights/justification gates),
+  * `sources[]` (winning + losing claims with weights & reasons),
+  * `policyRevisionId` (identifier of the Vexer policy used),
+  * `consensusDigest` (stable SHA‑256 over canonical JSON).
+* **Raw claims** export for auditing (unchanged, with provenance).
+* **Provider snapshots** (per source, last N days) for operator debugging.
+* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
+
+All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
+
+---
+
+## 2) Identity model — products & joins
+
+### 2.1 Vuln identity
+
+* Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets.
+* **Alias graph** maintained (from Feedser) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable.
+
+### 2.2 Product identity (`productKey`)
+
+* **Primary:** `purl` (Package URL).
+* **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable.
+* **Fallback:** `oci:<registry>/<repo>@<digest>` for image‑level VEX.
+* **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`).
+
+> Vexer does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping.
+
+---
+
+## 3) Storage schema (MongoDB)
+
+Database: `vexer`
+
+### 3.1 Collections
+
+**`vex.providers`**
+
+```
+_id: providerId
+name, homepage, contact
+trustTier: enum {vendor, distro, platform, hub, attestation}
+signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
+fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
+enabled: bool
+createdAt, modifiedAt
+```
+
+**`vex.raw`** (immutable raw documents)
+
+```
+_id: sha256(doc bytes)
+providerId
+uri
+ingestedAt
+contentType
+sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
+payload: GridFS pointer (if large)
+disposition: kept|replaced|superseded
+correlation: { replaces?: sha256, replacedBy?: sha256 }
+```
+
+**`vex.claims`** (normalized rows; dedupe on providerId+vulnId+productKey+docDigest)
+
+```
+_id
+providerId
+vulnId
+productKey
+status
+justification?
+introducedVersion?
+fixedVersion?
+lastObserved
+docDigest
+provenance { uri, line?, pointer?, signatureState }
+evidence[] { key, value, locator }
+indices: 
+  - {vulnId:1, productKey:1}
+  - {providerId:1, lastObserved:-1}
+  - {status:1}
+  - text index (optional) on evidence.value for debugging
+```
+
+**`vex.consensus`** (rollups)
+
+```
+_id: sha256(canonical(vulnId, productKey, policyRevision))
+vulnId
+productKey
+rollupStatus
+sources[]: [
+  { providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
+]
+policyRevisionId
+evaluatedAt
+consensusDigest  // same as _id
+indices:
+  - {vulnId:1, productKey:1}
+  - {policyRevisionId:1, evaluatedAt:-1}
+```
+
+**`vex.exports`** (manifest of emitted artifacts)
+
+```
+_id
+querySignature
+format: raw|consensus|index
+artifactSha256
+rekor { uuid, index, url }?
+createdAt
+policyRevisionId
+cacheable: bool
+```
+
+**`vex.cache`**
+
+```
+querySignature -> exportId (for fast reuse)
+ttl, hits
+```
+
+**`vex.migrations`**
+
+* ordered migrations applied at bootstrap to ensure indexes.
+
+### 3.2 Indexing strategy
+
+* Hot path queries use exact `(vulnId, productKey)` and time‑bounded windows; compound indexes cover both.
+* Providers list view by `lastObserved` for monitoring staleness.
+* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
+
+---
+
+## 4) Ingestion pipeline
+
+### 4.1 Connector contract
+
+```csharp
+public interface IVexConnector
+{
+    string ProviderId { get; }
+    Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs
+    Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
+}
+```
+
+* **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff.
+* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
+
+### 4.2 Signature verification (per provider)
+
+* **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles.
+* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
+* **x509** (mutual TLS / provider‑pinned certs) where applicable.
+* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
+
+> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can down‑weight or ignore them.
+
+### 4.3 Time discipline
+
+* For each doc, prefer **provider’s document timestamp**; if absent, use fetch time.
+* Claims carry `lastObserved` which drives **tie‑breaking** within equal weight tiers.
+
+---
+
+## 5) Normalization: product & status semantics
+
+### 5.1 Product mapping
+
+* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
+* Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
+* If expansion would be speculative, the claim remains **platform‑scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non‑joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
+
+### 5.2 Status + justification mapping
+
+* Canonical **status**: `affected | not_affected | fixed | under_investigation`.
+* **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.:
+
+  * `component_not_present`
+  * `vulnerable_code_not_in_execute_path`
+  * `vulnerable_configuration_unused`
+  * `inline_mitigation_applied`
+  * `fix_available` (with `fixedVersion`)
+  * `under_investigation`
+* Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`.
+
+---
+
+## 6) Consensus algorithm
+
+**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
+
+### 6.1 Inputs
+
+* Set **S** of `VexClaim` for the key.
+* **Vexer policy snapshot**:
+
+  * **weights** per provider tier and per provider overrides.
+  * **justification gates** (e.g., require justification for `not_affected` to be acceptable).
+  * **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros).
+  * **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered).
+
+### 6.2 Steps
+
+1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
+2. **Score** each claim:
+   `score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
+3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
+4. **Pick** `rollupStatus = argmax_status W(status)`.
+5. **Tie‑breakers** (in order):
+
+   * Higher **max single** provider score wins (vendor > distro > platform > hub).
+   * More **recent** lastObserved wins.
+   * Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
+6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
+
+> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
+
+---
+
+## 7) Query & export APIs
+
+All endpoints are versioned under `/api/v1/vex`.
+
+### 7.1 Query (online)
+
+```
+POST /claims/search
+  body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
+  → { claims[], nextPageToken? }
+
+POST /consensus/search
+  body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
+  → { entries[], nextPageToken? }
+
+POST /excititor/resolve (scope: vex.read)
+  body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
+  → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] }
+```
+
+### 7.2 Exports (cacheable snapshots)
+
+```
+POST /exports
+  body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
+  → { exportId, artifactSha256, rekor? }
+
+GET  /exports/{exportId}        → bytes (application/json or binary index)
+GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }
+```
+
+### 7.3 Provider operations
+
+```
+GET  /providers                  → provider list & signature policy
+POST /providers/{id}/refresh     → trigger fetch/normalize window
+GET  /providers/{id}/status      → last fetch, doc counts, signature stats
+```
+
+**Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC.
+
+---
+
+## 8) Attestation integration
+
+* Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines).
+* `vex.exports.rekor` stores `{uuid, index, url}` when present.
+* **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields:
+
+  * `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`.
+
+---
+
+## 9) Configuration (YAML)
+
+```yaml
+vexer:
+  mongo: { uri: "mongodb://mongo/vexer" }
+  s3:
+    endpoint: http://minio:9000
+    bucket: stellaops
+  policy:
+    weights:
+      vendor: 1.0
+      distro: 0.9
+      platform: 0.7
+      hub: 0.5
+      attestation: 0.6
+    providerOverrides:
+      redhat: 1.0
+      suse: 0.95
+    requireJustificationForNotAffected: true
+    signatureRequiredForFixed: true
+    minEvidence:
+      not_affected:
+        vendorOrTwoDistros: true
+  connectors:
+    - providerId: redhat
+      kind: csaf
+      baseUrl: https://access.redhat.com/security/data/csaf/v2/
+      signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
+      windowDays: 7
+    - providerId: suse
+      kind: csaf
+      baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
+      signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
+    - providerId: ubuntu
+      kind: openvex
+      baseUrl: https://…/vex/
+      signaturePolicy: { type: none }
+    - providerId: vendorX
+      kind: cyclonedx-vex
+      ociRef: ghcr.io/vendorx/vex@sha256:…
+      signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }
+```
+
+---
+
+## 10) Security model
+
+* **Input signature verification** enforced per provider policy (PGP, cosign, x509).
+* **Connector allowlists**: outbound fetch constrained to configured domains.
+* **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies.
+* **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
+* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
+
+---
+
+## 11) Performance & scale
+
+* **Targets:**
+
+  * Normalize 10k VEX claims/minute/core.
+  * Consensus compute ≤ 50 ms for 1k unique `(vuln, product)` pairs in hot cache.
+  * Export (consensus) 1M rows in ≤ 60 s on 8 cores with streaming writer.
+
+* **Scaling:**
+
+  * WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys.
+  * Exports stream straight to S3 (MinIO) with rolling buffers.
+
+* **Caching:**
+
+  * `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`.
+
+---
+
+## 12) Observability
+
+* **Metrics:**
+
+  * `vex.ingest.docs_total{provider}`
+  * `vex.normalize.claims_total{provider}`
+  * `vex.signature.failures_total{provider,method}`
+  * `vex.consensus.conflicts_total{vulnId}`
+  * `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
+* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
+* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hit‑rate.
+
+---
+
+## 13) Testing matrix
+
+* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
+* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
+* **Normalization edge cases:** platform‑only claims, free‑text justifications, non‑purl products.
+* **Consensus:** conflict scenarios across tiers; check tie‑breakers; justification gates.
+* **Performance:** 1M‑row export timing; memory ceilings; stream correctness.
+* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
+* **API contract tests:** pagination, filters, RBAC, rate limits.
+
+---
+
+## 14) Integration points
+
+* **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
+* **Feedser**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation.
+* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
+* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
+
+---
+
+## 15) Failure modes & fallback
+
+* **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor).
+* **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy.
+* **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**.
+
+---
+
+## 16) Rollout plan (incremental)
+
+1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`.
+2. **Signature policies**: PGP for distros; cosign for OCI.
+3. **Exports + optional attestation**.
+4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer.
+5. **Scale hardening**: export indexes; conflict analytics.
+
+---
+
+## 17) Appendix — canonical JSON (stable ordering)
+
+All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`:
+
+* UTF‑8 without BOM;
+* keys sorted (ASCII);
+* arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated;
+* timestamps in `YYYY‑MM‑DDThh:mm:ssZ`;
+* no insignificant whitespace.
+