VexHub Architecture

Scope. Architecture and operational contract for the VexHub aggregation service that normalizes, validates, and distributes VEX statements with deterministic, offline-friendly outputs.

1) Purpose

VexHub collects VEX statements from multiple upstream sources, validates and normalizes them, detects conflicts, and exposes a distribution API for internal services and external tools (Trivy/Grype). It is the canonical aggregation layer that feeds VexLens trust scoring and Policy Engine decisioning.

2) Responsibilities

Scheduled ingestion of upstream VEX sources (connectors + mirrored feeds).
Canonical normalization to OpenVEX-compatible structures.
Validation pipeline (schema + signature/provenance checks).
Conflict detection and provenance capture.
Distribution API for CVE/PURL/source queries and bulk exports.

Non-goals: policy decisioning (Policy Engine), consensus computation (VexLens), raw ingestion guardrails (Excititor AOC).

3) Component Model

VexHub.WebService: Minimal API host for distribution endpoints and admin controls.
VexHub.Worker: Background workers for ingestion schedules and validation pipelines.
Normalization Pipeline: Canonicalizes statements, deduplicates, and links provenance.
Validation Pipeline: Schema validation (OpenVEX/CycloneDX/CSAF) and signature checks.
Storage: PostgreSQL schema vexhub for normalized statements, provenance, conflicts, and export cursors.

4) Data Model (Draft)

vexhub.statement
- id, source_id, vuln_id, product_key, status, justification, timestamp, statement_hash
vexhub.provenance
- statement_id, issuer, signature_valid, signature_ref, source_uri, ingested_at
vexhub.conflict
- vuln_id, product_key, statement_ids[], detected_at, reason
vexhub.export_cursor
- source_id, last_exported_at, snapshot_hash

All tables must include tenant_id, UTC timestamps, and deterministic ordering keys.

5) API Surface (Draft)

GET /api/v1/vex/cve/{cve-id}
GET /api/v1/vex/package/{purl}
GET /api/v1/vex/source/{source-id}
GET /api/v1/vex/export (bulk OpenVEX feed)
GET /api/v1/vex/index (vex-index.json)

Responses are deterministic: stable ordering by timestamp DESC, then source_id ASC, then statement_hash ASC.

6) Determinism & Offline Posture

Ingestion runs against frozen snapshots where possible; all outputs include snapshot_hash.
Canonical JSON serialization with stable key ordering.
No network egress outside configured connectors (sealed mode supported).
Bulk exports are immutable and content-addressed.

7) Security & Auth

API access requires Authority scopes (vexhub.read, vexhub.admin).
Signature verification follows issuer registry rules; failures are surfaced as metadata, not silent drops.
Rate limiting enforced at API gateway and per-client tokens.

8) Observability

Metrics: vexhub_ingest_total, vexhub_validation_failures_total, vexhub_conflicts_total, vexhub_export_duration_seconds.
Logs: include tenant_id, source_id, statement_hash, and trace_id.
Traces: spans for ingestion, normalization, validation, export.

9) Integration Points

Excititor: upstream connectors provide source payloads and trust hints.
VexLens: consumes normalized statements and provenance for trust scoring and consensus.
Policy Engine: reads VexLens consensus results; VexHub provides external distribution.
UI: VEX conflict studio consumes conflict API once available.

10) Testing Strategy

Unit tests for normalization and validation pipelines.
Integration tests with Postgres for ingestion and API outputs.
Determinism tests comparing repeated exports with identical inputs.

Last updated: 2025-12-22.

3.7 KiB Raw Blame History