Files
git.stella-ops.org/docs/modules/vexhub/architecture.md

3.7 KiB

VexHub Architecture

Scope. Architecture and operational contract for the VexHub aggregation service that normalizes, validates, and distributes VEX statements with deterministic, offline-friendly outputs.

1) Purpose

VexHub collects VEX statements from multiple upstream sources, validates and normalizes them, detects conflicts, and exposes a distribution API for internal services and external tools (Trivy/Grype). It is the canonical aggregation layer that feeds VexLens trust scoring and Policy Engine decisioning.

2) Responsibilities

  • Scheduled ingestion of upstream VEX sources (connectors + mirrored feeds).
  • Canonical normalization to OpenVEX-compatible structures.
  • Validation pipeline (schema + signature/provenance checks).
  • Conflict detection and provenance capture.
  • Distribution API for CVE/PURL/source queries and bulk exports.

Non-goals: policy decisioning (Policy Engine), consensus computation (VexLens), raw ingestion guardrails (Excititor AOC).

3) Component Model

  • VexHub.WebService: Minimal API host for distribution endpoints and admin controls.
  • VexHub.Worker: Background workers for ingestion schedules and validation pipelines.
  • Normalization Pipeline: Canonicalizes statements, deduplicates, and links provenance.
  • Validation Pipeline: Schema validation (OpenVEX/CycloneDX/CSAF) and signature checks.
  • Storage: PostgreSQL schema vexhub for normalized statements, provenance, conflicts, and export cursors.

4) Data Model (Draft)

  • vexhub.statement
    • id, source_id, vuln_id, product_key, status, justification, timestamp, statement_hash
  • vexhub.provenance
    • statement_id, issuer, signature_valid, signature_ref, source_uri, ingested_at
  • vexhub.conflict
    • vuln_id, product_key, statement_ids[], detected_at, reason
  • vexhub.export_cursor
    • source_id, last_exported_at, snapshot_hash

All tables must include tenant_id, UTC timestamps, and deterministic ordering keys.

5) API Surface (Draft)

  • GET /api/v1/vex/cve/{cve-id}
  • GET /api/v1/vex/package/{purl}
  • GET /api/v1/vex/source/{source-id}
  • GET /api/v1/vex/export (bulk OpenVEX feed)
  • GET /api/v1/vex/index (vex-index.json)

Responses are deterministic: stable ordering by timestamp DESC, then source_id ASC, then statement_hash ASC.

6) Determinism & Offline Posture

  • Ingestion runs against frozen snapshots where possible; all outputs include snapshot_hash.
  • Canonical JSON serialization with stable key ordering.
  • No network egress outside configured connectors (sealed mode supported).
  • Bulk exports are immutable and content-addressed.

7) Security & Auth

  • API access requires Authority scopes (vexhub.read, vexhub.admin).
  • Signature verification follows issuer registry rules; failures are surfaced as metadata, not silent drops.
  • Rate limiting enforced at API gateway and per-client tokens.

8) Observability

  • Metrics: vexhub_ingest_total, vexhub_validation_failures_total, vexhub_conflicts_total, vexhub_export_duration_seconds.
  • Logs: include tenant_id, source_id, statement_hash, and trace_id.
  • Traces: spans for ingestion, normalization, validation, export.

9) Integration Points

  • Excititor: upstream connectors provide source payloads and trust hints.
  • VexLens: consumes normalized statements and provenance for trust scoring and consensus.
  • Policy Engine: reads VexLens consensus results; VexHub provides external distribution.
  • UI: VEX conflict studio consumes conflict API once available.

10) Testing Strategy

  • Unit tests for normalization and validation pipelines.
  • Integration tests with Postgres for ingestion and API outputs.
  • Determinism tests comparing repeated exports with identical inputs.

Last updated: 2025-12-22.