Refactor code structure for improved readability and maintainability; optimize performance in key functions.

2025-12-22 19:06:31 +02:00
parent dfaa2079aa
commit 4602ccc3a3
1444 changed files with 109919 additions and 8058 deletions
--- a/docs/modules/vexhub/architecture.md
+++ b/docs/modules/vexhub/architecture.md
@@ -0,0 +1,72 @@
+# VexHub Architecture
+
+> **Scope.** Architecture and operational contract for the VexHub aggregation service that normalizes, validates, and distributes VEX statements with deterministic, offline-friendly outputs.
+
+## 1) Purpose
+VexHub collects VEX statements from multiple upstream sources, validates and normalizes them, detects conflicts, and exposes a distribution API for internal services and external tools (Trivy/Grype). It is the canonical aggregation layer that feeds VexLens trust scoring and Policy Engine decisioning.
+
+## 2) Responsibilities
+- Scheduled ingestion of upstream VEX sources (connectors + mirrored feeds).
+- Canonical normalization to OpenVEX-compatible structures.
+- Validation pipeline (schema + signature/provenance checks).
+- Conflict detection and provenance capture.
+- Distribution API for CVE/PURL/source queries and bulk exports.
+
+Non-goals: policy decisioning (Policy Engine), consensus computation (VexLens), raw ingestion guardrails (Excititor AOC).
+
+## 3) Component Model
+- **VexHub.WebService**: Minimal API host for distribution endpoints and admin controls.
+- **VexHub.Worker**: Background workers for ingestion schedules and validation pipelines.
+- **Normalization Pipeline**: Canonicalizes statements, deduplicates, and links provenance.
+- **Validation Pipeline**: Schema validation (OpenVEX/CycloneDX/CSAF) and signature checks.
+- **Storage**: PostgreSQL schema `vexhub` for normalized statements, provenance, conflicts, and export cursors.
+
+## 4) Data Model (Draft)
+- `vexhub.statement`
+  - `id`, `source_id`, `vuln_id`, `product_key`, `status`, `justification`, `timestamp`, `statement_hash`
+- `vexhub.provenance`
+  - `statement_id`, `issuer`, `signature_valid`, `signature_ref`, `source_uri`, `ingested_at`
+- `vexhub.conflict`
+  - `vuln_id`, `product_key`, `statement_ids[]`, `detected_at`, `reason`
+- `vexhub.export_cursor`
+  - `source_id`, `last_exported_at`, `snapshot_hash`
+
+All tables must include `tenant_id`, UTC timestamps, and deterministic ordering keys.
+
+## 5) API Surface (Draft)
+- `GET /api/v1/vex/cve/{cve-id}`
+- `GET /api/v1/vex/package/{purl}`
+- `GET /api/v1/vex/source/{source-id}`
+- `GET /api/v1/vex/export` (bulk OpenVEX feed)
+- `GET /api/v1/vex/index` (vex-index.json)
+
+Responses are deterministic: stable ordering by `timestamp DESC`, then `source_id ASC`, then `statement_hash ASC`.
+
+## 6) Determinism & Offline Posture
+- Ingestion runs against frozen snapshots where possible; all outputs include `snapshot_hash`.
+- Canonical JSON serialization with stable key ordering.
+- No network egress outside configured connectors (sealed mode supported).
+- Bulk exports are immutable and content-addressed.
+
+## 7) Security & Auth
+- API access requires Authority scopes (`vexhub.read`, `vexhub.admin`).
+- Signature verification follows issuer registry rules; failures are surfaced as metadata, not silent drops.
+- Rate limiting enforced at API gateway and per-client tokens.
+
+## 8) Observability
+- Metrics: `vexhub_ingest_total`, `vexhub_validation_failures_total`, `vexhub_conflicts_total`, `vexhub_export_duration_seconds`.
+- Logs: include `tenant_id`, `source_id`, `statement_hash`, and `trace_id`.
+- Traces: spans for ingestion, normalization, validation, export.
+
+## 9) Integration Points
+- **Excititor**: upstream connectors provide source payloads and trust hints.
+- **VexLens**: consumes normalized statements and provenance for trust scoring and consensus.
+- **Policy Engine**: reads VexLens consensus results; VexHub provides external distribution.
+- **UI**: VEX conflict studio consumes conflict API once available.
+
+## 10) Testing Strategy
+- Unit tests for normalization and validation pipelines.
+- Integration tests with Postgres for ingestion and API outputs.
+- Determinism tests comparing repeated exports with identical inputs.
+
+*Last updated: 2025-12-22.*