Restructure solution layout by module

2025-10-28 15:10:40 +02:00
parent 4e3e575db5
commit 68da90a11a
4103 changed files with 192899 additions and 187024 deletions
--- a/docs/vex/aggregation.md
+++ b/docs/vex/aggregation.md
@@ -1,229 +1,229 @@
-# VEX Observations & Linksets
-
-> Imposed rule: Work of this type or tasks of this type on this component must
-> also be applied everywhere else it should be applied.
-
-Link-Not-Merge brings the same immutable observation model to Excititor that
-Concelier now uses for advisories. VEX statements are stored as append-only
-observations; linksets correlate them, capture conflicts, and keep provenance so
-Policy Engine and UI surfaces can explain decisions without collapsing sources.
-
---
-
-## 1. Model overview
-
-### 1.1 Observation lifecycle
-
-1. **Ingest** – Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX
-   attestations, validate signatures, and strip any derived consensus data
-   forbidden by the Aggregation-Only Contract (AOC).
-2. **Persist** – Excititor writes immutable `vex_observations` keyed by tenant,
-   provider, upstream identifier, and `contentHash`. Supersedes chains record
-   revisions; the original payload is never mutated.
-3. **Expose** – WebService will surface paginated observation APIs and Offline
-   Kit snapshots mirror the same data for air-gapped sites.
-
-Observation schema sketch (final shape lands with `EXCITITOR-LNM-21-001`):
-
-```text
-observationId = {tenant}:{providerId}:{upstreamId}:{revision}
-tenant, providerId, streamId
-upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
-          contentHash, signature{present, format?, keyId?, signature?} }
-content{ format, specVersion, raw }
-statements[
-  { vulnerabilityId, productKey, status, justification?,
-    introducedVersion?, fixedVersion?, locator }
-]
-linkset{ purls[], cpes[], aliases[], references[],
-         reconciledFrom[], conflicts[]? }
-attributes{ batchId?, replayCursor? }
-createdAt
-```
-
- **Raw payload** (`content.raw`) remains lossless (Relaxed Extended JSON).
- **Statements** provide normalized tuples for each claim contained in the
-  document, including justification and version hints.
- **Linkset** mirrors identifiers extracted during ingestion, retaining JSON
-  pointer metadata so audits can trace back to the source fragment.
-
-### 1.2 Linkset lifecycle
-
-Linksets correlate claims referring to the same `(vulnerabilityId, productKey)`
-pair across providers.
-
-1. **Seed** – Observations push normalized identifiers (CVE, GHSA, vendor IDs)
-   plus canonical product keys (purl preferred, cpe fallback). Platform-scoped
-   statements remain marked `non_joinable`.
-2. **Correlate** – The linkset builder groups statements by tenant and identity,
-   combines alias graphs from Concelier, and uses justification/product overlap
-   to assign correlation confidence.
-3. **Annotate** – Conflicts (status disagreement, justification mismatch, range
-   inconsistencies) are recorded as structured entries.
-4. **Persist** – Results land in `vex_linksets` with deterministic IDs (hash of
-   sorted `(vulnerabilityId, productKey, observationIds)`) and append-only
-   history for replay/debugging.
-
-Linksets never override statements or invent consensus; they simply align
-evidence for Policy Engine and consumers.
-
---
-
-## 2. Observation vs. linkset
-
- **Purpose**
-  - Observation: Immutable record of a single upstream VEX document.
-  - Linkset: Correlated evidence spanning observations that describe the same
-    product-vulnerability pair.
- **Mutation**
-  - Observation: Append-only via supersedes.
-  - Linkset: Regenerated deterministically by correlation jobs.
- **Allowed fields**
-  - Observation: Raw payload, provenance, normalized statement tuples, join
-    hints.
-  - Linkset: Observation references, statement IDs, confidence metrics, conflict
-    annotations.
- **Forbidden fields**
-  - Observation: Derived consensus, suppression flags, risk scores.
-  - Linkset: Derived severity or policy decisions (only evidence + conflicts).
- **Consumers**
-  - Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
-  - Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.
-
-### 2.1 Example sequence
-
-1. Canonical vendor issues an attested OpenVEX declaring `CVE-2025-2222` as
-   `not_affected` for `pkg:rpm/redhat/openssl@1.1.1w-12`. Excititor inserts a
-   new observation referencing that statement.
-2. Upstream CycloneDX VEX from a distro reports the same product as `affected`
-   with `under_investigation` justification.
-3. Linkset builder groups both statements by alias overlap and product key,
-   setting confidence `high` because CVE and purl match.
-4. Conflict annotation records `status-mismatch` and retains both justifications;
-   Policy Engine uses this to explain why suppression cannot proceed without
-   policy override.
-
---
-
-## 3. Conflict handling
-
-Structured conflicts capture disagreements without mutating source statements.
-
-```json
-{
-  "type": "status-mismatch",
-  "vulnerabilityId": "CVE-2025-2222",
-  "productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
-  "statements": [
-    {
-      "observationId": "tenant:redhat:openvex:3",
-      "providerId": "redhat",
-      "status": "not_affected",
-      "justification": "component_not_present"
-    },
-    {
-      "observationId": "tenant:ubuntu:cyclonedx:12",
-      "providerId": "ubuntu",
-      "status": "affected",
-      "justification": "under_investigation"
-    }
-  ],
-  "confidence": "medium",
-  "detectedAt": "2025-10-27T14:30:00Z"
-}
-```
-
-Conflict classes (tracked via `EXCITITOR-LNM-21-003`):
-
- `status-mismatch` – Different statuses for the same pair (affected vs
-  not_affected vs fixed vs under_investigation).
- `justification-divergence` – Same status but incompatible justifications or
-  missing justification where policy requires it.
- `version-range-clash` – Introduced/fixed ranges contradict each other.
- `non-joinable-overlap` – Platform-scoped statements collide with package
-  statements; flagged as warning but retained.
- `metadata-gap` – Missing provenance/signature field on specific statements.
-
-Conflicts surface through:
-
- `/vex/linksets/{id}` APIs (`conflicts[]` payload).
- Console evidence panels (badges + drawer detail).
- CLI exports (`stella vex linkset …` planned in `CLI-LNM-22-002`).
- Metrics dashboards (`vex_linkset_conflicts_total{type}`).
-
---
-
-## 4. AOC alignment
-
- **Raw-first** – `content.raw` and `statements[]` mirror upstream input; no
-  derived consensus or suppression values are written by ingestion.
- **No merges** – Each upstream statement persists independently; linksets refer
-  back via `observationId`.
- **Provenance mandatory** – Missing signature or source metadata yields
-  `ERR_AOC_004`; ingestion blocks until connectors fix the feed.
- **Idempotent writes** – Duplicate `(providerId, upstreamId, contentHash)`
-  results in a no-op; revisions append with a `supersedes` pointer.
- **Deterministic output** – Correlator sorts identifiers, normalizes timestamps
-  (UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
- **Scope-aware** – Tenant claims enforced on write/read; Authority scopes
-  `vex:ingest` / `vex:read` are required (see `AUTH-AOC-22-001`).
-
-Violations raise `ERR_AOC_00x`, emit `aoc_violation_total`, and prevent the data
-from landing downstream.
-
---
-
-## 5. Downstream consumption
-
- **Policy Engine** – Evaluates VEX evidence alongside advisory linksets to gate
-  suppression, severity downgrades, or explainability.
- **Console UI** – Evidence panel renders VEX statements grouped by provider and
-  highlights conflicts or missing signatures.
- **CLI** – Planned commands export observations/linksets for offline analysis
-  (`CLI-LNM-22-002`).
- **Offline Kit** – Bundled snapshots keep VEX data aligned with advisory
-  observations for air-gapped parity.
- **Observability** – Dashboards track ingestion latency, conflict counts, and
-  supersedes depth per provider.
-
-New consumers must treat both collections as read-only and preserve deterministic
-ordering when caching.
-
---
-
-## 6. Validation & testing
-
- **Unit tests** (`StellaOps.Excititor.Core.Tests`) to cover schema guards,
-  deterministic linkset hashing, conflict classification, and supersedes
-  behaviour.
- **Mongo integration tests** (`StellaOps.Excititor.Storage.Mongo.Tests`) to
-  verify indexes, shard keys, and idempotent writes across tenants.
- **CLI smoke suites** (`stella vex observations`, `stella vex linksets`) for
-  JSON determinism and exit code coverage.
- **Replay determinism** – Feed identical upstream payloads twice and ensure
-  observation/linkset hashes match across runs.
- **Offline kit verification** – Validate VEX exports packaged in Offline Kit
-  snapshots against live service outputs.
- **Fixture refresh** – Samples (`SAMPLES-LNM-22-002`) must include multi-source
-  conflicts and justification variants used by docs and UI tests.
-
---
-
-## 7. Reviewer checklist
-
- Observation schema aligns with `EXCITITOR-LNM-21-001` once the schema lands;
-  update references as soon as the final contract is published.
- Linkset lifecycle covers correlation signals (alias graphs, product keys,
-  justification rules) and deterministic ID strategy.
- Conflict classes include status, justification, version range, platform overlap
-  scenarios.
- AOC guardrails called out with relevant error codes and Authority scopes.
- Downstream consumer list matches active APIs/CLI features (update when
-  `CLI-LNM-22-002` and WebService endpoints ship).
- Validation section references Core, Storage, CLI, and Offline test suites plus
-  fixture requirements.
- Imposed rule reminder retained at top.
-
-Dependencies outstanding (2025-10-27): `EXCITITOR-LNM-21-001..005` and
-`EXCITITOR-LNM-21-101..102` are still TODO; revisit this document once schemas,
-APIs, and fixtures are implemented.
+# VEX Observations & Linksets
+
+> Imposed rule: Work of this type or tasks of this type on this component must
+> also be applied everywhere else it should be applied.
+
+Link-Not-Merge brings the same immutable observation model to Excititor that
+Concelier now uses for advisories. VEX statements are stored as append-only
+observations; linksets correlate them, capture conflicts, and keep provenance so
+Policy Engine and UI surfaces can explain decisions without collapsing sources.
+
+---
+
+## 1. Model overview
+
+### 1.1 Observation lifecycle
+
+1. **Ingest** – Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX
+   attestations, validate signatures, and strip any derived consensus data
+   forbidden by the Aggregation-Only Contract (AOC).
+2. **Persist** – Excititor writes immutable `vex_observations` keyed by tenant,
+   provider, upstream identifier, and `contentHash`. Supersedes chains record
+   revisions; the original payload is never mutated.
+3. **Expose** – WebService will surface paginated observation APIs and Offline
+   Kit snapshots mirror the same data for air-gapped sites.
+
+Observation schema sketch (final shape lands with `EXCITITOR-LNM-21-001`):
+
+```text
+observationId = {tenant}:{providerId}:{upstreamId}:{revision}
+tenant, providerId, streamId
+upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
+          contentHash, signature{present, format?, keyId?, signature?} }
+content{ format, specVersion, raw }
+statements[
+  { vulnerabilityId, productKey, status, justification?,
+    introducedVersion?, fixedVersion?, locator }
+]
+linkset{ purls[], cpes[], aliases[], references[],
+         reconciledFrom[], conflicts[]? }
+attributes{ batchId?, replayCursor? }
+createdAt
+```
+
+- **Raw payload** (`content.raw`) remains lossless (Relaxed Extended JSON).
+- **Statements** provide normalized tuples for each claim contained in the
+  document, including justification and version hints.
+- **Linkset** mirrors identifiers extracted during ingestion, retaining JSON
+  pointer metadata so audits can trace back to the source fragment.
+
+### 1.2 Linkset lifecycle
+
+Linksets correlate claims referring to the same `(vulnerabilityId, productKey)`
+pair across providers.
+
+1. **Seed** – Observations push normalized identifiers (CVE, GHSA, vendor IDs)
+   plus canonical product keys (purl preferred, cpe fallback). Platform-scoped
+   statements remain marked `non_joinable`.
+2. **Correlate** – The linkset builder groups statements by tenant and identity,
+   combines alias graphs from Concelier, and uses justification/product overlap
+   to assign correlation confidence.
+3. **Annotate** – Conflicts (status disagreement, justification mismatch, range
+   inconsistencies) are recorded as structured entries.
+4. **Persist** – Results land in `vex_linksets` with deterministic IDs (hash of
+   sorted `(vulnerabilityId, productKey, observationIds)`) and append-only
+   history for replay/debugging.
+
+Linksets never override statements or invent consensus; they simply align
+evidence for Policy Engine and consumers.
+
+---
+
+## 2. Observation vs. linkset
+
+- **Purpose**
+  - Observation: Immutable record of a single upstream VEX document.
+  - Linkset: Correlated evidence spanning observations that describe the same
+    product-vulnerability pair.
+- **Mutation**
+  - Observation: Append-only via supersedes.
+  - Linkset: Regenerated deterministically by correlation jobs.
+- **Allowed fields**
+  - Observation: Raw payload, provenance, normalized statement tuples, join
+    hints.
+  - Linkset: Observation references, statement IDs, confidence metrics, conflict
+    annotations.
+- **Forbidden fields**
+  - Observation: Derived consensus, suppression flags, risk scores.
+  - Linkset: Derived severity or policy decisions (only evidence + conflicts).
+- **Consumers**
+  - Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
+  - Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.
+
+### 2.1 Example sequence
+
+1. Canonical vendor issues an attested OpenVEX declaring `CVE-2025-2222` as
+   `not_affected` for `pkg:rpm/redhat/openssl@1.1.1w-12`. Excititor inserts a
+   new observation referencing that statement.
+2. Upstream CycloneDX VEX from a distro reports the same product as `affected`
+   with `under_investigation` justification.
+3. Linkset builder groups both statements by alias overlap and product key,
+   setting confidence `high` because CVE and purl match.
+4. Conflict annotation records `status-mismatch` and retains both justifications;
+   Policy Engine uses this to explain why suppression cannot proceed without
+   policy override.
+
+---
+
+## 3. Conflict handling
+
+Structured conflicts capture disagreements without mutating source statements.
+
+```json
+{
+  "type": "status-mismatch",
+  "vulnerabilityId": "CVE-2025-2222",
+  "productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
+  "statements": [
+    {
+      "observationId": "tenant:redhat:openvex:3",
+      "providerId": "redhat",
+      "status": "not_affected",
+      "justification": "component_not_present"
+    },
+    {
+      "observationId": "tenant:ubuntu:cyclonedx:12",
+      "providerId": "ubuntu",
+      "status": "affected",
+      "justification": "under_investigation"
+    }
+  ],
+  "confidence": "medium",
+  "detectedAt": "2025-10-27T14:30:00Z"
+}
+```
+
+Conflict classes (tracked via `EXCITITOR-LNM-21-003`):
+
+- `status-mismatch` – Different statuses for the same pair (affected vs
+  not_affected vs fixed vs under_investigation).
+- `justification-divergence` – Same status but incompatible justifications or
+  missing justification where policy requires it.
+- `version-range-clash` – Introduced/fixed ranges contradict each other.
+- `non-joinable-overlap` – Platform-scoped statements collide with package
+  statements; flagged as warning but retained.
+- `metadata-gap` – Missing provenance/signature field on specific statements.
+
+Conflicts surface through:
+
+- `/vex/linksets/{id}` APIs (`conflicts[]` payload).
+- Console evidence panels (badges + drawer detail).
+- CLI exports (`stella vex linkset …` planned in `CLI-LNM-22-002`).
+- Metrics dashboards (`vex_linkset_conflicts_total{type}`).
+
+---
+
+## 4. AOC alignment
+
+- **Raw-first** – `content.raw` and `statements[]` mirror upstream input; no
+  derived consensus or suppression values are written by ingestion.
+- **No merges** – Each upstream statement persists independently; linksets refer
+  back via `observationId`.
+- **Provenance mandatory** – Missing signature or source metadata yields
+  `ERR_AOC_004`; ingestion blocks until connectors fix the feed.
+- **Idempotent writes** – Duplicate `(providerId, upstreamId, contentHash)`
+  results in a no-op; revisions append with a `supersedes` pointer.
+- **Deterministic output** – Correlator sorts identifiers, normalizes timestamps
+  (UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
+- **Scope-aware** – Tenant claims enforced on write/read; Authority scopes
+  `vex:ingest` / `vex:read` are required (see `AUTH-AOC-22-001`).
+
+Violations raise `ERR_AOC_00x`, emit `aoc_violation_total`, and prevent the data
+from landing downstream.
+
+---
+
+## 5. Downstream consumption
+
+- **Policy Engine** – Evaluates VEX evidence alongside advisory linksets to gate
+  suppression, severity downgrades, or explainability.
+- **Console UI** – Evidence panel renders VEX statements grouped by provider and
+  highlights conflicts or missing signatures.
+- **CLI** – Planned commands export observations/linksets for offline analysis
+  (`CLI-LNM-22-002`).
+- **Offline Kit** – Bundled snapshots keep VEX data aligned with advisory
+  observations for air-gapped parity.
+- **Observability** – Dashboards track ingestion latency, conflict counts, and
+  supersedes depth per provider.
+
+New consumers must treat both collections as read-only and preserve deterministic
+ordering when caching.
+
+---
+
+## 6. Validation & testing
+
+- **Unit tests** (`StellaOps.Excititor.Core.Tests`) to cover schema guards,
+  deterministic linkset hashing, conflict classification, and supersedes
+  behaviour.
+- **Mongo integration tests** (`StellaOps.Excititor.Storage.Mongo.Tests`) to
+  verify indexes, shard keys, and idempotent writes across tenants.
+- **CLI smoke suites** (`stella vex observations`, `stella vex linksets`) for
+  JSON determinism and exit code coverage.
+- **Replay determinism** – Feed identical upstream payloads twice and ensure
+  observation/linkset hashes match across runs.
+- **Offline kit verification** – Validate VEX exports packaged in Offline Kit
+  snapshots against live service outputs.
+- **Fixture refresh** – Samples (`SAMPLES-LNM-22-002`) must include multi-source
+  conflicts and justification variants used by docs and UI tests.
+
+---
+
+## 7. Reviewer checklist
+
+- Observation schema aligns with `EXCITITOR-LNM-21-001` once the schema lands;
+  update references as soon as the final contract is published.
+- Linkset lifecycle covers correlation signals (alias graphs, product keys,
+  justification rules) and deterministic ID strategy.
+- Conflict classes include status, justification, version range, platform overlap
+  scenarios.
+- AOC guardrails called out with relevant error codes and Authority scopes.
+- Downstream consumer list matches active APIs/CLI features (update when
+  `CLI-LNM-22-002` and WebService endpoints ship).
+- Validation section references Core, Storage, CLI, and Offline test suites plus
+  fixture requirements.
+- Imposed rule reminder retained at top.
+
+Dependencies outstanding (2025-10-27): `EXCITITOR-LNM-21-001..005` and
+`EXCITITOR-LNM-21-101..102` are still TODO; revisit this document once schemas,
+APIs, and fixtures are implemented.