Restructure solution layout by module
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
root
2025-10-28 15:10:40 +02:00
parent 4e3e575db5
commit 68da90a11a
4103 changed files with 192899 additions and 187024 deletions

View File

@@ -1,229 +1,229 @@
# VEX Observations & Linksets
> Imposed rule: Work of this type or tasks of this type on this component must
> also be applied everywhere else it should be applied.
Link-Not-Merge brings the same immutable observation model to Excititor that
Concelier now uses for advisories. VEX statements are stored as append-only
observations; linksets correlate them, capture conflicts, and keep provenance so
Policy Engine and UI surfaces can explain decisions without collapsing sources.
---
## 1. Model overview
### 1.1 Observation lifecycle
1. **Ingest** Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX
attestations, validate signatures, and strip any derived consensus data
forbidden by the Aggregation-Only Contract (AOC).
2. **Persist** Excititor writes immutable `vex_observations` keyed by tenant,
provider, upstream identifier, and `contentHash`. Supersedes chains record
revisions; the original payload is never mutated.
3. **Expose** WebService will surface paginated observation APIs and Offline
Kit snapshots mirror the same data for air-gapped sites.
Observation schema sketch (final shape lands with `EXCITITOR-LNM-21-001`):
```text
observationId = {tenant}:{providerId}:{upstreamId}:{revision}
tenant, providerId, streamId
upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?} }
content{ format, specVersion, raw }
statements[
{ vulnerabilityId, productKey, status, justification?,
introducedVersion?, fixedVersion?, locator }
]
linkset{ purls[], cpes[], aliases[], references[],
reconciledFrom[], conflicts[]? }
attributes{ batchId?, replayCursor? }
createdAt
```
- **Raw payload** (`content.raw`) remains lossless (Relaxed Extended JSON).
- **Statements** provide normalized tuples for each claim contained in the
document, including justification and version hints.
- **Linkset** mirrors identifiers extracted during ingestion, retaining JSON
pointer metadata so audits can trace back to the source fragment.
### 1.2 Linkset lifecycle
Linksets correlate claims referring to the same `(vulnerabilityId, productKey)`
pair across providers.
1. **Seed** Observations push normalized identifiers (CVE, GHSA, vendor IDs)
plus canonical product keys (purl preferred, cpe fallback). Platform-scoped
statements remain marked `non_joinable`.
2. **Correlate** The linkset builder groups statements by tenant and identity,
combines alias graphs from Concelier, and uses justification/product overlap
to assign correlation confidence.
3. **Annotate** Conflicts (status disagreement, justification mismatch, range
inconsistencies) are recorded as structured entries.
4. **Persist** Results land in `vex_linksets` with deterministic IDs (hash of
sorted `(vulnerabilityId, productKey, observationIds)`) and append-only
history for replay/debugging.
Linksets never override statements or invent consensus; they simply align
evidence for Policy Engine and consumers.
---
## 2. Observation vs. linkset
- **Purpose**
- Observation: Immutable record of a single upstream VEX document.
- Linkset: Correlated evidence spanning observations that describe the same
product-vulnerability pair.
- **Mutation**
- Observation: Append-only via supersedes.
- Linkset: Regenerated deterministically by correlation jobs.
- **Allowed fields**
- Observation: Raw payload, provenance, normalized statement tuples, join
hints.
- Linkset: Observation references, statement IDs, confidence metrics, conflict
annotations.
- **Forbidden fields**
- Observation: Derived consensus, suppression flags, risk scores.
- Linkset: Derived severity or policy decisions (only evidence + conflicts).
- **Consumers**
- Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
- Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.
### 2.1 Example sequence
1. Canonical vendor issues an attested OpenVEX declaring `CVE-2025-2222` as
`not_affected` for `pkg:rpm/redhat/openssl@1.1.1w-12`. Excititor inserts a
new observation referencing that statement.
2. Upstream CycloneDX VEX from a distro reports the same product as `affected`
with `under_investigation` justification.
3. Linkset builder groups both statements by alias overlap and product key,
setting confidence `high` because CVE and purl match.
4. Conflict annotation records `status-mismatch` and retains both justifications;
Policy Engine uses this to explain why suppression cannot proceed without
policy override.
---
## 3. Conflict handling
Structured conflicts capture disagreements without mutating source statements.
```json
{
"type": "status-mismatch",
"vulnerabilityId": "CVE-2025-2222",
"productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
"statements": [
{
"observationId": "tenant:redhat:openvex:3",
"providerId": "redhat",
"status": "not_affected",
"justification": "component_not_present"
},
{
"observationId": "tenant:ubuntu:cyclonedx:12",
"providerId": "ubuntu",
"status": "affected",
"justification": "under_investigation"
}
],
"confidence": "medium",
"detectedAt": "2025-10-27T14:30:00Z"
}
```
Conflict classes (tracked via `EXCITITOR-LNM-21-003`):
- `status-mismatch` Different statuses for the same pair (affected vs
not_affected vs fixed vs under_investigation).
- `justification-divergence` Same status but incompatible justifications or
missing justification where policy requires it.
- `version-range-clash` Introduced/fixed ranges contradict each other.
- `non-joinable-overlap` Platform-scoped statements collide with package
statements; flagged as warning but retained.
- `metadata-gap` Missing provenance/signature field on specific statements.
Conflicts surface through:
- `/vex/linksets/{id}` APIs (`conflicts[]` payload).
- Console evidence panels (badges + drawer detail).
- CLI exports (`stella vex linkset …` planned in `CLI-LNM-22-002`).
- Metrics dashboards (`vex_linkset_conflicts_total{type}`).
---
## 4. AOC alignment
- **Raw-first** `content.raw` and `statements[]` mirror upstream input; no
derived consensus or suppression values are written by ingestion.
- **No merges** Each upstream statement persists independently; linksets refer
back via `observationId`.
- **Provenance mandatory** Missing signature or source metadata yields
`ERR_AOC_004`; ingestion blocks until connectors fix the feed.
- **Idempotent writes** Duplicate `(providerId, upstreamId, contentHash)`
results in a no-op; revisions append with a `supersedes` pointer.
- **Deterministic output** Correlator sorts identifiers, normalizes timestamps
(UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
- **Scope-aware** Tenant claims enforced on write/read; Authority scopes
`vex:ingest` / `vex:read` are required (see `AUTH-AOC-22-001`).
Violations raise `ERR_AOC_00x`, emit `aoc_violation_total`, and prevent the data
from landing downstream.
---
## 5. Downstream consumption
- **Policy Engine** Evaluates VEX evidence alongside advisory linksets to gate
suppression, severity downgrades, or explainability.
- **Console UI** Evidence panel renders VEX statements grouped by provider and
highlights conflicts or missing signatures.
- **CLI** Planned commands export observations/linksets for offline analysis
(`CLI-LNM-22-002`).
- **Offline Kit** Bundled snapshots keep VEX data aligned with advisory
observations for air-gapped parity.
- **Observability** Dashboards track ingestion latency, conflict counts, and
supersedes depth per provider.
New consumers must treat both collections as read-only and preserve deterministic
ordering when caching.
---
## 6. Validation & testing
- **Unit tests** (`StellaOps.Excititor.Core.Tests`) to cover schema guards,
deterministic linkset hashing, conflict classification, and supersedes
behaviour.
- **Mongo integration tests** (`StellaOps.Excititor.Storage.Mongo.Tests`) to
verify indexes, shard keys, and idempotent writes across tenants.
- **CLI smoke suites** (`stella vex observations`, `stella vex linksets`) for
JSON determinism and exit code coverage.
- **Replay determinism** Feed identical upstream payloads twice and ensure
observation/linkset hashes match across runs.
- **Offline kit verification** Validate VEX exports packaged in Offline Kit
snapshots against live service outputs.
- **Fixture refresh** Samples (`SAMPLES-LNM-22-002`) must include multi-source
conflicts and justification variants used by docs and UI tests.
---
## 7. Reviewer checklist
- Observation schema aligns with `EXCITITOR-LNM-21-001` once the schema lands;
update references as soon as the final contract is published.
- Linkset lifecycle covers correlation signals (alias graphs, product keys,
justification rules) and deterministic ID strategy.
- Conflict classes include status, justification, version range, platform overlap
scenarios.
- AOC guardrails called out with relevant error codes and Authority scopes.
- Downstream consumer list matches active APIs/CLI features (update when
`CLI-LNM-22-002` and WebService endpoints ship).
- Validation section references Core, Storage, CLI, and Offline test suites plus
fixture requirements.
- Imposed rule reminder retained at top.
Dependencies outstanding (2025-10-27): `EXCITITOR-LNM-21-001..005` and
`EXCITITOR-LNM-21-101..102` are still TODO; revisit this document once schemas,
APIs, and fixtures are implemented.
# VEX Observations & Linksets
> Imposed rule: Work of this type or tasks of this type on this component must
> also be applied everywhere else it should be applied.
Link-Not-Merge brings the same immutable observation model to Excititor that
Concelier now uses for advisories. VEX statements are stored as append-only
observations; linksets correlate them, capture conflicts, and keep provenance so
Policy Engine and UI surfaces can explain decisions without collapsing sources.
---
## 1. Model overview
### 1.1 Observation lifecycle
1. **Ingest** Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX
attestations, validate signatures, and strip any derived consensus data
forbidden by the Aggregation-Only Contract (AOC).
2. **Persist** Excititor writes immutable `vex_observations` keyed by tenant,
provider, upstream identifier, and `contentHash`. Supersedes chains record
revisions; the original payload is never mutated.
3. **Expose** WebService will surface paginated observation APIs and Offline
Kit snapshots mirror the same data for air-gapped sites.
Observation schema sketch (final shape lands with `EXCITITOR-LNM-21-001`):
```text
observationId = {tenant}:{providerId}:{upstreamId}:{revision}
tenant, providerId, streamId
upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?} }
content{ format, specVersion, raw }
statements[
{ vulnerabilityId, productKey, status, justification?,
introducedVersion?, fixedVersion?, locator }
]
linkset{ purls[], cpes[], aliases[], references[],
reconciledFrom[], conflicts[]? }
attributes{ batchId?, replayCursor? }
createdAt
```
- **Raw payload** (`content.raw`) remains lossless (Relaxed Extended JSON).
- **Statements** provide normalized tuples for each claim contained in the
document, including justification and version hints.
- **Linkset** mirrors identifiers extracted during ingestion, retaining JSON
pointer metadata so audits can trace back to the source fragment.
### 1.2 Linkset lifecycle
Linksets correlate claims referring to the same `(vulnerabilityId, productKey)`
pair across providers.
1. **Seed** Observations push normalized identifiers (CVE, GHSA, vendor IDs)
plus canonical product keys (purl preferred, cpe fallback). Platform-scoped
statements remain marked `non_joinable`.
2. **Correlate** The linkset builder groups statements by tenant and identity,
combines alias graphs from Concelier, and uses justification/product overlap
to assign correlation confidence.
3. **Annotate** Conflicts (status disagreement, justification mismatch, range
inconsistencies) are recorded as structured entries.
4. **Persist** Results land in `vex_linksets` with deterministic IDs (hash of
sorted `(vulnerabilityId, productKey, observationIds)`) and append-only
history for replay/debugging.
Linksets never override statements or invent consensus; they simply align
evidence for Policy Engine and consumers.
---
## 2. Observation vs. linkset
- **Purpose**
- Observation: Immutable record of a single upstream VEX document.
- Linkset: Correlated evidence spanning observations that describe the same
product-vulnerability pair.
- **Mutation**
- Observation: Append-only via supersedes.
- Linkset: Regenerated deterministically by correlation jobs.
- **Allowed fields**
- Observation: Raw payload, provenance, normalized statement tuples, join
hints.
- Linkset: Observation references, statement IDs, confidence metrics, conflict
annotations.
- **Forbidden fields**
- Observation: Derived consensus, suppression flags, risk scores.
- Linkset: Derived severity or policy decisions (only evidence + conflicts).
- **Consumers**
- Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
- Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.
### 2.1 Example sequence
1. Canonical vendor issues an attested OpenVEX declaring `CVE-2025-2222` as
`not_affected` for `pkg:rpm/redhat/openssl@1.1.1w-12`. Excititor inserts a
new observation referencing that statement.
2. Upstream CycloneDX VEX from a distro reports the same product as `affected`
with `under_investigation` justification.
3. Linkset builder groups both statements by alias overlap and product key,
setting confidence `high` because CVE and purl match.
4. Conflict annotation records `status-mismatch` and retains both justifications;
Policy Engine uses this to explain why suppression cannot proceed without
policy override.
---
## 3. Conflict handling
Structured conflicts capture disagreements without mutating source statements.
```json
{
"type": "status-mismatch",
"vulnerabilityId": "CVE-2025-2222",
"productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
"statements": [
{
"observationId": "tenant:redhat:openvex:3",
"providerId": "redhat",
"status": "not_affected",
"justification": "component_not_present"
},
{
"observationId": "tenant:ubuntu:cyclonedx:12",
"providerId": "ubuntu",
"status": "affected",
"justification": "under_investigation"
}
],
"confidence": "medium",
"detectedAt": "2025-10-27T14:30:00Z"
}
```
Conflict classes (tracked via `EXCITITOR-LNM-21-003`):
- `status-mismatch` Different statuses for the same pair (affected vs
not_affected vs fixed vs under_investigation).
- `justification-divergence` Same status but incompatible justifications or
missing justification where policy requires it.
- `version-range-clash` Introduced/fixed ranges contradict each other.
- `non-joinable-overlap` Platform-scoped statements collide with package
statements; flagged as warning but retained.
- `metadata-gap` Missing provenance/signature field on specific statements.
Conflicts surface through:
- `/vex/linksets/{id}` APIs (`conflicts[]` payload).
- Console evidence panels (badges + drawer detail).
- CLI exports (`stella vex linkset …` planned in `CLI-LNM-22-002`).
- Metrics dashboards (`vex_linkset_conflicts_total{type}`).
---
## 4. AOC alignment
- **Raw-first** `content.raw` and `statements[]` mirror upstream input; no
derived consensus or suppression values are written by ingestion.
- **No merges** Each upstream statement persists independently; linksets refer
back via `observationId`.
- **Provenance mandatory** Missing signature or source metadata yields
`ERR_AOC_004`; ingestion blocks until connectors fix the feed.
- **Idempotent writes** Duplicate `(providerId, upstreamId, contentHash)`
results in a no-op; revisions append with a `supersedes` pointer.
- **Deterministic output** Correlator sorts identifiers, normalizes timestamps
(UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
- **Scope-aware** Tenant claims enforced on write/read; Authority scopes
`vex:ingest` / `vex:read` are required (see `AUTH-AOC-22-001`).
Violations raise `ERR_AOC_00x`, emit `aoc_violation_total`, and prevent the data
from landing downstream.
---
## 5. Downstream consumption
- **Policy Engine** Evaluates VEX evidence alongside advisory linksets to gate
suppression, severity downgrades, or explainability.
- **Console UI** Evidence panel renders VEX statements grouped by provider and
highlights conflicts or missing signatures.
- **CLI** Planned commands export observations/linksets for offline analysis
(`CLI-LNM-22-002`).
- **Offline Kit** Bundled snapshots keep VEX data aligned with advisory
observations for air-gapped parity.
- **Observability** Dashboards track ingestion latency, conflict counts, and
supersedes depth per provider.
New consumers must treat both collections as read-only and preserve deterministic
ordering when caching.
---
## 6. Validation & testing
- **Unit tests** (`StellaOps.Excititor.Core.Tests`) to cover schema guards,
deterministic linkset hashing, conflict classification, and supersedes
behaviour.
- **Mongo integration tests** (`StellaOps.Excititor.Storage.Mongo.Tests`) to
verify indexes, shard keys, and idempotent writes across tenants.
- **CLI smoke suites** (`stella vex observations`, `stella vex linksets`) for
JSON determinism and exit code coverage.
- **Replay determinism** Feed identical upstream payloads twice and ensure
observation/linkset hashes match across runs.
- **Offline kit verification** Validate VEX exports packaged in Offline Kit
snapshots against live service outputs.
- **Fixture refresh** Samples (`SAMPLES-LNM-22-002`) must include multi-source
conflicts and justification variants used by docs and UI tests.
---
## 7. Reviewer checklist
- Observation schema aligns with `EXCITITOR-LNM-21-001` once the schema lands;
update references as soon as the final contract is published.
- Linkset lifecycle covers correlation signals (alias graphs, product keys,
justification rules) and deterministic ID strategy.
- Conflict classes include status, justification, version range, platform overlap
scenarios.
- AOC guardrails called out with relevant error codes and Authority scopes.
- Downstream consumer list matches active APIs/CLI features (update when
`CLI-LNM-22-002` and WebService endpoints ship).
- Validation section references Core, Storage, CLI, and Offline test suites plus
fixture requirements.
- Imposed rule reminder retained at top.
Dependencies outstanding (2025-10-27): `EXCITITOR-LNM-21-001..005` and
`EXCITITOR-LNM-21-101..102` are still TODO; revisit this document once schemas,
APIs, and fixtures are implemented.