9.6 KiB
VEX Observations & Linksets
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Link-Not-Merge brings the same immutable observation model to Excititor that Concelier now uses for advisories. VEX statements are stored as append-only observations; linksets correlate them, capture conflicts, and keep provenance so Policy Engine and UI surfaces can explain decisions without collapsing sources.
1. Model overview
1.1 Observation lifecycle
- Ingest – Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX attestations, validate signatures, and strip any derived consensus data forbidden by the Aggregation-Only Contract (AOC).
- Persist – Excititor writes immutable
vex_observationskeyed by tenant, provider, upstream identifier, andcontentHash. Supersedes chains record revisions; the original payload is never mutated. - Expose – WebService will surface paginated observation APIs and Offline Kit snapshots mirror the same data for air-gapped sites.
Observation schema sketch (final shape lands with EXCITITOR-LNM-21-001):
observationId = {tenant}:{providerId}:{upstreamId}:{revision}
tenant, providerId, streamId
upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?} }
content{ format, specVersion, raw }
statements[
{ vulnerabilityId, productKey, status, justification?,
introducedVersion?, fixedVersion?, locator }
]
linkset{ purls[], cpes[], aliases[], references[],
reconciledFrom[], conflicts[]? }
attributes{ batchId?, replayCursor? }
createdAt
- Raw payload (
content.raw) remains lossless (Relaxed Extended JSON). - Statements provide normalized tuples for each claim contained in the document, including justification and version hints.
- Linkset mirrors identifiers extracted during ingestion, retaining JSON pointer metadata so audits can trace back to the source fragment.
1.2 Linkset lifecycle
Linksets correlate claims referring to the same (vulnerabilityId, productKey)
pair across providers.
- Seed – Observations push normalized identifiers (CVE, GHSA, vendor IDs)
plus canonical product keys (purl preferred, cpe fallback). Platform-scoped
statements remain marked
non_joinable. - Correlate – The linkset builder groups statements by tenant and identity, combines alias graphs from Concelier, and uses justification/product overlap to assign correlation confidence.
- Annotate – Conflicts (status disagreement, justification mismatch, range inconsistencies) are recorded as structured entries.
- Persist – Results land in
vex_linksetswith deterministic IDs (hash of sorted(vulnerabilityId, productKey, observationIds)) and append-only history for replay/debugging.
Linksets never override statements or invent consensus; they simply align evidence for Policy Engine and consumers.
2. Observation vs. linkset
- Purpose
- Observation: Immutable record of a single upstream VEX document.
- Linkset: Correlated evidence spanning observations that describe the same product-vulnerability pair.
- Mutation
- Observation: Append-only via supersedes.
- Linkset: Regenerated deterministically by correlation jobs.
- Allowed fields
- Observation: Raw payload, provenance, normalized statement tuples, join hints.
- Linkset: Observation references, statement IDs, confidence metrics, conflict annotations.
- Forbidden fields
- Observation: Derived consensus, suppression flags, risk scores.
- Linkset: Derived severity or policy decisions (only evidence + conflicts).
- Consumers
- Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
- Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.
2.1 Example sequence
- Canonical vendor issues an attested OpenVEX declaring
CVE-2025-2222asnot_affectedforpkg:rpm/redhat/openssl@1.1.1w-12. Excititor inserts a new observation referencing that statement. - Upstream CycloneDX VEX from a distro reports the same product as
affectedwithunder_investigationjustification. - Linkset builder groups both statements by alias overlap and product key,
setting confidence
highbecause CVE and purl match. - Conflict annotation records
status-mismatchand retains both justifications; Policy Engine uses this to explain why suppression cannot proceed without policy override.
3. Conflict handling
Structured conflicts capture disagreements without mutating source statements.
{
"type": "status-mismatch",
"vulnerabilityId": "CVE-2025-2222",
"productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
"statements": [
{
"observationId": "tenant:redhat:openvex:3",
"providerId": "redhat",
"status": "not_affected",
"justification": "component_not_present"
},
{
"observationId": "tenant:ubuntu:cyclonedx:12",
"providerId": "ubuntu",
"status": "affected",
"justification": "under_investigation"
}
],
"confidence": "medium",
"detectedAt": "2025-10-27T14:30:00Z"
}
Conflict classes (tracked via EXCITITOR-LNM-21-003):
status-mismatch– Different statuses for the same pair (affected vs not_affected vs fixed vs under_investigation).justification-divergence– Same status but incompatible justifications or missing justification where policy requires it.version-range-clash– Introduced/fixed ranges contradict each other.non-joinable-overlap– Platform-scoped statements collide with package statements; flagged as warning but retained.metadata-gap– Missing provenance/signature field on specific statements.
Conflicts surface through:
/vex/linksets/{id}APIs (conflicts[]payload).- Console evidence panels (badges + drawer detail).
- CLI exports (
stella vex linkset …planned inCLI-LNM-22-002). - Metrics dashboards (
vex_linkset_conflicts_total{type}).
4. AOC alignment
- Raw-first –
content.rawandstatements[]mirror upstream input; no derived consensus or suppression values are written by ingestion. - No merges – Each upstream statement persists independently; linksets refer
back via
observationId. - Provenance mandatory – Missing signature or source metadata yields
ERR_AOC_004; ingestion blocks until connectors fix the feed. - Idempotent writes – Duplicate
(providerId, upstreamId, contentHash)results in a no-op; revisions append with asupersedespointer. - Deterministic output – Correlator sorts identifiers, normalizes timestamps (UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
- Scope-aware – Tenant claims enforced on write/read; Authority scopes
vex:ingest/vex:readare required (seeAUTH-AOC-22-001).
Violations raise ERR_AOC_00x, emit aoc_violation_total, and prevent the data
from landing downstream.
5. Downstream consumption
- Policy Engine – Evaluates VEX evidence alongside advisory linksets to gate suppression, severity downgrades, or explainability.
- Console UI – Evidence panel renders VEX statements grouped by provider and highlights conflicts or missing signatures.
- CLI – Planned commands export observations/linksets for offline analysis
(
CLI-LNM-22-002). - Offline Kit – Bundled snapshots keep VEX data aligned with advisory observations for air-gapped parity.
- Observability – Dashboards track ingestion latency, conflict counts, and supersedes depth per provider.
New consumers must treat both collections as read-only and preserve deterministic ordering when caching.
6. Validation & testing
- Unit tests (
StellaOps.Excititor.Core.Tests) to cover schema guards, deterministic linkset hashing, conflict classification, and supersedes behaviour. - Mongo integration tests (
StellaOps.Excititor.Storage.Mongo.Tests) to verify indexes, shard keys, and idempotent writes across tenants. - CLI smoke suites (
stella vex observations,stella vex linksets) for JSON determinism and exit code coverage. - Replay determinism – Feed identical upstream payloads twice and ensure observation/linkset hashes match across runs.
- Offline kit verification – Validate VEX exports packaged in Offline Kit snapshots against live service outputs.
- Fixture refresh – Samples (
SAMPLES-LNM-22-002) must include multi-source conflicts and justification variants used by docs and UI tests.
7. Reviewer checklist
- Observation schema aligns with
EXCITITOR-LNM-21-001once the schema lands; update references as soon as the final contract is published. - Linkset lifecycle covers correlation signals (alias graphs, product keys, justification rules) and deterministic ID strategy.
- Conflict classes include status, justification, version range, platform overlap scenarios.
- AOC guardrails called out with relevant error codes and Authority scopes.
- Downstream consumer list matches active APIs/CLI features (update when
CLI-LNM-22-002and WebService endpoints ship). - Validation section references Core, Storage, CLI, and Offline test suites plus fixture requirements.
- Imposed rule reminder retained at top.
Dependencies outstanding (2025-10-27): EXCITITOR-LNM-21-001..005 and
EXCITITOR-LNM-21-101..102 are still TODO; revisit this document once schemas,
APIs, and fixtures are implemented.