VEX Observations & Linksets

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

Link-Not-Merge brings the same immutable observation model to Excititor that Concelier now uses for advisories. VEX statements are stored as append-only observations; linksets correlate them, capture conflicts, and keep provenance so Policy Engine and UI surfaces can explain decisions without collapsing sources.

1. Model overview

1.1 Observation lifecycle

Ingest – Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX attestations, validate signatures, and strip any derived consensus data forbidden by the Aggregation-Only Contract (AOC).
Persist – Excititor writes immutable vex_observations keyed by tenant, provider, upstream identifier, and contentHash. Supersedes chains record revisions; the original payload is never mutated.
Expose – WebService will surface paginated observation APIs and Offline Kit snapshots mirror the same data for air-gapped sites.

Observation schema sketch (final shape lands with EXCITITOR-LNM-21-001):

observationId = {tenant}:{providerId}:{upstreamId}:{revision}
tenant, providerId, streamId
upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
          contentHash, signature{present, format?, keyId?, signature?} }
content{ format, specVersion, raw }
statements[
  { vulnerabilityId, productKey, status, justification?,
    introducedVersion?, fixedVersion?, locator }
]
linkset{ purls[], cpes[], aliases[], references[],
         reconciledFrom[], conflicts[]? }
attributes{ batchId?, replayCursor? }
createdAt

Raw payload (content.raw) remains lossless (Relaxed Extended JSON).
Statements provide normalized tuples for each claim contained in the document, including justification and version hints.
Linkset mirrors identifiers extracted during ingestion, retaining JSON pointer metadata so audits can trace back to the source fragment.

1.2 Linkset lifecycle

Linksets correlate claims referring to the same (vulnerabilityId, productKey) pair across providers.

Seed – Observations push normalized identifiers (CVE, GHSA, vendor IDs) plus canonical product keys (purl preferred, cpe fallback). Platform-scoped statements remain marked non_joinable.
Correlate – The linkset builder groups statements by tenant and identity, combines alias graphs from Concelier, and uses justification/product overlap to assign correlation confidence.
Annotate – Conflicts (status disagreement, justification mismatch, range inconsistencies) are recorded as structured entries.
Persist – Results land in vex_linksets with deterministic IDs (hash of sorted (vulnerabilityId, productKey, observationIds)) and append-only history for replay/debugging.

Linksets never override statements or invent consensus; they simply align evidence for Policy Engine and consumers.

2. Observation vs. linkset

Purpose
- Observation: Immutable record of a single upstream VEX document.
- Linkset: Correlated evidence spanning observations that describe the same product-vulnerability pair.
Mutation
- Observation: Append-only via supersedes.
- Linkset: Regenerated deterministically by correlation jobs.
Allowed fields
- Observation: Raw payload, provenance, normalized statement tuples, join hints.
- Linkset: Observation references, statement IDs, confidence metrics, conflict annotations.
Forbidden fields
- Observation: Derived consensus, suppression flags, risk scores.
- Linkset: Derived severity or policy decisions (only evidence + conflicts).
Consumers
- Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
- Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.

2.1 Example sequence

Canonical vendor issues an attested OpenVEX declaring CVE-2025-2222 as not_affected for pkg:rpm/redhat/openssl@1.1.1w-12. Excititor inserts a new observation referencing that statement.
Upstream CycloneDX VEX from a distro reports the same product as affected with under_investigation justification.
Linkset builder groups both statements by alias overlap and product key, setting confidence high because CVE and purl match.
Conflict annotation records status-mismatch and retains both justifications; Policy Engine uses this to explain why suppression cannot proceed without policy override.

3. Conflict handling

Structured conflicts capture disagreements without mutating source statements.

{
  "type": "status-mismatch",
  "vulnerabilityId": "CVE-2025-2222",
  "productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
  "statements": [
    {
      "observationId": "tenant:redhat:openvex:3",
      "providerId": "redhat",
      "status": "not_affected",
      "justification": "component_not_present"
    },
    {
      "observationId": "tenant:ubuntu:cyclonedx:12",
      "providerId": "ubuntu",
      "status": "affected",
      "justification": "under_investigation"
    }
  ],
  "confidence": "medium",
  "detectedAt": "2025-10-27T14:30:00Z"
}

Conflict classes (tracked via EXCITITOR-LNM-21-003):

status-mismatch – Different statuses for the same pair (affected vs not_affected vs fixed vs under_investigation).
justification-divergence – Same status but incompatible justifications or missing justification where policy requires it.
version-range-clash – Introduced/fixed ranges contradict each other.
non-joinable-overlap – Platform-scoped statements collide with package statements; flagged as warning but retained.
metadata-gap – Missing provenance/signature field on specific statements.

Conflicts surface through:

/vex/linksets/{id} APIs (conflicts[] payload).
Console evidence panels (badges + drawer detail).
CLI exports (stella vex linkset … planned in CLI-LNM-22-002).
Metrics dashboards (vex_linkset_conflicts_total{type}).

4. AOC alignment

Raw-first – content.raw and statements[] mirror upstream input; no derived consensus or suppression values are written by ingestion.
No merges – Each upstream statement persists independently; linksets refer back via observationId.
Provenance mandatory – Missing signature or source metadata yields ERR_AOC_004; ingestion blocks until connectors fix the feed.
Idempotent writes – Duplicate (providerId, upstreamId, contentHash) results in a no-op; revisions append with a supersedes pointer.
Deterministic output – Correlator sorts identifiers, normalizes timestamps (UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
Scope-aware – Tenant claims enforced on write/read; Authority scopes vex:ingest / vex:read are required (see AUTH-AOC-22-001).

Violations raise ERR_AOC_00x, emit aoc_violation_total, and prevent the data from landing downstream.

5. Downstream consumption

Policy Engine – Evaluates VEX evidence alongside advisory linksets to gate suppression, severity downgrades, or explainability.
Console UI – Evidence panel renders VEX statements grouped by provider and highlights conflicts or missing signatures.
CLI – Planned commands export observations/linksets for offline analysis (CLI-LNM-22-002).
Offline Kit – Bundled snapshots keep VEX data aligned with advisory observations for air-gapped parity.
Observability – Dashboards track ingestion latency, conflict counts, and supersedes depth per provider.

New consumers must treat both collections as read-only and preserve deterministic ordering when caching.

6. Validation & testing

Unit tests (StellaOps.Excititor.Core.Tests) to cover schema guards, deterministic linkset hashing, conflict classification, and supersedes behaviour.
Mongo integration tests (StellaOps.Excititor.Storage.Mongo.Tests) to verify indexes, shard keys, and idempotent writes across tenants.
CLI smoke suites (stella vex observations, stella vex linksets) for JSON determinism and exit code coverage.
Replay determinism – Feed identical upstream payloads twice and ensure observation/linkset hashes match across runs.
Offline kit verification – Validate VEX exports packaged in Offline Kit snapshots against live service outputs.
Fixture refresh – Samples (SAMPLES-LNM-22-002) must include multi-source conflicts and justification variants used by docs and UI tests.

7. Reviewer checklist

Observation schema aligns with EXCITITOR-LNM-21-001 once the schema lands; update references as soon as the final contract is published.
Linkset lifecycle covers correlation signals (alias graphs, product keys, justification rules) and deterministic ID strategy.
Conflict classes include status, justification, version range, platform overlap scenarios.
AOC guardrails called out with relevant error codes and Authority scopes.
Downstream consumer list matches active APIs/CLI features (update when CLI-LNM-22-002 and WebService endpoints ship).
Validation section references Core, Storage, CLI, and Offline test suites plus fixture requirements.
Imposed rule reminder retained at top.

Dependencies outstanding (2025-10-27): EXCITITOR-LNM-21-001..005 and EXCITITOR-LNM-21-101..102 are still TODO; revisit this document once schemas, APIs, and fixtures are implemented.

9.6 KiB Raw Blame History Unescape Escape