Files
git.stella-ops.org/docs/vex/aggregation.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

230 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# VEX Observations & Linksets
> Imposed rule: Work of this type or tasks of this type on this component must
> also be applied everywhere else it should be applied.
Link-Not-Merge brings the same immutable observation model to Excititor that
Concelier now uses for advisories. VEX statements are stored as append-only
observations; linksets correlate them, capture conflicts, and keep provenance so
Policy Engine and UI surfaces can explain decisions without collapsing sources.
---
## 1. Model overview
### 1.1 Observation lifecycle
1. **Ingest** Connectors fetch OpenVEX, CSAF VEX, CycloneDX VEX, or VEX
attestations, validate signatures, and strip any derived consensus data
forbidden by the Aggregation-Only Contract (AOC).
2. **Persist** Excititor writes immutable `vex_observations` keyed by tenant,
provider, upstream identifier, and `contentHash`. Supersedes chains record
revisions; the original payload is never mutated.
3. **Expose** WebService will surface paginated observation APIs and Offline
Kit snapshots mirror the same data for air-gapped sites.
Observation schema sketch (final shape lands with `EXCITITOR-LNM-21-001`):
```text
observationId = {tenant}:{providerId}:{upstreamId}:{revision}
tenant, providerId, streamId
upstream{ upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format?, keyId?, signature?} }
content{ format, specVersion, raw }
statements[
{ vulnerabilityId, productKey, status, justification?,
introducedVersion?, fixedVersion?, locator }
]
linkset{ purls[], cpes[], aliases[], references[],
reconciledFrom[], conflicts[]? }
attributes{ batchId?, replayCursor? }
createdAt
```
- **Raw payload** (`content.raw`) remains lossless (Relaxed Extended JSON).
- **Statements** provide normalized tuples for each claim contained in the
document, including justification and version hints.
- **Linkset** mirrors identifiers extracted during ingestion, retaining JSON
pointer metadata so audits can trace back to the source fragment.
### 1.2 Linkset lifecycle
Linksets correlate claims referring to the same `(vulnerabilityId, productKey)`
pair across providers.
1. **Seed** Observations push normalized identifiers (CVE, GHSA, vendor IDs)
plus canonical product keys (purl preferred, cpe fallback). Platform-scoped
statements remain marked `non_joinable`.
2. **Correlate** The linkset builder groups statements by tenant and identity,
combines alias graphs from Concelier, and uses justification/product overlap
to assign correlation confidence.
3. **Annotate** Conflicts (status disagreement, justification mismatch, range
inconsistencies) are recorded as structured entries.
4. **Persist** Results land in `vex_linksets` with deterministic IDs (hash of
sorted `(vulnerabilityId, productKey, observationIds)`) and append-only
history for replay/debugging.
Linksets never override statements or invent consensus; they simply align
evidence for Policy Engine and consumers.
---
## 2. Observation vs. linkset
- **Purpose**
- Observation: Immutable record of a single upstream VEX document.
- Linkset: Correlated evidence spanning observations that describe the same
product-vulnerability pair.
- **Mutation**
- Observation: Append-only via supersedes.
- Linkset: Regenerated deterministically by correlation jobs.
- **Allowed fields**
- Observation: Raw payload, provenance, normalized statement tuples, join
hints.
- Linkset: Observation references, statement IDs, confidence metrics, conflict
annotations.
- **Forbidden fields**
- Observation: Derived consensus, suppression flags, risk scores.
- Linkset: Derived severity or policy decisions (only evidence + conflicts).
- **Consumers**
- Observation: Evidence exports, Offline Kit mirrors, CLI raw dumps.
- Linkset: Policy Engine VEX overlay, Console evidence panes, Vuln Explorer.
### 2.1 Example sequence
1. Canonical vendor issues an attested OpenVEX declaring `CVE-2025-2222` as
`not_affected` for `pkg:rpm/redhat/openssl@1.1.1w-12`. Excititor inserts a
new observation referencing that statement.
2. Upstream CycloneDX VEX from a distro reports the same product as `affected`
with `under_investigation` justification.
3. Linkset builder groups both statements by alias overlap and product key,
setting confidence `high` because CVE and purl match.
4. Conflict annotation records `status-mismatch` and retains both justifications;
Policy Engine uses this to explain why suppression cannot proceed without
policy override.
---
## 3. Conflict handling
Structured conflicts capture disagreements without mutating source statements.
```json
{
"type": "status-mismatch",
"vulnerabilityId": "CVE-2025-2222",
"productKey": "pkg:rpm/redhat/openssl@1.1.1w-12",
"statements": [
{
"observationId": "tenant:redhat:openvex:3",
"providerId": "redhat",
"status": "not_affected",
"justification": "component_not_present"
},
{
"observationId": "tenant:ubuntu:cyclonedx:12",
"providerId": "ubuntu",
"status": "affected",
"justification": "under_investigation"
}
],
"confidence": "medium",
"detectedAt": "2025-10-27T14:30:00Z"
}
```
Conflict classes (tracked via `EXCITITOR-LNM-21-003`):
- `status-mismatch` Different statuses for the same pair (affected vs
not_affected vs fixed vs under_investigation).
- `justification-divergence` Same status but incompatible justifications or
missing justification where policy requires it.
- `version-range-clash` Introduced/fixed ranges contradict each other.
- `non-joinable-overlap` Platform-scoped statements collide with package
statements; flagged as warning but retained.
- `metadata-gap` Missing provenance/signature field on specific statements.
Conflicts surface through:
- `/vex/linksets/{id}` APIs (`conflicts[]` payload).
- Console evidence panels (badges + drawer detail).
- CLI exports (`stella vex linkset …` planned in `CLI-LNM-22-002`).
- Metrics dashboards (`vex_linkset_conflicts_total{type}`).
---
## 4. AOC alignment
- **Raw-first** `content.raw` and `statements[]` mirror upstream input; no
derived consensus or suppression values are written by ingestion.
- **No merges** Each upstream statement persists independently; linksets refer
back via `observationId`.
- **Provenance mandatory** Missing signature or source metadata yields
`ERR_AOC_004`; ingestion blocks until connectors fix the feed.
- **Idempotent writes** Duplicate `(providerId, upstreamId, contentHash)`
results in a no-op; revisions append with a `supersedes` pointer.
- **Deterministic output** Correlator sorts identifiers, normalizes timestamps
(UTC ISO-8601), and hashes canonical JSON to generate stable linkset IDs.
- **Scope-aware** Tenant claims enforced on write/read; Authority scopes
`vex:ingest` / `vex:read` are required (see `AUTH-AOC-22-001`).
Violations raise `ERR_AOC_00x`, emit `aoc_violation_total`, and prevent the data
from landing downstream.
---
## 5. Downstream consumption
- **Policy Engine** Evaluates VEX evidence alongside advisory linksets to gate
suppression, severity downgrades, or explainability.
- **Console UI** Evidence panel renders VEX statements grouped by provider and
highlights conflicts or missing signatures.
- **CLI** Planned commands export observations/linksets for offline analysis
(`CLI-LNM-22-002`).
- **Offline Kit** Bundled snapshots keep VEX data aligned with advisory
observations for air-gapped parity.
- **Observability** Dashboards track ingestion latency, conflict counts, and
supersedes depth per provider.
New consumers must treat both collections as read-only and preserve deterministic
ordering when caching.
---
## 6. Validation & testing
- **Unit tests** (`StellaOps.Excititor.Core.Tests`) to cover schema guards,
deterministic linkset hashing, conflict classification, and supersedes
behaviour.
- **Mongo integration tests** (`StellaOps.Excititor.Storage.Mongo.Tests`) to
verify indexes, shard keys, and idempotent writes across tenants.
- **CLI smoke suites** (`stella vex observations`, `stella vex linksets`) for
JSON determinism and exit code coverage.
- **Replay determinism** Feed identical upstream payloads twice and ensure
observation/linkset hashes match across runs.
- **Offline kit verification** Validate VEX exports packaged in Offline Kit
snapshots against live service outputs.
- **Fixture refresh** Samples (`SAMPLES-LNM-22-002`) must include multi-source
conflicts and justification variants used by docs and UI tests.
---
## 7. Reviewer checklist
- Observation schema aligns with `EXCITITOR-LNM-21-001` once the schema lands;
update references as soon as the final contract is published.
- Linkset lifecycle covers correlation signals (alias graphs, product keys,
justification rules) and deterministic ID strategy.
- Conflict classes include status, justification, version range, platform overlap
scenarios.
- AOC guardrails called out with relevant error codes and Authority scopes.
- Downstream consumer list matches active APIs/CLI features (update when
`CLI-LNM-22-002` and WebService endpoints ship).
- Validation section references Core, Storage, CLI, and Offline test suites plus
fixture requirements.
- Imposed rule reminder retained at top.
Dependencies outstanding (2025-10-27): `EXCITITOR-LNM-21-001..005` and
`EXCITITOR-LNM-21-101..102` are still TODO; revisit this document once schemas,
APIs, and fixtures are implemented.