Files
git.stella-ops.org/docs/advisories/aggregation.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

219 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Advisory Observations & Linksets
> Imposed rule: Work of this type or tasks of this type on this component must also
> be applied everywhere else it should be applied.
The Link-Not-Merge (LNM) initiative replaces the legacy "merge" pipeline with
immutable observations and correlation linksets. This guide explains how
Concelier ingests advisory statements, preserves upstream truth, and produces
linksets that downstream services (Policy Engine, Vuln Explorer, Console) can
use without collapsing sources together.
---
## 1. Model overview
### 1.1 Observation lifecycle
1. **Ingest** Connectors fetch upstream payloads (CSAF, OSV, vendor feeds),
validate signatures, and drop any derived fields prohibited by the
Aggregation-Only Contract (AOC).
2. **Persist** Concelier writes immutable `advisory_observations` scoped by
`tenant`, `(source.vendor, upstreamId)`, and `contentHash`. Supersedes chains
capture revisions without mutating history.
3. **Expose** WebService surfaces paged/read APIs; Offline Kit snapshots
include the same documents for air-gapped installs.
Observation schema highlights:
```text
observationId = {tenant}:{source.vendor}:{upstreamId}:{revision}
tenant, source{vendor, stream, api, collectorVersion}
upstream{upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format, keyId, signature}}
content{format, specVersion, raw}
identifiers{cve?, ghsa?, aliases[], osvIds[]}
linkset{purls[], cpes[], aliases[], references[], conflicts[]?}
createdAt, attributes{batchId?, replayCursor?}
```
- **Immutable raw** (`content.raw`) mirrors upstream payloads exactly.
- **Provenance** (`source.*`, `upstream.*`) satisfies AOC guardrails and enables
cryptographic attestations.
- **Identifiers** retain lossless extracts (CVE, GHSA, vendor aliases) that seed
linksets.
- **Linkset** captures join hints but never merges or adds derived severity.
### 1.2 Linkset lifecycle
Linksets correlate observations that describe the same vulnerable product while
keeping each source intact.
1. **Seed** Observations emit normalized identifiers (`purl`, `cpe`,
`alias`) during ingestion.
2. **Correlate** Linkset builder groups observations by tenant, product
coordinates, and equivalence signals (PURL alias graph, CVE overlap, CVSS
vector equality, fuzzy titles).
3. **Annotate** Detected conflicts (severity disagreements, affected-range
mismatch, incompatible references) are recorded with structured payloads and
preserved for UI/API export.
4. **Persist** Results land in `advisory_linksets` with deterministic IDs
(`linksetId = {tenant}:{hash(aliases+purls+seedIds)}`) and append-only history
for reproducibility.
Linksets never suppress or prefer one source; they provide aligned evidence so
other services can apply policy.
---
## 2. Observation vs. linkset
- **Purpose**
- Observation: Immutable record per vendor and revision.
- Linkset: Correlates observations that share product identity.
- **Mutation**
- Observation: Append-only via supersedes chain.
- Linkset: Rebuilt deterministically from canonical signals.
- **Allowed fields**
- Observation: Raw payload, provenance, identifiers, join hints.
- Linkset: Observation references, normalized product metadata, conflicts.
- **Forbidden fields**
- Observation: Derived severity, policy status, opinionated dedupe.
- Linkset: Derived severity (conflicts recorded but unresolved).
- **Consumers**
- Observation: Evidence API, Offline Kit, CLI exports.
- Linkset: Policy Engine overlay, UI evidence panel, Vuln Explorer.
### 2.1 Example sequence
1. Red Hat PSIRT publishes RHSA-2025:1234 for OpenSSL; Concelier inserts an
observation for vendor `redhat` with `pkg:rpm/redhat/openssl@1.1.1w-12`.
2. NVD issues CVE-2025-0001; a second observation is inserted for vendor `nvd`.
3. Linkset builder runs, groups the two observations, records alias and PURL
overlap, and flags a CVSS disagreement (`7.5` vs `7.2`).
4. Policy Engine reads the linkset, recognises the severity variance, and relies
on configured rules to decide the effective output.
---
## 3. Conflict handling
Conflicts record disagreements without altering source payloads. The builder
emits structured entries:
```json
{
"type": "severity-mismatch",
"field": "cvss.baseScore",
"observations": [
{
"source": "redhat",
"value": "7.5",
"vector": "AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N"
},
{
"source": "nvd",
"value": "7.2",
"vector": "AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N"
}
],
"confidence": "medium",
"detectedAt": "2025-10-27T14:00:00Z"
}
```
Supported conflict classes:
- `severity-mismatch` CVSS or qualitative severities differ.
- `affected-range-divergence` Product ranges, fixed versions, or platforms
disagree.
- `statement-disagreement` One observation declares `not_affected` while
another states `affected`.
- `reference-clash` URL or classifier collisions (for example, exploit URL vs
conflicting advisory).
- `alias-inconsistency` Aliases map to different canonical IDs (GHSA vs CVE).
- `metadata-gap` Required provenance missing on one source; logged as a
warning.
Conflict surfaces:
- WebService endpoints (`GET /advisories/linksets/{id}``conflicts[]`).
- UI evidence panel chips and conflict badges.
- CLI exports (JSON/OSV) exposed through LNM commands.
- Observability metrics (`advisory_linkset_conflicts_total{type}`).
---
## 4. AOC alignment
Observations and linksets must satisfy Aggregation-Only Contract invariants:
- **No derived severity** `content.raw` may include upstream severity, but the
observation body never injects or edits severity.
- **No merges** Each upstream document stays separate; linksets reference
observations via deterministic IDs.
- **Provenance mandatory** Missing `signature` or `source` metadata is an AOC
violation (`ERR_AOC_004`).
- **Idempotent writes** Duplicate `contentHash` yields a no-op; supersedes
pointer captures new revisions.
- **Deterministic output** Linkset builder sorts keys, normalizes timestamps
(UTC ISO-8601), and uses canonical JSON hashing.
Violations trigger guard errors (`ERR_AOC_00x`), emit `aoc_violation_total`
metrics, and block persistence until corrected.
---
## 5. Downstream consumption
- **Policy Engine** Computes effective severity and risk overlays from linkset
evidence and conflicts.
- **Console UI** Renders per-source statements, signed hashes, and conflict
banners inside the evidence panel.
- **CLI (`stella advisories linkset …`)** Exports observations and linksets as
JSON or OSV for offline triage.
- **Offline Kit** Shipping snapshots include observation and linkset
collections for air-gap parity.
- **Observability** Dashboards track ingestion latency, conflict counts, and
supersedes depth.
When adding new consumers, ensure they honour append-only semantics and do not
mutate observation or linkset collections.
---
## 6. Validation & testing
- **Unit tests** (`StellaOps.Concelier.Core.Tests`) validate schema guards,
deterministic linkset hashing, conflict detection fixtures, and supersedes
chains.
- **Mongo integration tests** (`StellaOps.Concelier.Storage.Mongo.Tests`) verify
indexes and idempotent writes under concurrency.
- **CLI smoke suites** confirm `stella advisories observations` and `stella
advisories linksets` export stable JSON.
- **Determinism checks** replay identical upstream payloads and assert that the
resulting observation and linkset documents match byte for byte.
- **Offline kit verification** simulates air-gapped bootstrap to confirm that
snapshots align with live data.
Add fixtures whenever a new conflict type or correlation signal is introduced.
Ensure canonical JSON serialization remains stable across .NET runtime updates.
---
## 7. Reviewer checklist
- Observation schema segment matches the latest `StellaOps.Concelier.Models`
contract.
- Linkset lifecycle covers correlation signals, conflict classes, and
deterministic IDs.
- AOC invariants are explicitly called out with violation codes.
- Examples include multi-source correlation plus conflict annotation.
- Downstream consumer guidance reflects active APIs and CLI features.
- Testing section lists required suites (Core, Storage, CLI, Offline).
- Imposed rule reminder is present at the top of the document.
Confirmed against Concelier Link-Not-Merge tasks:
`CONCELIER-LNM-21-001..005`, `CONCELIER-LNM-21-101..103`,
`CONCELIER-LNM-21-201..203`.