# Ingestion, aggregation, and linksets StellaOps ingestion is governed by the Aggregation-Only Contract (AOC). The rules enforce deterministic, policy-neutral collection of advisory and VEX data. AOC core rules - Ingestion writes raw facts only. No derived severity, consensus, or policy hints. - No merges. Each upstream document is stored independently. - Provenance is mandatory: source metadata, content hashes, signature fields. - Idempotent writes keyed by vendor + upstream id + content hash. - Append-only revisions via supersedes pointers. - Deterministic output: canonical JSON, UTC timestamps, stable ordering. Ingestion pipeline (high level) 1) Fetch upstream payload. 2) Validate signature and schema. 3) Normalize metadata (timestamps, ids, content hash). 4) Persist raw document (append-only). 5) Emit observation (immutable record). 6) Build linksets (deterministic correlation). 7) Expose via API and Offline Kit snapshots. Advisory observations (Concelier) - observationId format: {tenant}:{source.vendor}:{upstreamId}:{revision}. - Key fields: tenant, source, upstream, content.raw, identifiers, linkset hints. - Supersedes pointer links revisions without mutation. VEX observations (Excititor) - observationId format: {tenant}:{providerId}:{upstreamId}:{revision}. - Raw VEX payload plus normalized statement tuples. - Linkset hints include purls, cpes, aliases, references. Linksets and conflicts - Linksets correlate observations by product identity while preserving sources. - Deterministic ids are hashes of sorted identifiers and observation references. - Conflicts are recorded, not resolved. Common conflict types: - severity mismatch - affected range divergence - status or justification mismatch - alias inconsistency - metadata gap (missing provenance) Observation example (short) ```json { "observationId": "tenant-a:redhat:CVE-2025-0001:1", "tenant": "tenant-a", "source": { "vendor": "redhat", "stream": "csaf" }, "upstream": { "upstreamId": "CVE-2025-0001", "documentVersion": "2025-01-10", "contentHash": "sha256:1111...", "signature": { "present": true } }, "identifiers": { "cve": "CVE-2025-0001", "aliases": ["RHSA-2025:1234"] }, "linkset": { "purls": ["pkg:rpm/redhat/openssl@1.1.1w-12"] } } ``` Deterministic linkset id - Build a canonical string with sorted identifiers and observation ids. - linksetId = sha256(tenant + "|" + join(sorted(purls)) + "|" + join(sorted(observationIds))) Linkset example (short) ```json { "linksetId": "tenant-a:sha256:2222...", "observations": ["tenant-a:redhat:CVE-2025-0001:1", "tenant-a:nvd:CVE-2025-0001:3"], "purls": ["pkg:rpm/redhat/openssl@1.1.1w-12"], "conflicts": [{ "type": "severity-mismatch" }] } ``` Idempotency and supersedes - Same content hash results in a no-op. - New content hash creates a new observation with supersedes set. - Supersedes chains are append-only and acyclic. AOC error model - ERR_AOC_001: forbidden derived fields detected. - ERR_AOC_002: merge attempt detected. - ERR_AOC_003: idempotency violation. - ERR_AOC_004: missing provenance. - ERR_AOC_005: signature or checksum mismatch. - ERR_AOC_006: derived findings write attempt. - ERR_AOC_007: schema violation. Downstream consumers - Policy Engine applies rules and produces effective findings. - Console and CLI render evidence panels and conflicts. - Offline Kit bundles observations and linksets for air-gapped parity. Validation and tests - Schema validators and guard libraries enforce AOC rules. - Unit and integration tests validate idempotency and linkset hashes. - CLI verifier and offline kit checks confirm determinism. Related references - docs/ingestion/aggregation-only-contract.md - docs/aoc/aoc-guardrails.md - docs2/ingestion/aoc-guardrails.md - ingestion/backfill.md - docs/advisories/aggregation.md - docs/vex/aggregation.md