14 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			14 KiB
		
	
	
	
	
	
	
	
Aggregation-Only Contract Reference
The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents.
1. Purpose and Scope
- Defines the canonical behaviour for 
advisory_rawandvex_rawcollections and the linkset hints they may emit. - Applies to every ingestion runtime (
StellaOps.Concelier.*,StellaOps.Excititor.*), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance. - Complements the high-level architecture in Concelier and Authority enforcement documented in Authority Architecture.
 - Paired guidance: see the guard-rail checkpoints in AOC Guardrails and CLI usage that will land in 
/docs/cli/as part of Sprint 19 follow-up. 
2. Philosophy and Goals
- Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions.
 - Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores.
 - Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated.
 - Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs.
 
3. Contract Invariants
| # | Invariant | What it forbids or requires | Enforcement surfaces | 
|---|---|---|---|
| 1 | No derived severity at ingest | Reject top-level keys such as severity, cvss, effective_status, consensus_provider, risk_score. Raw upstream CVSS remains inside content.raw. | 
Mongo schema validator, AOCWriteGuard, Roslyn analyzer, stella aoc verify. | 
| 2 | No merges or opinionated dedupe | Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. | Repository interceptors, unit/fixture suites. | 
| 3 | Provenance is mandatory | source.*, upstream.*, and signature metadata must be present; missing provenance triggers ERR_AOC_004. | 
Schema validator, guard, CLI verifier. | 
| 4 | Idempotent upserts | Writes keyed by (vendor, upstream_id, content_hash) either no-op or insert a new revision with supersedes. Duplicate hashes map to the same document. | 
Repository guard, storage unique index, CI smoke tests. | 
| 5 | Append-only revisions | Updates create a new document with supersedes pointer; no in-place mutation of content. | 
Mongo schema (supersedes format), guard, data migration scripts. | 
| 6 | Linkset only | Ingestion may compute link hints (purls, cpes, IDs) to accelerate joins, but must not transform or infer severity or policy. | 
Linkset builders reviewed via fixtures and analyzers. | 
| 7 | Policy-only effective findings | Only Policy Engine identities can write effective_finding_*; ingestion callers receive ERR_AOC_006 if they attempt it. | 
Authority scopes, Policy Engine guard. | 
| 8 | Schema safety | Unknown top-level keys reject with ERR_AOC_007; timestamps use ISO 8601 UTC strings; tenant is required. | 
Mongo validator, JSON schema tests. | 
| 9 | Clock discipline | Collectors stamp fetched_at and received_at monotonically per batch to support reproducibility windows. | 
Collector contracts, QA fixtures. | 
4. Raw Schemas
4.1 advisory_raw
| Field | Type | Notes | 
|---|---|---|
_id | 
string | advisory_raw:{source}:{upstream_id}:{revision}; deterministic and tenant-scoped. | 
tenant | 
string | Required; injected by Authority middleware and asserted by schema validator. | 
source.vendor | 
string | Provider identifier (e.g., redhat, osv, ghsa). | 
source.stream | 
string | Connector stream name (csaf, osv, etc.). | 
source.api | 
string | Absolute URI of upstream document; stored for traceability. | 
source.collector_version | 
string | Semantic version of the collector. | 
upstream.upstream_id | 
string | Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID). | 
upstream.document_version | 
string | Upstream issued timestamp or revision string. | 
upstream.fetched_at / received_at | 
string | ISO 8601 UTC timestamps recorded by the collector. | 
upstream.content_hash | 
string | sha256: digest of the raw payload used for idempotency. | 
upstream.signature | 
object | Required structure storing present, format, key_id, sig; even unsigned payloads set present: false. | 
content.format | 
string | Source format (CSAF, OSV, etc.). | 
content.spec_version | 
string | Upstream spec version when known. | 
content.raw | 
object | Full upstream payload, untouched except for transport normalisation. | 
identifiers | 
object | Upstream identifiers (cve, ghsa, aliases, etc.) captured as provided (trimmed, order preserved, duplicates allowed). | 
linkset | 
object | Join hints (see section 4.3). | 
supersedes | 
string or null | Points to previous revision of same upstream doc when content hash changes. | 
4.2 vex_raw
| Field | Type | Notes | 
|---|---|---|
_id | 
string | vex_raw:{source}:{upstream_id}:{revision}. | 
tenant | 
string | Required; matches advisory collection requirements. | 
source.* | 
object | Same shape and requirements as advisory_raw. | 
upstream.* | 
object | Includes document_version, timestamps, content_hash, and signature. | 
content.format | 
string | Typically CycloneDX-VEX or CSAF-VEX. | 
content.raw | 
object | Entire upstream VEX payload. | 
identifiers.statements | 
array | Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins. | 
linkset | 
object | CVEs, GHSA IDs, and PURLs referenced in the document. | 
supersedes | 
string or null | Same convention as advisory documents. | 
4.3 Linkset Fields
purls: fully qualified Package URLs extracted from raw ranges or product nodes.cpes: Common Platform Enumerations when upstream docs provide them.aliases: Any alternate advisory identifiers present in the payload.references: Array of{ type, url }pairs pointing back to vendor advisories, patches, or exploits.reconciled_from: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable.
Canonicalisation rules:
- Package URLs are rendered in canonical form without qualifiers/subpaths (
pkg:type/namespace/name@version). - CPE values are normalised to the 2.3 binding (
cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*). - Connector mapping stages are responsible for the canonical form; ingestion trims whitespace but otherwise preserves the original order and duplicate entries so downstream policy can reason about upstream intent.
 
4.4 advisory_observations
advisory_observations is an immutable projection of the validated raw document used by Link‑Not‑Merge overlays. Fields mirror the JSON contract surfaced by StellaOps.Concelier.Models.Observations.AdvisoryObservation.
| Field | Type | Notes | 
|---|---|---|
_id | 
string | Deterministic observation id — {tenant}:{source.vendor}:{upstreamId}:{revision}. | 
tenant | 
string | Lower-case tenant identifier. | 
source.vendor / source.stream | 
string | Connector identity (e.g., vendor/redhat, ecosystem/osv). | 
source.api | 
string | Absolute URI the connector fetched from. | 
source.collectorVersion | 
string | Optional semantic version of the connector build. | 
upstream.upstream_id | 
string | Advisory identifier as issued by the provider (CVE, vendor ID, etc.). | 
upstream.document_version | 
string | Upstream revision/version string. | 
upstream.fetchedAt / upstream.receivedAt | 
datetime | UTC timestamps recorded by the connector. | 
upstream.contentHash | 
string | sha256: digest used for idempotency. | 
upstream.signature | 
object | {present, format?, keyId?, signature?} describing upstream signature material. | 
content.format / content.specVersion | 
string | Raw payload format metadata (CSAF, OSV, JSON, etc.). | 
content.raw | 
object | Full upstream document stored losslessly (Relaxed Extended JSON). | 
content.metadata | 
object | Optional connector-specific metadata (batch ids, hints). | 
linkset.aliases | 
array | Connector-supplied aliases (trimmed, order preserved, duplicates allowed). | 
linkset.purls | 
array | Connector-supplied PURLs (ingestion preserves order and duplicates). | 
linkset.cpes | 
array | Connector-supplied CPE URIs (trimmed, order preserved). | 
linkset.references | 
array | { type, url } pairs (trimmed; ingestion preserves order). | 
createdAt | 
datetime | Timestamp when Concelier persisted the observation. | 
attributes | 
object | Optional provenance attributes keyed by connector. | 
5. Error Model
| Code | Description | HTTP status | Surfaces | 
|---|---|---|---|
ERR_AOC_001 | 
Forbidden field detected (severity, cvss, effective data). | 400 | Ingestion APIs, CLI verifier, CI guard. | 
ERR_AOC_002 | 
Merge attempt detected (multiple upstream sources fused into one document). | 400 | Ingestion APIs, CLI verifier. | 
ERR_AOC_003 | 
Idempotency violation (duplicate without supersedes pointer). | 409 | Repository guard, Mongo unique index, CLI verifier. | 
ERR_AOC_004 | 
Missing provenance metadata (source, upstream, signature). | 
422 | Schema validator, ingestion endpoints. | 
ERR_AOC_005 | 
Signature or checksum mismatch. | 422 | Collector validation, CLI verifier. | 
ERR_AOC_006 | 
Attempt to persist derived findings from ingestion context. | 403 | Policy engine guard, Authority scopes. | 
ERR_AOC_007 | 
Unknown top-level fields (schema violation). | 400 | Mongo validator, CLI verifier. | 
Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance.
6. API and Tooling Interfaces
- Concelier ingestion (
StellaOps.Concelier.WebService)POST /ingest/advisory: accepts upstream payload metadata; server-side guard constructs and persists raw document.GET /advisories/raw/{id}and filterable list endpoints expose raw documents for debugging and offline analysis.POST /aoc/verify: runs guard checks over recent documents and returns summary totals plus first violations.
 - Excititor ingestion (
StellaOps.Excititor.WebService) mirrors the same surface for VEX documents. - CLI workflows (
stella aoc verify,stella sources ingest --dry-run) surface pre-flight verification; documentation will live in/docs/cli/alongside Sprint 19 CLI updates. - Authority scopes: new 
advisory:ingest,advisory:read,vex:ingest, andvex:readscopes enforce least privilege; see Authority Architecture for scope grammar. 
7. Idempotency and Supersedes Rules
- Compute 
content_hashbefore any transformation; use it with(source.vendor, upstream.upstream_id)to detect duplicates. - If a document with the same hash already exists, skip the write and log a no-op.
 - When a new hash arrives for an existing upstream document, insert a new record and set 
supersedesto the previous_id. - Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert.
 - Expose idempotency counters via metrics (
ingestion_write_total{result=ok|noop}) to catch regressions early. 
8. Migration Playbook
- Freeze ingestion writes except for raw pass-through paths while deploying schema validators.
 - Snapshot existing collections to 
_backup_*for rollback safety. - Strip forbidden fields from historical documents into a temporary 
advisory_view_legacyused only during transition. - Enable Mongo JSON schema validators for 
advisory_rawandvex_raw. - Run collectors in 
--dry-runto confirm only allowed keys appear; fix violations before lifting the freeze. - Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream.
 - Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting.
 - Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end.
 
9. Observability and Diagnostics
- Metrics: 
ingestion_write_total{result=ok|reject},aoc_violation_total{code},ingestion_signature_verified_total{result},ingestion_latency_seconds,advisory_revision_count. - Traces: spans 
ingest.fetch,ingest.transform,ingest.write, andaoc.guardwith correlation IDs shared across workers. - Logs: structured entries must include 
tenant,source.vendor,upstream.upstream_id,content_hash, andviolation_codewhen applicable. - Dashboards: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant.
 
10. Security and Tenancy Checklist
- Enforce Authority scopes (
advisory:ingest,vex:ingest,advisory:read,vex:read) and require tenant claims on every request. - Maintain pinned trust stores for signature verification; capture verification result in metrics and logs.
 - Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence.
 - Validate that Policy Engine remains the only identity with permission to write 
effective_finding_*documents. - Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity.
 - Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation.
 
11. Compliance Checklist
- Deterministic guard enabled in Concelier and Excititor repositories.
 - Mongo validators deployed for 
advisory_rawandvex_raw. - Authority scopes and tenant enforcement verified via integration tests.
 - CLI and CI pipelines run 
stella aoc verifyagainst seeded snapshots. - Observability feeds (metrics, logs, traces) wired into dashboards with alerts.
 - Offline kit instructions updated to bundle validators and verifier tooling.
 - Security review recorded covering ingestion, tenancy, and rollback procedures.
 
Last updated: 2025-10-27 (Sprint 19).