15 KiB
15 KiB
Aggregation-Only Contract Reference
The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents.
1. Purpose and Scope
- Defines the canonical behaviour for
advisory_rawandvex_rawcollections and the linkset hints they may emit. - Applies to every ingestion runtime (
StellaOps.Concelier.*,StellaOps.Excititor.*), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance. - Complements the high-level architecture in Concelier and Authority enforcement documented in Authority Architecture.
- Paired guidance: see the guard-rail checkpoints in AOC Guardrails, the implementation reference in AOC Guard Library, and CLI usage that will land in
/docs/modules/cli/guides/as part of Sprint 19 follow-up.
2. Philosophy and Goals
- Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions.
- Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores.
- Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated.
- Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs.
3. Contract Invariants
| # | Invariant | What it forbids or requires | Enforcement surfaces |
|---|---|---|---|
| 1 | No derived severity at ingest | Reject top-level keys such as severity, cvss, effective_status, consensus_provider, risk_score. Raw upstream CVSS remains inside content.raw. |
Mongo schema validator, AOCWriteGuard, Roslyn analyzer, stella aoc verify. |
| 2 | No merges or opinionated dedupe | Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. | Repository interceptors, unit/fixture suites. |
| 3 | Provenance is mandatory | source.*, upstream.*, and signature metadata must be present; missing provenance triggers ERR_AOC_004. |
Schema validator, guard, CLI verifier. |
| 4 | Idempotent upserts | Writes keyed by (vendor, upstream_id, content_hash) either no-op or insert a new revision with supersedes. Duplicate hashes map to the same document. |
Repository guard, storage unique index, CI smoke tests. |
| 5 | Append-only revisions | Updates create a new document with supersedes pointer; no in-place mutation of content. |
Mongo schema (supersedes format), guard, data migration scripts. |
| 6 | Linkset only | Ingestion may compute link hints (purls, cpes, IDs) to accelerate joins, but must not transform or infer severity or policy. Observations now persist both canonical linksets (for indexed queries) and raw linksets (preserving upstream order/duplicates) so downstream policy can decide how to normalise. |
Linkset builders reviewed via fixtures/analyzers; raw-vs-canonical parity covered by observation fixtures. |
| 7 | Policy-only effective findings | Only Policy Engine identities can write effective_finding_*; ingestion callers receive ERR_AOC_006 if they attempt it. |
Authority scopes, Policy Engine guard. |
| 8 | Schema safety | Unknown top-level keys reject with ERR_AOC_007; timestamps use ISO 8601 UTC strings; tenant is required. |
Mongo validator, JSON schema tests. |
| 9 | Clock discipline | Collectors stamp fetched_at and received_at monotonically per batch to support reproducibility windows. |
Collector contracts, QA fixtures. |
4. Raw Schemas
4.1 advisory_raw
| Field | Type | Notes |
|---|---|---|
_id |
string | advisory_raw:{source}:{upstream_id}:{revision}; deterministic and tenant-scoped. |
tenant |
string | Required; injected by Authority middleware and asserted by schema validator. |
source.vendor |
string | Provider identifier (e.g., redhat, osv, ghsa). |
source.stream |
string | Connector stream name (csaf, osv, etc.). |
source.api |
string | Absolute URI of upstream document; stored for traceability. |
source.collector_version |
string | Semantic version of the collector. |
upstream.upstream_id |
string | Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID). |
upstream.document_version |
string | Upstream issued timestamp or revision string. |
upstream.fetched_at / received_at |
string | ISO 8601 UTC timestamps recorded by the collector. |
upstream.content_hash |
string | sha256: digest of the raw payload used for idempotency. |
upstream.signature |
object | Required structure storing present, format, key_id, sig; even unsigned payloads set present: false. |
content.format |
string | Source format (CSAF, OSV, etc.). |
content.spec_version |
string | Upstream spec version when known. |
content.raw |
object | Full upstream payload, untouched except for transport normalisation. |
identifiers |
object | Upstream identifiers (cve, ghsa, aliases, etc.) captured as provided (trimmed, order preserved, duplicates allowed). |
linkset |
object | Join hints (see section 4.3). |
supersedes |
string or null | Points to previous revision of same upstream doc when content hash changes. |
4.2 vex_raw
| Field | Type | Notes |
|---|---|---|
_id |
string | vex_raw:{source}:{upstream_id}:{revision}. |
tenant |
string | Required; matches advisory collection requirements. |
source.* |
object | Same shape and requirements as advisory_raw. |
upstream.* |
object | Includes document_version, timestamps, content_hash, and signature. |
content.format |
string | Typically CycloneDX-VEX or CSAF-VEX. |
content.raw |
object | Entire upstream VEX payload. |
identifiers.statements |
array | Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins. |
linkset |
object | CVEs, GHSA IDs, and PURLs referenced in the document. |
supersedes |
string or null | Same convention as advisory documents. |
4.3 Linkset Fields
purls: fully qualified Package URLs extracted from raw ranges or product nodes.cpes: Common Platform Enumerations when upstream docs provide them.aliases: Any alternate advisory identifiers present in the payload.references: Array of{ type, url }pairs pointing back to vendor advisories, patches, or exploits.reconciled_from: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable.
Canonicalisation rules:
- Package URLs are rendered in canonical form without qualifiers/subpaths (
pkg:type/namespace/name@version). - CPE values are normalised to the 2.3 binding (
cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*). - Connector mapping stages are responsible for the canonical form; ingestion trims whitespace but otherwise preserves the original order and duplicate entries so downstream policy can reason about upstream intent.
4.4 advisory_observations
advisory_observations is an immutable projection of the validated raw document used by Link‑Not‑Merge overlays. Fields mirror the JSON contract surfaced by StellaOps.Concelier.Models.Observations.AdvisoryObservation.
| Field | Type | Notes |
|---|---|---|
_id |
string | Deterministic observation id — {tenant}:{source.vendor}:{upstreamId}:{revision}. |
tenant |
string | Lower-case tenant identifier. |
source.vendor / source.stream |
string | Connector identity (e.g., vendor/redhat, ecosystem/osv). |
source.api |
string | Absolute URI the connector fetched from. |
source.collectorVersion |
string | Optional semantic version of the connector build. |
upstream.upstream_id |
string | Advisory identifier as issued by the provider (CVE, vendor ID, etc.). |
upstream.document_version |
string | Upstream revision/version string. |
upstream.fetchedAt / upstream.receivedAt |
datetime | UTC timestamps recorded by the connector. |
upstream.contentHash |
string | sha256: digest used for idempotency. |
upstream.signature |
object | {present, format?, keyId?, signature?} describing upstream signature material. |
content.format / content.specVersion |
string | Raw payload format metadata (CSAF, OSV, JSON, etc.). |
content.raw |
object | Full upstream document stored losslessly (Relaxed Extended JSON). |
content.metadata |
object | Optional connector-specific metadata (batch ids, hints). |
linkset.aliases |
array | Connector-supplied aliases (trimmed, order preserved, duplicates allowed). |
linkset.purls |
array | Connector-supplied PURLs (ingestion preserves order and duplicates). |
linkset.cpes |
array | Connector-supplied CPE URIs (trimmed, order preserved). |
linkset.references |
array | { type, url } pairs (trimmed; ingestion preserves order). |
createdAt |
datetime | Timestamp when Concelier persisted the observation. |
attributes |
object | Optional provenance attributes keyed by connector. |
5. Error Model
| Code | Description | HTTP status | Surfaces |
|---|---|---|---|
ERR_AOC_001 |
Forbidden field detected (severity, cvss, effective data). | 400 | Ingestion APIs, CLI verifier, CI guard. |
ERR_AOC_002 |
Merge attempt detected (multiple upstream sources fused into one document). | 400 | Ingestion APIs, CLI verifier. |
ERR_AOC_003 |
Idempotency violation (duplicate without supersedes pointer). | 409 | Repository guard, Mongo unique index, CLI verifier. |
ERR_AOC_004 |
Missing provenance metadata (source, upstream, signature). |
422 | Schema validator, ingestion endpoints. |
ERR_AOC_005 |
Signature or checksum mismatch. | 422 | Collector validation, CLI verifier. |
ERR_AOC_006 |
Attempt to persist derived findings from ingestion context. | 403 | Policy engine guard, Authority scopes. |
ERR_AOC_007 |
Unknown top-level fields (schema violation). | 400 | Mongo validator, CLI verifier. |
Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance.
6. API and Tooling Interfaces
- Concelier ingestion (
StellaOps.Concelier.WebService)POST /ingest/advisory: accepts upstream payload metadata; server-side guard constructs and persists raw document.GET /advisories/raw/{id}and filterable list endpoints expose raw documents for debugging and offline analysis.POST /aoc/verify: runs guard checks over recent documents and returns summary totals plus first violations.
- Excititor ingestion (
StellaOps.Excititor.WebService) mirrors the same surface for VEX documents. - CLI workflows (
stella aoc verify,stella sources ingest --dry-run) surface pre-flight verification; documentation will live in/docs/modules/cli/guides/alongside Sprint 19 CLI updates. - Authority scopes: new
advisory:ingest,advisory:read,vex:ingest, andvex:readscopes enforce least privilege; see Authority Architecture for scope grammar.
7. Idempotency and Supersedes Rules
- Compute
content_hashbefore any transformation; use it with(source.vendor, upstream.upstream_id)to detect duplicates. - If a document with the same hash already exists, skip the write and log a no-op.
- When a new hash arrives for an existing upstream document, insert a new record and set
supersedesto the previous_id. - Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert.
- Expose idempotency counters via metrics (
ingestion_write_total{result=ok|noop}) to catch regressions early.
8. Migration Playbook
- Freeze ingestion writes except for raw pass-through paths while deploying schema validators.
- Snapshot existing collections to
_backup_*for rollback safety. - Strip forbidden fields from historical documents into a temporary
advisory_view_legacyused only during transition. - Enable Mongo JSON schema validators for
advisory_rawandvex_raw. - Run collectors in
--dry-runto confirm only allowed keys appear; fix violations before lifting the freeze. - Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream.
- Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting.
- Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end.
9. Observability and Diagnostics
- Metrics:
ingestion_write_total{result=ok|reject},aoc_violation_total{code},ingestion_signature_verified_total{result},ingestion_latency_seconds,advisory_revision_count. - Traces: spans
ingest.fetch,ingest.transform,ingest.write, andaoc.guardwith correlation IDs shared across workers. - Logs: structured entries must include
tenant,source.vendor,upstream.upstream_id,content_hash, andviolation_codewhen applicable. - Dashboards: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant.
10. Security and Tenancy Checklist
- Enforce Authority scopes (
advisory:ingest,vex:ingest,advisory:read,vex:read) and require tenant claims on every request. - Maintain pinned trust stores for signature verification; capture verification result in metrics and logs.
- Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence.
- Validate that Policy Engine remains the only identity with permission to write
effective_finding_*documents. - Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity.
- Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation.
11. Compliance Checklist
- Deterministic guard enabled in Concelier and Excititor repositories.
- Mongo validators deployed for
advisory_rawandvex_raw. - Authority scopes and tenant enforcement verified via integration tests.
- CLI and CI pipelines run
stella aoc verifyagainst seeded snapshots. - Observability feeds (metrics, logs, traces) wired into dashboards with alerts.
- Offline kit instructions updated to bundle validators and verifier tooling.
- Security review recorded covering ingestion, tenancy, and rollback procedures.
Last updated: 2025-10-27 (Sprint 19).