Files
git.stella-ops.org/docs/ingestion/aggregation-only-contract.md
master 4e3e575db5 feat: Implement console session management with tenant and profile handling
- Add ConsoleSessionStore for managing console session state including tenants, profile, and token information.
- Create OperatorContextService to manage operator context for orchestrator actions.
- Implement OperatorMetadataInterceptor to enrich HTTP requests with operator context metadata.
- Develop ConsoleProfileComponent to display user profile and session details, including tenant information and access tokens.
- Add corresponding HTML and SCSS for ConsoleProfileComponent to enhance UI presentation.
- Write unit tests for ConsoleProfileComponent to ensure correct rendering and functionality.
2025-10-28 09:59:09 +02:00

14 KiB
Raw Blame History

Aggregation-Only Contract Reference

The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents.

1. Purpose and Scope

  • Defines the canonical behaviour for advisory_raw and vex_raw collections and the linkset hints they may emit.
  • Applies to every ingestion runtime (StellaOps.Concelier.*, StellaOps.Excititor.*), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance.
  • Complements the high-level architecture in Concelier and Authority enforcement documented in Authority Architecture.
  • Paired guidance: see the guard-rail checkpoints in AOC Guardrails and CLI usage that will land in /docs/cli/ as part of Sprint 19 follow-up.

2. Philosophy and Goals

  • Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions.
  • Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores.
  • Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated.
  • Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs.

3. Contract Invariants

# Invariant What it forbids or requires Enforcement surfaces
1 No derived severity at ingest Reject top-level keys such as severity, cvss, effective_status, consensus_provider, risk_score. Raw upstream CVSS remains inside content.raw. Mongo schema validator, AOCWriteGuard, Roslyn analyzer, stella aoc verify.
2 No merges or opinionated dedupe Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. Repository interceptors, unit/fixture suites.
3 Provenance is mandatory source.*, upstream.*, and signature metadata must be present; missing provenance triggers ERR_AOC_004. Schema validator, guard, CLI verifier.
4 Idempotent upserts Writes keyed by (vendor, upstream_id, content_hash) either no-op or insert a new revision with supersedes. Duplicate hashes map to the same document. Repository guard, storage unique index, CI smoke tests.
5 Append-only revisions Updates create a new document with supersedes pointer; no in-place mutation of content. Mongo schema (supersedes format), guard, data migration scripts.
6 Linkset only Ingestion may compute link hints (purls, cpes, IDs) to accelerate joins, but must not transform or infer severity or policy. Linkset builders reviewed via fixtures and analyzers.
7 Policy-only effective findings Only Policy Engine identities can write effective_finding_*; ingestion callers receive ERR_AOC_006 if they attempt it. Authority scopes, Policy Engine guard.
8 Schema safety Unknown top-level keys reject with ERR_AOC_007; timestamps use ISO 8601 UTC strings; tenant is required. Mongo validator, JSON schema tests.
9 Clock discipline Collectors stamp fetched_at and received_at monotonically per batch to support reproducibility windows. Collector contracts, QA fixtures.

4. Raw Schemas

4.1 advisory_raw

Field Type Notes
_id string advisory_raw:{source}:{upstream_id}:{revision}; deterministic and tenant-scoped.
tenant string Required; injected by Authority middleware and asserted by schema validator.
source.vendor string Provider identifier (e.g., redhat, osv, ghsa).
source.stream string Connector stream name (csaf, osv, etc.).
source.api string Absolute URI of upstream document; stored for traceability.
source.collector_version string Semantic version of the collector.
upstream.upstream_id string Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID).
upstream.document_version string Upstream issued timestamp or revision string.
upstream.fetched_at / received_at string ISO 8601 UTC timestamps recorded by the collector.
upstream.content_hash string sha256: digest of the raw payload used for idempotency.
upstream.signature object Required structure storing present, format, key_id, sig; even unsigned payloads set present: false.
content.format string Source format (CSAF, OSV, etc.).
content.spec_version string Upstream spec version when known.
content.raw object Full upstream payload, untouched except for transport normalisation.
identifiers object Normalised identifiers (cve, ghsa, aliases, etc.) derived losslessly from raw content.
linkset object Join hints (see section 4.3).
supersedes string or null Points to previous revision of same upstream doc when content hash changes.

4.2 vex_raw

Field Type Notes
_id string vex_raw:{source}:{upstream_id}:{revision}.
tenant string Required; matches advisory collection requirements.
source.* object Same shape and requirements as advisory_raw.
upstream.* object Includes document_version, timestamps, content_hash, and signature.
content.format string Typically CycloneDX-VEX or CSAF-VEX.
content.raw object Entire upstream VEX payload.
identifiers.statements array Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins.
linkset object CVEs, GHSA IDs, and PURLs referenced in the document.
supersedes string or null Same convention as advisory documents.

4.3 Linkset Fields

  • purls: fully qualified Package URLs extracted from raw ranges or product nodes.
  • cpes: Common Platform Enumerations when upstream docs provide them.
  • aliases: Any alternate advisory identifiers present in the payload.
  • references: Array of { type, url } pairs pointing back to vendor advisories, patches, or exploits.
  • reconciled_from: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable.

Canonicalisation rules:

  • Package URLs are rendered in canonical form without qualifiers/subpaths (pkg:type/namespace/name@version).
  • CPE values are normalised to the 2.3 binding (cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*).

4.4 advisory_observations

advisory_observations is an immutable projection of the validated raw document used by LinkNotMerge overlays. Fields mirror the JSON contract surfaced by StellaOps.Concelier.Models.Observations.AdvisoryObservation.

Field Type Notes
_id string Deterministic observation id — {tenant}:{source.vendor}:{upstreamId}:{revision}.
tenant string Lower-case tenant identifier.
source.vendor / source.stream string Connector identity (e.g., vendor/redhat, ecosystem/osv).
source.api string Absolute URI the connector fetched from.
source.collectorVersion string Optional semantic version of the connector build.
upstream.upstream_id string Advisory identifier as issued by the provider (CVE, vendor ID, etc.).
upstream.document_version string Upstream revision/version string.
upstream.fetchedAt / upstream.receivedAt datetime UTC timestamps recorded by the connector.
upstream.contentHash string sha256: digest used for idempotency.
upstream.signature object {present, format?, keyId?, signature?} describing upstream signature material.
content.format / content.specVersion string Raw payload format metadata (CSAF, OSV, JSON, etc.).
content.raw object Full upstream document stored losslessly (Relaxed Extended JSON).
content.metadata object Optional connector-specific metadata (batch ids, hints).
linkset.aliases array Normalized aliases (lower-case, sorted).
linkset.purls array Normalized PURLs extracted from the document.
linkset.cpes array Normalized CPE URIs.
linkset.references array { type, url } pairs (type lower-case).
createdAt datetime Timestamp when Concelier persisted the observation.
attributes object Optional provenance attributes keyed by connector.

5. Error Model

Code Description HTTP status Surfaces
ERR_AOC_001 Forbidden field detected (severity, cvss, effective data). 400 Ingestion APIs, CLI verifier, CI guard.
ERR_AOC_002 Merge attempt detected (multiple upstream sources fused into one document). 400 Ingestion APIs, CLI verifier.
ERR_AOC_003 Idempotency violation (duplicate without supersedes pointer). 409 Repository guard, Mongo unique index, CLI verifier.
ERR_AOC_004 Missing provenance metadata (source, upstream, signature). 422 Schema validator, ingestion endpoints.
ERR_AOC_005 Signature or checksum mismatch. 422 Collector validation, CLI verifier.
ERR_AOC_006 Attempt to persist derived findings from ingestion context. 403 Policy engine guard, Authority scopes.
ERR_AOC_007 Unknown top-level fields (schema violation). 400 Mongo validator, CLI verifier.

Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance.

6. API and Tooling Interfaces

  • Concelier ingestion (StellaOps.Concelier.WebService)
    • POST /ingest/advisory: accepts upstream payload metadata; server-side guard constructs and persists raw document.
    • GET /advisories/raw/{id} and filterable list endpoints expose raw documents for debugging and offline analysis.
    • POST /aoc/verify: runs guard checks over recent documents and returns summary totals plus first violations.
  • Excititor ingestion (StellaOps.Excititor.WebService) mirrors the same surface for VEX documents.
  • CLI workflows (stella aoc verify, stella sources ingest --dry-run) surface pre-flight verification; documentation will live in /docs/cli/ alongside Sprint 19 CLI updates.
  • Authority scopes: new advisory:ingest, advisory:read, vex:ingest, and vex:read scopes enforce least privilege; see Authority Architecture for scope grammar.

7. Idempotency and Supersedes Rules

  1. Compute content_hash before any transformation; use it with (source.vendor, upstream.upstream_id) to detect duplicates.
  2. If a document with the same hash already exists, skip the write and log a no-op.
  3. When a new hash arrives for an existing upstream document, insert a new record and set supersedes to the previous _id.
  4. Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert.
  5. Expose idempotency counters via metrics (ingestion_write_total{result=ok|noop}) to catch regressions early.

8. Migration Playbook

  1. Freeze ingestion writes except for raw pass-through paths while deploying schema validators.
  2. Snapshot existing collections to _backup_* for rollback safety.
  3. Strip forbidden fields from historical documents into a temporary advisory_view_legacy used only during transition.
  4. Enable Mongo JSON schema validators for advisory_raw and vex_raw.
  5. Run collectors in --dry-run to confirm only allowed keys appear; fix violations before lifting the freeze.
  6. Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream.
  7. Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting.
  8. Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end.

9. Observability and Diagnostics

  • Metrics: ingestion_write_total{result=ok|reject}, aoc_violation_total{code}, ingestion_signature_verified_total{result}, ingestion_latency_seconds, advisory_revision_count.
  • Traces: spans ingest.fetch, ingest.transform, ingest.write, and aoc.guard with correlation IDs shared across workers.
  • Logs: structured entries must include tenant, source.vendor, upstream.upstream_id, content_hash, and violation_code when applicable.
  • Dashboards: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant.

10. Security and Tenancy Checklist

  • Enforce Authority scopes (advisory:ingest, vex:ingest, advisory:read, vex:read) and require tenant claims on every request.
  • Maintain pinned trust stores for signature verification; capture verification result in metrics and logs.
  • Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence.
  • Validate that Policy Engine remains the only identity with permission to write effective_finding_* documents.
  • Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity.
  • Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation.

11. Compliance Checklist

  • Deterministic guard enabled in Concelier and Excititor repositories.
  • Mongo validators deployed for advisory_raw and vex_raw.
  • Authority scopes and tenant enforcement verified via integration tests.
  • CLI and CI pipelines run stella aoc verify against seeded snapshots.
  • Observability feeds (metrics, logs, traces) wired into dashboards with alerts.
  • Offline kit instructions updated to bundle validators and verifier tooling.
  • Security review recorded covering ingestion, tenancy, and rollback procedures.

Last updated: 2025-10-27 (Sprint 19).