feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		| @@ -1,180 +1,180 @@ | ||||
| # Aggregation-Only Contract Reference | ||||
|  | ||||
| > The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents. | ||||
|  | ||||
| ## 1. Purpose and Scope | ||||
|  | ||||
| - Defines the canonical behaviour for `advisory_raw` and `vex_raw` collections and the linkset hints they may emit. | ||||
| - Applies to every ingestion runtime (`StellaOps.Concelier.*`, `StellaOps.Excititor.*`), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance. | ||||
| - Complements the high-level architecture in [Concelier](../ARCHITECTURE_CONCELIER.md) and Authority enforcement documented in [Authority Architecture](../ARCHITECTURE_AUTHORITY.md). | ||||
| - Paired guidance: see the guard-rail checkpoints in [AOC Guardrails](../aoc/aoc-guardrails.md) and CLI usage that will land in `/docs/cli/` as part of Sprint 19 follow-up. | ||||
|  | ||||
| ## 2. Philosophy and Goals | ||||
|  | ||||
| - Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions. | ||||
| - Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores. | ||||
| - Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated. | ||||
| - Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs. | ||||
|  | ||||
| ## 3. Contract Invariants | ||||
|  | ||||
| | # | Invariant | What it forbids or requires | Enforcement surfaces | | ||||
| |---|-----------|-----------------------------|----------------------| | ||||
| | 1 | No derived severity at ingest | Reject top-level keys such as `severity`, `cvss`, `effective_status`, `consensus_provider`, `risk_score`. Raw upstream CVSS remains inside `content.raw`. | Mongo schema validator, `AOCWriteGuard`, Roslyn analyzer, `stella aoc verify`. | | ||||
| | 2 | No merges or opinionated dedupe | Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. | Repository interceptors, unit/fixture suites. | | ||||
| | 3 | Provenance is mandatory | `source.*`, `upstream.*`, and `signature` metadata must be present; missing provenance triggers `ERR_AOC_004`. | Schema validator, guard, CLI verifier. | | ||||
| | 4 | Idempotent upserts | Writes keyed by `(vendor, upstream_id, content_hash)` either no-op or insert a new revision with `supersedes`. Duplicate hashes map to the same document. | Repository guard, storage unique index, CI smoke tests. | | ||||
| | 5 | Append-only revisions | Updates create a new document with `supersedes` pointer; no in-place mutation of content. | Mongo schema (`supersedes` format), guard, data migration scripts. | | ||||
| | 6 | Linkset only | Ingestion may compute link hints (`purls`, `cpes`, IDs) to accelerate joins, but must not transform or infer severity or policy. | Linkset builders reviewed via fixtures and analyzers. | | ||||
| | 7 | Policy-only effective findings | Only Policy Engine identities can write `effective_finding_*`; ingestion callers receive `ERR_AOC_006` if they attempt it. | Authority scopes, Policy Engine guard. | | ||||
| | 8 | Schema safety | Unknown top-level keys reject with `ERR_AOC_007`; timestamps use ISO 8601 UTC strings; tenant is required. | Mongo validator, JSON schema tests. | | ||||
| | 9 | Clock discipline | Collectors stamp `fetched_at` and `received_at` monotonically per batch to support reproducibility windows. | Collector contracts, QA fixtures. | | ||||
|  | ||||
| ## 4. Raw Schemas | ||||
|  | ||||
| ### 4.1 `advisory_raw` | ||||
|  | ||||
| | Field | Type | Notes | | ||||
| |-------|------|-------| | ||||
| | `_id` | string | `advisory_raw:{source}:{upstream_id}:{revision}`; deterministic and tenant-scoped. | | ||||
| | `tenant` | string | Required; injected by Authority middleware and asserted by schema validator. | | ||||
| | `source.vendor` | string | Provider identifier (e.g., `redhat`, `osv`, `ghsa`). | | ||||
| | `source.stream` | string | Connector stream name (`csaf`, `osv`, etc.). | | ||||
| | `source.api` | string | Absolute URI of upstream document; stored for traceability. | | ||||
| | `source.collector_version` | string | Semantic version of the collector. | | ||||
| | `upstream.upstream_id` | string | Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID). | | ||||
| | `upstream.document_version` | string | Upstream issued timestamp or revision string. | | ||||
| | `upstream.fetched_at` / `received_at` | string | ISO 8601 UTC timestamps recorded by the collector. | | ||||
| | `upstream.content_hash` | string | `sha256:` digest of the raw payload used for idempotency. | | ||||
| | `upstream.signature` | object | Required structure storing `present`, `format`, `key_id`, `sig`; even unsigned payloads set `present: false`. | | ||||
| | `content.format` | string | Source format (`CSAF`, `OSV`, etc.). | | ||||
| | `content.spec_version` | string | Upstream spec version when known. | | ||||
| | `content.raw` | object | Full upstream payload, untouched except for transport normalisation. | | ||||
| | `identifiers` | object | Normalised identifiers (`cve`, `ghsa`, `aliases`, etc.) derived losslessly from raw content. | | ||||
| | `linkset` | object | Join hints (see section 4.3). | | ||||
| | `supersedes` | string or null | Points to previous revision of same upstream doc when content hash changes. | | ||||
|  | ||||
| ### 4.2 `vex_raw` | ||||
|  | ||||
| | Field | Type | Notes | | ||||
| |-------|------|-------| | ||||
| | `_id` | string | `vex_raw:{source}:{upstream_id}:{revision}`. | | ||||
| | `tenant` | string | Required; matches advisory collection requirements. | | ||||
| | `source.*` | object | Same shape and requirements as `advisory_raw`. | | ||||
| | `upstream.*` | object | Includes `document_version`, timestamps, `content_hash`, and `signature`. | | ||||
| | `content.format` | string | Typically `CycloneDX-VEX` or `CSAF-VEX`. | | ||||
| | `content.raw` | object | Entire upstream VEX payload. | | ||||
| | `identifiers.statements` | array | Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins. | | ||||
| | `linkset` | object | CVEs, GHSA IDs, and PURLs referenced in the document. | | ||||
| | `supersedes` | string or null | Same convention as advisory documents. | | ||||
|  | ||||
| ### 4.3 Linkset Fields | ||||
|  | ||||
| - `purls`: fully qualified Package URLs extracted from raw ranges or product nodes. | ||||
| - `cpes`: Common Platform Enumerations when upstream docs provide them. | ||||
| - `aliases`: Any alternate advisory identifiers present in the payload. | ||||
| - `references`: Array of `{ type, url }` pairs pointing back to vendor advisories, patches, or exploits. | ||||
| - `reconciled_from`: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable. | ||||
|  | ||||
| Canonicalisation rules: | ||||
| - Package URLs are rendered in canonical form without qualifiers/subpaths (`pkg:type/namespace/name@version`). | ||||
| - CPE values are normalised to the 2.3 binding (`cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*`). | ||||
|  | ||||
| ### 4.4 `advisory_observations` | ||||
|  | ||||
| `advisory_observations` is an immutable projection of the validated raw document used by Link‑Not‑Merge overlays. Fields mirror the JSON contract surfaced by `StellaOps.Concelier.Models.Observations.AdvisoryObservation`. | ||||
|  | ||||
| | Field | Type | Notes | | ||||
| |-------|------|-------| | ||||
| | `_id` | string | Deterministic observation id — `{tenant}:{source.vendor}:{upstreamId}:{revision}`. | | ||||
| | `tenant` | string | Lower-case tenant identifier. | | ||||
| | `source.vendor` / `source.stream` | string | Connector identity (e.g., `vendor/redhat`, `ecosystem/osv`). | | ||||
| | `source.api` | string | Absolute URI the connector fetched from. | | ||||
| | `source.collectorVersion` | string | Optional semantic version of the connector build. | | ||||
| | `upstream.upstream_id` | string | Advisory identifier as issued by the provider (CVE, vendor ID, etc.). | | ||||
| | `upstream.document_version` | string | Upstream revision/version string. | | ||||
| | `upstream.fetchedAt` / `upstream.receivedAt` | datetime | UTC timestamps recorded by the connector. | | ||||
| | `upstream.contentHash` | string | `sha256:` digest used for idempotency. | | ||||
| | `upstream.signature` | object | `{present, format?, keyId?, signature?}` describing upstream signature material. | | ||||
| | `content.format` / `content.specVersion` | string | Raw payload format metadata (CSAF, OSV, JSON, etc.). | | ||||
| | `content.raw` | object | Full upstream document stored losslessly (Relaxed Extended JSON). | | ||||
| | `content.metadata` | object | Optional connector-specific metadata (batch ids, hints). | | ||||
| | `linkset.aliases` | array | Normalized aliases (lower-case, sorted). | | ||||
| | `linkset.purls` | array | Normalized PURLs extracted from the document. | | ||||
| | `linkset.cpes` | array | Normalized CPE URIs. | | ||||
| | `linkset.references` | array | `{ type, url }` pairs (type lower-case). | | ||||
| | `createdAt` | datetime | Timestamp when Concelier persisted the observation. | | ||||
| | `attributes` | object | Optional provenance attributes keyed by connector. | | ||||
|  | ||||
| ## 5. Error Model | ||||
|  | ||||
| | Code | Description | HTTP status | Surfaces | | ||||
| |------|-------------|-------------|----------| | ||||
| | `ERR_AOC_001` | Forbidden field detected (severity, cvss, effective data). | 400 | Ingestion APIs, CLI verifier, CI guard. | | ||||
| | `ERR_AOC_002` | Merge attempt detected (multiple upstream sources fused into one document). | 400 | Ingestion APIs, CLI verifier. | | ||||
| | `ERR_AOC_003` | Idempotency violation (duplicate without supersedes pointer). | 409 | Repository guard, Mongo unique index, CLI verifier. | | ||||
| | `ERR_AOC_004` | Missing provenance metadata (`source`, `upstream`, `signature`). | 422 | Schema validator, ingestion endpoints. | | ||||
| | `ERR_AOC_005` | Signature or checksum mismatch. | 422 | Collector validation, CLI verifier. | | ||||
| | `ERR_AOC_006` | Attempt to persist derived findings from ingestion context. | 403 | Policy engine guard, Authority scopes. | | ||||
| | `ERR_AOC_007` | Unknown top-level fields (schema violation). | 400 | Mongo validator, CLI verifier. | | ||||
|  | ||||
| Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance. | ||||
|  | ||||
| ## 6. API and Tooling Interfaces | ||||
|  | ||||
| - **Concelier ingestion** (`StellaOps.Concelier.WebService`) | ||||
|   - `POST /ingest/advisory`: accepts upstream payload metadata; server-side guard constructs and persists raw document. | ||||
|   - `GET /advisories/raw/{id}` and filterable list endpoints expose raw documents for debugging and offline analysis. | ||||
|   - `POST /aoc/verify`: runs guard checks over recent documents and returns summary totals plus first violations. | ||||
| - **Excititor ingestion** (`StellaOps.Excititor.WebService`) mirrors the same surface for VEX documents. | ||||
| - **CLI workflows** (`stella aoc verify`, `stella sources ingest --dry-run`) surface pre-flight verification; documentation will live in `/docs/cli/` alongside Sprint 19 CLI updates. | ||||
| - **Authority scopes**: new `advisory:ingest`, `advisory:read`, `vex:ingest`, and `vex:read` scopes enforce least privilege; see [Authority Architecture](../ARCHITECTURE_AUTHORITY.md) for scope grammar. | ||||
|  | ||||
| ## 7. Idempotency and Supersedes Rules | ||||
|  | ||||
| 1. Compute `content_hash` before any transformation; use it with `(source.vendor, upstream.upstream_id)` to detect duplicates. | ||||
| 2. If a document with the same hash already exists, skip the write and log a no-op. | ||||
| 3. When a new hash arrives for an existing upstream document, insert a new record and set `supersedes` to the previous `_id`. | ||||
| 4. Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert. | ||||
| 5. Expose idempotency counters via metrics (`ingestion_write_total{result=ok|noop}`) to catch regressions early. | ||||
|  | ||||
| ## 8. Migration Playbook | ||||
|  | ||||
| 1. Freeze ingestion writes except for raw pass-through paths while deploying schema validators. | ||||
| 2. Snapshot existing collections to `_backup_*` for rollback safety. | ||||
| 3. Strip forbidden fields from historical documents into a temporary `advisory_view_legacy` used only during transition. | ||||
| 4. Enable Mongo JSON schema validators for `advisory_raw` and `vex_raw`. | ||||
| 5. Run collectors in `--dry-run` to confirm only allowed keys appear; fix violations before lifting the freeze. | ||||
| 6. Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream. | ||||
| 7. Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting. | ||||
| 8. Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end. | ||||
|  | ||||
| ## 9. Observability and Diagnostics | ||||
|  | ||||
| - **Metrics**: `ingestion_write_total{result=ok|reject}`, `aoc_violation_total{code}`, `ingestion_signature_verified_total{result}`, `ingestion_latency_seconds`, `advisory_revision_count`. | ||||
| - **Traces**: spans `ingest.fetch`, `ingest.transform`, `ingest.write`, and `aoc.guard` with correlation IDs shared across workers. | ||||
| - **Logs**: structured entries must include `tenant`, `source.vendor`, `upstream.upstream_id`, `content_hash`, and `violation_code` when applicable. | ||||
| - **Dashboards**: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant. | ||||
|  | ||||
| ## 10. Security and Tenancy Checklist | ||||
|  | ||||
| - Enforce Authority scopes (`advisory:ingest`, `vex:ingest`, `advisory:read`, `vex:read`) and require tenant claims on every request. | ||||
| - Maintain pinned trust stores for signature verification; capture verification result in metrics and logs. | ||||
| - Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence. | ||||
| - Validate that Policy Engine remains the only identity with permission to write `effective_finding_*` documents. | ||||
| - Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity. | ||||
| - Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation. | ||||
|  | ||||
| ## 11. Compliance Checklist | ||||
|  | ||||
| - [ ] Deterministic guard enabled in Concelier and Excititor repositories. | ||||
| - [ ] Mongo validators deployed for `advisory_raw` and `vex_raw`. | ||||
| - [ ] Authority scopes and tenant enforcement verified via integration tests. | ||||
| - [ ] CLI and CI pipelines run `stella aoc verify` against seeded snapshots. | ||||
| - [ ] Observability feeds (metrics, logs, traces) wired into dashboards with alerts. | ||||
| - [ ] Offline kit instructions updated to bundle validators and verifier tooling. | ||||
| - [ ] Security review recorded covering ingestion, tenancy, and rollback procedures. | ||||
|  | ||||
| --- | ||||
|  | ||||
| *Last updated: 2025-10-27 (Sprint 19).* | ||||
| # Aggregation-Only Contract Reference | ||||
|  | ||||
| > The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents. | ||||
|  | ||||
| ## 1. Purpose and Scope | ||||
|  | ||||
| - Defines the canonical behaviour for `advisory_raw` and `vex_raw` collections and the linkset hints they may emit. | ||||
| - Applies to every ingestion runtime (`StellaOps.Concelier.*`, `StellaOps.Excititor.*`), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance. | ||||
| - Complements the high-level architecture in [Concelier](../modules/concelier/architecture.md) and Authority enforcement documented in [Authority Architecture](../modules/authority/architecture.md). | ||||
| - Paired guidance: see the guard-rail checkpoints in [AOC Guardrails](../aoc/aoc-guardrails.md) and CLI usage that will land in `/docs/modules/cli/guides/` as part of Sprint 19 follow-up. | ||||
|  | ||||
| ## 2. Philosophy and Goals | ||||
|  | ||||
| - Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions. | ||||
| - Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores. | ||||
| - Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated. | ||||
| - Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs. | ||||
|  | ||||
| ## 3. Contract Invariants | ||||
|  | ||||
| | # | Invariant | What it forbids or requires | Enforcement surfaces | | ||||
| |---|-----------|-----------------------------|----------------------| | ||||
| | 1 | No derived severity at ingest | Reject top-level keys such as `severity`, `cvss`, `effective_status`, `consensus_provider`, `risk_score`. Raw upstream CVSS remains inside `content.raw`. | Mongo schema validator, `AOCWriteGuard`, Roslyn analyzer, `stella aoc verify`. | | ||||
| | 2 | No merges or opinionated dedupe | Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. | Repository interceptors, unit/fixture suites. | | ||||
| | 3 | Provenance is mandatory | `source.*`, `upstream.*`, and `signature` metadata must be present; missing provenance triggers `ERR_AOC_004`. | Schema validator, guard, CLI verifier. | | ||||
| | 4 | Idempotent upserts | Writes keyed by `(vendor, upstream_id, content_hash)` either no-op or insert a new revision with `supersedes`. Duplicate hashes map to the same document. | Repository guard, storage unique index, CI smoke tests. | | ||||
| | 5 | Append-only revisions | Updates create a new document with `supersedes` pointer; no in-place mutation of content. | Mongo schema (`supersedes` format), guard, data migration scripts. | | ||||
| | 6 | Linkset only | Ingestion may compute link hints (`purls`, `cpes`, IDs) to accelerate joins, but must not transform or infer severity or policy. | Linkset builders reviewed via fixtures and analyzers. | | ||||
| | 7 | Policy-only effective findings | Only Policy Engine identities can write `effective_finding_*`; ingestion callers receive `ERR_AOC_006` if they attempt it. | Authority scopes, Policy Engine guard. | | ||||
| | 8 | Schema safety | Unknown top-level keys reject with `ERR_AOC_007`; timestamps use ISO 8601 UTC strings; tenant is required. | Mongo validator, JSON schema tests. | | ||||
| | 9 | Clock discipline | Collectors stamp `fetched_at` and `received_at` monotonically per batch to support reproducibility windows. | Collector contracts, QA fixtures. | | ||||
|  | ||||
| ## 4. Raw Schemas | ||||
|  | ||||
| ### 4.1 `advisory_raw` | ||||
|  | ||||
| | Field | Type | Notes | | ||||
| |-------|------|-------| | ||||
| | `_id` | string | `advisory_raw:{source}:{upstream_id}:{revision}`; deterministic and tenant-scoped. | | ||||
| | `tenant` | string | Required; injected by Authority middleware and asserted by schema validator. | | ||||
| | `source.vendor` | string | Provider identifier (e.g., `redhat`, `osv`, `ghsa`). | | ||||
| | `source.stream` | string | Connector stream name (`csaf`, `osv`, etc.). | | ||||
| | `source.api` | string | Absolute URI of upstream document; stored for traceability. | | ||||
| | `source.collector_version` | string | Semantic version of the collector. | | ||||
| | `upstream.upstream_id` | string | Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID). | | ||||
| | `upstream.document_version` | string | Upstream issued timestamp or revision string. | | ||||
| | `upstream.fetched_at` / `received_at` | string | ISO 8601 UTC timestamps recorded by the collector. | | ||||
| | `upstream.content_hash` | string | `sha256:` digest of the raw payload used for idempotency. | | ||||
| | `upstream.signature` | object | Required structure storing `present`, `format`, `key_id`, `sig`; even unsigned payloads set `present: false`. | | ||||
| | `content.format` | string | Source format (`CSAF`, `OSV`, etc.). | | ||||
| | `content.spec_version` | string | Upstream spec version when known. | | ||||
| | `content.raw` | object | Full upstream payload, untouched except for transport normalisation. | | ||||
| | `identifiers` | object | Normalised identifiers (`cve`, `ghsa`, `aliases`, etc.) derived losslessly from raw content. | | ||||
| | `linkset` | object | Join hints (see section 4.3). | | ||||
| | `supersedes` | string or null | Points to previous revision of same upstream doc when content hash changes. | | ||||
|  | ||||
| ### 4.2 `vex_raw` | ||||
|  | ||||
| | Field | Type | Notes | | ||||
| |-------|------|-------| | ||||
| | `_id` | string | `vex_raw:{source}:{upstream_id}:{revision}`. | | ||||
| | `tenant` | string | Required; matches advisory collection requirements. | | ||||
| | `source.*` | object | Same shape and requirements as `advisory_raw`. | | ||||
| | `upstream.*` | object | Includes `document_version`, timestamps, `content_hash`, and `signature`. | | ||||
| | `content.format` | string | Typically `CycloneDX-VEX` or `CSAF-VEX`. | | ||||
| | `content.raw` | object | Entire upstream VEX payload. | | ||||
| | `identifiers.statements` | array | Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins. | | ||||
| | `linkset` | object | CVEs, GHSA IDs, and PURLs referenced in the document. | | ||||
| | `supersedes` | string or null | Same convention as advisory documents. | | ||||
|  | ||||
| ### 4.3 Linkset Fields | ||||
|  | ||||
| - `purls`: fully qualified Package URLs extracted from raw ranges or product nodes. | ||||
| - `cpes`: Common Platform Enumerations when upstream docs provide them. | ||||
| - `aliases`: Any alternate advisory identifiers present in the payload. | ||||
| - `references`: Array of `{ type, url }` pairs pointing back to vendor advisories, patches, or exploits. | ||||
| - `reconciled_from`: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable. | ||||
|  | ||||
| Canonicalisation rules: | ||||
| - Package URLs are rendered in canonical form without qualifiers/subpaths (`pkg:type/namespace/name@version`). | ||||
| - CPE values are normalised to the 2.3 binding (`cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*`). | ||||
|  | ||||
| ### 4.4 `advisory_observations` | ||||
|  | ||||
| `advisory_observations` is an immutable projection of the validated raw document used by Link‑Not‑Merge overlays. Fields mirror the JSON contract surfaced by `StellaOps.Concelier.Models.Observations.AdvisoryObservation`. | ||||
|  | ||||
| | Field | Type | Notes | | ||||
| |-------|------|-------| | ||||
| | `_id` | string | Deterministic observation id — `{tenant}:{source.vendor}:{upstreamId}:{revision}`. | | ||||
| | `tenant` | string | Lower-case tenant identifier. | | ||||
| | `source.vendor` / `source.stream` | string | Connector identity (e.g., `vendor/redhat`, `ecosystem/osv`). | | ||||
| | `source.api` | string | Absolute URI the connector fetched from. | | ||||
| | `source.collectorVersion` | string | Optional semantic version of the connector build. | | ||||
| | `upstream.upstream_id` | string | Advisory identifier as issued by the provider (CVE, vendor ID, etc.). | | ||||
| | `upstream.document_version` | string | Upstream revision/version string. | | ||||
| | `upstream.fetchedAt` / `upstream.receivedAt` | datetime | UTC timestamps recorded by the connector. | | ||||
| | `upstream.contentHash` | string | `sha256:` digest used for idempotency. | | ||||
| | `upstream.signature` | object | `{present, format?, keyId?, signature?}` describing upstream signature material. | | ||||
| | `content.format` / `content.specVersion` | string | Raw payload format metadata (CSAF, OSV, JSON, etc.). | | ||||
| | `content.raw` | object | Full upstream document stored losslessly (Relaxed Extended JSON). | | ||||
| | `content.metadata` | object | Optional connector-specific metadata (batch ids, hints). | | ||||
| | `linkset.aliases` | array | Normalized aliases (lower-case, sorted). | | ||||
| | `linkset.purls` | array | Normalized PURLs extracted from the document. | | ||||
| | `linkset.cpes` | array | Normalized CPE URIs. | | ||||
| | `linkset.references` | array | `{ type, url }` pairs (type lower-case). | | ||||
| | `createdAt` | datetime | Timestamp when Concelier persisted the observation. | | ||||
| | `attributes` | object | Optional provenance attributes keyed by connector. | | ||||
|  | ||||
| ## 5. Error Model | ||||
|  | ||||
| | Code | Description | HTTP status | Surfaces | | ||||
| |------|-------------|-------------|----------| | ||||
| | `ERR_AOC_001` | Forbidden field detected (severity, cvss, effective data). | 400 | Ingestion APIs, CLI verifier, CI guard. | | ||||
| | `ERR_AOC_002` | Merge attempt detected (multiple upstream sources fused into one document). | 400 | Ingestion APIs, CLI verifier. | | ||||
| | `ERR_AOC_003` | Idempotency violation (duplicate without supersedes pointer). | 409 | Repository guard, Mongo unique index, CLI verifier. | | ||||
| | `ERR_AOC_004` | Missing provenance metadata (`source`, `upstream`, `signature`). | 422 | Schema validator, ingestion endpoints. | | ||||
| | `ERR_AOC_005` | Signature or checksum mismatch. | 422 | Collector validation, CLI verifier. | | ||||
| | `ERR_AOC_006` | Attempt to persist derived findings from ingestion context. | 403 | Policy engine guard, Authority scopes. | | ||||
| | `ERR_AOC_007` | Unknown top-level fields (schema violation). | 400 | Mongo validator, CLI verifier. | | ||||
|  | ||||
| Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance. | ||||
|  | ||||
| ## 6. API and Tooling Interfaces | ||||
|  | ||||
| - **Concelier ingestion** (`StellaOps.Concelier.WebService`) | ||||
|   - `POST /ingest/advisory`: accepts upstream payload metadata; server-side guard constructs and persists raw document. | ||||
|   - `GET /advisories/raw/{id}` and filterable list endpoints expose raw documents for debugging and offline analysis. | ||||
|   - `POST /aoc/verify`: runs guard checks over recent documents and returns summary totals plus first violations. | ||||
| - **Excititor ingestion** (`StellaOps.Excititor.WebService`) mirrors the same surface for VEX documents. | ||||
| - **CLI workflows** (`stella aoc verify`, `stella sources ingest --dry-run`) surface pre-flight verification; documentation will live in `/docs/modules/cli/guides/` alongside Sprint 19 CLI updates. | ||||
| - **Authority scopes**: new `advisory:ingest`, `advisory:read`, `vex:ingest`, and `vex:read` scopes enforce least privilege; see [Authority Architecture](../modules/authority/architecture.md) for scope grammar. | ||||
|  | ||||
| ## 7. Idempotency and Supersedes Rules | ||||
|  | ||||
| 1. Compute `content_hash` before any transformation; use it with `(source.vendor, upstream.upstream_id)` to detect duplicates. | ||||
| 2. If a document with the same hash already exists, skip the write and log a no-op. | ||||
| 3. When a new hash arrives for an existing upstream document, insert a new record and set `supersedes` to the previous `_id`. | ||||
| 4. Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert. | ||||
| 5. Expose idempotency counters via metrics (`ingestion_write_total{result=ok|noop}`) to catch regressions early. | ||||
|  | ||||
| ## 8. Migration Playbook | ||||
|  | ||||
| 1. Freeze ingestion writes except for raw pass-through paths while deploying schema validators. | ||||
| 2. Snapshot existing collections to `_backup_*` for rollback safety. | ||||
| 3. Strip forbidden fields from historical documents into a temporary `advisory_view_legacy` used only during transition. | ||||
| 4. Enable Mongo JSON schema validators for `advisory_raw` and `vex_raw`. | ||||
| 5. Run collectors in `--dry-run` to confirm only allowed keys appear; fix violations before lifting the freeze. | ||||
| 6. Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream. | ||||
| 7. Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting. | ||||
| 8. Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end. | ||||
|  | ||||
| ## 9. Observability and Diagnostics | ||||
|  | ||||
| - **Metrics**: `ingestion_write_total{result=ok|reject}`, `aoc_violation_total{code}`, `ingestion_signature_verified_total{result}`, `ingestion_latency_seconds`, `advisory_revision_count`. | ||||
| - **Traces**: spans `ingest.fetch`, `ingest.transform`, `ingest.write`, and `aoc.guard` with correlation IDs shared across workers. | ||||
| - **Logs**: structured entries must include `tenant`, `source.vendor`, `upstream.upstream_id`, `content_hash`, and `violation_code` when applicable. | ||||
| - **Dashboards**: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant. | ||||
|  | ||||
| ## 10. Security and Tenancy Checklist | ||||
|  | ||||
| - Enforce Authority scopes (`advisory:ingest`, `vex:ingest`, `advisory:read`, `vex:read`) and require tenant claims on every request. | ||||
| - Maintain pinned trust stores for signature verification; capture verification result in metrics and logs. | ||||
| - Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence. | ||||
| - Validate that Policy Engine remains the only identity with permission to write `effective_finding_*` documents. | ||||
| - Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity. | ||||
| - Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation. | ||||
|  | ||||
| ## 11. Compliance Checklist | ||||
|  | ||||
| - [ ] Deterministic guard enabled in Concelier and Excititor repositories. | ||||
| - [ ] Mongo validators deployed for `advisory_raw` and `vex_raw`. | ||||
| - [ ] Authority scopes and tenant enforcement verified via integration tests. | ||||
| - [ ] CLI and CI pipelines run `stella aoc verify` against seeded snapshots. | ||||
| - [ ] Observability feeds (metrics, logs, traces) wired into dashboards with alerts. | ||||
| - [ ] Offline kit instructions updated to bundle validators and verifier tooling. | ||||
| - [ ] Security review recorded covering ingestion, tenancy, and rollback procedures. | ||||
|  | ||||
| --- | ||||
|  | ||||
| *Last updated: 2025-10-27 (Sprint 19).* | ||||
|   | ||||
		Reference in New Issue
	
	Block a user