docs consolidation and others

This commit is contained in:
master
2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions

View File

@@ -0,0 +1,181 @@
# Aggregation-Only Contract Reference
> The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents.
## 1. Purpose and Scope
- Defines the canonical behaviour for `advisory_raw` and `vex_raw` collections and the linkset hints they may emit.
- Applies to every ingestion runtime (`StellaOps.Concelier.*`, `StellaOps.Excititor.*`), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance.
- Complements the high-level architecture in [Concelier](../modules/concelier/architecture.md) and Authority enforcement documented in [Authority Architecture](../modules/authority/architecture.md).
- Paired guidance: see the guard-rail checkpoints in [AOC Guardrails](../aoc/aoc-guardrails.md), the implementation reference in [AOC Guard Library](../aoc/guard-library.md), and CLI usage that will land in `/docs/modules/cli/guides/` as part of Sprint 19 follow-up.
## 2. Philosophy and Goals
- Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions.
- Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores.
- Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated.
- Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs.
## 3. Contract Invariants
| # | Invariant | What it forbids or requires | Enforcement surfaces |
|---|-----------|-----------------------------|----------------------|
| 1 | No derived severity at ingest | Reject top-level keys such as `severity`, `cvss`, `effective_status`, `consensus_provider`, `risk_score`. Raw upstream CVSS remains inside `content.raw`. | PostgreSQL schema validator, `AOCWriteGuard`, Roslyn analyzer, `stella aoc verify`. |
| 2 | No merges or opinionated dedupe | Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. | Repository interceptors, unit/fixture suites. |
| 3 | Provenance is mandatory | `source.*`, `upstream.*`, and `signature` metadata must be present; missing provenance triggers `ERR_AOC_004`. | Schema validator, guard, CLI verifier. |
| 4 | Idempotent upserts | Writes keyed by `(vendor, upstream_id, content_hash)` either no-op or insert a new revision with `supersedes`. Duplicate hashes map to the same document. | Repository guard, storage unique index, CI smoke tests. |
| 5 | Append-only revisions | Updates create a new document with `supersedes` pointer; no in-place mutation of content. | PostgreSQL schema (`supersedes` format), guard, data migration scripts. |
| 6 | Linkset only | Ingestion may compute link hints (`purls`, `cpes`, IDs) to accelerate joins, but must not transform or infer severity or policy. Observations now persist both canonical linksets (for indexed queries) and raw linksets (preserving upstream order/duplicates) so downstream policy can decide how to normalise. When `concelier:features:noMergeEnabled=true`, all merge-derived canonicalisation paths must be disabled. | Linkset builders reviewed via fixtures/analyzers; raw-vs-canonical parity covered by observation fixtures; analyzer `CONCELIER0002` blocks merge API usage. |
| 7 | Policy-only effective findings | Only Policy Engine identities can write `effective_finding_*`; ingestion callers receive `ERR_AOC_006` if they attempt it. | Authority scopes, Policy Engine guard. |
| 8 | Schema safety | Unknown top-level keys reject with `ERR_AOC_007`; timestamps use ISO 8601 UTC strings; tenant is required. | PostgreSQL validator, JSON schema tests. |
| 9 | Clock discipline | Collectors stamp `fetched_at` and `received_at` monotonically per batch to support reproducibility windows. | Collector contracts, QA fixtures. |
## 4. Raw Schemas
### 4.1 `advisory_raw`
| Field | Type | Notes |
|-------|------|-------|
| `_id` | string | `advisory_raw:{source}:{upstream_id}:{revision}`; deterministic and tenant-scoped. |
| `tenant` | string | Required; injected by Authority middleware and asserted by schema validator. |
| `source.vendor` | string | Provider identifier (e.g., `redhat`, `osv`, `ghsa`). |
| `source.stream` | string | Connector stream name (`csaf`, `osv`, etc.). |
| `source.api` | string | Absolute URI of upstream document; stored for traceability. |
| `source.collector_version` | string | Semantic version of the collector. |
| `upstream.upstream_id` | string | Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID). |
| `upstream.document_version` | string | Upstream issued timestamp or revision string. |
| `upstream.fetched_at` / `received_at` | string | ISO 8601 UTC timestamps recorded by the collector. |
| `upstream.content_hash` | string | `sha256:` digest of the raw payload used for idempotency. |
| `upstream.signature` | object | Required structure storing `present`, `format`, `key_id`, `sig`; even unsigned payloads set `present: false`. |
| `content.format` | string | Source format (`CSAF`, `OSV`, etc.). |
| `content.spec_version` | string | Upstream spec version when known. |
| `content.raw` | object | Full upstream payload, untouched except for transport normalisation. |
| `identifiers` | object | Upstream identifiers (`cve`, `ghsa`, `aliases`, etc.) captured as provided (trimmed, order preserved, duplicates allowed). |
| `linkset` | object | Join hints (see section 4.3). |
| `supersedes` | string or null | Points to previous revision of same upstream doc when content hash changes. |
### 4.2 `vex_raw`
| Field | Type | Notes |
|-------|------|-------|
| `_id` | string | `vex_raw:{source}:{upstream_id}:{revision}`. |
| `tenant` | string | Required; matches advisory collection requirements. |
| `source.*` | object | Same shape and requirements as `advisory_raw`. |
| `upstream.*` | object | Includes `document_version`, timestamps, `content_hash`, and `signature`. |
| `content.format` | string | Typically `CycloneDX-VEX` or `CSAF-VEX`. |
| `content.raw` | object | Entire upstream VEX payload. |
| `identifiers.statements` | array | Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins. |
| `linkset` | object | CVEs, GHSA IDs, and PURLs referenced in the document. |
| `supersedes` | string or null | Same convention as advisory documents. |
### 4.3 Linkset Fields
- `purls`: fully qualified Package URLs extracted from raw ranges or product nodes.
- `cpes`: Common Platform Enumerations when upstream docs provide them.
- `aliases`: Any alternate advisory identifiers present in the payload.
- `references`: Array of `{ type, url }` pairs pointing back to vendor advisories, patches, or exploits.
- `reconciled_from`: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable.
Canonicalisation rules:
- Package URLs are rendered in canonical form without qualifiers/subpaths (`pkg:type/namespace/name@version`).
- CPE values are normalised to the 2.3 binding (`cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*`).
- Connector mapping stages are responsible for the canonical form; ingestion trims whitespace but otherwise preserves the original order and duplicate entries so downstream policy can reason about upstream intent.
### 4.4 `advisory_observations`
`advisory_observations` is an immutable projection of the validated raw document used by LinkNotMerge overlays. Fields mirror the JSON contract surfaced by `StellaOps.Concelier.Models.Observations.AdvisoryObservation`.
| Field | Type | Notes |
|-------|------|-------|
| `_id` | string | Deterministic observation id — `{tenant}:{source.vendor}:{upstreamId}:{revision}`. |
| `tenant` | string | Lower-case tenant identifier. |
| `source.vendor` / `source.stream` | string | Connector identity (e.g., `vendor/redhat`, `ecosystem/osv`). |
| `source.api` | string | Absolute URI the connector fetched from. |
| `source.collectorVersion` | string | Optional semantic version of the connector build. |
| `upstream.upstream_id` | string | Advisory identifier as issued by the provider (CVE, vendor ID, etc.). |
| `upstream.document_version` | string | Upstream revision/version string. |
| `upstream.fetchedAt` / `upstream.receivedAt` | datetime | UTC timestamps recorded by the connector. |
| `upstream.contentHash` | string | `sha256:` digest used for idempotency. |
| `upstream.signature` | object | `{present, format?, keyId?, signature?}` describing upstream signature material. |
| `content.format` / `content.specVersion` | string | Raw payload format metadata (CSAF, OSV, JSON, etc.). |
| `content.raw` | object | Full upstream document stored losslessly (Relaxed Extended JSON). |
| `content.metadata` | object | Optional connector-specific metadata (batch ids, hints). |
| `linkset.aliases` | array | Connector-supplied aliases (trimmed, order preserved, duplicates allowed). |
| `linkset.purls` | array | Connector-supplied PURLs (ingestion preserves order and duplicates). |
| `linkset.cpes` | array | Connector-supplied CPE URIs (trimmed, order preserved). |
| `linkset.references` | array | `{ type, url }` pairs (trimmed; ingestion preserves order). |
| `createdAt` | datetime | Timestamp when Concelier persisted the observation. |
| `attributes` | object | Optional provenance attributes keyed by connector. |
## 5. Error Model
| Code | Description | HTTP status | Surfaces |
|------|-------------|-------------|----------|
| `ERR_AOC_001` | Forbidden field detected (severity, cvss, effective data). | 400 | Ingestion APIs, CLI verifier, CI guard. |
| `ERR_AOC_002` | Merge attempt detected (multiple upstream sources fused into one document). | 400 | Ingestion APIs, CLI verifier. |
| `ERR_AOC_003` | Idempotency violation (duplicate without supersedes pointer). | 409 | Repository guard, PostgreSQL unique index, CLI verifier. |
| `ERR_AOC_004` | Missing provenance metadata (`source`, `upstream`, `signature`). | 422 | Schema validator, ingestion endpoints. |
| `ERR_AOC_005` | Signature or checksum mismatch. | 422 | Collector validation, CLI verifier. |
| `ERR_AOC_006` | Attempt to persist derived findings from ingestion context. | 403 | Policy engine guard, Authority scopes. |
| `ERR_AOC_007` | Unknown top-level fields (schema violation). | 400 | PostgreSQL validator, CLI verifier. |
Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance. The shared guard library (`StellaOps.Aoc.AocError`) emits consistent payloads (`code`, `message`, `violations[]`) for HTTP APIs, CLI tooling, and verifiers.
## 6. API and Tooling Interfaces
- **Concelier ingestion** (`StellaOps.Concelier.WebService`)
- `POST /ingest/advisory`: accepts upstream payload metadata; server-side guard constructs and persists raw document.
- `GET /advisories/raw/{id}` and filterable list endpoints expose raw documents for debugging and offline analysis.
- `POST /aoc/verify`: runs guard checks over recent documents and returns summary totals plus first violations.
- **Excititor ingestion** (`StellaOps.Excititor.WebService`) mirrors the same surface for VEX documents.
- **CLI workflows** (`stella aoc verify`, `stella sources ingest --dry-run`) surface pre-flight verification; documentation will live in `/docs/modules/cli/guides/` alongside Sprint 19 CLI updates.
- **Authority scopes**: new `advisory:ingest`, `advisory:read`, `vex:ingest`, and `vex:read` scopes enforce least privilege; see [Authority Architecture](../modules/authority/architecture.md) for scope grammar.
## 7. Idempotency and Supersedes Rules
1. Compute `content_hash` before any transformation; use it with `(source.vendor, upstream.upstream_id)` to detect duplicates.
2. If a document with the same hash already exists, skip the write and log a no-op.
3. When a new hash arrives for an existing upstream document, insert a new record and set `supersedes` to the previous `_id`.
4. Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert.
5. Expose idempotency counters via metrics (`ingestion_write_total{result=ok|noop}`) to catch regressions early.
## 8. Migration Playbook
1. Freeze ingestion writes except for raw pass-through paths while deploying schema validators.
2. Snapshot existing collections to `_backup_*` for rollback safety.
3. Strip forbidden fields from historical documents into a temporary `advisory_view_legacy` used only during transition.
4. Enable PostgreSQL JSON schema validators for `advisory_raw` and `vex_raw`.
5. Run collectors in `--dry-run` to confirm only allowed keys appear; fix violations before lifting the freeze.
6. Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream.
7. Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting.
8. Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end.
## 9. Observability and Diagnostics
- **Metrics**: `ingestion_write_total{result=ok|reject}`, `aoc_violation_total{code}`, `ingestion_signature_verified_total{result}`, `ingestion_latency_seconds`, `advisory_revision_count`.
- **Traces**: spans `ingest.fetch`, `ingest.transform`, `ingest.write`, and `aoc.guard` with correlation IDs shared across workers.
- **Logs**: structured entries must include `tenant`, `source.vendor`, `upstream.upstream_id`, `content_hash`, and `violation_code` when applicable.
- **Dashboards**: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant.
## 10. Security and Tenancy Checklist
- Enforce Authority scopes (`advisory:ingest`, `vex:ingest`, `advisory:read`, `vex:read`) and require tenant claims on every request.
- Maintain pinned trust stores for signature verification; capture verification result in metrics and logs.
- Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence.
- Validate that Policy Engine remains the only identity with permission to write `effective_finding_*` documents.
- Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity.
- Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation.
## 11. Compliance Checklist
- [ ] Deterministic guard enabled in Concelier and Excititor repositories.
- [ ] PostgreSQL validators deployed for `advisory_raw` and `vex_raw`.
- [ ] Authority scopes and tenant enforcement verified via integration tests.
- [ ] CLI and CI pipelines run `stella aoc verify` against seeded snapshots.
- [ ] Observability feeds (metrics, logs, traces) wired into dashboards with alerts.
- [ ] Offline kit instructions updated to bundle validators and verifier tooling.
- [ ] Security review recorded covering ingestion, tenancy, and rollback procedures.
---
*Last updated: 2025-10-27 (Sprint 19).*

View File

@@ -0,0 +1,218 @@
# Advisory Observations & Linksets
> Imposed rule: Work of this type or tasks of this type on this component must also
> be applied everywhere else it should be applied.
The Link-Not-Merge (LNM) initiative replaces the legacy "merge" pipeline with
immutable observations and correlation linksets. This guide explains how
Concelier ingests advisory statements, preserves upstream truth, and produces
linksets that downstream services (Policy Engine, Vuln Explorer, Console) can
use without collapsing sources together.
---
## 1. Model overview
### 1.1 Observation lifecycle
1. **Ingest** Connectors fetch upstream payloads (CSAF, OSV, vendor feeds),
validate signatures, and drop any derived fields prohibited by the
Aggregation-Only Contract (AOC).
2. **Persist** Concelier writes immutable `advisory_observations` scoped by
`tenant`, `(source.vendor, upstreamId)`, and `contentHash`. Supersedes chains
capture revisions without mutating history.
3. **Expose** WebService surfaces paged/read APIs; Offline Kit snapshots
include the same documents for air-gapped installs.
Observation schema highlights:
```text
observationId = {tenant}:{source.vendor}:{upstreamId}:{revision}
tenant, source{vendor, stream, api, collectorVersion}
upstream{upstreamId, documentVersion, fetchedAt, receivedAt,
contentHash, signature{present, format, keyId, signature}}
content{format, specVersion, raw}
identifiers{cve?, ghsa?, aliases[], osvIds[]}
linkset{purls[], cpes[], aliases[], references[], conflicts[]?}
createdAt, attributes{batchId?, replayCursor?}
```
- **Immutable raw** (`content.raw`) mirrors upstream payloads exactly.
- **Provenance** (`source.*`, `upstream.*`) satisfies AOC guardrails and enables
cryptographic attestations.
- **Identifiers** retain lossless extracts (CVE, GHSA, vendor aliases) that seed
linksets.
- **Linkset** captures join hints but never merges or adds derived severity.
### 1.2 Linkset lifecycle
Linksets correlate observations that describe the same vulnerable product while
keeping each source intact.
1. **Seed** Observations emit normalized identifiers (`purl`, `cpe`,
`alias`) during ingestion.
2. **Correlate** Linkset builder groups observations by tenant, product
coordinates, and equivalence signals (PURL alias graph, CVE overlap, CVSS
vector equality, fuzzy titles).
3. **Annotate** Detected conflicts (severity disagreements, affected-range
mismatch, incompatible references) are recorded with structured payloads and
preserved for UI/API export.
4. **Persist** Results land in `advisory_linksets` with deterministic IDs
(`linksetId = {tenant}:{hash(aliases+purls+seedIds)}`) and append-only history
for reproducibility.
Linksets never suppress or prefer one source; they provide aligned evidence so
other services can apply policy.
---
## 2. Observation vs. linkset
- **Purpose**
- Observation: Immutable record per vendor and revision.
- Linkset: Correlates observations that share product identity.
- **Mutation**
- Observation: Append-only via supersedes chain.
- Linkset: Rebuilt deterministically from canonical signals.
- **Allowed fields**
- Observation: Raw payload, provenance, identifiers, join hints.
- Linkset: Observation references, normalized product metadata, conflicts.
- **Forbidden fields**
- Observation: Derived severity, policy status, opinionated dedupe.
- Linkset: Derived severity (conflicts recorded but unresolved).
- **Consumers**
- Observation: Evidence API, Offline Kit, CLI exports.
- Linkset: Policy Engine overlay, UI evidence panel, Vuln Explorer.
### 2.1 Example sequence
1. Red Hat PSIRT publishes RHSA-2025:1234 for OpenSSL; Concelier inserts an
observation for vendor `redhat` with `pkg:rpm/redhat/openssl@1.1.1w-12`.
2. NVD issues CVE-2025-0001; a second observation is inserted for vendor `nvd`.
3. Linkset builder runs, groups the two observations, records alias and PURL
overlap, and flags a CVSS disagreement (`7.5` vs `7.2`).
4. Policy Engine reads the linkset, recognises the severity variance, and relies
on configured rules to decide the effective output.
---
## 3. Conflict handling
Conflicts record disagreements without altering source payloads. The builder
emits structured entries:
```json
{
"type": "severity-mismatch",
"field": "cvss.baseScore",
"observations": [
{
"source": "redhat",
"value": "7.5",
"vector": "AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N"
},
{
"source": "nvd",
"value": "7.2",
"vector": "AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N"
}
],
"confidence": "medium",
"detectedAt": "2025-10-27T14:00:00Z"
}
```
Supported conflict classes:
- `severity-mismatch` CVSS or qualitative severities differ.
- `affected-range-divergence` Product ranges, fixed versions, or platforms
disagree.
- `statement-disagreement` One observation declares `not_affected` while
another states `affected`.
- `reference-clash` URL or classifier collisions (for example, exploit URL vs
conflicting advisory).
- `alias-inconsistency` Aliases map to different canonical IDs (GHSA vs CVE).
- `metadata-gap` Required provenance missing on one source; logged as a
warning.
Conflict surfaces:
- WebService endpoints (`GET /advisories/linksets/{id}``conflicts[]`).
- UI evidence panel chips and conflict badges.
- CLI exports (JSON/OSV) exposed through LNM commands.
- Observability metrics (`advisory_linkset_conflicts_total{type}`).
---
## 4. AOC alignment
Observations and linksets must satisfy Aggregation-Only Contract invariants:
- **No derived severity** `content.raw` may include upstream severity, but the
observation body never injects or edits severity.
- **No merges** Each upstream document stays separate; linksets reference
observations via deterministic IDs.
- **Provenance mandatory** Missing `signature` or `source` metadata is an AOC
violation (`ERR_AOC_004`).
- **Idempotent writes** Duplicate `contentHash` yields a no-op; supersedes
pointer captures new revisions.
- **Deterministic output** Linkset builder sorts keys, normalizes timestamps
(UTC ISO-8601), and uses canonical JSON hashing.
Violations trigger guard errors (`ERR_AOC_00x`), emit `aoc_violation_total`
metrics, and block persistence until corrected.
---
## 5. Downstream consumption
- **Policy Engine** Computes effective severity and risk overlays from linkset
evidence and conflicts.
- **Console UI** Renders per-source statements, signed hashes, and conflict
banners inside the evidence panel.
- **CLI (`stella advisories linkset …`)** Exports observations and linksets as
JSON or OSV for offline triage.
- **Offline Kit** Shipping snapshots include observation and linkset
collections for air-gap parity.
- **Observability** Dashboards track ingestion latency, conflict counts, and
supersedes depth.
When adding new consumers, ensure they honour append-only semantics and do not
mutate observation or linkset collections.
---
## 6. Validation & testing
- **Unit tests** (`StellaOps.Concelier.Core.Tests`) validate schema guards,
deterministic linkset hashing, conflict detection fixtures, and supersedes
chains.
- **PostgreSQL integration tests** (`StellaOps.Concelier.Storage.Postgres.Tests`) verify
indexes and idempotent writes under concurrency.
- **CLI smoke suites** confirm `stella advisories observations` and `stella
advisories linksets` export stable JSON.
- **Determinism checks** replay identical upstream payloads and assert that the
resulting observation and linkset documents match byte for byte.
- **Offline kit verification** simulates air-gapped bootstrap to confirm that
snapshots align with live data.
Add fixtures whenever a new conflict type or correlation signal is introduced.
Ensure canonical JSON serialization remains stable across .NET runtime updates.
---
## 7. Reviewer checklist
- Observation schema segment matches the latest `StellaOps.Concelier.Models`
contract.
- Linkset lifecycle covers correlation signals, conflict classes, and
deterministic IDs.
- AOC invariants are explicitly called out with violation codes.
- Examples include multi-source correlation plus conflict annotation.
- Downstream consumer guidance reflects active APIs and CLI features.
- Testing section lists required suites (Core, Storage, CLI, Offline).
- Imposed rule reminder is present at the top of the document.
Confirmed against Concelier Link-Not-Merge tasks:
`CONCELIER-LNM-21-001..005`, `CONCELIER-LNM-21-101..103`,
`CONCELIER-LNM-21-201..203`.