Files
git.stella-ops.org/EPIC_1.md
master 651b8e0fa3 feat: Add new projects to solution and implement contract testing documentation
- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution.
- Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done.
- Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
2025-10-27 07:57:55 +02:00

525 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Heres the full writeup you can drop into your repo as the canonical reference for Epic 1. Its written in clean productdoc style so its safe to check in as Markdown. No fluff, just everything you need to build, test, and police it.
---
# Epic 1: AggregationOnly Contract (AOC) Enforcement
> Short name: **AOC enforcement**
> Services touched: **Conseiller (advisory ingestion), Excitator (VEX ingestion), Web API, Workers, Policy Engine, CLI, Console, Authority**
> Data stores: **MongoDB primary, optional Redis/NATS for jobs**
---
## 1) What it is
**AggregationOnly Contract (AOC)** is the ingestion covenant for StellaOps. It defines a hard boundary between **collection** and **interpretation**:
* **Ingestion (Conseiller/Excitator)** only **collects** data and preserves it as immutable raw facts with provenance. It does not decide, merge, normalize, prioritize, or assign severity. It may compute **links** that help future joins (aliases, PURLs, CPEs), but never derived judgments.
* **Policy evaluation** is the only place where merges, deduplication, consensus, severity computation, and status folding are allowed. Its reproducible and traceable.
The AOC establishes:
* **Immutable raw stores**: `advisory_raw` and `vex_raw` documents with full provenance, signatures, checksums, and upstream identifiers.
* **Linksets**: machinegenerated join hints (aliases, PURLs, CPEs, CVE/GHSA IDs) that never change the underlying source content.
* **Invariants**: a strict set of “never do this in ingestion” rules enforced by schema validation, runtime guards, and CI checks.
* **AOC Verifier**: a buildtime and runtime watchdog that blocks noncompliant code and data writes.
This epic delivers: schemas, guards, error codes, APIs, tests, migration, docs, and ops dashboards to make AOC nonnegotiable across the platform.
---
## 2) Why
AOC makes results **auditable, deterministic, and organizationspecific**. Source vendors disagree; your policies decide. By removing hidden heuristics from ingestion, we avoid unexplainable risk changes, race conditions between collectors, and vendor bias. Policytime evaluation yields reproducible deltas with complete “why” traces.
---
## 3) How it should work (deep details)
### 3.1 Core invariants
The following must be true for every write to `advisory_raw` and `vex_raw` and for every ingestion pipeline:
1. **No severity in ingestion**
* Forbidden fields: `severity`, `cvss`, `cvss_vector`, `effective_status`, `effective_range`, `merged_from`, `consensus_provider`, `reachability`, `asset_criticality`, `risk_score`.
2. **No merges or dedups in ingestion**
* No combining two upstream advisories into one. No picking a single truth when multiple VEX statements exist.
3. **Provenance is mandatory**
* Every raw doc includes `provenance` and `signature/checksum`.
4. **Idempotent upserts**
* Same upstream document (by `upstream_id` + `source` + `content_hash`) must not create duplicates.
5. **Appendonly versioning**
* Revisions from the source create new immutable documents with `supersedes` pointers; no inplace edits.
6. **Linkset only**
* Ingestion can compute and store a `linkset` for join performance. Linkset does not alter or infer severity/status.
7. **Policytime only for effective findings**
* Only the Policy Engine can write `effective_finding_*` materializations.
8. **Schema safety**
* Strict JSON schema validation at DB level; unknown fields reject writes.
9. **Clock discipline**
* Timestamps are UTC, monotonic within a batch; collectors record `fetched_at` and `received_at`.
### 3.2 Data model
#### 3.2.1 `advisory_raw` (Mongo collection)
```json
{
"_id": "advisory_raw:osv:GHSA-xxxx-....:v3",
"source": {
"vendor": "OSV",
"stream": "github",
"api": "https://api.osv.dev/v1/.../GHSA-...",
"collector_version": "conseiller/1.7.3"
},
"upstream": {
"upstream_id": "GHSA-xxxx-....",
"document_version": "2024-09-01T12:13:14Z",
"fetched_at": "2025-01-02T03:04:05Z",
"received_at": "2025-01-02T03:04:06Z",
"content_hash": "sha256:...",
"signature": {
"present": true,
"format": "dsse",
"key_id": "rekor:.../key/abc",
"sig": "base64..."
}
},
"content": {
"format": "OSV",
"spec_version": "1.6",
"raw": { /* full upstream JSON, unmodified */ }
},
"identifiers": {
"cve": ["CVE-2023-1234"],
"ghsa": ["GHSA-xxxx-...."],
"aliases": ["CVE-2023-1234", "GHSA-xxxx-...."]
},
"linkset": {
"purls": ["pkg:npm/lodash@4.17.21", "pkg:maven/..."],
"cpes": ["cpe:2.3:a:..."],
"references": [
{"type":"advisory","url":"https://..."},
{"type":"fix","url":"https://..."}
],
"reconciled_from": ["content.raw.affected.ranges", "content.raw.pkg"]
},
"supersedes": "advisory_raw:osv:GHSA-xxxx-....:v2",
"tenant": "default"
}
```
> Note: No `severity`, no `cvss`, no `effective_*`. If the upstream payload includes CVSS, it stays inside `content.raw` and is not promoted or normalized at ingestion.
#### 3.2.2 `vex_raw` (Mongo collection)
```json
{
"_id": "vex_raw:vendorX:doc-123:v4",
"source": {
"vendor": "VendorX",
"stream": "vex",
"api": "https://.../vex/doc-123",
"collector_version": "excitator/0.9.2"
},
"upstream": {
"upstream_id": "doc-123",
"document_version": "2025-01-15T08:09:10Z",
"fetched_at": "2025-01-16T01:02:03Z",
"received_at": "2025-01-16T01:02:03Z",
"content_hash": "sha256:...",
"signature": { "present": true, "format": "cms", "key_id": "kid:...", "sig": "..." }
},
"content": {
"format": "CycloneDX-VEX", // or "CSAF-VEX"
"spec_version": "1.5",
"raw": { /* full upstream VEX */ }
},
"identifiers": {
"statements": [
{
"advisory_ids": ["CVE-2023-1234","GHSA-..."],
"component_purls": ["pkg:deb/openssl@1.1.1"],
"status": "not_affected",
"justification": "component_not_present"
}
]
},
"linkset": {
"purls": ["pkg:deb/openssl@1.1.1"],
"cves": ["CVE-2023-1234"],
"ghsas": ["GHSA-..."]
},
"supersedes": "vex_raw:vendorX:doc-123:v3",
"tenant": "default"
}
```
> VEX statuses remain as raw facts. No crossprovider consensus is computed here.
### 3.3 Database validation
* MongoDB JSON Schema validators on both collections:
* Reject forbidden fields at the top level.
* Enforce presence of `source`, `upstream`, `content`, `linkset`, `tenant`.
* Enforce string formats for timestamps and hashes.
### 3.4 Write paths
1. **Collector fetches upstream**
* Normalize transport (gzip/json), compute `content_hash`, verify signature if available.
2. **Build raw doc**
* Populate `source`, `upstream`, `content.raw`, `identifiers`, `linkset`.
3. **Idempotent upsert**
* Lookup by `(source.vendor, upstream.upstream_id, upstream.content_hash)`. If exists, skip; if new content hash, insert new revision with `supersedes`.
4. **AOC guard**
* Runtime interceptor inspects write payload; if any forbidden field detected, reject with `ERR_AOC_001`.
5. **Metrics**
* Emit `ingestion_write_ok` or `ingestion_write_reject` with reason code.
### 3.5 Read paths (ingestion scope)
* Allow only listing, getting raw docs, and searching by linkset. No endpoints return “effective findings” from ingestion services.
### 3.6 Error codes
| Code | Meaning | HTTP |
| ------------- | ------------------------------------------------------------ | ---- |
| `ERR_AOC_001` | Forbidden field present (severity/consensus/normalized data) | 400 |
| `ERR_AOC_002` | Merge attempt detected (multiple upstreams fused) | 400 |
| `ERR_AOC_003` | Idempotency violation (duplicate without supersedes) | 409 |
| `ERR_AOC_004` | Missing provenance fields | 422 |
| `ERR_AOC_005` | Signature/checksum mismatch | 422 |
| `ERR_AOC_006` | Attempt to write effective findings from ingestion context | 403 |
| `ERR_AOC_007` | Unknown toplevel fields (schema violation) | 400 |
### 3.7 AOC Verifier
A buildtime and runtime safeguard:
* **Static checks (CI)**
* Block imports of `*.Policy*` or `*.Merge*` from ingestion modules.
* AST lint rule: any write to `advisory_raw` or `vex_raw` setting a forbidden key fails the build.
* **Runtime checks**
* Repository layer interceptor inspects documents before insert/update; rejects forbidden fields and multisource merges.
* **Drift detection job**
* Nightly job scans newest N docs; if violation found, pages ops and blocks new pipeline runs.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 3.8 Indexing strategy
* `advisory_raw`:
* `{ "identifiers.cve": 1 }`, `{ "identifiers.ghsa": 1 }`, `{ "linkset.purls": 1 }`, `{ "source.vendor": 1, "upstream.upstream_id": 1, "upstream.content_hash": 1 }` (unique), `{ "tenant": 1 }`.
* `vex_raw`:
* `{ "identifiers.statements.advisory_ids": 1 }`, `{ "linkset.purls": 1 }`, `{ "source.vendor": 1, "upstream.upstream_id": 1, "upstream.content_hash": 1 }` (unique), `{ "tenant": 1 }`.
### 3.9 Interaction with Policy Engine
* Policy Engine pulls raw docs by identifiers/linksets and computes:
* Dedup/merge per policy
* Consensus for VEX statements
* Severity normalization and risk scoring
* Writes **only** to `effective_finding_{policyId}` collections.
A dedicated write guard refuses `effective_finding_*` writes from any caller that isnt the Policy Engine service identity.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 3.10 Migration plan
1. **Freeze ingestion writes** except raw passthrough.
2. **Backfill**: copy existing ingestion collections to `_backup_*`.
3. **Strip forbidden fields** from raw copies, move them into a temporary `advisory_view_legacy` used only by Policy Engine for parity.
4. **Enable DB schema validators**.
5. **Run collectors** in dryrun; ensure only allowed keys land.
6. **Switch Policy Engine** to pull exclusively from `*_raw` and to compute everything else.
7. **Delete legacy normalized fields** in ingestion codepaths.
8. **Enable runtime guards** and CI lint.
### 3.11 Observability
* Metrics:
* `aoc_violation_total{code=...}`, `ingestion_write_total{result=ok|reject}`, `ingestion_signature_verified_total{result=ok|fail}`, `ingestion_latency_seconds`, `advisory_revision_count`.
* Tracing: span `ingest.fetch`, `ingest.transform`, `ingest.write`, `aoc.guard`.
* Logs: include `tenant`, `source.vendor`, `upstream.upstream_id`, `content_hash`, `correlation_id`.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 3.12 Security and tenancy
* Every raw doc carries a `tenant` field.
* Authority enforces `advisory:write` and `vex:write` scopes for ingestion endpoints.
* Crosstenant reads/writes are blocked by default.
* Secrets never logged; signatures verified with pinned trust stores.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 3.13 CLI and Console behavior
* **CLI**
* `stella sources ingest --dry-run` prints wouldwrite payload and explicitly shows that no severity/status fields are present.
* `stella aoc verify` scans last K documents and reports violations with exit codes.
* **Console**
* Sources dashboard shows AOC pass/fail per job, most recent violation codes, and a drilldown to the offending document.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
---
## 4) API surface (ingestion scope)
### 4.1 Conseiller (Advisories)
* `POST /ingest/advisory`
* Body: raw upstream advisory with metadata; server constructs document, not the client.
* Rejections: `ERR_AOC_00x` per table above.
* `GET /advisories/raw/{id}`
* `GET /advisories/raw?cve=CVE-...&purl=pkg:...&tenant=...`
* `GET /advisories/raw/{id}/provenance`
* `POST /aoc/verify?since=ISO8601` returns summary stats and first N violations.
### 4.2 Excitator (VEX)
* `POST /ingest/vex`
* `GET /vex/raw/{id}`
* `GET /vex/raw?advisory_id=CVE-...&purl=pkg:...`
* `POST /aoc/verify?since=ISO8601`
All endpoints require `tenant` scope and appropriate `:write` or `:read`.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
---
## 5) Example: endtoend flow
1. Collector fetches `GHSA-1234` from OSV.
2. Build `advisory_raw` with linkset PURLs.
3. Insert; AOC guard approves.
4. Policy Engine later evaluates SBOM `S-42` under `policy P-7`, reads raw advisory and any VEX raw docs, computes effective findings, and writes to `effective_finding_P-7`.
5. CLI `stella aoc verify --since 24h` returns `0` violations.
---
## 6) Implementation tasks
Breakdown by component with exact work items. Each section ends with the imposed sentence you requested.
### 6.1 Conseiller (advisory ingestion, WS + Worker)
* [ ] Add Mongo JSON schema validation for `advisory_raw`.
* [ ] Implement repository layer with **write interceptors** that reject forbidden fields.
* [ ] Compute `linkset` from upstream using deterministic mappers.
* [ ] Enforce idempotency by unique index on `(source.vendor, upstream.upstream_id, upstream.content_hash, tenant)`.
* [ ] Remove any normalization pipelines; relocate to Policy Engine.
* [ ] Add `POST /ingest/advisory` and `GET /advisories/raw*` endpoints with Authority scope checks.
* [ ] Emit observability metrics and traces.
* [ ] Unit tests: schema violations, idempotency, supersedes chain, forbidden fields.
* [ ] Integration tests: large batch ingest, linkset correctness against golden fixtures.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.2 Excitator (VEX ingestion, WS + Worker)
* [ ] Add Mongo JSON schema validation for `vex_raw`.
* [ ] Implement repository layer guard identical to Conseiller.
* [ ] Deterministic `linkset` extraction for advisory IDs and PURLs.
* [ ] Endpoints `POST /ingest/vex`, `GET /vex/raw*` with scopes.
* [ ] Remove any consensus or folding logic; leave VEX statements as raw.
* [ ] Tests as per Conseiller, with rich fixtures for CycloneDXVEX and CSAF.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.3 Web API shared library
* [ ] Define `AOCForbiddenKeys` and export for both services.
* [ ] Provide `AOCWriteGuard` middleware and `AOCError` types.
* [ ] Provide `ProvenanceBuilder` utility.
* [ ] Provide `SignatureVerifier` and `Checksum` helpers.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.4 Policy Engine
* [ ] Block any import/use from ingestion modules by lint rule.
* [ ] Add hard gate on `effective_finding_*` writes that verifies caller identity is Policy Engine.
* [ ] Update readers to pull fields only from `content.raw`, `identifiers`, `linkset`, not any legacy normalized fields.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.5 Authority
* [ ] Introduce scopes: `advisory:write`, `advisory:read`, `vex:write`, `vex:read`, `aoc:verify`.
* [ ] Add `tenant` claim propagation to ingestion services.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.6 CLI
* [ ] `stella sources ingest --dry-run` and `stella aoc verify` commands.
* [ ] Exit codes mapping to `ERR_AOC_00x`.
* [ ] JSON output schema including violation list.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.7 Console
* [ ] Sources dashboard tiles: last run, AOC violations, top error codes.
* [ ] Drilldown page rendering offending doc with highlight on forbidden keys.
* [ ] “Verify last 24h” action calling the AOC Verifier endpoint.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 6.8 CI/CD
* [ ] AST linter to forbid writes of banned keys in ingestion modules.
* [ ] Unit test coverage gates for AOC guard code.
* [ ] Pipeline stage that runs `stella aoc verify` against seeded DB snapshots.
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
---
## 7) Documentation changes (create/update these files)
1. **`/docs/ingestion/aggregation-only-contract.md`**
* Add: philosophy, invariants, schemas for `advisory_raw`/`vex_raw`, error codes, linkset definition, examples, idempotency rules, supersedes, API references, migration steps, observability, security.
2. **`/docs/architecture/overview.md`**
* Update system diagram to show AOC boundary and raw stores; add sequence diagram: fetch → guard → raw insert → policy evaluation.
3. **`/docs/architecture/policy-engine.md`**
* Clarify ingestion boundary; list inputs consumed from raw; note that any severity/consensus is policytime only.
4. **`/docs/ui/console.md`**
* Add Sources dashboard section: AOC tiles and violation drilldown.
5. **`/docs/cli/cli-reference.md`**
* Add `stella aoc verify` and `stella sources ingest --dry-run` usage and exit codes.
6. **`/docs/observability/observability.md`**
* Document new metrics, traces, logs keys for AOC.
7. **`/docs/security/authority-scopes.md`**
* Add new scopes and tenancy enforcement for ingestion endpoints.
8. **`/docs/deploy/containers.md`**
* Note DB validators must be enabled; environment flags for AOC guards; readonly user for verify endpoint.
Each file should include a “Compliance checklist” subsection for AOC.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
---
## 8) Acceptance criteria
* DB validators are active and reject writes with forbidden fields.
* AOC runtime guards log and reject violations with correct error codes.
* CI linter prevents shipping code that writes forbidden keys to raw stores.
* Ingestion of known fixture sets results in zero normalized fields outside `content.raw`.
* Policy Engine is the only writer of `effective_finding_*` materializations.
* CLI `stella aoc verify` returns success on clean datasets and nonzero on seeded violations.
* Console shows AOC status and violation drilldowns.
---
## 9) Risks and mitigations
* **Collector drift**: new upstream fields tempt developers to normalize.
* Mitigation: CI linter + guard + schema validators; require RFC to extend linkset.
* **Performance impact**: extra validation on write.
* Mitigation: guard is O(number of keys) and schema check is bounded; indexes sized appropriately.
* **Migration complexity**: moving legacy normalized fields out.
* Mitigation: temporary `advisory_view_legacy` for parity; stepwise cutover.
* **Tenant leakage**: missing tenant on write.
* Mitigation: schema requires `tenant`; middleware injects and asserts.
---
## 10) Test plan
* **Unit tests**
* Guard rejects forbidden keys; idempotency; supersedes chain; provenance required.
* Signature verification paths: good, bad, absent.
* **Property tests**
* Randomized upstream docs never produce forbidden keys at top level.
* **Integration tests**
* Batch ingest of 50k advisories: throughput, zero violations.
* Mixed VEX sources with contradictory statements remain separate in raw.
* **Contract tests**
* Policy Engine refuses to run without raw inputs; writes only to `effective_finding_*`.
* **Endtoend**
* Seed SBOM + advisories + VEX; ensure findings are identical pre/post migration.
---
## 11) Developer checklists
**Definition of Ready**
* Upstream spec reference attached.
* Linkset mappers defined.
* Example fixtures added.
**Definition of Done**
* DB validators deployed and tested.
* Runtime guards enabled.
* CI linter merged and enforced.
* Docs updated (files in section 7).
* Metrics visible on dashboard.
* CLI verify passes.
---
## 12) Glossary
* **Raw document**: exact upstream content plus provenance, with join hints.
* **Linkset**: PURLs/CPEs/IDs extracted to accelerate joins later.
* **Supersedes**: pointer from a newer raw doc to the previous revision of the same upstream doc.
* **Policytime**: evaluation phase where merges, consensus, and severity are computed.
* **AOC**: AggregationOnly Contract.
---
### Final imposed reminder
**Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.**