Files
git.stella-ops.org/docs/modules/concelier/link-not-merge-schema.md
StellaOps Bot 1c6730a1d2
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-11-28 00:45:16 +02:00

177 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Link-Not-Merge (LNM) Observation & Linkset Schema
_Frozen v1 (add-only) — approved 2025-11-17 for CONCELIER-LNM-21-001/002/101._
## Goals
- Immutable storage of raw advisory observations per source/tenant.
- Deterministic linksets built from observations without merging or mutating originals.
- Stable across online/offline deployments; replayable from raw inputs.
## Status
- Frozen v1 as of 2025-11-17; further schema changes must go through ADR + sprint gating (CONCELIER-LNM-22x+).
## Observation document (Mongo JSON Schema excerpt)
```json
{
"bsonType": "object",
"required": ["_id","tenantId","source","advisoryId","affected","provenance","ingestedAt"],
"properties": {
"_id": {"bsonType": "objectId"},
"tenantId": {"bsonType": "string"},
"source": {"bsonType": "string", "description": "Adapter id, e.g., ghsa, nvd, cert-bund"},
"advisoryId": {"bsonType": "string"},
"title": {"bsonType": "string"},
"summary": {"bsonType": "string"},
"severities": {
"bsonType": "array",
"items": {"bsonType": "object", "required": ["system","score"],
"properties": {"system":{"bsonType":"string"},"score":{"bsonType":"double"},"vector":{"bsonType":"string"}}}
},
"affected": {
"bsonType": "array",
"items": {"bsonType":"object","required":["purl"],
"properties": {
"purl": {"bsonType":"string"},
"package": {"bsonType":"string"},
"versions": {"bsonType":"array","items":{"bsonType":"string"}},
"ranges": {"bsonType":"array","items":{"bsonType":"object",
"required":["type","events"],
"properties": {"type":{"bsonType":"string"},"events":{"bsonType":"array","items":{"bsonType":"object"}}}}},
"ecosystem": {"bsonType":"string"},
"cpe": {"bsonType":"array","items":{"bsonType":"string"}},
"cpes": {"bsonType":"array","items":{"bsonType":"string"}}
}
}
},
"references": {"bsonType": "array", "items": {"bsonType":"string"}},
"scopes": {"bsonType":"array","items":{"bsonType":"string"}},
"relationships": {
"bsonType": "array",
"items": {"bsonType":"object","required":["type","source","target"],
"properties": {
"type":{"bsonType":"string"},
"source":{"bsonType":"string"},
"target":{"bsonType":"string"},
"provenance":{"bsonType":"string"}
}}
},
"weaknesses": {"bsonType":"array","items":{"bsonType":"string"}},
"published": {"bsonType": "date"},
"modified": {"bsonType": "date"},
"provenance": {
"bsonType": "object",
"required": ["sourceArtifactSha","fetchedAt"],
"properties": {
"sourceArtifactSha": {"bsonType":"string"},
"fetchedAt": {"bsonType":"date"},
"ingestJobId": {"bsonType":"string"},
"signature": {"bsonType":"object"}
}
},
"ingestedAt": {"bsonType": "date"}
}
}
```
### Observation invariants
- **Immutable:** no in-place updates; new revision → new document with `supersedesId` optional pointer.
- **Deterministic keying:** `_id` derived from `hash(tenantId|source|advisoryId|provenance.sourceArtifactSha)` to keep inserts idempotent in replay.
- **Normalization guardrails:** version ranges must be stored as raw-from-source; no inferred merges.
## Append-Only Contract (AOC) — LNM-21-004
The Aggregation-Only Contract (AOC) ensures observations are immutable after creation. This is enforced by `IAdvisoryObservationWriteGuard`.
### Write disposition rules
| Existing Hash | New Hash | Disposition | Action |
|--------------|----------|-------------|--------|
| null/empty | any | `Proceed` | Insert new observation |
| X | X (identical) | `SkipIdentical` | Idempotent re-insert, no write |
| X | Y (different) | `RejectMutation` | Reject with `AppendOnlyViolationException` |
### Supersession model
When an advisory source publishes a revised version of an advisory:
1. A **new observation** is created with its own unique `observationId` and `contentHash`.
2. The new observation MAY carry a `supersedesId` pointing to the previous observation.
3. The **original observation remains immutable** — it is never updated or deleted.
4. Linksets are rebuilt to include all non-superseded observations; superseded observations remain queryable for audit but excluded from active linkset aggregation.
### Implementation checklist (LNM-21-004)
- [x] `IAdvisoryObservationWriteGuard` interface with `ValidateWrite(observation, existingContentHash)` method.
- [x] `AdvisoryObservationWriteGuard` implementation enforcing append-only semantics.
- [x] `AppendOnlyViolationException` for mutation rejections.
- [x] DI registration via `AddConcelierAocGuards()` extension.
- [x] Unit tests covering Proceed/SkipIdentical/RejectMutation scenarios.
- [x] Legacy merge logic deprecated with `[Obsolete]` and gated by `NoMergeEnabled` feature flag (defaults to `true`).
- [x] Roslyn analyzer `StellaOps.Concelier.Analyzers.NoMergeApiAnalyzer` emits warnings for merge API usage.
## Linkset document
```json
{
"bsonType":"object",
"required":["_id","tenantId","advisoryId","source","observations","createdAt"],
"properties":{
"_id":{"bsonType":"objectId"},
"tenantId":{"bsonType":"string"},
"advisoryId":{"bsonType":"string"},
"source":{"bsonType":"string"},
"observations":{"bsonType":"array","items":{"bsonType":"objectId"}},
"normalized": {
"bsonType":"object",
"properties":{
"purls":{"bsonType":"array","items":{"bsonType":"string"}},
"versions":{"bsonType":"array","items":{"bsonType":"string"}},
"ranges": {"bsonType":"array","items":{"bsonType":"object"}},
"severities": {"bsonType":"array","items":{"bsonType":"object"}}
}
},
"confidence": {"bsonType":"double", "description":"Optional correlation confidence (01)"},
"conflicts": {"bsonType":"array","items":{"bsonType":"object",
"required":["field","reason"],
"properties":{
"field":{"bsonType":"string"},
"reason":{"bsonType":"string"},
"values":{"bsonType":"array","items":{"bsonType":"string"}},
"sourceIds":{"bsonType":"array","items":{"bsonType":"string"}}
}}},
"createdAt":{"bsonType":"date"},
"builtByJobId":{"bsonType":"string"},
"provenance": {"bsonType":"object","properties":{
"observationHashes":{"bsonType":"array","items":{"bsonType":"string"}},
"toolVersion" : {"bsonType":"string"},
"policyHash" : {"bsonType":"string"}
}}
}
}
```
### Linkset invariants
- Built from a set of observation IDs; never overwrites observations.
- Carries the hash list of source observations for audit/replay.
- Deterministic sort: observations sorted by `source, advisoryId, fetchedAt` before hashing.
- Conflicts are additive only and now carry optional `sourceIds[]` to trace which upstream sources produced divergent values.
## Indexes (Mongo)
- Observations: `{ tenantId:1, source:1, advisoryId:1, provenance.fetchedAt:-1 }` (compound for ingest); `{ provenance.sourceArtifactSha:1 }` unique to avoid dup writes.
- Linksets: `{ tenantId:1, advisoryId:1, source:1 }` unique; `{ observations:1 }` sparse for reverse lookups.
## Collections
- `advisory_observations` — raw per-source docs (immutable).
- `advisory_linksets` — derived normalized aggregates with observation pointers and hashes.
## Determinism & replay
- Replay rebuild: order observations by fetchedAt, recompute linkset hash list, ensure byte-identical linkset JSON.
- All timestamps UTC ISO-8601; no server-local time.
- String normalization: lowercase `source`, trim/normalize PURLs, stable sort arrays.
## Sample documents
See `docs/samples/lnm/observation-ghsa.json` and `docs/samples/lnm/linkset-ghsa.json` (added with this draft) for concrete payloads.
## Approval path
1) Architecture + Concelier Core review this document.
2) If accepted, freeze JSON Schema and roll into `src/Concelier/__Libraries/StellaOps.Concelier.Storage.Mongo` migrations.
3) Update consumers (policy/CLI/export) to read from linksets only; deprecate Merge endpoints.
---
Tracking: CONCELIER-LNM-21-001/002/101; Sprint 110 blockers (Concelier/Excititor waves).