This commit is contained in:
StellaOps Bot
2025-12-09 00:20:52 +02:00
parent 3d01bf9edc
commit bc0762e97d
261 changed files with 14033 additions and 4427 deletions

View File

@@ -124,6 +124,62 @@ Excititor workers now hydrate signature metadata with issuer trust data retrieve
`GET /v1/vex/statements/{advisory_key}` produces sorted JSON responses containing raw statement metadata (`issuer`, `content_hash`, `signature`), normalised tuples, and provenance pointers. Advisory AI consumes this endpoint to build retrieval contexts with explicit citations.
### 1.5 Postgres raw store (replaces Mongo/GridFS)
> Mongo/BSON/GridFS are being removed. This is the canonical design for the Postgres-backed raw store that powers `/vex/raw` and ingestion.
Schema: `vex`
- **`vex_raw_documents`** (append-only)
- `digest TEXT PRIMARY KEY``sha256:{hex}` of canonical UTF-8 JSON bytes.
- `tenant TEXT NOT NULL`
- `provider_id TEXT NOT NULL`
- `format TEXT NOT NULL CHECK (format IN ('openvex','csaf','cyclonedx','custom'))`
- `source_uri TEXT NOT NULL`, `etag TEXT NULL`
- `retrieved_at TIMESTAMPTZ NOT NULL`, `recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()`
- `supersedes_digest TEXT NULL REFERENCES vex_raw_documents(digest)`
- `content_json JSONB NOT NULL` — canonicalised payload (truncated when blobbed)
- `content_size_bytes INT NOT NULL`
- `metadata_json JSONB NOT NULL` — statement_id, issuer, spec_version, content_type, connector version, hashes, quarantine flags
- `provenance_json JSONB NOT NULL` — DSSE/chain/rekor/trust info
- `inline_payload BOOLEAN NOT NULL DEFAULT TRUE`
- UNIQUE (`tenant`, `provider_id`, `source_uri`, `etag`)
- Indexes: `(tenant, retrieved_at DESC)`, `(tenant, provider_id, retrieved_at DESC)`, `(tenant, supersedes_digest)`, GIN on `metadata_json`, GIN on `provenance_json`.
- **`vex_raw_blobs`** (large payloads)
- `digest TEXT PRIMARY KEY REFERENCES vex_raw_documents(digest) ON DELETE CASCADE`
- `payload BYTEA NOT NULL` (canonical JSON bytes; no compression to preserve determinism)
- `payload_hash TEXT NOT NULL` (hash of stored bytes)
- **`vex_raw_attachments`** (optional future)
- `digest TEXT REFERENCES vex_raw_documents(digest) ON DELETE CASCADE`
- `name TEXT NOT NULL`, `media_type TEXT NOT NULL`
- `payload BYTEA NOT NULL`, `payload_hash TEXT NOT NULL`
- PRIMARY KEY (`digest`, `name`)
- **Observations/linksets** — use the append-only Postgres linkset schema already defined for `IAppendOnlyLinksetStore` (tables `vex_linksets`, `vex_linkset_observations`, `vex_linkset_disagreements`, `vex_linkset_mutations`) with indexes on `(tenant, vulnerability_id, product_key)` and `updated_at`.
**Canonicalisation & hashing**
1. Parse upstream JSON; sort keys; normalize newlines; encode UTF-8 without BOM. Preserve array order.
2. Compute `digest = "sha256:{hex}"` over canonical bytes.
3. If `size <= inline_threshold_bytes` (default 256 KiB) set `inline_payload=true` and store in `content_json`; otherwise store bytes in `vex_raw_blobs` and set `inline_payload=false`.
4. Persist `content_size_bytes` (pre-canonical length) and `payload_hash` for integrity.
**API mapping (replaces Mongo/BSON)**
List/query `/vex/raw` via `SELECT ... FROM vex.vex_raw_documents WHERE tenant=@t ORDER BY retrieved_at DESC, digest LIMIT @n OFFSET @offset`; cursor uses `(retrieved_at, digest)`. `GET /vex/raw/{digest}` loads the row and optional blob; `GET /vex/raw/{digest}/provenance` projects `provenance_json` + `metadata_json`. Filters (`providerId`, `format`, `since`, `until`, `supersedes`, `hasAttachments`) map to indexed predicates; JSON subfields use `metadata_json ->> 'field'`.
**Write semantics**
- `IVexRawStore` Postgres implementation enforces append-only inserts; duplicate `digest` => no-op; duplicate (`tenant`, `provider_id`, `source_uri`, `etag`) with new digest inserts a new row and sets `supersedes_digest`.
- `IVexRawWriteGuard` runs before insert; tenant is mandatory on every query and write.
**Rollout**
1. Add migration under `src/Excititor/__Libraries/StellaOps.Excititor.Storage.Postgres/Migrations` creating the tables/indexes above.
2. Implement `PostgresVexRawStore` and switch WebService/Worker DI to `AddExcititorPostgresStorage`; remove `VexMongoStorageOptions`, `IMongoDatabase`, and GridFS paths.
3. Update `/vex/raw` endpoints/tests to the Postgres store; delete Mongo fixtures once parity is green. Mark Mongo storage paths as deprecated and remove them in the next release.
---
## 2) Inputs, outputs & canonical domain