up
This commit is contained in:
@@ -124,6 +124,62 @@ Excititor workers now hydrate signature metadata with issuer trust data retrieve
|
||||
|
||||
`GET /v1/vex/statements/{advisory_key}` produces sorted JSON responses containing raw statement metadata (`issuer`, `content_hash`, `signature`), normalised tuples, and provenance pointers. Advisory AI consumes this endpoint to build retrieval contexts with explicit citations.
|
||||
|
||||
### 1.5 Postgres raw store (replaces Mongo/GridFS)
|
||||
|
||||
> Mongo/BSON/GridFS are being removed. This is the canonical design for the Postgres-backed raw store that powers `/vex/raw` and ingestion.
|
||||
|
||||
Schema: `vex`
|
||||
|
||||
- **`vex_raw_documents`** (append-only)
|
||||
- `digest TEXT PRIMARY KEY` — `sha256:{hex}` of canonical UTF-8 JSON bytes.
|
||||
- `tenant TEXT NOT NULL`
|
||||
- `provider_id TEXT NOT NULL`
|
||||
- `format TEXT NOT NULL CHECK (format IN ('openvex','csaf','cyclonedx','custom'))`
|
||||
- `source_uri TEXT NOT NULL`, `etag TEXT NULL`
|
||||
- `retrieved_at TIMESTAMPTZ NOT NULL`, `recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()`
|
||||
- `supersedes_digest TEXT NULL REFERENCES vex_raw_documents(digest)`
|
||||
- `content_json JSONB NOT NULL` — canonicalised payload (truncated when blobbed)
|
||||
- `content_size_bytes INT NOT NULL`
|
||||
- `metadata_json JSONB NOT NULL` — statement_id, issuer, spec_version, content_type, connector version, hashes, quarantine flags
|
||||
- `provenance_json JSONB NOT NULL` — DSSE/chain/rekor/trust info
|
||||
- `inline_payload BOOLEAN NOT NULL DEFAULT TRUE`
|
||||
- UNIQUE (`tenant`, `provider_id`, `source_uri`, `etag`)
|
||||
- Indexes: `(tenant, retrieved_at DESC)`, `(tenant, provider_id, retrieved_at DESC)`, `(tenant, supersedes_digest)`, GIN on `metadata_json`, GIN on `provenance_json`.
|
||||
|
||||
- **`vex_raw_blobs`** (large payloads)
|
||||
- `digest TEXT PRIMARY KEY REFERENCES vex_raw_documents(digest) ON DELETE CASCADE`
|
||||
- `payload BYTEA NOT NULL` (canonical JSON bytes; no compression to preserve determinism)
|
||||
- `payload_hash TEXT NOT NULL` (hash of stored bytes)
|
||||
|
||||
- **`vex_raw_attachments`** (optional future)
|
||||
- `digest TEXT REFERENCES vex_raw_documents(digest) ON DELETE CASCADE`
|
||||
- `name TEXT NOT NULL`, `media_type TEXT NOT NULL`
|
||||
- `payload BYTEA NOT NULL`, `payload_hash TEXT NOT NULL`
|
||||
- PRIMARY KEY (`digest`, `name`)
|
||||
|
||||
- **Observations/linksets** — use the append-only Postgres linkset schema already defined for `IAppendOnlyLinksetStore` (tables `vex_linksets`, `vex_linkset_observations`, `vex_linkset_disagreements`, `vex_linkset_mutations`) with indexes on `(tenant, vulnerability_id, product_key)` and `updated_at`.
|
||||
|
||||
**Canonicalisation & hashing**
|
||||
|
||||
1. Parse upstream JSON; sort keys; normalize newlines; encode UTF-8 without BOM. Preserve array order.
|
||||
2. Compute `digest = "sha256:{hex}"` over canonical bytes.
|
||||
3. If `size <= inline_threshold_bytes` (default 256 KiB) set `inline_payload=true` and store in `content_json`; otherwise store bytes in `vex_raw_blobs` and set `inline_payload=false`.
|
||||
4. Persist `content_size_bytes` (pre-canonical length) and `payload_hash` for integrity.
|
||||
|
||||
**API mapping (replaces Mongo/BSON)**
|
||||
List/query `/vex/raw` via `SELECT ... FROM vex.vex_raw_documents WHERE tenant=@t ORDER BY retrieved_at DESC, digest LIMIT @n OFFSET @offset`; cursor uses `(retrieved_at, digest)`. `GET /vex/raw/{digest}` loads the row and optional blob; `GET /vex/raw/{digest}/provenance` projects `provenance_json` + `metadata_json`. Filters (`providerId`, `format`, `since`, `until`, `supersedes`, `hasAttachments`) map to indexed predicates; JSON subfields use `metadata_json ->> 'field'`.
|
||||
|
||||
**Write semantics**
|
||||
|
||||
- `IVexRawStore` Postgres implementation enforces append-only inserts; duplicate `digest` => no-op; duplicate (`tenant`, `provider_id`, `source_uri`, `etag`) with new digest inserts a new row and sets `supersedes_digest`.
|
||||
- `IVexRawWriteGuard` runs before insert; tenant is mandatory on every query and write.
|
||||
|
||||
**Rollout**
|
||||
|
||||
1. Add migration under `src/Excititor/__Libraries/StellaOps.Excititor.Storage.Postgres/Migrations` creating the tables/indexes above.
|
||||
2. Implement `PostgresVexRawStore` and switch WebService/Worker DI to `AddExcititorPostgresStorage`; remove `VexMongoStorageOptions`, `IMongoDatabase`, and GridFS paths.
|
||||
3. Update `/vex/raw` endpoints/tests to the Postgres store; delete Mongo fixtures once parity is green. Mark Mongo storage paths as deprecated and remove them in the next release.
|
||||
|
||||
---
|
||||
|
||||
## 2) Inputs, outputs & canonical domain
|
||||
|
||||
Reference in New Issue
Block a user