Here’s a quick, practical cheat‑sheet on choosing **PostgreSQL vs MongoDB** for security/DevOps apps—plus how I’d model SBOM/VEX and queues in Stella Ops without adding moving parts.

---

# PostgreSQL you can lean on (why it often wins for ops apps)

* **JSONB that flies:** Store documents yet query like SQL. Add **GIN indexes** on JSONB fields for fast lookups (`jsonb_ops` general; `jsonb_path_ops` great for `@>` containment).
* **Queue pattern built‑in:** `SELECT … FOR UPDATE SKIP LOCKED` lets multiple workers pop jobs from the same table safely—no head‑of‑line blocking, no extra broker.
* **Cooperative locks:** **Advisory locks** (session/transaction) for “at‑most‑once” sections or leader election.
* **Lightweight pub/sub:** **LISTEN/NOTIFY** for async nudges between services (poke a worker to re‑scan, refresh cache, etc.).
* **Search included:** **Full‑text search** (tsvector/tsquery) is native—no separate search service for moderate needs.
* **Serious backups:** **PITR** with WAL archiving / `pg_basebackup` for deterministic rollbacks and offline bundles.

# MongoDB facts to factor in

* **Flexible ingest:** Schemaless docs make it easy to absorb varied telemetry and vendor feeds.
* **Horizontal scale:** Sharding is mature for huge, read‑heavy datasets.
* **Consistency is a choice:** Design embedding vs refs and when to use multi‑document transactions.

---

# A simple rule of thumb (Stella Ops‑style)

* **System of record:** PostgreSQL (JSONB first).
* **Hot paths:** Materialized views + JSONB GIN indexes.
* **Queues & coordination:** PostgreSQL (skip‑locked + advisory locks).
* **Cache/accel only:** Valkey (ephemeral).
* **MongoDB:** Optional for **very large, read‑optimized graph snapshots** (e.g., periodically baked reachability graphs) if Postgres starts to strain.

---

# Concrete patterns you can drop in today

**1) SBOM/VEX storage (Postgres JSONB)**

```sql
-- Documents
CREATE TABLE sbom (
  id BIGSERIAL PRIMARY KEY,
  artifact_purl TEXT NOT NULL,
  doc JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX sbom_purl_idx ON sbom(artifact_purl);
CREATE INDEX sbom_doc_gin ON sbom USING GIN (doc jsonb_path_ops);

-- Common queries
-- find components by name/version:
-- SELECT * FROM sbom WHERE doc @> '{"components":[{"name":"openssl","version":"3.0.14"}]}';

-- VEX
CREATE TABLE vex (
  id BIGSERIAL PRIMARY KEY,
  subject_purl TEXT NOT NULL,
  vex_doc JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX vex_subject_idx ON vex(subject_purl);
CREATE INDEX vex_doc_gin ON vex USING GIN (vex_doc jsonb_path_ops);
```

**2) Hot reads via materialized views**

```sql
CREATE MATERIALIZED VIEW mv_open_findings AS
SELECT
  s.artifact_purl,
  c->>'name' AS comp,
  c->>'version' AS ver,
  v.vex_doc
FROM sbom s
CROSS JOIN LATERAL jsonb_array_elements(s.doc->'components') c
LEFT JOIN vex v ON v.subject_purl = s.artifact_purl
-- add WHERE clauses to pre‑filter only actionable rows
;
CREATE INDEX mv_open_findings_idx ON mv_open_findings(artifact_purl, comp);
```

Refresh cadence: on feed import or via a scheduler; `REFRESH MATERIALIZED VIEW CONCURRENTLY mv_open_findings;`

**3) Queue without a broker**

```sql
CREATE TABLE job_queue(
  id BIGSERIAL PRIMARY KEY,
  kind TEXT NOT NULL,          -- e.g., 'scan', 'sbom-diff'
  payload JSONB NOT NULL,
  run_after TIMESTAMPTZ DEFAULT now(),
  attempts INT DEFAULT 0,
  locked_at TIMESTAMPTZ,
  locked_by TEXT
);
CREATE INDEX job_queue_ready_idx ON job_queue(kind, run_after);

-- Worker loop
WITH cte AS (
  SELECT id FROM job_queue
  WHERE kind = $1 AND run_after <= now() AND locked_at IS NULL
  ORDER BY id
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
UPDATE job_queue j
SET locked_at = now(), locked_by = $2
FROM cte
WHERE j.id = cte.id
RETURNING j.*;
```

Release/fail with: set `locked_at=NULL, locked_by=NULL, attempts=attempts+1` or delete on success.

**4) Advisory lock for singletons**

```sql
-- Acquire (per tenant, per artifact)
SELECT pg_try_advisory_xact_lock(hashtextextended('recalc:'||tenant||':'||artifact, 0));
```

**5) Nudge workers without a bus**

```sql
NOTIFY stella_scan, json_build_object('purl', $1, 'priority', 5)::TEXT;
-- workers LISTEN stella_scan and enqueue quickly
```

---

# When to add MongoDB

* You need **interactive exploration** over **hundreds of millions of nodes/edges** (e.g., historical “proof‑of‑integrity” graphs) where document fan‑out and denormalized reads beat relational joins.
* Snapshot cadence is **batchy** (hourly/daily), and you can **re‑emit** snapshots deterministically from Postgres (single source of truth).
* You want to isolate read spikes from the transactional core.

**Snapshot pipe:** Postgres → (ETL) → MongoDB collection `{graph_id, node, edges[], attrs}` with **compound shard keys** tuned to your UI traversal.

---

# Why this fits Stella Ops

* Fewer moving parts on‑prem/air‑gapped.
* Deterministic replays (PITR + immutable imports).
* Clear performance levers (GIN indexes, MVs, skip‑locked queues).
* MongoDB stays optional, purpose‑built for giant read graphs—not a default dependency.

If you want, I can turn the above into ready‑to‑run `.sql` migrations and a small **.NET 10** worker (Dapper/EF Core) that implements the queue loop + advisory locks + LISTEN/NOTIFY hooks.
Below is a handoff-ready set of **PostgreSQL tables/views engineering guidelines** intended for developer review. It is written as a **gap-finding checklist** with **concrete DDL patterns** and **performance red flags** (Postgres as system of record, JSONB where useful, derived projections where needed).

---

# PostgreSQL Tables & Views Engineering Guide

## 0) Non-negotiable principles

1. **Every hot query must have an index story.** If you cannot name the index that serves it, you have a performance gap.
2. **Write path stays simple.** Prefer **append-only** versioning to large updates (especially for JSONB).
3. **Multi-tenant must be explicit.** Every core table includes `tenant_id` and indexes are tenant-prefixed.
4. **Derived data is a product.** If the UI needs it fast, model it as a **projection table or materialized view**, not as an ad-hoc mega-join.
5. **Idempotency is enforced in the DB.** Unique keys for imports/jobs/results; no “best effort” dedupe in application only.

---

# 1) Table taxonomy and what to look for

Use this to classify every table; each class has different indexing/retention/locking rules.

### A. Source-of-truth (SOR) tables

Examples: `sbom_document`, `vex_document`, `feed_import`, `scan_manifest`, `attestation`.

* **Expect:** immutable rows, versioning via new row inserts.
* **Gaps:** frequent updates to large JSONB; missing `content_hash`; no unique idempotency key.

### B. Projection tables (query-optimized)

Examples: `open_findings`, `artifact_risk_summary`, `component_index`.

* **Expect:** denormalized, indexed for UI/API; refresh/update strategy defined.
* **Gaps:** projections rebuilt from scratch too often; missing incremental update plan; no retention plan.

### C. Queue/outbox tables

Examples: `job_queue`, `outbox_events`.

* **Expect:** `SKIP LOCKED` claim pattern; retry + DLQ; minimal lock duration.
* **Gaps:** holding row locks while doing work; missing partial index for “ready” jobs.

### D. Audit/event tables

Examples: `scan_run_event`, `decision_event`, `access_audit`.

* **Expect:** append-only; partitioned by time; BRIN on timestamps.
* **Gaps:** single huge table without partitioning; slow deletes instead of partition drops.

---

# 2) Naming, keys, and required columns

## Required columns per class

### SOR documents (SBOM/VEX/Attestations)

* `tenant_id uuid`
* `id bigserial` (internal PK)
* `external_id uuid` (optional API-facing id)
* `content_hash bytea` (sha256) **NOT NULL**
* `doc jsonb` **NOT NULL**
* `created_at timestamptz` **NOT NULL default now()**
* `supersedes_id bigint NULL` (version chain) OR `version int`

**Checklist**

* [ ] Unique constraint exists: `(tenant_id, content_hash)`
* [ ] Version strategy exists (supersedes/version) and is queryable
* [ ] “Latest” access is index-backed (see §4)

### Queue

* `tenant_id uuid` (if multi-tenant)
* `id bigserial`
* `kind text`
* `payload jsonb`
* `run_after timestamptz`
* `attempts int`
* `locked_at timestamptz NULL`
* `locked_by text NULL`
* `status smallint` (optional; e.g., ready/running/done/dead)

**Checklist**

* [ ] “Ready to claim” has a partial index (see §4)
* [ ] Claim transaction is short (claim+commit; work outside lock)

---

# 3) JSONB rules that prevent “looks fine → melts in prod”

## When JSONB is appropriate

* Storing signed envelopes (DSSE), SBOM/VEX raw docs, vendor payloads.
* Ingest-first scenarios where schema evolves.

## When JSONB is a performance hazard

* You frequently query deep keys/arrays (components, vulnerabilities, call paths).
* You need sorting/aggregations on doc fields.

**Mandatory pattern for hot JSON fields**

1. Keep the raw JSONB for fidelity.
2. Extract **hot keys** into **stored generated columns** (or real columns), index those.
3. Extract **hot arrays** into child tables (components, vulnerabilities).

Example:

```sql
CREATE TABLE sbom_document (
  id           bigserial PRIMARY KEY,
  tenant_id    uuid NOT NULL,
  artifact_purl text NOT NULL,
  content_hash bytea NOT NULL,
  doc          jsonb NOT NULL,
  created_at   timestamptz NOT NULL DEFAULT now(),

  -- hot keys as generated columns
  bom_format   text GENERATED ALWAYS AS ((doc->>'bomFormat')) STORED,
  spec_version text GENERATED ALWAYS AS ((doc->>'specVersion')) STORED
);

CREATE UNIQUE INDEX ux_sbom_doc_hash ON sbom_document(tenant_id, content_hash);
CREATE INDEX ix_sbom_doc_tenant_artifact ON sbom_document(tenant_id, artifact_purl, created_at DESC);
CREATE INDEX ix_sbom_doc_json_gin ON sbom_document USING GIN (doc jsonb_path_ops);
CREATE INDEX ix_sbom_doc_bomformat ON sbom_document(tenant_id, bom_format);
```

**Checklist**

* [ ] Any query using `doc->>` in WHERE has either an expression index or a generated column index
* [ ] Any query using `jsonb_array_elements(...)` in hot path has been replaced by a normalized child table or a projection table

---

# 4) Indexing standards (what devs must justify)

## Core rules

1. **Tenant-first**: `INDEX(tenant_id, …)` for anything read per tenant.
2. **Sort support**: if query uses `ORDER BY created_at DESC`, index must end with `created_at DESC`.
3. **Partial indexes** for sparse predicates (status/locked flags).
4. **BRIN** for massive append-only time series.
5. **GIN jsonb_path_ops** for containment (`@>`) on JSONB; avoid GIN for everything.

## Required index patterns by use case

### “Latest version per artifact”

If you store versions as rows:

```sql
-- supports: WHERE tenant_id=? AND artifact_purl=? ORDER BY created_at DESC LIMIT 1
CREATE INDEX ix_sbom_latest ON sbom_document(tenant_id, artifact_purl, created_at DESC);
```

### Ready queue claims

```sql
CREATE INDEX ix_job_ready
ON job_queue(kind, run_after, id)
WHERE locked_at IS NULL;

-- Optional: tenant scoped
CREATE INDEX ix_job_ready_tenant
ON job_queue(tenant_id, kind, run_after, id)
WHERE locked_at IS NULL;
```

### JSON key lookup (expression index)

```sql
-- supports: WHERE (doc->>'subject') = ?
CREATE INDEX ix_vex_subject_expr
ON vex_document(tenant_id, (doc->>'subject'));
```

### Massive event table time filtering

```sql
CREATE INDEX brin_scan_events_time
ON scan_run_event USING BRIN (occurred_at);
```

**Red flags**

* GIN index on a JSONB column + frequent updates = bloat and write amplification.
* No partial index for queue readiness → sequential scans under load.
* Composite indexes with wrong leading column order (e.g., `created_at, tenant_id`) → not used.

---

# 5) Partitioning and retention (avoid “infinite tables”)

Use partitioning for:

* audit/events
* scan run logs
* large finding histories
* anything > tens of millions rows with time-based access

## Standard approach

* Partition by `occurred_at` (monthly) for event/audit tables.
* Retention by dropping partitions (fast and vacuum-free).

Example:

```sql
CREATE TABLE scan_run_event (
  tenant_id uuid NOT NULL,
  scan_run_id bigint NOT NULL,
  occurred_at timestamptz NOT NULL,
  event_type text NOT NULL,
  payload jsonb NOT NULL
) PARTITION BY RANGE (occurred_at);
```

**Checklist**

* [ ] Partition creation/rollover process exists (migration or scheduler)
* [ ] Retention is “DROP PARTITION”, not “DELETE WHERE occurred_at < …”
* [ ] Each partition has needed local indexes (BRIN/time + tenant filters)

---

# 6) Views vs Materialized Views vs Projection Tables

## Use a normal VIEW when

* It’s thin (renaming columns, simple joins) and not used in hot paths.

## Use a MATERIALIZED VIEW when

* It accelerates complex joins/aggregations and can be refreshed on a schedule.
* You can tolerate refresh lag.

**Materialized view requirements**

* Must have a **unique index** to use `REFRESH … CONCURRENTLY`.
* Refresh must be **outside** an explicit transaction block.

Example:

```sql
CREATE MATERIALIZED VIEW mv_artifact_risk AS
SELECT tenant_id, artifact_purl, max(score) AS risk_score
FROM open_findings
GROUP BY tenant_id, artifact_purl;

CREATE UNIQUE INDEX ux_mv_artifact_risk
ON mv_artifact_risk(tenant_id, artifact_purl);
```

## Prefer projection tables over MV when

* You need **incremental updates** (on import/scan completion).
* You need deterministic “point-in-time” snapshots per manifest.

**Checklist**

* [ ] Every MV has refresh cadence + owner (which worker/job triggers it)
* [ ] UI/API queries do not depend on a heavy non-materialized view
* [ ] If “refresh cost” scales with whole dataset, projection table exists instead

---

# 7) Queue and outbox patterns that do not deadlock

## Claim pattern (short transaction)

```sql
WITH cte AS (
  SELECT id
  FROM job_queue
  WHERE kind = $1
    AND run_after <= now()
    AND locked_at IS NULL
  ORDER BY id
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
UPDATE job_queue j
SET locked_at = now(),
    locked_by = $2
FROM cte
WHERE j.id = cte.id
RETURNING j.*;
```

**Rules**

* Claim + commit quickly.
* Do work outside the lock.
* On completion: update row to done (or delete if you want compactness).
* On failure: increment attempts, set `run_after = now() + backoff`, release lock.

**Checklist**

* [ ] Worker does not keep transaction open while scanning/importing
* [ ] Backoff policy is encoded (in DB columns) and observable
* [ ] DLQ condition exists (attempts > N) and is queryable

---

# 8) Query performance review checklist (what to require in PRs)

For each new endpoint/query:

* [ ] Provide the query (SQL) and the intended parameters.
* [ ] Provide `EXPLAIN (ANALYZE, BUFFERS)` from a dataset size that resembles staging.
* [ ] Identify the serving index(es).
* [ ] Confirm row estimates are not wildly wrong (if they are: stats or predicate mismatch).
* [ ] Confirm it is tenant-scoped and uses the tenant-leading index.

**Common fixes**

* Replace `IN (SELECT …)` with `EXISTS` for correlated checks.
* Replace `ORDER BY … LIMIT` without index with an index that matches ordering.
* Avoid exploding joins with JSON arrays; pre-extract.

---

# 9) Vacuum, bloat, and “why is disk growing”

## Design to avoid bloat

* Append-only for large docs and events.
* If frequent updates are needed, isolate hot-updated columns into a smaller table.

Example split:

* `job_queue_payload` (stable)
* `job_queue_state` (locked/status/attempts updated frequently)

**Checklist**

* [ ] Large frequently-updated JSONB tables have been questioned
* [ ] Updates do not rewrite big TOAST values repeatedly
* [ ] Retention is partition-drop where possible

---

# 10) Migration safety rules (prevent production locks)

* Index creation: `CREATE INDEX CONCURRENTLY`.
* Dropping indexes: `DROP INDEX CONCURRENTLY`.
* New column with default on large table:

  1. `ADD COLUMN` nullable
  2. backfill in batches
  3. `ALTER COLUMN SET NOT NULL`
  4. add default if needed

**Checklist**

* [ ] No long-running `ALTER TABLE` on huge tables without plan
* [ ] Any new NOT NULL constraint is staged safely

---

# 11) Stella Ops-specific schema guidance (SBOM/VEX/Finding)

## Minimum recommended normalized tables

Even if you keep raw SBOM/VEX JSON:

* `sbom_document` (raw, immutable)
* `sbom_component` (extracted components)
* `vex_document` (raw, immutable)
* `vex_statement` (extracted statements per CVE/component)
* `finding` (facts: CVE ↔ component ↔ artifact ↔ scan_run)
* `scan_manifest` (determinism: feed versions/hashes, policy hash)
* `scan_run` (links results to manifest)

**Key gap detectors**

* If “find all artifacts affected by CVE X” is slow → missing `finding` indexing.
* If “component search” is slow → missing `sbom_component` and its indexes.
* If “replay this scan” is not exact → missing `scan_manifest` + feed import hashes.

---

# 12) Minimal “definition of done” for a new table/view

A PR adding a table/view is incomplete unless it includes:

* [ ] Table classification (SOR / projection / queue / event)
* [ ] Primary key and idempotency unique key
* [ ] Tenant scoping strategy
* [ ] Index plan mapped to known queries
* [ ] Retention plan (especially for event/projection tables)
* [ ] Refresh/update plan if derived
* [ ] Example query + `EXPLAIN` for the top 1–3 access patterns

---

If you want this as a single drop-in repo document, tell me the target path (e.g., `/docs/platform/postgres-table-view-guidelines.md`) and I will format it exactly as a team-facing guideline, including a one-page “Architecture/Performance Gaps” review form that engineers can paste into PR descriptions.