Files
git.stella-ops.org/docs/product-advisories/14-Dec-2025 - Evaluate PostgreSQL vs MongoDB for StellaOps.md
2025-12-14 16:23:44 +02:00

17 KiB
Raw Blame History

Heres a quick, practical cheatsheet on choosing PostgreSQL vs MongoDB for security/DevOps apps—plus how Id model SBOM/VEX and queues in StellaOps without adding moving parts.


PostgreSQL you can lean on (why it often wins for ops apps)

  • JSONB that flies: Store documents yet query like SQL. Add GIN indexes on JSONB fields for fast lookups (jsonb_ops general; jsonb_path_ops great for @> containment).
  • Queue pattern builtin: SELECT … FOR UPDATE SKIP LOCKED lets multiple workers pop jobs from the same table safely—no headofline blocking, no extra broker.
  • Cooperative locks: Advisory locks (session/transaction) for “atmostonce” sections or leader election.
  • Lightweight pub/sub: LISTEN/NOTIFY for async nudges between services (poke a worker to rescan, refresh cache, etc.).
  • Search included: Fulltext search (tsvector/tsquery) is native—no separate search service for moderate needs.
  • Serious backups: PITR with WAL archiving / pg_basebackup for deterministic rollbacks and offline bundles.

MongoDB facts to factor in

  • Flexible ingest: Schemaless docs make it easy to absorb varied telemetry and vendor feeds.
  • Horizontal scale: Sharding is mature for huge, readheavy datasets.
  • Consistency is a choice: Design embedding vs refs and when to use multidocument transactions.

A simple rule of thumb (StellaOpsstyle)

  • System of record: PostgreSQL (JSONB first).
  • Hot paths: Materialized views + JSONB GIN indexes.
  • Queues & coordination: PostgreSQL (skiplocked + advisory locks).
  • Cache/accel only: Valkey (ephemeral).
  • MongoDB: Optional for very large, readoptimized graph snapshots (e.g., periodically baked reachability graphs) if Postgres starts to strain.

Concrete patterns you can drop in today

1) SBOM/VEX storage (Postgres JSONB)

-- Documents
CREATE TABLE sbom (
  id BIGSERIAL PRIMARY KEY,
  artifact_purl TEXT NOT NULL,
  doc JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX sbom_purl_idx ON sbom(artifact_purl);
CREATE INDEX sbom_doc_gin ON sbom USING GIN (doc jsonb_path_ops);

-- Common queries
-- find components by name/version:
-- SELECT * FROM sbom WHERE doc @> '{"components":[{"name":"openssl","version":"3.0.14"}]}';

-- VEX
CREATE TABLE vex (
  id BIGSERIAL PRIMARY KEY,
  subject_purl TEXT NOT NULL,
  vex_doc JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX vex_subject_idx ON vex(subject_purl);
CREATE INDEX vex_doc_gin ON vex USING GIN (vex_doc jsonb_path_ops);

2) Hot reads via materialized views

CREATE MATERIALIZED VIEW mv_open_findings AS
SELECT
  s.artifact_purl,
  c->>'name' AS comp,
  c->>'version' AS ver,
  v.vex_doc
FROM sbom s
CROSS JOIN LATERAL jsonb_array_elements(s.doc->'components') c
LEFT JOIN vex v ON v.subject_purl = s.artifact_purl
-- add WHERE clauses to prefilter only actionable rows
;
CREATE INDEX mv_open_findings_idx ON mv_open_findings(artifact_purl, comp);

Refresh cadence: on feed import or via a scheduler; REFRESH MATERIALIZED VIEW CONCURRENTLY mv_open_findings;

3) Queue without a broker

CREATE TABLE job_queue(
  id BIGSERIAL PRIMARY KEY,
  kind TEXT NOT NULL,          -- e.g., 'scan', 'sbom-diff'
  payload JSONB NOT NULL,
  run_after TIMESTAMPTZ DEFAULT now(),
  attempts INT DEFAULT 0,
  locked_at TIMESTAMPTZ,
  locked_by TEXT
);
CREATE INDEX job_queue_ready_idx ON job_queue(kind, run_after);

-- Worker loop
WITH cte AS (
  SELECT id FROM job_queue
  WHERE kind = $1 AND run_after <= now() AND locked_at IS NULL
  ORDER BY id
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
UPDATE job_queue j
SET locked_at = now(), locked_by = $2
FROM cte
WHERE j.id = cte.id
RETURNING j.*;

Release/fail with: set locked_at=NULL, locked_by=NULL, attempts=attempts+1 or delete on success.

4) Advisory lock for singletons

-- Acquire (per tenant, per artifact)
SELECT pg_try_advisory_xact_lock(hashtextextended('recalc:'||tenant||':'||artifact, 0));

5) Nudge workers without a bus

NOTIFY stella_scan, json_build_object('purl', $1, 'priority', 5)::TEXT;
-- workers LISTEN stella_scan and enqueue quickly

When to add MongoDB

  • You need interactive exploration over hundreds of millions of nodes/edges (e.g., historical “proofofintegrity” graphs) where document fanout and denormalized reads beat relational joins.
  • Snapshot cadence is batchy (hourly/daily), and you can reemit snapshots deterministically from Postgres (single source of truth).
  • You want to isolate read spikes from the transactional core.

Snapshot pipe: Postgres → (ETL) → MongoDB collection {graph_id, node, edges[], attrs} with compound shard keys tuned to your UI traversal.


Why this fits StellaOps

  • Fewer moving parts onprem/airgapped.
  • Deterministic replays (PITR + immutable imports).
  • Clear performance levers (GIN indexes, MVs, skiplocked queues).
  • MongoDB stays optional, purposebuilt for giant read graphs—not a default dependency.

If you want, I can turn the above into readytorun .sql migrations and a small .NET 10 worker (Dapper/EF Core) that implements the queue loop + advisory locks + LISTEN/NOTIFY hooks. Below is a handoff-ready set of PostgreSQL tables/views engineering guidelines intended for developer review. It is written as a gap-finding checklist with concrete DDL patterns and performance red flags (Postgres as system of record, JSONB where useful, derived projections where needed).


PostgreSQL Tables & Views Engineering Guide

0) Non-negotiable principles

  1. Every hot query must have an index story. If you cannot name the index that serves it, you have a performance gap.
  2. Write path stays simple. Prefer append-only versioning to large updates (especially for JSONB).
  3. Multi-tenant must be explicit. Every core table includes tenant_id and indexes are tenant-prefixed.
  4. Derived data is a product. If the UI needs it fast, model it as a projection table or materialized view, not as an ad-hoc mega-join.
  5. Idempotency is enforced in the DB. Unique keys for imports/jobs/results; no “best effort” dedupe in application only.

1) Table taxonomy and what to look for

Use this to classify every table; each class has different indexing/retention/locking rules.

A. Source-of-truth (SOR) tables

Examples: sbom_document, vex_document, feed_import, scan_manifest, attestation.

  • Expect: immutable rows, versioning via new row inserts.
  • Gaps: frequent updates to large JSONB; missing content_hash; no unique idempotency key.

B. Projection tables (query-optimized)

Examples: open_findings, artifact_risk_summary, component_index.

  • Expect: denormalized, indexed for UI/API; refresh/update strategy defined.
  • Gaps: projections rebuilt from scratch too often; missing incremental update plan; no retention plan.

C. Queue/outbox tables

Examples: job_queue, outbox_events.

  • Expect: SKIP LOCKED claim pattern; retry + DLQ; minimal lock duration.
  • Gaps: holding row locks while doing work; missing partial index for “ready” jobs.

D. Audit/event tables

Examples: scan_run_event, decision_event, access_audit.

  • Expect: append-only; partitioned by time; BRIN on timestamps.
  • Gaps: single huge table without partitioning; slow deletes instead of partition drops.

2) Naming, keys, and required columns

Required columns per class

SOR documents (SBOM/VEX/Attestations)

  • tenant_id uuid
  • id bigserial (internal PK)
  • external_id uuid (optional API-facing id)
  • content_hash bytea (sha256) NOT NULL
  • doc jsonb NOT NULL
  • created_at timestamptz NOT NULL default now()
  • supersedes_id bigint NULL (version chain) OR version int

Checklist

  • Unique constraint exists: (tenant_id, content_hash)
  • Version strategy exists (supersedes/version) and is queryable
  • “Latest” access is index-backed (see §4)

Queue

  • tenant_id uuid (if multi-tenant)
  • id bigserial
  • kind text
  • payload jsonb
  • run_after timestamptz
  • attempts int
  • locked_at timestamptz NULL
  • locked_by text NULL
  • status smallint (optional; e.g., ready/running/done/dead)

Checklist

  • “Ready to claim” has a partial index (see §4)
  • Claim transaction is short (claim+commit; work outside lock)

3) JSONB rules that prevent “looks fine → melts in prod”

When JSONB is appropriate

  • Storing signed envelopes (DSSE), SBOM/VEX raw docs, vendor payloads.
  • Ingest-first scenarios where schema evolves.

When JSONB is a performance hazard

  • You frequently query deep keys/arrays (components, vulnerabilities, call paths).
  • You need sorting/aggregations on doc fields.

Mandatory pattern for hot JSON fields

  1. Keep the raw JSONB for fidelity.
  2. Extract hot keys into stored generated columns (or real columns), index those.
  3. Extract hot arrays into child tables (components, vulnerabilities).

Example:

CREATE TABLE sbom_document (
  id           bigserial PRIMARY KEY,
  tenant_id    uuid NOT NULL,
  artifact_purl text NOT NULL,
  content_hash bytea NOT NULL,
  doc          jsonb NOT NULL,
  created_at   timestamptz NOT NULL DEFAULT now(),

  -- hot keys as generated columns
  bom_format   text GENERATED ALWAYS AS ((doc->>'bomFormat')) STORED,
  spec_version text GENERATED ALWAYS AS ((doc->>'specVersion')) STORED
);

CREATE UNIQUE INDEX ux_sbom_doc_hash ON sbom_document(tenant_id, content_hash);
CREATE INDEX ix_sbom_doc_tenant_artifact ON sbom_document(tenant_id, artifact_purl, created_at DESC);
CREATE INDEX ix_sbom_doc_json_gin ON sbom_document USING GIN (doc jsonb_path_ops);
CREATE INDEX ix_sbom_doc_bomformat ON sbom_document(tenant_id, bom_format);

Checklist

  • Any query using doc->> in WHERE has either an expression index or a generated column index
  • Any query using jsonb_array_elements(...) in hot path has been replaced by a normalized child table or a projection table

4) Indexing standards (what devs must justify)

Core rules

  1. Tenant-first: INDEX(tenant_id, …) for anything read per tenant.
  2. Sort support: if query uses ORDER BY created_at DESC, index must end with created_at DESC.
  3. Partial indexes for sparse predicates (status/locked flags).
  4. BRIN for massive append-only time series.
  5. GIN jsonb_path_ops for containment (@>) on JSONB; avoid GIN for everything.

Required index patterns by use case

“Latest version per artifact”

If you store versions as rows:

-- supports: WHERE tenant_id=? AND artifact_purl=? ORDER BY created_at DESC LIMIT 1
CREATE INDEX ix_sbom_latest ON sbom_document(tenant_id, artifact_purl, created_at DESC);

Ready queue claims

CREATE INDEX ix_job_ready
ON job_queue(kind, run_after, id)
WHERE locked_at IS NULL;

-- Optional: tenant scoped
CREATE INDEX ix_job_ready_tenant
ON job_queue(tenant_id, kind, run_after, id)
WHERE locked_at IS NULL;

JSON key lookup (expression index)

-- supports: WHERE (doc->>'subject') = ?
CREATE INDEX ix_vex_subject_expr
ON vex_document(tenant_id, (doc->>'subject'));

Massive event table time filtering

CREATE INDEX brin_scan_events_time
ON scan_run_event USING BRIN (occurred_at);

Red flags

  • GIN index on a JSONB column + frequent updates = bloat and write amplification.
  • No partial index for queue readiness → sequential scans under load.
  • Composite indexes with wrong leading column order (e.g., created_at, tenant_id) → not used.

5) Partitioning and retention (avoid “infinite tables”)

Use partitioning for:

  • audit/events
  • scan run logs
  • large finding histories
  • anything > tens of millions rows with time-based access

Standard approach

  • Partition by occurred_at (monthly) for event/audit tables.
  • Retention by dropping partitions (fast and vacuum-free).

Example:

CREATE TABLE scan_run_event (
  tenant_id uuid NOT NULL,
  scan_run_id bigint NOT NULL,
  occurred_at timestamptz NOT NULL,
  event_type text NOT NULL,
  payload jsonb NOT NULL
) PARTITION BY RANGE (occurred_at);

Checklist

  • Partition creation/rollover process exists (migration or scheduler)
  • Retention is “DROP PARTITION”, not “DELETE WHERE occurred_at < …”
  • Each partition has needed local indexes (BRIN/time + tenant filters)

6) Views vs Materialized Views vs Projection Tables

Use a normal VIEW when

  • Its thin (renaming columns, simple joins) and not used in hot paths.

Use a MATERIALIZED VIEW when

  • It accelerates complex joins/aggregations and can be refreshed on a schedule.
  • You can tolerate refresh lag.

Materialized view requirements

  • Must have a unique index to use REFRESH … CONCURRENTLY.
  • Refresh must be outside an explicit transaction block.

Example:

CREATE MATERIALIZED VIEW mv_artifact_risk AS
SELECT tenant_id, artifact_purl, max(score) AS risk_score
FROM open_findings
GROUP BY tenant_id, artifact_purl;

CREATE UNIQUE INDEX ux_mv_artifact_risk
ON mv_artifact_risk(tenant_id, artifact_purl);

Prefer projection tables over MV when

  • You need incremental updates (on import/scan completion).
  • You need deterministic “point-in-time” snapshots per manifest.

Checklist

  • Every MV has refresh cadence + owner (which worker/job triggers it)
  • UI/API queries do not depend on a heavy non-materialized view
  • If “refresh cost” scales with whole dataset, projection table exists instead

7) Queue and outbox patterns that do not deadlock

Claim pattern (short transaction)

WITH cte AS (
  SELECT id
  FROM job_queue
  WHERE kind = $1
    AND run_after <= now()
    AND locked_at IS NULL
  ORDER BY id
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
UPDATE job_queue j
SET locked_at = now(),
    locked_by = $2
FROM cte
WHERE j.id = cte.id
RETURNING j.*;

Rules

  • Claim + commit quickly.
  • Do work outside the lock.
  • On completion: update row to done (or delete if you want compactness).
  • On failure: increment attempts, set run_after = now() + backoff, release lock.

Checklist

  • Worker does not keep transaction open while scanning/importing
  • Backoff policy is encoded (in DB columns) and observable
  • DLQ condition exists (attempts > N) and is queryable

8) Query performance review checklist (what to require in PRs)

For each new endpoint/query:

  • Provide the query (SQL) and the intended parameters.
  • Provide EXPLAIN (ANALYZE, BUFFERS) from a dataset size that resembles staging.
  • Identify the serving index(es).
  • Confirm row estimates are not wildly wrong (if they are: stats or predicate mismatch).
  • Confirm it is tenant-scoped and uses the tenant-leading index.

Common fixes

  • Replace IN (SELECT …) with EXISTS for correlated checks.
  • Replace ORDER BY … LIMIT without index with an index that matches ordering.
  • Avoid exploding joins with JSON arrays; pre-extract.

9) Vacuum, bloat, and “why is disk growing”

Design to avoid bloat

  • Append-only for large docs and events.
  • If frequent updates are needed, isolate hot-updated columns into a smaller table.

Example split:

  • job_queue_payload (stable)
  • job_queue_state (locked/status/attempts updated frequently)

Checklist

  • Large frequently-updated JSONB tables have been questioned
  • Updates do not rewrite big TOAST values repeatedly
  • Retention is partition-drop where possible

10) Migration safety rules (prevent production locks)

  • Index creation: CREATE INDEX CONCURRENTLY.

  • Dropping indexes: DROP INDEX CONCURRENTLY.

  • New column with default on large table:

    1. ADD COLUMN nullable
    2. backfill in batches
    3. ALTER COLUMN SET NOT NULL
    4. add default if needed

Checklist

  • No long-running ALTER TABLE on huge tables without plan
  • Any new NOT NULL constraint is staged safely

11) Stella Ops-specific schema guidance (SBOM/VEX/Finding)

Even if you keep raw SBOM/VEX JSON:

  • sbom_document (raw, immutable)
  • sbom_component (extracted components)
  • vex_document (raw, immutable)
  • vex_statement (extracted statements per CVE/component)
  • finding (facts: CVE ↔ component ↔ artifact ↔ scan_run)
  • scan_manifest (determinism: feed versions/hashes, policy hash)
  • scan_run (links results to manifest)

Key gap detectors

  • If “find all artifacts affected by CVE X” is slow → missing finding indexing.
  • If “component search” is slow → missing sbom_component and its indexes.
  • If “replay this scan” is not exact → missing scan_manifest + feed import hashes.

12) Minimal “definition of done” for a new table/view

A PR adding a table/view is incomplete unless it includes:

  • Table classification (SOR / projection / queue / event)
  • Primary key and idempotency unique key
  • Tenant scoping strategy
  • Index plan mapped to known queries
  • Retention plan (especially for event/projection tables)
  • Refresh/update plan if derived
  • Example query + EXPLAIN for the top 13 access patterns

If you want this as a single drop-in repo document, tell me the target path (e.g., /docs/platform/postgres-table-view-guidelines.md) and I will format it exactly as a team-facing guideline, including a one-page “Architecture/Performance Gaps” review form that engineers can paste into PR descriptions.