Here’s a crisp, opinionated storage blueprint you can hand to your Stella Ops devs right now, plus zero‑downtime conversion tactics so you can keep prototyping fast without painting yourself into a corner. # Module → store map (deterministic by default) * **Authority / OAuth / Accounts & Audit** * **PostgreSQL** as the primary source of truth. * Tables: `users`, `clients`, `oauth_tokens`, `roles`, `grants`, `audit_log`. * **Row‑Level Security (RLS)** on `users`, `grants`, `audit_log`; **STRICT FK + CHECK** constraints; **immutable UUID PKs**. * **Audit**: `audit_log(actor_id, action, entity, entity_id, at timestamptz default now(), diff jsonb)`. * **Why**: ACID + RLS keeps authz decisions and audit trails deterministic and reviewable. * **VEX & Vulnerability Writes** * **PostgreSQL** with **JSONB facts + relational decisions**. * Tables: `vuln_fact(jsonb)`, `vex_decision(package_id, vuln_id, status, rationale, proof_ref, updated_at)`. * **Materialized views** for triage queues, e.g. `mv_triage_hotset` (refresh on commit or scheduled). * **Why**: JSONB lets you ingest vendor‑shaped docs; decisions stay relational for joins, integrity, and explainability. * **Routing / Feature Flags / Rate‑limits** * **PostgreSQL** (truth) + **Redis** (cache). * Tables: `feature_flag(key, rules jsonb, version)`, `route(domain, service, instance_id, last_heartbeat)`, `rate_limiter(bucket, quota, interval)`. * Redis keys: `flag:{key}:{version}`, `route:{domain}`, `rl:{bucket}` with short TTLs. * **Why**: one canonical RDBMS for consistency; Redis for hot path latency. * **Unknowns Registry (ambiguity tracker)** * **PostgreSQL** with **temporal tables** (bitemporal pattern via `valid_from/valid_to`, `sys_from/sys_to`). * Table: `unknowns(subject_hash, kind, context jsonb, valid_from, valid_to, sys_from default now(), sys_to)`. * Views: `unknowns_current` where `valid_to is null`. * **Why**: preserves how/when uncertainty changed (critical for proofs and audits). * **Artifacts / SBOM / VEX files** * **OCI‑compatible CAS** (e.g., self‑hosted registry or MinIO bucket as content‑addressable store). * Keys by **digest** (`sha256:...`), metadata in Postgres `artifact(index)` with `digest`, `media_type`, `size`, `signatures`. * **Why**: blobs don’t belong in your RDBMS; use CAS for scale + cryptographic addressing. --- # PostgreSQL implementation essentials (copy/paste starters) * **RLS scaffold (Authority)**: ```sql alter table audit_log enable row level security; create policy p_audit_read_self on audit_log for select using (actor_id = current_setting('app.user_id')::uuid or exists (select 1 from grants g where g.user_id = current_setting('app.user_id')::uuid and g.role = 'auditor')); ``` * **JSONB facts + relational decisions**: ```sql create table vuln_fact ( id uuid primary key default gen_random_uuid(), source text not null, payload jsonb not null, received_at timestamptz default now() ); create table vex_decision ( package_id uuid not null, vuln_id text not null, status text check (status in ('not_affected','affected','fixed','under_investigation')), rationale text, proof_ref text, decided_at timestamptz default now(), primary key (package_id, vuln_id) ); ``` * **Materialized view for triage**: ```sql create materialized view mv_triage_hotset as select v.id as fact_id, v.payload->>'vuln' as vuln, v.received_at from vuln_fact v where (now() - v.received_at) < interval '7 days'; -- refresh concurrently via job ``` * **Temporal pattern (Unknowns)**: ```sql create table unknowns ( id uuid primary key default gen_random_uuid(), subject_hash text not null, kind text not null, context jsonb not null, valid_from timestamptz not null default now(), valid_to timestamptz, sys_from timestamptz not null default now(), sys_to timestamptz ); create view unknowns_current as select * from unknowns where valid_to is null; ``` --- # Conversion (not migration): zero‑downtime, prototype‑friendly Even if you’re “not migrating anything yet,” set these rails now so cutting over later is painless. 1. **Encode Mongo‑shaped docs into JSONB with versioned schemas** * Ingest pipeline writes to `*_fact(payload jsonb, schema_version int)`. * Add a **`validate(schema_version, payload)`** step in your service layer (JSON Schema or SQL checks). * Keep a **forward‑compatible view** that projects stable columns from JSONB (e.g., `payload->>'id' as vendor_id`) so downstream code doesn’t break when payload evolves. 2. **Outbox pattern for exactly‑once side‑effects** * Add `outbox(id, topic, key, payload jsonb, created_at, dispatched bool default false)`. * On the same transaction as your write, insert the outbox row. * A background dispatcher reads `dispatched=false`, publishes to MQ/Webhook, then marks `dispatched=true`. * Guarantees: no lost events, no duplicates to external systems. 3. **Parallel read adapters behind feature flags** * Keep old readers (e.g., Mongo driver) and new Postgres readers in the same service. * Gate by `feature_flag('pg_reads')` per tenant or env; flip gradually. * Add a **read‑diff monitor** that compares results and logs mismatches to `audit_log(diff)`. 4. **CDC for analytics without coupling** * Enable **logical replication** (pgoutput) on your key tables. * Stream changes into analyzers (reachability, heuristics) without hitting primaries. * This lets you keep OLTP clean and still power dashboards/tests. 5. **Materialized views & job cadence** * Refresh `mv_*` on a fixed cadence (e.g., every 2–5 minutes) or post‑commit for hot paths. * Keep **“cold path”** analytics in separate schemas (`analytics.*`) sourced from CDC. 6. **Cutover playbook (phased)** * Phase A (Dark Read): write Postgres, still serve from Mongo; compare results silently. * Phase B (Shadow Serve): 5–10% traffic from Postgres via flag; auto‑rollback switch. * Phase C (Authoritative): Postgres becomes source; Mongo path left for emergency read‑only. * Phase D (Retire): freeze Mongo, back up, remove writes, delete code paths after 2 stable sprints. --- # Rate‑limits & flags: single truth, fast edges * **Truth in Postgres** with versioned flag docs: ```sql create table feature_flag ( key text primary key, rules jsonb not null, version int not null default 1, updated_at timestamptz default now() ); ``` * **Edge cache** in Redis: * `SETEX flag:{key}:{version} ` * On update, bump `version`; readers compose cache key with version (cache‑busting without deletes). * **Rate limiting**: Persist quotas in Postgres; counters in Redis (`INCR rl:{bucket}:{window}`), with periodic reconciliation jobs writing summaries back to Postgres for audits. --- # CAS for SBOM/VEX/attestations * Push blobs to OCI/MinIO by digest; store only pointers in Postgres: ```sql create table artifact_index ( digest text primary key, media_type text not null, size bigint not null, created_at timestamptz default now(), signature_refs jsonb ); ``` * Benefits: immutable, deduped, easy to mirror into offline kits. --- # Guardrails your team should follow * **Always** wrap multi‑table writes (facts + outbox + decisions) in a single transaction. * **Prefer** `jsonb_path_query` for targeted reads; **avoid** scanning entire payloads. * **Enforce** RLS + least‑privilege roles; application sets `app.user_id` at session start. * **Version everything**: schemas, flags, materialized views; never “change in place” without bumping version. * **Observability**: expose `pg_stat_statements`, refresh latency for `mv_*`, outbox lag, Redis hit ratio, and RLS policy hits. --- If you want, I can turn this into: * ready‑to‑run **EF Core 10** migrations, * a **/docs/architecture/store-map.md** for your repo, * and a tiny **dev seed** (Docker + sample data) so the team can poke it immediately. Here’s a focused “PostgreSQL patterns per module” doc you can hand straight to your StellaOps devs. --- # StellaOps – PostgreSQL Patterns per Module **Scope:** How each StellaOps module should use PostgreSQL: schema patterns, constraints, RLS, indexing, and transaction rules. --- ## 0. Cross‑cutting PostgreSQL Rules These apply everywhere unless explicitly overridden. ### 0.1 Core conventions * **Schemas** * Use **one logical schema** per module: `authority`, `routing`, `vex`, `unknowns`, `artifact`. * Shared utilities (e.g., `outbox`) live in a `core` schema. * **Naming** * Tables: `snake_case`, singular: `user`, `feature_flag`, `vuln_fact`. * PK: `id uuid primary key`. * FKs: `_id` (e.g., `user_id`, `tenant_id`). * Timestamps: * `created_at timestamptz not null default now()` * `updated_at timestamptz not null default now()` * **Multi‑tenancy** * All tenant‑scoped tables must have `tenant_id uuid not null`. * Enforce tenant isolation with **RLS** on `tenant_id`. * **Time & timezones** * Always `timestamptz`, always store **UTC**, let the DB default `now()`. ### 0.2 RLS & security * RLS must be **enabled** on any table reachable from a user‑initiated path. * Every session must set: ```sql select set_config('app.user_id', '', false); select set_config('app.tenant_id', '', false); select set_config('app.roles', 'role1,role2', false); ``` * RLS policies: * Base policy: `tenant_id = current_setting('app.tenant_id')::uuid`. * Extra predicates for per‑user privacy (e.g., only see own tokens, only own API clients). * DB users: * Each module’s service has its **own role** with access only to its schema + `core.outbox`. ### 0.3 JSONB & versioning * Any JSONB column must have: * `payload jsonb not null`, * `schema_version int not null`. * Always index: * by source (`source` / `origin`), * by a small set of projected fields used in WHERE clauses. ### 0.4 Migrations * All schema changes via migrations, forward‑only. * Backwards‑compat pattern: 1. Add new columns / tables. 2. Backfill. 3. Flip code to use new structure (behind a feature flag). 4. After stability, remove old columns/paths. --- ## 1. Authority Module (auth, accounts, audit) **Schema:** `authority.*` **Mission:** identity, OAuth, roles, grants, audit. ### 1.1 Core tables & patterns * `authority.user` ```sql create table authority.user ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, email text not null, display_name text not null, is_disabled boolean not null default false, created_at timestamptz not null default now(), updated_at timestamptz not null default now(), unique (tenant_id, email) ); ``` * Never hard‑delete users: use `is_disabled` (and optionally `disabled_at`). * `authority.role` ```sql create table authority.role ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, name text not null, description text, created_at timestamptz not null default now(), updated_at timestamptz not null default now(), unique (tenant_id, name) ); ``` * `authority.grant` ```sql create table authority.grant ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, user_id uuid not null references authority.user(id), role_id uuid not null references authority.role(id), created_at timestamptz not null default now(), unique (tenant_id, user_id, role_id) ); ``` * `authority.oauth_client`, `authority.oauth_token` * Enforce token uniqueness: ```sql create table authority.oauth_token ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, user_id uuid not null references authority.user(id), client_id uuid not null references authority.oauth_client(id), token_hash text not null, -- hash, never raw expires_at timestamptz not null, created_at timestamptz not null default now(), revoked_at timestamptz, unique (token_hash) ); ``` ### 1.2 Audit log pattern * `authority.audit_log` ```sql create table authority.audit_log ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, actor_id uuid, -- null for system action text not null, entity_type text not null, entity_id uuid, at timestamptz not null default now(), diff jsonb not null ); ``` * Insert audit rows in the **same transaction** as the change. ### 1.3 RLS patterns * Base RLS: ```sql alter table authority.user enable row level security; create policy p_user_tenant on authority.user for all using (tenant_id = current_setting('app.tenant_id')::uuid); ``` * Extra policies: * Audit log is visible only to: * actor themself, or * users with an `auditor` or `admin` role. --- ## 2. Routing & Feature Flags Module **Schema:** `routing.*` **Mission:** where instances live, what features are on, rate‑limit configuration. ### 2.1 Feature flags * `routing.feature_flag` ```sql create table routing.feature_flag ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, key text not null, rules jsonb not null, version int not null default 1, is_enabled boolean not null default true, created_at timestamptz not null default now(), updated_at timestamptz not null default now(), unique (tenant_id, key) ); ``` * **Immutability by version**: * On update, **increment `version`**, don’t overwrite historical data. * Mirror changes into a history table via trigger: ```sql create table routing.feature_flag_history ( id uuid primary key default gen_random_uuid(), feature_flag_id uuid not null references routing.feature_flag(id), tenant_id uuid not null, key text not null, rules jsonb not null, version int not null, changed_at timestamptz not null default now(), changed_by uuid ); ``` ### 2.2 Instance registry * `routing.instance` ```sql create table routing.instance ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, instance_key text not null, domain text not null, last_heartbeat timestamptz not null default now(), status text not null check (status in ('active','draining','offline')), created_at timestamptz not null default now(), updated_at timestamptz not null default now(), unique (tenant_id, instance_key), unique (tenant_id, domain) ); ``` * Pattern: * Heartbeats use `update ... set last_heartbeat = now()` without touching other fields. * Routing logic filters by `status='active'` and heartbeat recency. ### 2.3 Rate‑limit configuration * Config in Postgres, counters in Redis: ```sql create table routing.rate_limit_config ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, key text not null, limit_per_interval int not null, interval_seconds int not null, created_at timestamptz not null default now(), updated_at timestamptz not null default now(), unique (tenant_id, key) ); ``` --- ## 3. VEX & Vulnerability Module **Schema:** `vex.*` **Mission:** ingest vulnerability facts, keep decisions & triage state. ### 3.1 Facts as JSONB * `vex.vuln_fact` ```sql create table vex.vuln_fact ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, source text not null, -- e.g. "nvd", "vendor_x_vex" external_id text, -- e.g. CVE, advisory id payload jsonb not null, schema_version int not null, received_at timestamptz not null default now() ); ``` * Index patterns: ```sql create index on vex.vuln_fact (tenant_id, source); create index on vex.vuln_fact (tenant_id, external_id); create index vuln_fact_payload_gin on vex.vuln_fact using gin (payload); ``` ### 3.2 Decisions as relational data * `vex.package` ```sql create table vex.package ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, name text not null, version text not null, ecosystem text not null, -- e.g. "pypi", "npm" created_at timestamptz not null default now(), unique (tenant_id, name, version, ecosystem) ); ``` * `vex.vex_decision` ```sql create table vex.vex_decision ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, package_id uuid not null references vex.package(id), vuln_id text not null, status text not null check (status in ( 'not_affected', 'affected', 'fixed', 'under_investigation' )), rationale text, proof_ref text, -- CAS digest or URL decided_by uuid, decided_at timestamptz not null default now(), created_at timestamptz not null default now(), updated_at timestamptz not null default now(), unique (tenant_id, package_id, vuln_id) ); ``` * For history: * Keep current state in `vex_decision`. * Mirror previous versions into `vex_decision_history` table (similar to feature flags). ### 3.3 Triage queues with materialized views * Example triage view: ```sql create materialized view vex.mv_triage_queue as select d.tenant_id, p.name, p.version, d.vuln_id, d.status, d.decided_at from vex.vex_decision d join vex.package p on p.id = d.package_id where d.status = 'under_investigation'; ``` * Refresh options: * Scheduled refresh (cron/worker). * Or **incremental** via triggers (more complex; use only when needed). ### 3.4 RLS for VEX * All tables scoped by `tenant_id`. * Typical policy: ```sql alter table vex.vex_decision enable row level security; create policy p_vex_tenant on vex.vex_decision for all using (tenant_id = current_setting('app.tenant_id')::uuid); ``` --- ## 4. Unknowns Module **Schema:** `unknowns.*` **Mission:** represent uncertainty and how it changes over time. ### 4.1 Bitemporal unknowns table * `unknowns.unknown` ```sql create table unknowns.unknown ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, subject_hash text not null, -- stable identifier for "thing" being reasoned about kind text not null, -- e.g. "reachability", "version_inferred" context jsonb not null, -- extra info: call graph node, evidence, etc. valid_from timestamptz not null default now(), valid_to timestamptz, sys_from timestamptz not null default now(), sys_to timestamptz, created_at timestamptz not null default now() ); ``` * “Exactly one open unknown per subject/kind” pattern: ```sql create unique index unknown_one_open_per_subject on unknowns.unknown (tenant_id, subject_hash, kind) where valid_to is null; ``` ### 4.2 Closing an unknown * Close by setting `valid_to` and `sys_to`: ```sql update unknowns.unknown set valid_to = now(), sys_to = now() where id = :id and valid_to is null; ``` * Never hard-delete; keep all rows for audit/explanation. ### 4.3 Convenience views * Current unknowns: ```sql create view unknowns.current as select * from unknowns.unknown where valid_to is null; ``` ### 4.4 RLS * Same tenant policy as other modules; unknowns are tenant‑scoped. --- ## 5. Artifact Index / CAS Module **Schema:** `artifact.*` **Mission:** index of immutable blobs stored in OCI / S3 / MinIO etc. ### 5.1 Artifact index * `artifact.artifact` ```sql create table artifact.artifact ( digest text primary key, -- e.g. "sha256:..." tenant_id uuid not null, media_type text not null, size_bytes bigint not null, created_at timestamptz not null default now(), created_by uuid ); ``` * Validate digest shape with a CHECK: ```sql alter table artifact.artifact add constraint chk_digest_format check (digest ~ '^sha[0-9]+:[0-9a-fA-F]{32,}$'); ``` ### 5.2 Signatures and tags * `artifact.signature` ```sql create table artifact.signature ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, artifact_digest text not null references artifact.artifact(digest), signer text not null, signature_payload jsonb not null, created_at timestamptz not null default now() ); ``` * `artifact.tag` ```sql create table artifact.tag ( id uuid primary key default gen_random_uuid(), tenant_id uuid not null, name text not null, artifact_digest text not null references artifact.artifact(digest), created_at timestamptz not null default now(), unique (tenant_id, name) ); ``` ### 5.3 RLS * Ensure that tenants cannot see each other’s digests, even if the CAS backing store is shared: ```sql alter table artifact.artifact enable row level security; create policy p_artifact_tenant on artifact.artifact for all using (tenant_id = current_setting('app.tenant_id')::uuid); ``` --- ## 6. Shared Outbox / Event Pattern **Schema:** `core.*` **Mission:** reliable events for external side‑effects. ### 6.1 Outbox table * `core.outbox` ```sql create table core.outbox ( id uuid primary key default gen_random_uuid(), tenant_id uuid, aggregate_type text not null, -- e.g. "vex_decision", "feature_flag" aggregate_id uuid, topic text not null, payload jsonb not null, created_at timestamptz not null default now(), dispatched_at timestamptz, dispatch_attempts int not null default 0, error text ); ``` ### 6.2 Usage rule * For anything that must emit an event (webhook, Kafka, notifications): * In the **same transaction** as the change: * write primary data (e.g. `vex.vex_decision`), * insert an `outbox` row. * A background worker: * pulls undelivered rows, * sends to external system, * updates `dispatched_at`/`dispatch_attempts`/`error`. --- ## 7. Indexing & Query Patterns per Module ### 7.1 Authority * Index: * `user(tenant_id, email)` * `grant(tenant_id, user_id)` * `oauth_token(token_hash)` * Typical query patterns: * Look up user by `tenant_id + email`. * All roles/grants for a user; design composite indexes accordingly. ### 7.2 Routing & Flags * Index: * `feature_flag(tenant_id, key)` * partial index on enabled flags: ```sql create index on routing.feature_flag (tenant_id, key) where is_enabled; ``` * `instance(tenant_id, status)`, `instance(tenant_id, domain)`. ### 7.3 VEX * Index: * `package(tenant_id, name, version, ecosystem)` * `vex_decision(tenant_id, package_id, vuln_id)` * GIN on `vuln_fact.payload` for flexible querying. ### 7.4 Unknowns * Index: * unique open unknown per subject/kind (shown above). * `unknown(tenant_id, kind)` for filtering by kind. ### 7.5 Artifact * Index: * PK on `digest`. * `signature(tenant_id, artifact_digest)`. * `tag(tenant_id, name)`. --- ## 8. Transaction & Isolation Guidelines * Default isolation: **READ COMMITTED**. * For critical sequences (e.g., provisioning a tenant, bulk role updates): * consider **REPEATABLE READ** or **SERIALIZABLE** and keep transactions short. * Pattern: * One transaction per logical user action (e.g., “set flag”, “record decision”). * Never do long‑running external calls inside a database transaction. --- If you’d like, next step I can turn this into: * concrete `CREATE SCHEMA` + `CREATE TABLE` migration files, and * a short “How to write queries in each module” cheat‑sheet for devs (with example SELECT/INSERT/UPDATE patterns).