Files
git.stella-ops.org/docs/product-advisories/01-Dec-2025 - PostgreSQL Patterns for Each StellaOps Module.md
2025-12-01 17:50:11 +02:00

23 KiB
Raw Blame History

Heres a crisp, opinionated storage blueprint you can hand to your StellaOps devs right now, plus zerodowntime conversion tactics so you can keep prototyping fast without painting yourself into a corner.

Module → store map (deterministic by default)

  • Authority / OAuth / Accounts & Audit

    • PostgreSQL as the primary source of truth.
    • Tables: users, clients, oauth_tokens, roles, grants, audit_log.
    • RowLevel Security (RLS) on users, grants, audit_log; STRICT FK + CHECK constraints; immutable UUID PKs.
    • Audit: audit_log(actor_id, action, entity, entity_id, at timestamptz default now(), diff jsonb).
    • Why: ACID + RLS keeps authz decisions and audit trails deterministic and reviewable.
  • VEX & Vulnerability Writes

    • PostgreSQL with JSONB facts + relational decisions.
    • Tables: vuln_fact(jsonb), vex_decision(package_id, vuln_id, status, rationale, proof_ref, updated_at).
    • Materialized views for triage queues, e.g. mv_triage_hotset (refresh on commit or scheduled).
    • Why: JSONB lets you ingest vendorshaped docs; decisions stay relational for joins, integrity, and explainability.
  • Routing / Feature Flags / Ratelimits

    • PostgreSQL (truth) + Redis (cache).
    • Tables: feature_flag(key, rules jsonb, version), route(domain, service, instance_id, last_heartbeat), rate_limiter(bucket, quota, interval).
    • Redis keys: flag:{key}:{version}, route:{domain}, rl:{bucket} with short TTLs.
    • Why: one canonical RDBMS for consistency; Redis for hot path latency.
  • Unknowns Registry (ambiguity tracker)

    • PostgreSQL with temporal tables (bitemporal pattern via valid_from/valid_to, sys_from/sys_to).
    • Table: unknowns(subject_hash, kind, context jsonb, valid_from, valid_to, sys_from default now(), sys_to).
    • Views: unknowns_current where valid_to is null.
    • Why: preserves how/when uncertainty changed (critical for proofs and audits).
  • Artifacts / SBOM / VEX files

    • OCIcompatible CAS (e.g., selfhosted registry or MinIO bucket as contentaddressable store).
    • Keys by digest (sha256:...), metadata in Postgres artifact(index) with digest, media_type, size, signatures.
    • Why: blobs dont belong in your RDBMS; use CAS for scale + cryptographic addressing.

PostgreSQL implementation essentials (copy/paste starters)

  • RLS scaffold (Authority):

    alter table audit_log enable row level security;
    create policy p_audit_read_self
      on audit_log for select
      using (actor_id = current_setting('app.user_id')::uuid or
             exists (select 1 from grants g where g.user_id = current_setting('app.user_id')::uuid and g.role = 'auditor'));
    
  • JSONB facts + relational decisions:

    create table vuln_fact (
      id uuid primary key default gen_random_uuid(),
      source text not null,
      payload jsonb not null,
      received_at timestamptz default now()
    );
    
    create table vex_decision (
      package_id uuid not null,
      vuln_id text not null,
      status text check (status in ('not_affected','affected','fixed','under_investigation')),
      rationale text,
      proof_ref text,
      decided_at timestamptz default now(),
      primary key (package_id, vuln_id)
    );
    
  • Materialized view for triage:

    create materialized view mv_triage_hotset as
    select v.id as fact_id, v.payload->>'vuln' as vuln, v.received_at
    from vuln_fact v
    where (now() - v.received_at) < interval '7 days';
    -- refresh concurrently via job
    
  • Temporal pattern (Unknowns):

    create table unknowns (
      id uuid primary key default gen_random_uuid(),
      subject_hash text not null,
      kind text not null,
      context jsonb not null,
      valid_from timestamptz not null default now(),
      valid_to   timestamptz,
      sys_from   timestamptz not null default now(),
      sys_to     timestamptz
    );
    
    create view unknowns_current as
    select * from unknowns where valid_to is null;
    

Conversion (not migration): zerodowntime, prototypefriendly

Even if youre “not migrating anything yet,” set these rails now so cutting over later is painless.

  1. Encode Mongoshaped docs into JSONB with versioned schemas
  • Ingest pipeline writes to *_fact(payload jsonb, schema_version int).
  • Add a validate(schema_version, payload) step in your service layer (JSON Schema or SQL checks).
  • Keep a forwardcompatible view that projects stable columns from JSONB (e.g., payload->>'id' as vendor_id) so downstream code doesnt break when payload evolves.
  1. Outbox pattern for exactlyonce sideeffects
  • Add outbox(id, topic, key, payload jsonb, created_at, dispatched bool default false).
  • On the same transaction as your write, insert the outbox row.
  • A background dispatcher reads dispatched=false, publishes to MQ/Webhook, then marks dispatched=true.
  • Guarantees: no lost events, no duplicates to external systems.
  1. Parallel read adapters behind feature flags
  • Keep old readers (e.g., Mongo driver) and new Postgres readers in the same service.
  • Gate by feature_flag('pg_reads') per tenant or env; flip gradually.
  • Add a readdiff monitor that compares results and logs mismatches to audit_log(diff).
  1. CDC for analytics without coupling
  • Enable logical replication (pgoutput) on your key tables.
  • Stream changes into analyzers (reachability, heuristics) without hitting primaries.
  • This lets you keep OLTP clean and still power dashboards/tests.
  1. Materialized views & job cadence
  • Refresh mv_* on a fixed cadence (e.g., every 25 minutes) or postcommit for hot paths.
  • Keep “cold path” analytics in separate schemas (analytics.*) sourced from CDC.
  1. Cutover playbook (phased)
  • Phase A (Dark Read): write Postgres, still serve from Mongo; compare results silently.
  • Phase B (Shadow Serve): 510% traffic from Postgres via flag; autorollback switch.
  • Phase C (Authoritative): Postgres becomes source; Mongo path left for emergency readonly.
  • Phase D (Retire): freeze Mongo, back up, remove writes, delete code paths after 2 stable sprints.

Ratelimits & flags: single truth, fast edges

  • Truth in Postgres with versioned flag docs:

    create table feature_flag (
      key text primary key,
      rules jsonb not null,
      version int not null default 1,
      updated_at timestamptz default now()
    );
    
  • Edge cache in Redis:

    • SETEX flag:{key}:{version} <ttl> <json>
    • On update, bump version; readers compose cache key with version (cachebusting without deletes).
  • Rate limiting: Persist quotas in Postgres; counters in Redis (INCR rl:{bucket}:{window}), with periodic reconciliation jobs writing summaries back to Postgres for audits.


CAS for SBOM/VEX/attestations

  • Push blobs to OCI/MinIO by digest; store only pointers in Postgres:

    create table artifact_index (
      digest text primary key,
      media_type text not null,
      size bigint not null,
      created_at timestamptz default now(),
      signature_refs jsonb
    );
    
  • Benefits: immutable, deduped, easy to mirror into offline kits.


Guardrails your team should follow

  • Always wrap multitable writes (facts + outbox + decisions) in a single transaction.
  • Prefer jsonb_path_query for targeted reads; avoid scanning entire payloads.
  • Enforce RLS + leastprivilege roles; application sets app.user_id at session start.
  • Version everything: schemas, flags, materialized views; never “change in place” without bumping version.
  • Observability: expose pg_stat_statements, refresh latency for mv_*, outbox lag, Redis hit ratio, and RLS policy hits.

If you want, I can turn this into:

  • readytorun EF Core 10 migrations,
  • a /docs/architecture/store-map.md for your repo,
  • and a tiny dev seed (Docker + sample data) so the team can poke it immediately. Heres a focused “PostgreSQL patterns per module” doc you can hand straight to your StellaOps devs.

StellaOps PostgreSQL Patterns per Module

Scope: How each StellaOps module should use PostgreSQL: schema patterns, constraints, RLS, indexing, and transaction rules.


0. Crosscutting PostgreSQL Rules

These apply everywhere unless explicitly overridden.

0.1 Core conventions

  • Schemas

    • Use one logical schema per module: authority, routing, vex, unknowns, artifact.
    • Shared utilities (e.g., outbox) live in a core schema.
  • Naming

    • Tables: snake_case, singular: user, feature_flag, vuln_fact.

    • PK: id uuid primary key.

    • FKs: <referenced_table>_id (e.g., user_id, tenant_id).

    • Timestamps:

      • created_at timestamptz not null default now()
      • updated_at timestamptz not null default now()
  • Multitenancy

    • All tenantscoped tables must have tenant_id uuid not null.
    • Enforce tenant isolation with RLS on tenant_id.
  • Time & timezones

    • Always timestamptz, always store UTC, let the DB default now().

0.2 RLS & security

  • RLS must be enabled on any table reachable from a userinitiated path.

  • Every session must set:

    select set_config('app.user_id',   '<uuid>', false);
    select set_config('app.tenant_id', '<uuid>', false);
    select set_config('app.roles',     'role1,role2', false);
    
  • RLS policies:

    • Base policy: tenant_id = current_setting('app.tenant_id')::uuid.
    • Extra predicates for peruser privacy (e.g., only see own tokens, only own API clients).
  • DB users:

    • Each modules service has its own role with access only to its schema + core.outbox.

0.3 JSONB & versioning

  • Any JSONB column must have:

    • payload jsonb not null,
    • schema_version int not null.
  • Always index:

    • by source (source / origin),
    • by a small set of projected fields used in WHERE clauses.

0.4 Migrations

  • All schema changes via migrations, forwardonly.

  • Backwardscompat pattern:

    1. Add new columns / tables.
    2. Backfill.
    3. Flip code to use new structure (behind a feature flag).
    4. After stability, remove old columns/paths.

1. Authority Module (auth, accounts, audit)

Schema: authority.* Mission: identity, OAuth, roles, grants, audit.

1.1 Core tables & patterns

  • authority.user

    create table authority.user (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      email text not null,
      display_name text not null,
      is_disabled boolean not null default false,
      created_at timestamptz not null default now(),
      updated_at timestamptz not null default now(),
      unique (tenant_id, email)
    );
    
    • Never harddelete users: use is_disabled (and optionally disabled_at).
  • authority.role

    create table authority.role (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      name text not null,
      description text,
      created_at timestamptz not null default now(),
      updated_at timestamptz not null default now(),
      unique (tenant_id, name)
    );
    
  • authority.grant

    create table authority.grant (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      user_id uuid not null references authority.user(id),
      role_id uuid not null references authority.role(id),
      created_at timestamptz not null default now(),
      unique (tenant_id, user_id, role_id)
    );
    
  • authority.oauth_client, authority.oauth_token

    • Enforce token uniqueness:

      create table authority.oauth_token (
        id uuid primary key default gen_random_uuid(),
        tenant_id uuid not null,
        user_id uuid not null references authority.user(id),
        client_id uuid not null references authority.oauth_client(id),
        token_hash text not null, -- hash, never raw
        expires_at timestamptz not null,
        created_at timestamptz not null default now(),
        revoked_at timestamptz,
        unique (token_hash)
      );
      

1.2 Audit log pattern

  • authority.audit_log

    create table authority.audit_log (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      actor_id uuid, -- null for system
      action text not null,
      entity_type text not null,
      entity_id uuid,
      at timestamptz not null default now(),
      diff jsonb not null
    );
    
  • Insert audit rows in the same transaction as the change.

1.3 RLS patterns

  • Base RLS:

    alter table authority.user enable row level security;
    
    create policy p_user_tenant on authority.user
      for all using (tenant_id = current_setting('app.tenant_id')::uuid);
    
  • Extra policies:

    • Audit log is visible only to:

      • actor themself, or
      • users with an auditor or admin role.

2. Routing & Feature Flags Module

Schema: routing.* Mission: where instances live, what features are on, ratelimit configuration.

2.1 Feature flags

  • routing.feature_flag

    create table routing.feature_flag (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      key text not null,
      rules jsonb not null,
      version int not null default 1,
      is_enabled boolean not null default true,
      created_at timestamptz not null default now(),
      updated_at timestamptz not null default now(),
      unique (tenant_id, key)
    );
    
  • Immutability by version:

    • On update, increment version, dont overwrite historical data.

    • Mirror changes into a history table via trigger:

      create table routing.feature_flag_history (
        id uuid primary key default gen_random_uuid(),
        feature_flag_id uuid not null references routing.feature_flag(id),
        tenant_id uuid not null,
        key text not null,
        rules jsonb not null,
        version int not null,
        changed_at timestamptz not null default now(),
        changed_by uuid
      );
      

2.2 Instance registry

  • routing.instance

    create table routing.instance (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      instance_key text not null,
      domain text not null,
      last_heartbeat timestamptz not null default now(),
      status text not null check (status in ('active','draining','offline')),
      created_at timestamptz not null default now(),
      updated_at timestamptz not null default now(),
      unique (tenant_id, instance_key),
      unique (tenant_id, domain)
    );
    
  • Pattern:

    • Heartbeats use update ... set last_heartbeat = now() without touching other fields.
    • Routing logic filters by status='active' and heartbeat recency.

2.3 Ratelimit configuration

  • Config in Postgres, counters in Redis:

    create table routing.rate_limit_config (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      key text not null,
      limit_per_interval int not null,
      interval_seconds int not null,
      created_at timestamptz not null default now(),
      updated_at timestamptz not null default now(),
      unique (tenant_id, key)
    );
    

3. VEX & Vulnerability Module

Schema: vex.* Mission: ingest vulnerability facts, keep decisions & triage state.

3.1 Facts as JSONB

  • vex.vuln_fact

    create table vex.vuln_fact (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      source text not null,               -- e.g. "nvd", "vendor_x_vex"
      external_id text,                   -- e.g. CVE, advisory id
      payload jsonb not null,
      schema_version int not null,
      received_at timestamptz not null default now()
    );
    
  • Index patterns:

    create index on vex.vuln_fact (tenant_id, source);
    create index on vex.vuln_fact (tenant_id, external_id);
    create index vuln_fact_payload_gin on vex.vuln_fact using gin (payload);
    

3.2 Decisions as relational data

  • vex.package

    create table vex.package (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      name text not null,
      version text not null,
      ecosystem text not null, -- e.g. "pypi", "npm"
      created_at timestamptz not null default now(),
      unique (tenant_id, name, version, ecosystem)
    );
    
  • vex.vex_decision

    create table vex.vex_decision (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      package_id uuid not null references vex.package(id),
      vuln_id text not null,
      status text not null check (status in (
        'not_affected', 'affected', 'fixed', 'under_investigation'
      )),
      rationale text,
      proof_ref text, -- CAS digest or URL
      decided_by uuid,
      decided_at timestamptz not null default now(),
      created_at timestamptz not null default now(),
      updated_at timestamptz not null default now(),
      unique (tenant_id, package_id, vuln_id)
    );
    
  • For history:

    • Keep current state in vex_decision.
    • Mirror previous versions into vex_decision_history table (similar to feature flags).

3.3 Triage queues with materialized views

  • Example triage view:

    create materialized view vex.mv_triage_queue as
    select
      d.tenant_id,
      p.name,
      p.version,
      d.vuln_id,
      d.status,
      d.decided_at
    from vex.vex_decision d
    join vex.package p on p.id = d.package_id
    where d.status = 'under_investigation';
    
  • Refresh options:

    • Scheduled refresh (cron/worker).
    • Or incremental via triggers (more complex; use only when needed).

3.4 RLS for VEX

  • All tables scoped by tenant_id.

  • Typical policy:

    alter table vex.vex_decision enable row level security;
    
    create policy p_vex_tenant on vex.vex_decision
      for all using (tenant_id = current_setting('app.tenant_id')::uuid);
    

4. Unknowns Module

Schema: unknowns.* Mission: represent uncertainty and how it changes over time.

4.1 Bitemporal unknowns table

  • unknowns.unknown

    create table unknowns.unknown (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      subject_hash text not null, -- stable identifier for "thing" being reasoned about
      kind text not null,         -- e.g. "reachability", "version_inferred"
      context jsonb not null,     -- extra info: call graph node, evidence, etc.
      valid_from timestamptz not null default now(),
      valid_to timestamptz,
      sys_from timestamptz not null default now(),
      sys_to timestamptz,
      created_at timestamptz not null default now()
    );
    
  • “Exactly one open unknown per subject/kind” pattern:

    create unique index unknown_one_open_per_subject
      on unknowns.unknown (tenant_id, subject_hash, kind)
      where valid_to is null;
    

4.2 Closing an unknown

  • Close by setting valid_to and sys_to:

    update unknowns.unknown
    set valid_to = now(), sys_to = now()
    where id = :id and valid_to is null;
    
  • Never hard-delete; keep all rows for audit/explanation.

4.3 Convenience views

  • Current unknowns:

    create view unknowns.current as
    select *
    from unknowns.unknown
    where valid_to is null;
    

4.4 RLS

  • Same tenant policy as other modules; unknowns are tenantscoped.

5. Artifact Index / CAS Module

Schema: artifact.* Mission: index of immutable blobs stored in OCI / S3 / MinIO etc.

5.1 Artifact index

  • artifact.artifact

    create table artifact.artifact (
      digest text primary key, -- e.g. "sha256:..."
      tenant_id uuid not null,
      media_type text not null,
      size_bytes bigint not null,
      created_at timestamptz not null default now(),
      created_by uuid
    );
    
  • Validate digest shape with a CHECK:

    alter table artifact.artifact
      add constraint chk_digest_format
      check (digest ~ '^sha[0-9]+:[0-9a-fA-F]{32,}$');
    

5.2 Signatures and tags

  • artifact.signature

    create table artifact.signature (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      artifact_digest text not null references artifact.artifact(digest),
      signer text not null,
      signature_payload jsonb not null,
      created_at timestamptz not null default now()
    );
    
  • artifact.tag

    create table artifact.tag (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid not null,
      name text not null,
      artifact_digest text not null references artifact.artifact(digest),
      created_at timestamptz not null default now(),
      unique (tenant_id, name)
    );
    

5.3 RLS

  • Ensure that tenants cannot see each others digests, even if the CAS backing store is shared:

    alter table artifact.artifact enable row level security;
    
    create policy p_artifact_tenant on artifact.artifact
      for all using (tenant_id = current_setting('app.tenant_id')::uuid);
    

6. Shared Outbox / Event Pattern

Schema: core.* Mission: reliable events for external sideeffects.

6.1 Outbox table

  • core.outbox

    create table core.outbox (
      id uuid primary key default gen_random_uuid(),
      tenant_id uuid,
      aggregate_type text not null, -- e.g. "vex_decision", "feature_flag"
      aggregate_id uuid,
      topic text not null,
      payload jsonb not null,
      created_at timestamptz not null default now(),
      dispatched_at timestamptz,
      dispatch_attempts int not null default 0,
      error text
    );
    

6.2 Usage rule

  • For anything that must emit an event (webhook, Kafka, notifications):

    • In the same transaction as the change:

      • write primary data (e.g. vex.vex_decision),
      • insert an outbox row.
    • A background worker:

      • pulls undelivered rows,
      • sends to external system,
      • updates dispatched_at/dispatch_attempts/error.

7. Indexing & Query Patterns per Module

7.1 Authority

  • Index:

    • user(tenant_id, email)
    • grant(tenant_id, user_id)
    • oauth_token(token_hash)
  • Typical query patterns:

    • Look up user by tenant_id + email.
    • All roles/grants for a user; design composite indexes accordingly.

7.2 Routing & Flags

  • Index:

    • feature_flag(tenant_id, key)

    • partial index on enabled flags:

      create index on routing.feature_flag (tenant_id, key)
        where is_enabled;
      
    • instance(tenant_id, status), instance(tenant_id, domain).

7.3 VEX

  • Index:

    • package(tenant_id, name, version, ecosystem)
    • vex_decision(tenant_id, package_id, vuln_id)
    • GIN on vuln_fact.payload for flexible querying.

7.4 Unknowns

  • Index:

    • unique open unknown per subject/kind (shown above).
    • unknown(tenant_id, kind) for filtering by kind.

7.5 Artifact

  • Index:

    • PK on digest.
    • signature(tenant_id, artifact_digest).
    • tag(tenant_id, name).

8. Transaction & Isolation Guidelines

  • Default isolation: READ COMMITTED.

  • For critical sequences (e.g., provisioning a tenant, bulk role updates):

    • consider REPEATABLE READ or SERIALIZABLE and keep transactions short.
  • Pattern:

    • One transaction per logical user action (e.g., “set flag”, “record decision”).
    • Never do longrunning external calls inside a database transaction.

If youd like, next step I can turn this into:

  • concrete CREATE SCHEMA + CREATE TABLE migration files, and
  • a short “How to write queries in each module” cheatsheet for devs (with example SELECT/INSERT/UPDATE patterns).