Files
git.stella-ops.org/docs/operations/postgresql-patterns-runbook.md
StellaOps Bot b058dbe031 up
2025-12-14 23:20:14 +02:00

10 KiB

PostgreSQL Patterns Operational Runbook

Version: 1.0 Last Updated: 2025-12-14 Related Sprints: SPRINT_3420, SPRINT_3421, SPRINT_3422, SPRINT_3423


Table of Contents

  1. Row-Level Security (RLS)
  2. Bitemporal Unknowns
  3. Time-Based Partitioning
  4. Generated Columns
  5. Troubleshooting

1. Row-Level Security (RLS)

1.1 Overview

RLS provides database-level tenant isolation. All tenant-scoped tables enforce that queries can only access rows matching the current tenant context.

1.2 Setting Tenant Context

Before executing queries, the application must set the tenant context:

-- Set tenant context for the session
SELECT set_config('app.tenant_id', 'my-tenant-id', false);

-- Verify tenant context
SELECT current_setting('app.tenant_id', true);

1.3 Validating RLS Configuration

Run the validation script to check RLS status:

psql -f deploy/postgres-validation/001_validate_rls.sql

Expected output shows:

  • All tenant-scoped tables with rls_enabled = true
  • All tenant-scoped tables with rls_forced = true
  • One policy per table named *_tenant_isolation

1.4 Bypassing RLS (Admin Operations)

For migrations or cross-tenant operations, use the admin role:

-- Grant admin role to current user (one-time setup)
GRANT scheduler_admin TO your_migration_user;

-- Execute as admin (bypasses RLS)
SET ROLE scheduler_admin;
SELECT * FROM scheduler.runs; -- Returns all tenants

-- Return to normal role
RESET ROLE;

1.5 Common Issues

Issue: Query returns no rows unexpectedly

-- Check if tenant context is set
SELECT current_setting('app.tenant_id', true);
-- If NULL or empty, set it before querying

Issue: "app.tenant_id session variable not set" error

-- Set the tenant context
SELECT set_config('app.tenant_id', 'tenant-123', false);

Issue: Need to query across tenants

-- Use admin bypass role
SET ROLE scheduler_admin;
-- Run your cross-tenant query
RESET ROLE;

2. Bitemporal Unknowns

2.1 Overview

The unknowns schema tracks ambiguities discovered during vulnerability scanning using bitemporal semantics:

  • Valid time (valid_from, valid_to): When the unknown was relevant in the real world
  • System time (sys_from, sys_to): When the system knew about the unknown

2.2 Querying Current Unknowns

-- Get all current open unknowns for a tenant
SELECT * FROM unknowns.current
WHERE tenant_id = 'my-tenant';

-- Or use the view with tenant context set
SELECT set_config('app.tenant_id', 'my-tenant', false);
SELECT * FROM unknowns.current;

2.3 Point-in-Time Queries

Query the state of unknowns at any point in history:

-- What unknowns existed on January 1, 2025?
SELECT * FROM unknowns.as_of(
    'my-tenant',
    '2025-01-01 00:00:00+00'::TIMESTAMPTZ
);

-- What did we know about unknowns on Jan 1, as of Jan 15?
SELECT * FROM unknowns.as_of(
    'my-tenant',
    '2025-01-01 00:00:00+00'::TIMESTAMPTZ,  -- valid_at
    '2025-01-15 00:00:00+00'::TIMESTAMPTZ   -- sys_at
);

2.4 Resolving Unknowns

-- Resolve an unknown (sets valid_to and resolution fields)
UPDATE unknowns.unknown
SET resolved_at = NOW(),
    resolution_type = 'feed_updated',
    resolution_ref = 'nvd-feed-2025-01',
    resolution_notes = 'NVD now covers this package',
    valid_to = NOW()
WHERE id = 'unknown-uuid'
  AND tenant_id = 'my-tenant';

2.5 Metrics Queries

-- Count unknowns by kind
SELECT * FROM unknowns.count_by_kind('my-tenant');

-- Count unknowns by severity
SELECT * FROM unknowns.count_by_severity('my-tenant');

3. Time-Based Partitioning

3.1 Overview

High-volume tables are partitioned by month for:

  • Efficient time-range queries (partition pruning)
  • O(1) data retention (drop partition vs DELETE)
  • Improved vacuum performance

3.2 Checking Partition Status

Run the validation script:

psql -f deploy/postgres-validation/002_validate_partitions.sql

Or query directly:

-- View all partitions with sizes
SELECT * FROM partition_mgmt.partition_stats
WHERE schema_name = 'scheduler'
ORDER BY table_name, partition_name;

3.3 Creating Future Partitions

Partitions should be created ahead of time. Run monthly:

-- Create partitions for next 3 months
SELECT partition_mgmt.create_monthly_partitions(
    'scheduler',    -- schema
    'audit',        -- table
    'created_at',   -- partition column
    CURRENT_DATE,   -- start from now
    3               -- months ahead
);

3.4 Archiving Old Partitions

Preview what would be archived:

-- Dry run: see what would be archived (12 month retention)
SELECT * FROM partition_mgmt.cleanup_old_partitions(
    'scheduler', 'audit', 12, 'archive', TRUE
);

Execute archiving:

-- Actually archive partitions older than 12 months
SELECT * FROM partition_mgmt.cleanup_old_partitions(
    'scheduler', 'audit', 12, 'archive', FALSE
);

3.5 Dropping Archived Partitions

After verifying archived data is backed up:

-- Drop archived partition
DROP TABLE archive.audit_2024_01;

3.6 Maintenance Checklist

Weekly:

  • Check partition_mgmt.partition_stats for unexpected growth
  • Verify default partitions are empty

Monthly:

  • Create partitions for 3 months ahead
  • Review partitions approaching retention cutoff

Quarterly:

  • Archive partitions beyond retention period
  • Verify archived data is backed up
  • Drop archived partitions if space needed

4. Generated Columns

4.1 Overview

Generated columns extract frequently-queried JSONB fields into indexed columns for 10-50x query speedup.

4.2 Affected Tables

Table Generated Columns Source
scheduler.runs finding_count, critical_count, high_count, medium_count, low_count stats JSONB
vuln.advisories provenance_source_key, provenance_feed_id, provenance_ingested_at provenance JSONB
vuln.advisory_affected purl_type, purl_name purl TEXT

4.3 Query Optimization

Before (slow):

SELECT * FROM scheduler.runs
WHERE (stats->>'findingCount')::int > 100;

After (fast):

SELECT * FROM scheduler.runs
WHERE finding_count > 100;

4.4 Adding New Generated Columns

-- Add a new generated column
ALTER TABLE your_table
    ADD COLUMN new_field TEXT
        GENERATED ALWAYS AS (jsonb_column->>'field_name') STORED;

-- Create index
CREATE INDEX ix_your_table_new_field
    ON your_table (new_field)
    WHERE new_field IS NOT NULL;

4.5 Monitoring

Check column usage in queries:

-- See which indexes are being used
SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE schemaname = 'scheduler'
ORDER BY idx_scan DESC;

5. Troubleshooting

5.1 RLS Issues

Symptom: Application gets "permission denied" or empty results

Diagnosis:

-- Check if RLS is enabled
SELECT tablename, rowsecurity, forcerowsecurity
FROM pg_tables WHERE schemaname = 'scheduler';

-- Check policies
SELECT * FROM pg_policies WHERE schemaname = 'scheduler';

-- Check tenant context
SELECT current_setting('app.tenant_id', true);

Resolution:

  1. Ensure set_config('app.tenant_id', ...) is called before queries
  2. Verify the tenant ID matches data in the table
  3. For admin operations, use bypass role

5.2 Partition Issues

Symptom: Inserts fail with "no partition of relation"

Diagnosis:

-- Check existing partitions
SELECT * FROM partition_mgmt.partition_stats
WHERE table_name = 'audit';

-- Check default partition for orphaned data
SELECT COUNT(*) FROM scheduler.audit_default;

Resolution:

-- Create missing partition
SELECT partition_mgmt.create_partition(
    'scheduler', 'audit', 'created_at',
    '2025-12-01'::DATE, '2026-01-01'::DATE
);

Symptom: Queries scanning all partitions (slow)

Diagnosis:

-- Check query plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM scheduler.audit
WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01';

Resolution:

  • Ensure WHERE clause includes partition key
  • Use >= and < instead of BETWEEN for clearer pruning

5.3 Bitemporal Query Issues

Symptom: as_of() returns unexpected results

Diagnosis:

-- Check raw temporal data
SELECT id, valid_from, valid_to, sys_from, sys_to
FROM unknowns.unknown
WHERE tenant_id = 'my-tenant'
ORDER BY valid_from;

Resolution:

  • Verify timestamps are in UTC
  • Check that valid_from <= query_time < valid_to
  • Check that sys_from <= query_time < sys_to

5.4 Generated Column Issues

Symptom: Generated column has NULL when source field exists

Diagnosis:

-- Check source field format
SELECT provenance->>'source_key', provenance_source_key
FROM vuln.advisories
WHERE id = 'advisory-uuid';

Resolution:

  • Verify JSONB field name matches exactly (case-sensitive)
  • Check for type casting issues in generation expression

Appendix A: Quick Reference

Schema Admin Roles

Schema Admin Role Bypass RLS
scheduler scheduler_admin Yes
notify notify_admin Yes
authority authority_admin Yes
vex vex_admin Yes
policy policy_admin Yes
unknowns unknowns_admin Yes

Helper Functions

Function Purpose
*_app.require_current_tenant() Returns tenant from session, raises if not set
unknowns.as_of(tenant, valid_at, sys_at) Point-in-time temporal query
unknowns.count_by_kind(tenant) Aggregate unknowns by kind
unknowns.count_by_severity(tenant) Aggregate unknowns by severity
partition_mgmt.create_monthly_partitions(...) Create partitions ahead
partition_mgmt.cleanup_old_partitions(...) Archive old partitions

Validation Scripts

Script Purpose
deploy/postgres-validation/001_validate_rls.sql Verify RLS configuration
deploy/postgres-validation/002_validate_partitions.sql Verify partition health