up
This commit is contained in:
425
docs/operations/postgresql-patterns-runbook.md
Normal file
425
docs/operations/postgresql-patterns-runbook.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# PostgreSQL Patterns Operational Runbook
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** 2025-12-14
|
||||
**Related Sprints:** SPRINT_3420, SPRINT_3421, SPRINT_3422, SPRINT_3423
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Row-Level Security (RLS)](#1-row-level-security-rls)
|
||||
2. [Bitemporal Unknowns](#2-bitemporal-unknowns)
|
||||
3. [Time-Based Partitioning](#3-time-based-partitioning)
|
||||
4. [Generated Columns](#4-generated-columns)
|
||||
5. [Troubleshooting](#5-troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## 1. Row-Level Security (RLS)
|
||||
|
||||
### 1.1 Overview
|
||||
|
||||
RLS provides database-level tenant isolation. All tenant-scoped tables enforce that queries can only access rows matching the current tenant context.
|
||||
|
||||
### 1.2 Setting Tenant Context
|
||||
|
||||
Before executing queries, the application must set the tenant context:
|
||||
|
||||
```sql
|
||||
-- Set tenant context for the session
|
||||
SELECT set_config('app.tenant_id', 'my-tenant-id', false);
|
||||
|
||||
-- Verify tenant context
|
||||
SELECT current_setting('app.tenant_id', true);
|
||||
```
|
||||
|
||||
### 1.3 Validating RLS Configuration
|
||||
|
||||
Run the validation script to check RLS status:
|
||||
|
||||
```bash
|
||||
psql -f deploy/postgres-validation/001_validate_rls.sql
|
||||
```
|
||||
|
||||
Expected output shows:
|
||||
- All tenant-scoped tables with `rls_enabled = true`
|
||||
- All tenant-scoped tables with `rls_forced = true`
|
||||
- One policy per table named `*_tenant_isolation`
|
||||
|
||||
### 1.4 Bypassing RLS (Admin Operations)
|
||||
|
||||
For migrations or cross-tenant operations, use the admin role:
|
||||
|
||||
```sql
|
||||
-- Grant admin role to current user (one-time setup)
|
||||
GRANT scheduler_admin TO your_migration_user;
|
||||
|
||||
-- Execute as admin (bypasses RLS)
|
||||
SET ROLE scheduler_admin;
|
||||
SELECT * FROM scheduler.runs; -- Returns all tenants
|
||||
|
||||
-- Return to normal role
|
||||
RESET ROLE;
|
||||
```
|
||||
|
||||
### 1.5 Common Issues
|
||||
|
||||
**Issue:** Query returns no rows unexpectedly
|
||||
```sql
|
||||
-- Check if tenant context is set
|
||||
SELECT current_setting('app.tenant_id', true);
|
||||
-- If NULL or empty, set it before querying
|
||||
```
|
||||
|
||||
**Issue:** "app.tenant_id session variable not set" error
|
||||
```sql
|
||||
-- Set the tenant context
|
||||
SELECT set_config('app.tenant_id', 'tenant-123', false);
|
||||
```
|
||||
|
||||
**Issue:** Need to query across tenants
|
||||
```sql
|
||||
-- Use admin bypass role
|
||||
SET ROLE scheduler_admin;
|
||||
-- Run your cross-tenant query
|
||||
RESET ROLE;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Bitemporal Unknowns
|
||||
|
||||
### 2.1 Overview
|
||||
|
||||
The `unknowns` schema tracks ambiguities discovered during vulnerability scanning using bitemporal semantics:
|
||||
- **Valid time** (`valid_from`, `valid_to`): When the unknown was relevant in the real world
|
||||
- **System time** (`sys_from`, `sys_to`): When the system knew about the unknown
|
||||
|
||||
### 2.2 Querying Current Unknowns
|
||||
|
||||
```sql
|
||||
-- Get all current open unknowns for a tenant
|
||||
SELECT * FROM unknowns.current
|
||||
WHERE tenant_id = 'my-tenant';
|
||||
|
||||
-- Or use the view with tenant context set
|
||||
SELECT set_config('app.tenant_id', 'my-tenant', false);
|
||||
SELECT * FROM unknowns.current;
|
||||
```
|
||||
|
||||
### 2.3 Point-in-Time Queries
|
||||
|
||||
Query the state of unknowns at any point in history:
|
||||
|
||||
```sql
|
||||
-- What unknowns existed on January 1, 2025?
|
||||
SELECT * FROM unknowns.as_of(
|
||||
'my-tenant',
|
||||
'2025-01-01 00:00:00+00'::TIMESTAMPTZ
|
||||
);
|
||||
|
||||
-- What did we know about unknowns on Jan 1, as of Jan 15?
|
||||
SELECT * FROM unknowns.as_of(
|
||||
'my-tenant',
|
||||
'2025-01-01 00:00:00+00'::TIMESTAMPTZ, -- valid_at
|
||||
'2025-01-15 00:00:00+00'::TIMESTAMPTZ -- sys_at
|
||||
);
|
||||
```
|
||||
|
||||
### 2.4 Resolving Unknowns
|
||||
|
||||
```sql
|
||||
-- Resolve an unknown (sets valid_to and resolution fields)
|
||||
UPDATE unknowns.unknown
|
||||
SET resolved_at = NOW(),
|
||||
resolution_type = 'feed_updated',
|
||||
resolution_ref = 'nvd-feed-2025-01',
|
||||
resolution_notes = 'NVD now covers this package',
|
||||
valid_to = NOW()
|
||||
WHERE id = 'unknown-uuid'
|
||||
AND tenant_id = 'my-tenant';
|
||||
```
|
||||
|
||||
### 2.5 Metrics Queries
|
||||
|
||||
```sql
|
||||
-- Count unknowns by kind
|
||||
SELECT * FROM unknowns.count_by_kind('my-tenant');
|
||||
|
||||
-- Count unknowns by severity
|
||||
SELECT * FROM unknowns.count_by_severity('my-tenant');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Time-Based Partitioning
|
||||
|
||||
### 3.1 Overview
|
||||
|
||||
High-volume tables are partitioned by month for:
|
||||
- Efficient time-range queries (partition pruning)
|
||||
- O(1) data retention (drop partition vs DELETE)
|
||||
- Improved vacuum performance
|
||||
|
||||
### 3.2 Checking Partition Status
|
||||
|
||||
Run the validation script:
|
||||
|
||||
```bash
|
||||
psql -f deploy/postgres-validation/002_validate_partitions.sql
|
||||
```
|
||||
|
||||
Or query directly:
|
||||
|
||||
```sql
|
||||
-- View all partitions with sizes
|
||||
SELECT * FROM partition_mgmt.partition_stats
|
||||
WHERE schema_name = 'scheduler'
|
||||
ORDER BY table_name, partition_name;
|
||||
```
|
||||
|
||||
### 3.3 Creating Future Partitions
|
||||
|
||||
Partitions should be created ahead of time. Run monthly:
|
||||
|
||||
```sql
|
||||
-- Create partitions for next 3 months
|
||||
SELECT partition_mgmt.create_monthly_partitions(
|
||||
'scheduler', -- schema
|
||||
'audit', -- table
|
||||
'created_at', -- partition column
|
||||
CURRENT_DATE, -- start from now
|
||||
3 -- months ahead
|
||||
);
|
||||
```
|
||||
|
||||
### 3.4 Archiving Old Partitions
|
||||
|
||||
Preview what would be archived:
|
||||
|
||||
```sql
|
||||
-- Dry run: see what would be archived (12 month retention)
|
||||
SELECT * FROM partition_mgmt.cleanup_old_partitions(
|
||||
'scheduler', 'audit', 12, 'archive', TRUE
|
||||
);
|
||||
```
|
||||
|
||||
Execute archiving:
|
||||
|
||||
```sql
|
||||
-- Actually archive partitions older than 12 months
|
||||
SELECT * FROM partition_mgmt.cleanup_old_partitions(
|
||||
'scheduler', 'audit', 12, 'archive', FALSE
|
||||
);
|
||||
```
|
||||
|
||||
### 3.5 Dropping Archived Partitions
|
||||
|
||||
After verifying archived data is backed up:
|
||||
|
||||
```sql
|
||||
-- Drop archived partition
|
||||
DROP TABLE archive.audit_2024_01;
|
||||
```
|
||||
|
||||
### 3.6 Maintenance Checklist
|
||||
|
||||
**Weekly:**
|
||||
- [ ] Check `partition_mgmt.partition_stats` for unexpected growth
|
||||
- [ ] Verify default partitions are empty
|
||||
|
||||
**Monthly:**
|
||||
- [ ] Create partitions for 3 months ahead
|
||||
- [ ] Review partitions approaching retention cutoff
|
||||
|
||||
**Quarterly:**
|
||||
- [ ] Archive partitions beyond retention period
|
||||
- [ ] Verify archived data is backed up
|
||||
- [ ] Drop archived partitions if space needed
|
||||
|
||||
---
|
||||
|
||||
## 4. Generated Columns
|
||||
|
||||
### 4.1 Overview
|
||||
|
||||
Generated columns extract frequently-queried JSONB fields into indexed columns for 10-50x query speedup.
|
||||
|
||||
### 4.2 Affected Tables
|
||||
|
||||
| Table | Generated Columns | Source |
|
||||
|-------|-------------------|--------|
|
||||
| `scheduler.runs` | `finding_count`, `critical_count`, `high_count`, `medium_count`, `low_count` | `stats` JSONB |
|
||||
| `vuln.advisories` | `provenance_source_key`, `provenance_feed_id`, `provenance_ingested_at` | `provenance` JSONB |
|
||||
| `vuln.advisory_affected` | `purl_type`, `purl_name` | `purl` TEXT |
|
||||
|
||||
### 4.3 Query Optimization
|
||||
|
||||
**Before (slow):**
|
||||
```sql
|
||||
SELECT * FROM scheduler.runs
|
||||
WHERE (stats->>'findingCount')::int > 100;
|
||||
```
|
||||
|
||||
**After (fast):**
|
||||
```sql
|
||||
SELECT * FROM scheduler.runs
|
||||
WHERE finding_count > 100;
|
||||
```
|
||||
|
||||
### 4.4 Adding New Generated Columns
|
||||
|
||||
```sql
|
||||
-- Add a new generated column
|
||||
ALTER TABLE your_table
|
||||
ADD COLUMN new_field TEXT
|
||||
GENERATED ALWAYS AS (jsonb_column->>'field_name') STORED;
|
||||
|
||||
-- Create index
|
||||
CREATE INDEX ix_your_table_new_field
|
||||
ON your_table (new_field)
|
||||
WHERE new_field IS NOT NULL;
|
||||
```
|
||||
|
||||
### 4.5 Monitoring
|
||||
|
||||
Check column usage in queries:
|
||||
|
||||
```sql
|
||||
-- See which indexes are being used
|
||||
SELECT indexrelname, idx_scan, idx_tup_read
|
||||
FROM pg_stat_user_indexes
|
||||
WHERE schemaname = 'scheduler'
|
||||
ORDER BY idx_scan DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
### 5.1 RLS Issues
|
||||
|
||||
**Symptom:** Application gets "permission denied" or empty results
|
||||
|
||||
**Diagnosis:**
|
||||
```sql
|
||||
-- Check if RLS is enabled
|
||||
SELECT tablename, rowsecurity, forcerowsecurity
|
||||
FROM pg_tables WHERE schemaname = 'scheduler';
|
||||
|
||||
-- Check policies
|
||||
SELECT * FROM pg_policies WHERE schemaname = 'scheduler';
|
||||
|
||||
-- Check tenant context
|
||||
SELECT current_setting('app.tenant_id', true);
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
1. Ensure `set_config('app.tenant_id', ...)` is called before queries
|
||||
2. Verify the tenant ID matches data in the table
|
||||
3. For admin operations, use bypass role
|
||||
|
||||
### 5.2 Partition Issues
|
||||
|
||||
**Symptom:** Inserts fail with "no partition of relation"
|
||||
|
||||
**Diagnosis:**
|
||||
```sql
|
||||
-- Check existing partitions
|
||||
SELECT * FROM partition_mgmt.partition_stats
|
||||
WHERE table_name = 'audit';
|
||||
|
||||
-- Check default partition for orphaned data
|
||||
SELECT COUNT(*) FROM scheduler.audit_default;
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
```sql
|
||||
-- Create missing partition
|
||||
SELECT partition_mgmt.create_partition(
|
||||
'scheduler', 'audit', 'created_at',
|
||||
'2025-12-01'::DATE, '2026-01-01'::DATE
|
||||
);
|
||||
```
|
||||
|
||||
**Symptom:** Queries scanning all partitions (slow)
|
||||
|
||||
**Diagnosis:**
|
||||
```sql
|
||||
-- Check query plan
|
||||
EXPLAIN (ANALYZE, BUFFERS)
|
||||
SELECT * FROM scheduler.audit
|
||||
WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01';
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
- Ensure WHERE clause includes partition key
|
||||
- Use `>=` and `<` instead of BETWEEN for clearer pruning
|
||||
|
||||
### 5.3 Bitemporal Query Issues
|
||||
|
||||
**Symptom:** `as_of()` returns unexpected results
|
||||
|
||||
**Diagnosis:**
|
||||
```sql
|
||||
-- Check raw temporal data
|
||||
SELECT id, valid_from, valid_to, sys_from, sys_to
|
||||
FROM unknowns.unknown
|
||||
WHERE tenant_id = 'my-tenant'
|
||||
ORDER BY valid_from;
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
- Verify timestamps are in UTC
|
||||
- Check that `valid_from <= query_time < valid_to`
|
||||
- Check that `sys_from <= query_time < sys_to`
|
||||
|
||||
### 5.4 Generated Column Issues
|
||||
|
||||
**Symptom:** Generated column has NULL when source field exists
|
||||
|
||||
**Diagnosis:**
|
||||
```sql
|
||||
-- Check source field format
|
||||
SELECT provenance->>'source_key', provenance_source_key
|
||||
FROM vuln.advisories
|
||||
WHERE id = 'advisory-uuid';
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
- Verify JSONB field name matches exactly (case-sensitive)
|
||||
- Check for type casting issues in generation expression
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Quick Reference
|
||||
|
||||
### Schema Admin Roles
|
||||
|
||||
| Schema | Admin Role | Bypass RLS |
|
||||
|--------|------------|------------|
|
||||
| scheduler | `scheduler_admin` | Yes |
|
||||
| notify | `notify_admin` | Yes |
|
||||
| authority | `authority_admin` | Yes |
|
||||
| vex | `vex_admin` | Yes |
|
||||
| policy | `policy_admin` | Yes |
|
||||
| unknowns | `unknowns_admin` | Yes |
|
||||
|
||||
### Helper Functions
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `*_app.require_current_tenant()` | Returns tenant from session, raises if not set |
|
||||
| `unknowns.as_of(tenant, valid_at, sys_at)` | Point-in-time temporal query |
|
||||
| `unknowns.count_by_kind(tenant)` | Aggregate unknowns by kind |
|
||||
| `unknowns.count_by_severity(tenant)` | Aggregate unknowns by severity |
|
||||
| `partition_mgmt.create_monthly_partitions(...)` | Create partitions ahead |
|
||||
| `partition_mgmt.cleanup_old_partitions(...)` | Archive old partitions |
|
||||
|
||||
### Validation Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `deploy/postgres-validation/001_validate_rls.sql` | Verify RLS configuration |
|
||||
| `deploy/postgres-validation/002_validate_partitions.sql` | Verify partition health |
|
||||
Reference in New Issue
Block a user