add advisories

This commit is contained in:
master
2025-12-01 17:50:11 +02:00
parent c11d87d252
commit 790801f329
7 changed files with 3723 additions and 0 deletions

View File

@@ -0,0 +1,819 @@
Heres a crisp, opinionated storage blueprint you can hand to your StellaOps devs right now, plus zerodowntime conversion tactics so you can keep prototyping fast without painting yourself into a corner.
# Module → store map (deterministic by default)
* **Authority / OAuth / Accounts & Audit**
* **PostgreSQL** as the primary source of truth.
* Tables: `users`, `clients`, `oauth_tokens`, `roles`, `grants`, `audit_log`.
* **RowLevel Security (RLS)** on `users`, `grants`, `audit_log`; **STRICT FK + CHECK** constraints; **immutable UUID PKs**.
* **Audit**: `audit_log(actor_id, action, entity, entity_id, at timestamptz default now(), diff jsonb)`.
* **Why**: ACID + RLS keeps authz decisions and audit trails deterministic and reviewable.
* **VEX & Vulnerability Writes**
* **PostgreSQL** with **JSONB facts + relational decisions**.
* Tables: `vuln_fact(jsonb)`, `vex_decision(package_id, vuln_id, status, rationale, proof_ref, updated_at)`.
* **Materialized views** for triage queues, e.g. `mv_triage_hotset` (refresh on commit or scheduled).
* **Why**: JSONB lets you ingest vendorshaped docs; decisions stay relational for joins, integrity, and explainability.
* **Routing / Feature Flags / Ratelimits**
* **PostgreSQL** (truth) + **Redis** (cache).
* Tables: `feature_flag(key, rules jsonb, version)`, `route(domain, service, instance_id, last_heartbeat)`, `rate_limiter(bucket, quota, interval)`.
* Redis keys: `flag:{key}:{version}`, `route:{domain}`, `rl:{bucket}` with short TTLs.
* **Why**: one canonical RDBMS for consistency; Redis for hot path latency.
* **Unknowns Registry (ambiguity tracker)**
* **PostgreSQL** with **temporal tables** (bitemporal pattern via `valid_from/valid_to`, `sys_from/sys_to`).
* Table: `unknowns(subject_hash, kind, context jsonb, valid_from, valid_to, sys_from default now(), sys_to)`.
* Views: `unknowns_current` where `valid_to is null`.
* **Why**: preserves how/when uncertainty changed (critical for proofs and audits).
* **Artifacts / SBOM / VEX files**
* **OCIcompatible CAS** (e.g., selfhosted registry or MinIO bucket as contentaddressable store).
* Keys by **digest** (`sha256:...`), metadata in Postgres `artifact(index)` with `digest`, `media_type`, `size`, `signatures`.
* **Why**: blobs dont belong in your RDBMS; use CAS for scale + cryptographic addressing.
---
# PostgreSQL implementation essentials (copy/paste starters)
* **RLS scaffold (Authority)**:
```sql
alter table audit_log enable row level security;
create policy p_audit_read_self
on audit_log for select
using (actor_id = current_setting('app.user_id')::uuid or
exists (select 1 from grants g where g.user_id = current_setting('app.user_id')::uuid and g.role = 'auditor'));
```
* **JSONB facts + relational decisions**:
```sql
create table vuln_fact (
id uuid primary key default gen_random_uuid(),
source text not null,
payload jsonb not null,
received_at timestamptz default now()
);
create table vex_decision (
package_id uuid not null,
vuln_id text not null,
status text check (status in ('not_affected','affected','fixed','under_investigation')),
rationale text,
proof_ref text,
decided_at timestamptz default now(),
primary key (package_id, vuln_id)
);
```
* **Materialized view for triage**:
```sql
create materialized view mv_triage_hotset as
select v.id as fact_id, v.payload->>'vuln' as vuln, v.received_at
from vuln_fact v
where (now() - v.received_at) < interval '7 days';
-- refresh concurrently via job
```
* **Temporal pattern (Unknowns)**:
```sql
create table unknowns (
id uuid primary key default gen_random_uuid(),
subject_hash text not null,
kind text not null,
context jsonb not null,
valid_from timestamptz not null default now(),
valid_to timestamptz,
sys_from timestamptz not null default now(),
sys_to timestamptz
);
create view unknowns_current as
select * from unknowns where valid_to is null;
```
---
# Conversion (not migration): zerodowntime, prototypefriendly
Even if youre “not migrating anything yet,” set these rails now so cutting over later is painless.
1. **Encode Mongoshaped docs into JSONB with versioned schemas**
* Ingest pipeline writes to `*_fact(payload jsonb, schema_version int)`.
* Add a **`validate(schema_version, payload)`** step in your service layer (JSON Schema or SQL checks).
* Keep a **forwardcompatible view** that projects stable columns from JSONB (e.g., `payload->>'id' as vendor_id`) so downstream code doesnt break when payload evolves.
2. **Outbox pattern for exactlyonce sideeffects**
* Add `outbox(id, topic, key, payload jsonb, created_at, dispatched bool default false)`.
* On the same transaction as your write, insert the outbox row.
* A background dispatcher reads `dispatched=false`, publishes to MQ/Webhook, then marks `dispatched=true`.
* Guarantees: no lost events, no duplicates to external systems.
3. **Parallel read adapters behind feature flags**
* Keep old readers (e.g., Mongo driver) and new Postgres readers in the same service.
* Gate by `feature_flag('pg_reads')` per tenant or env; flip gradually.
* Add a **readdiff monitor** that compares results and logs mismatches to `audit_log(diff)`.
4. **CDC for analytics without coupling**
* Enable **logical replication** (pgoutput) on your key tables.
* Stream changes into analyzers (reachability, heuristics) without hitting primaries.
* This lets you keep OLTP clean and still power dashboards/tests.
5. **Materialized views & job cadence**
* Refresh `mv_*` on a fixed cadence (e.g., every 25 minutes) or postcommit for hot paths.
* Keep **“cold path”** analytics in separate schemas (`analytics.*`) sourced from CDC.
6. **Cutover playbook (phased)**
* Phase A (Dark Read): write Postgres, still serve from Mongo; compare results silently.
* Phase B (Shadow Serve): 510% traffic from Postgres via flag; autorollback switch.
* Phase C (Authoritative): Postgres becomes source; Mongo path left for emergency readonly.
* Phase D (Retire): freeze Mongo, back up, remove writes, delete code paths after 2 stable sprints.
---
# Ratelimits & flags: single truth, fast edges
* **Truth in Postgres** with versioned flag docs:
```sql
create table feature_flag (
key text primary key,
rules jsonb not null,
version int not null default 1,
updated_at timestamptz default now()
);
```
* **Edge cache** in Redis:
* `SETEX flag:{key}:{version} <ttl> <json>`
* On update, bump `version`; readers compose cache key with version (cachebusting without deletes).
* **Rate limiting**: Persist quotas in Postgres; counters in Redis (`INCR rl:{bucket}:{window}`), with periodic reconciliation jobs writing summaries back to Postgres for audits.
---
# CAS for SBOM/VEX/attestations
* Push blobs to OCI/MinIO by digest; store only pointers in Postgres:
```sql
create table artifact_index (
digest text primary key,
media_type text not null,
size bigint not null,
created_at timestamptz default now(),
signature_refs jsonb
);
```
* Benefits: immutable, deduped, easy to mirror into offline kits.
---
# Guardrails your team should follow
* **Always** wrap multitable writes (facts + outbox + decisions) in a single transaction.
* **Prefer** `jsonb_path_query` for targeted reads; **avoid** scanning entire payloads.
* **Enforce** RLS + leastprivilege roles; application sets `app.user_id` at session start.
* **Version everything**: schemas, flags, materialized views; never “change in place” without bumping version.
* **Observability**: expose `pg_stat_statements`, refresh latency for `mv_*`, outbox lag, Redis hit ratio, and RLS policy hits.
---
If you want, I can turn this into:
* readytorun **EF Core 10** migrations,
* a **/docs/architecture/store-map.md** for your repo,
* and a tiny **dev seed** (Docker + sample data) so the team can poke it immediately.
Heres a focused “PostgreSQL patterns per module” doc you can hand straight to your StellaOps devs.
---
# StellaOps PostgreSQL Patterns per Module
**Scope:** How each StellaOps module should use PostgreSQL: schema patterns, constraints, RLS, indexing, and transaction rules.
---
## 0. Crosscutting PostgreSQL Rules
These apply everywhere unless explicitly overridden.
### 0.1 Core conventions
* **Schemas**
* Use **one logical schema** per module: `authority`, `routing`, `vex`, `unknowns`, `artifact`.
* Shared utilities (e.g., `outbox`) live in a `core` schema.
* **Naming**
* Tables: `snake_case`, singular: `user`, `feature_flag`, `vuln_fact`.
* PK: `id uuid primary key`.
* FKs: `<referenced_table>_id` (e.g., `user_id`, `tenant_id`).
* Timestamps:
* `created_at timestamptz not null default now()`
* `updated_at timestamptz not null default now()`
* **Multitenancy**
* All tenantscoped tables must have `tenant_id uuid not null`.
* Enforce tenant isolation with **RLS** on `tenant_id`.
* **Time & timezones**
* Always `timestamptz`, always store **UTC**, let the DB default `now()`.
### 0.2 RLS & security
* RLS must be **enabled** on any table reachable from a userinitiated path.
* Every session must set:
```sql
select set_config('app.user_id', '<uuid>', false);
select set_config('app.tenant_id', '<uuid>', false);
select set_config('app.roles', 'role1,role2', false);
```
* RLS policies:
* Base policy: `tenant_id = current_setting('app.tenant_id')::uuid`.
* Extra predicates for peruser privacy (e.g., only see own tokens, only own API clients).
* DB users:
* Each modules service has its **own role** with access only to its schema + `core.outbox`.
### 0.3 JSONB & versioning
* Any JSONB column must have:
* `payload jsonb not null`,
* `schema_version int not null`.
* Always index:
* by source (`source` / `origin`),
* by a small set of projected fields used in WHERE clauses.
### 0.4 Migrations
* All schema changes via migrations, forwardonly.
* Backwardscompat pattern:
1. Add new columns / tables.
2. Backfill.
3. Flip code to use new structure (behind a feature flag).
4. After stability, remove old columns/paths.
---
## 1. Authority Module (auth, accounts, audit)
**Schema:** `authority.*`
**Mission:** identity, OAuth, roles, grants, audit.
### 1.1 Core tables & patterns
* `authority.user`
```sql
create table authority.user (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
email text not null,
display_name text not null,
is_disabled boolean not null default false,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, email)
);
```
* Never harddelete users: use `is_disabled` (and optionally `disabled_at`).
* `authority.role`
```sql
create table authority.role (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
name text not null,
description text,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, name)
);
```
* `authority.grant`
```sql
create table authority.grant (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
user_id uuid not null references authority.user(id),
role_id uuid not null references authority.role(id),
created_at timestamptz not null default now(),
unique (tenant_id, user_id, role_id)
);
```
* `authority.oauth_client`, `authority.oauth_token`
* Enforce token uniqueness:
```sql
create table authority.oauth_token (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
user_id uuid not null references authority.user(id),
client_id uuid not null references authority.oauth_client(id),
token_hash text not null, -- hash, never raw
expires_at timestamptz not null,
created_at timestamptz not null default now(),
revoked_at timestamptz,
unique (token_hash)
);
```
### 1.2 Audit log pattern
* `authority.audit_log`
```sql
create table authority.audit_log (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
actor_id uuid, -- null for system
action text not null,
entity_type text not null,
entity_id uuid,
at timestamptz not null default now(),
diff jsonb not null
);
```
* Insert audit rows in the **same transaction** as the change.
### 1.3 RLS patterns
* Base RLS:
```sql
alter table authority.user enable row level security;
create policy p_user_tenant on authority.user
for all using (tenant_id = current_setting('app.tenant_id')::uuid);
```
* Extra policies:
* Audit log is visible only to:
* actor themself, or
* users with an `auditor` or `admin` role.
---
## 2. Routing & Feature Flags Module
**Schema:** `routing.*`
**Mission:** where instances live, what features are on, ratelimit configuration.
### 2.1 Feature flags
* `routing.feature_flag`
```sql
create table routing.feature_flag (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
key text not null,
rules jsonb not null,
version int not null default 1,
is_enabled boolean not null default true,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, key)
);
```
* **Immutability by version**:
* On update, **increment `version`**, dont overwrite historical data.
* Mirror changes into a history table via trigger:
```sql
create table routing.feature_flag_history (
id uuid primary key default gen_random_uuid(),
feature_flag_id uuid not null references routing.feature_flag(id),
tenant_id uuid not null,
key text not null,
rules jsonb not null,
version int not null,
changed_at timestamptz not null default now(),
changed_by uuid
);
```
### 2.2 Instance registry
* `routing.instance`
```sql
create table routing.instance (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
instance_key text not null,
domain text not null,
last_heartbeat timestamptz not null default now(),
status text not null check (status in ('active','draining','offline')),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, instance_key),
unique (tenant_id, domain)
);
```
* Pattern:
* Heartbeats use `update ... set last_heartbeat = now()` without touching other fields.
* Routing logic filters by `status='active'` and heartbeat recency.
### 2.3 Ratelimit configuration
* Config in Postgres, counters in Redis:
```sql
create table routing.rate_limit_config (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
key text not null,
limit_per_interval int not null,
interval_seconds int not null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, key)
);
```
---
## 3. VEX & Vulnerability Module
**Schema:** `vex.*`
**Mission:** ingest vulnerability facts, keep decisions & triage state.
### 3.1 Facts as JSONB
* `vex.vuln_fact`
```sql
create table vex.vuln_fact (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
source text not null, -- e.g. "nvd", "vendor_x_vex"
external_id text, -- e.g. CVE, advisory id
payload jsonb not null,
schema_version int not null,
received_at timestamptz not null default now()
);
```
* Index patterns:
```sql
create index on vex.vuln_fact (tenant_id, source);
create index on vex.vuln_fact (tenant_id, external_id);
create index vuln_fact_payload_gin on vex.vuln_fact using gin (payload);
```
### 3.2 Decisions as relational data
* `vex.package`
```sql
create table vex.package (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
name text not null,
version text not null,
ecosystem text not null, -- e.g. "pypi", "npm"
created_at timestamptz not null default now(),
unique (tenant_id, name, version, ecosystem)
);
```
* `vex.vex_decision`
```sql
create table vex.vex_decision (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
package_id uuid not null references vex.package(id),
vuln_id text not null,
status text not null check (status in (
'not_affected', 'affected', 'fixed', 'under_investigation'
)),
rationale text,
proof_ref text, -- CAS digest or URL
decided_by uuid,
decided_at timestamptz not null default now(),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, package_id, vuln_id)
);
```
* For history:
* Keep current state in `vex_decision`.
* Mirror previous versions into `vex_decision_history` table (similar to feature flags).
### 3.3 Triage queues with materialized views
* Example triage view:
```sql
create materialized view vex.mv_triage_queue as
select
d.tenant_id,
p.name,
p.version,
d.vuln_id,
d.status,
d.decided_at
from vex.vex_decision d
join vex.package p on p.id = d.package_id
where d.status = 'under_investigation';
```
* Refresh options:
* Scheduled refresh (cron/worker).
* Or **incremental** via triggers (more complex; use only when needed).
### 3.4 RLS for VEX
* All tables scoped by `tenant_id`.
* Typical policy:
```sql
alter table vex.vex_decision enable row level security;
create policy p_vex_tenant on vex.vex_decision
for all using (tenant_id = current_setting('app.tenant_id')::uuid);
```
---
## 4. Unknowns Module
**Schema:** `unknowns.*`
**Mission:** represent uncertainty and how it changes over time.
### 4.1 Bitemporal unknowns table
* `unknowns.unknown`
```sql
create table unknowns.unknown (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
subject_hash text not null, -- stable identifier for "thing" being reasoned about
kind text not null, -- e.g. "reachability", "version_inferred"
context jsonb not null, -- extra info: call graph node, evidence, etc.
valid_from timestamptz not null default now(),
valid_to timestamptz,
sys_from timestamptz not null default now(),
sys_to timestamptz,
created_at timestamptz not null default now()
);
```
* “Exactly one open unknown per subject/kind” pattern:
```sql
create unique index unknown_one_open_per_subject
on unknowns.unknown (tenant_id, subject_hash, kind)
where valid_to is null;
```
### 4.2 Closing an unknown
* Close by setting `valid_to` and `sys_to`:
```sql
update unknowns.unknown
set valid_to = now(), sys_to = now()
where id = :id and valid_to is null;
```
* Never hard-delete; keep all rows for audit/explanation.
### 4.3 Convenience views
* Current unknowns:
```sql
create view unknowns.current as
select *
from unknowns.unknown
where valid_to is null;
```
### 4.4 RLS
* Same tenant policy as other modules; unknowns are tenantscoped.
---
## 5. Artifact Index / CAS Module
**Schema:** `artifact.*`
**Mission:** index of immutable blobs stored in OCI / S3 / MinIO etc.
### 5.1 Artifact index
* `artifact.artifact`
```sql
create table artifact.artifact (
digest text primary key, -- e.g. "sha256:..."
tenant_id uuid not null,
media_type text not null,
size_bytes bigint not null,
created_at timestamptz not null default now(),
created_by uuid
);
```
* Validate digest shape with a CHECK:
```sql
alter table artifact.artifact
add constraint chk_digest_format
check (digest ~ '^sha[0-9]+:[0-9a-fA-F]{32,}$');
```
### 5.2 Signatures and tags
* `artifact.signature`
```sql
create table artifact.signature (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
artifact_digest text not null references artifact.artifact(digest),
signer text not null,
signature_payload jsonb not null,
created_at timestamptz not null default now()
);
```
* `artifact.tag`
```sql
create table artifact.tag (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
name text not null,
artifact_digest text not null references artifact.artifact(digest),
created_at timestamptz not null default now(),
unique (tenant_id, name)
);
```
### 5.3 RLS
* Ensure that tenants cannot see each others digests, even if the CAS backing store is shared:
```sql
alter table artifact.artifact enable row level security;
create policy p_artifact_tenant on artifact.artifact
for all using (tenant_id = current_setting('app.tenant_id')::uuid);
```
---
## 6. Shared Outbox / Event Pattern
**Schema:** `core.*`
**Mission:** reliable events for external sideeffects.
### 6.1 Outbox table
* `core.outbox`
```sql
create table core.outbox (
id uuid primary key default gen_random_uuid(),
tenant_id uuid,
aggregate_type text not null, -- e.g. "vex_decision", "feature_flag"
aggregate_id uuid,
topic text not null,
payload jsonb not null,
created_at timestamptz not null default now(),
dispatched_at timestamptz,
dispatch_attempts int not null default 0,
error text
);
```
### 6.2 Usage rule
* For anything that must emit an event (webhook, Kafka, notifications):
* In the **same transaction** as the change:
* write primary data (e.g. `vex.vex_decision`),
* insert an `outbox` row.
* A background worker:
* pulls undelivered rows,
* sends to external system,
* updates `dispatched_at`/`dispatch_attempts`/`error`.
---
## 7. Indexing & Query Patterns per Module
### 7.1 Authority
* Index:
* `user(tenant_id, email)`
* `grant(tenant_id, user_id)`
* `oauth_token(token_hash)`
* Typical query patterns:
* Look up user by `tenant_id + email`.
* All roles/grants for a user; design composite indexes accordingly.
### 7.2 Routing & Flags
* Index:
* `feature_flag(tenant_id, key)`
* partial index on enabled flags:
```sql
create index on routing.feature_flag (tenant_id, key)
where is_enabled;
```
* `instance(tenant_id, status)`, `instance(tenant_id, domain)`.
### 7.3 VEX
* Index:
* `package(tenant_id, name, version, ecosystem)`
* `vex_decision(tenant_id, package_id, vuln_id)`
* GIN on `vuln_fact.payload` for flexible querying.
### 7.4 Unknowns
* Index:
* unique open unknown per subject/kind (shown above).
* `unknown(tenant_id, kind)` for filtering by kind.
### 7.5 Artifact
* Index:
* PK on `digest`.
* `signature(tenant_id, artifact_digest)`.
* `tag(tenant_id, name)`.
---
## 8. Transaction & Isolation Guidelines
* Default isolation: **READ COMMITTED**.
* For critical sequences (e.g., provisioning a tenant, bulk role updates):
* consider **REPEATABLE READ** or **SERIALIZABLE** and keep transactions short.
* Pattern:
* One transaction per logical user action (e.g., “set flag”, “record decision”).
* Never do longrunning external calls inside a database transaction.
---
If youd like, next step I can turn this into:
* concrete `CREATE SCHEMA` + `CREATE TABLE` migration files, and
* a short “How to write queries in each module” cheatsheet for devs (with example SELECT/INSERT/UPDATE patterns).