Files
StellaOps Bot ca578801fd save progress
2026-01-03 00:49:19 +02:00

584 lines
16 KiB
Markdown

# Schema Mapping Reference
This document provides a comprehensive mapping of all data storage schemas across PostgreSQL, Valkey, and RustFS (S3), organized by module ownership.
## Table of Contents
- [1. Storage Overview](#1-storage-overview)
- [2. PostgreSQL Schema Ownership](#2-postgresql-schema-ownership)
- [3. Valkey Keyspace Reference](#3-valkey-keyspace-reference)
- [4. RustFS (S3) Path Conventions](#4-rustfs-s3-path-conventions)
- [5. Module-to-Storage Mapping](#5-module-to-storage-mapping)
- [6. Cross-References](#6-cross-references)
---
## 1. Storage Overview
StellaOps uses three primary storage systems:
| Storage | Purpose | Technology | Notes |
|---------|---------|------------|-------|
| **PostgreSQL** | Canonical persistent store | PostgreSQL v16+ | Per-module schema isolation |
| **Valkey** | Cache, queues, events | Valkey v8.0 (Redis-compatible) | DPoP nonces, task streams |
| **RustFS** | Object storage | S3-compatible API | Content-addressed blobs |
### Storage Principles
1. **Schema Isolation**: Each module owns its PostgreSQL schema
2. **Append-Only for Evidence**: Advisory, VEX, and SBOM raw data is immutable (AOC)
3. **Content-Addressable**: Blob storage uses SHA256 digest prefixes
4. **Deterministic Keys**: Valkey keys follow predictable patterns
5. **TTL Management**: Cache entries have explicit TTLs
---
## 2. PostgreSQL Schema Ownership
### Schema-to-Module Mapping
| Schema | Owner Module | Purpose | Key Tables |
|--------|--------------|---------|------------|
| `authority` | Authority | Identity, clients, keys, auth audit | `clients`, `keys`, `tokens`, `audit_trail` |
| `scanner` | Scanner | Scan manifests, triage, metadata | `scans`, `artifacts`, `images`, `layers` |
| `vuln` | Concelier | Advisory raw documents, linksets | `advisory_raw`, `linksets`, `observations` |
| `vex` | Excititor | VEX raw statements, consensus | `vex_raw`, `consensus`, `issuer_trust` |
| `policy` | Policy | Policies, exceptions, findings | `policies`, `exceptions`, `effective_finding_*` |
| `scheduler` | Scheduler | Jobs, runs, schedules | `schedules`, `runs`, `impact_snapshots` |
| `notify` | Notifier | Rules, channels, delivery history | `rules`, `channels`, `templates`, `delivery_log` |
| `orchestrator` | Orchestrator | Workflows, tasks | `workflows`, `tasks`, `task_runs` |
| `registry` | Registry | Token service, image metadata | `tokens`, `repositories` |
| `symbols` | Symbols | Symbol resolution | `symbols`, `debug_info` |
| `unknowns` | Unknowns | Unknown components | `unknown_components`, `tracking` |
### Detailed Schema Definitions
#### Schema: `authority`
```sql
-- Core identity tables
clients (
client_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
name VARCHAR(255),
client_type VARCHAR(50), -- confidential, public, service
redirect_uris TEXT[],
scopes TEXT[],
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
keys (
key_id UUID PRIMARY KEY,
client_id UUID REFERENCES clients,
algorithm VARCHAR(50), -- RS256, ES256, EdDSA
public_key TEXT,
created_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ
);
tokens (
token_id UUID PRIMARY KEY,
client_id UUID REFERENCES clients,
subject VARCHAR(255),
scopes TEXT[],
issued_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ,
revoked_at TIMESTAMPTZ
);
audit_trail (
audit_id UUID PRIMARY KEY,
action VARCHAR(100),
actor_id VARCHAR(255),
resource_type VARCHAR(100),
resource_id VARCHAR(255),
occurred_at TIMESTAMPTZ,
metadata JSONB
);
```
#### Schema: `scanner`
```sql
-- Scan lifecycle tables
scans (
scan_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
image_digest VARCHAR(100),
image_reference TEXT,
state VARCHAR(50), -- pending, acquired, running, completed, failed
created_at TIMESTAMPTZ,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
metadata JSONB
);
artifacts (
artifact_id VARCHAR(100) PRIMARY KEY, -- sha256:...
scan_id UUID REFERENCES scans,
format VARCHAR(50), -- cdx-json, spdx-json, cdx-pb
created_at TIMESTAMPTZ,
rekor_proof JSONB
);
images (
image_digest VARCHAR(100) PRIMARY KEY,
repository TEXT,
tag TEXT,
architecture VARCHAR(50),
os VARCHAR(50),
created_at TIMESTAMPTZ
);
layers (
layer_digest VARCHAR(100) PRIMARY KEY,
media_type VARCHAR(255),
size BIGINT,
created_at TIMESTAMPTZ
);
scan_artifacts (
scan_id UUID REFERENCES scans,
artifact_id VARCHAR(100) REFERENCES artifacts,
PRIMARY KEY (scan_id, artifact_id)
);
```
#### Schema: `vuln`
```sql
-- Advisory storage (append-only, AOC enforced)
advisory_raw (
raw_id UUID PRIMARY KEY,
advisory_id VARCHAR(100), -- CVE-2024-xxxx
source VARCHAR(50), -- NVD, RED_HAT, OSV, GHSA
raw_document JSONB NOT NULL, -- Original JSON as-received
published_at TIMESTAMPTZ,
revision INTEGER,
created_at TIMESTAMPTZ,
UNIQUE (advisory_id, source, revision)
);
linksets (
linkset_id UUID PRIMARY KEY,
advisory_id VARCHAR(100),
purl TEXT,
version_range JSONB,
created_at TIMESTAMPTZ
);
observations (
observation_id UUID PRIMARY KEY,
advisory_id VARCHAR(100),
severity_cvss3 JSONB,
severity_cvss4 JSONB,
description TEXT,
references JSONB,
created_at TIMESTAMPTZ
);
```
#### Schema: `vex`
```sql
-- VEX storage (append-only, AOC enforced)
vex_raw (
raw_id UUID PRIMARY KEY,
issuer_id VARCHAR(255),
component_purl TEXT,
vulnerability_id VARCHAR(100),
status VARCHAR(50), -- not_affected, affected, under_investigation
justification VARCHAR(100),
raw_statement JSONB NOT NULL,
published_at TIMESTAMPTZ,
signature JSONB, -- DSSE envelope if signed
created_at TIMESTAMPTZ
);
consensus (
consensus_id UUID PRIMARY KEY,
component_purl TEXT,
vulnerability_id VARCHAR(100),
resolved_status VARCHAR(50),
conflict_detected BOOLEAN,
contributing_vex_ids UUID[],
computed_at TIMESTAMPTZ
);
issuer_trust (
issuer_id VARCHAR(255) PRIMARY KEY,
trust_score DECIMAL(3,2), -- 0.00 to 1.00
priority INTEGER, -- Lower = higher priority
issuer_type VARCHAR(50), -- vendor, distro, researcher, community
updated_at TIMESTAMPTZ
);
```
#### Schema: `policy`
```sql
-- Policy definitions and lifecycle
policies (
policy_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
name VARCHAR(255),
version INTEGER,
state VARCHAR(50), -- DRAFT, SHADOW, ACTIVE, ENFORCING
yaml_content TEXT,
compiled_ir JSONB,
ir_hash VARCHAR(100),
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
policy_runs (
run_id UUID PRIMARY KEY,
policy_id UUID REFERENCES policies,
tenant_id UUID NOT NULL,
cursor JSONB,
stats JSONB,
determinism_hash VARCHAR(100),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ
);
exceptions (
exception_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
policy_id UUID,
component_purl TEXT,
vulnerability_id VARCHAR(100),
effect VARCHAR(50), -- suppress, defer, downgrade, require_control
approval_level VARCHAR(10), -- G0, G1, G2, G3, G4
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ,
created_by VARCHAR(255)
);
exception_approval_audit (
audit_id UUID PRIMARY KEY,
exception_id UUID REFERENCES exceptions,
approval_level VARCHAR(10),
approver_id VARCHAR(255),
approved_at TIMESTAMPTZ,
comment TEXT
);
-- Dynamic per-policy finding tables (created dynamically)
-- effective_finding_{policyId} - Current snapshot
-- effective_finding_{policyId}_history - Audit trail
```
#### Schema: `scheduler`
```sql
-- Job scheduling
schedules (
schedule_id VARCHAR(100) PRIMARY KEY,
tenant_id UUID NOT NULL,
name VARCHAR(255),
enabled BOOLEAN,
cron_expression VARCHAR(100),
timezone VARCHAR(50),
mode VARCHAR(50),
selection JSONB,
notify JSONB,
limits JSONB,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
runs (
run_id VARCHAR(100) PRIMARY KEY,
tenant_id UUID NOT NULL,
schedule_id VARCHAR(100) REFERENCES schedules,
trigger VARCHAR(50),
state VARCHAR(50),
stats JSONB,
deltas JSONB,
created_at TIMESTAMPTZ,
started_at TIMESTAMPTZ,
finished_at TIMESTAMPTZ,
error TEXT
);
run_summaries (
summary_id VARCHAR(200) PRIMARY KEY, -- tenant:schedule
tenant_id UUID NOT NULL,
schedule_id VARCHAR(100),
last_run JSONB,
recent JSONB[],
counters JSONB,
updated_at TIMESTAMPTZ
);
```
#### Schema: `notify`
```sql
-- Notification routing
rules (
rule_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
name VARCHAR(255),
match JSONB,
actions JSONB,
enabled BOOLEAN,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ
);
channels (
channel_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
name VARCHAR(255),
type VARCHAR(50), -- slack, teams, email, webhook
config JSONB,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ
);
templates (
template_id UUID PRIMARY KEY,
channel_type VARCHAR(50),
key VARCHAR(100),
locale VARCHAR(10),
render_mode VARCHAR(50),
body TEXT,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
delivery_log (
delivery_id UUID PRIMARY KEY,
rule_id UUID REFERENCES rules,
channel_id UUID REFERENCES channels,
event_id UUID,
status VARCHAR(50),
delivered_at TIMESTAMPTZ,
error TEXT
);
```
---
## 3. Valkey Keyspace Reference
### Key Patterns by Module
| Module | Key Pattern | Type | TTL | Purpose |
|--------|-------------|------|-----|---------|
| **Authority** | `dpop:{jti}` | string | 5m | DPoP nonce cache (RFC 9449) |
| **Scanner** | `scan:{digest}` | string | Infinite | Last scan JSON result |
| **Scanner** | `layers:{digest}` | set | 90d | Layers with SBOMs (delta cache) |
| **Scanner** | `locator:{imageDigest}` | string | 30d | Image digest to sbomBlobId mapping |
| **Policy** | `policy:active` | string | Infinite | Active policy YAML/Rego |
| **Policy** | `policy:history` | list | Infinite | Change audit IDs |
| **Concelier** | `feed:nvd:json` | string | 24h | Normalized feed snapshot |
| **General** | `quota:{token}` | string | Until UTC midnight | Per-token scan counter |
| **Scheduler** | `scheduler:jobs` | stream | - | Job queue |
| **Notifier** | `notify:delivery` | stream | 7d | Delivery events |
| **All** | `events:*` | stream | 7d | Event streams |
| **All** | `queue:*` | stream | - | Task queues |
| **Telemetry** | `metrics:*` | various | - | Runtime metrics |
### Stream Definitions
| Stream | Producers | Consumers | Events |
|--------|-----------|-----------|--------|
| `scanner:events` | Scanner.Worker | Policy, Notify, TimelineIndexer, ExportCenter | scan.submitted, scan.completed, scan.failed |
| `concelier:drift` | Concelier.Worker | Scheduler, Policy, Notify | advisory.new, advisory.updated |
| `policy:evaluated` | Policy.Worker | Notify, Findings, ExportCenter | evaluation.completed, verdict.changed |
| `scheduler:jobs` | Scheduler | Scanner, Policy | run.started, rescan.triggered |
| `notify:delivery` | Notifier | Audit, TimelineIndexer | notification.sent, notification.failed |
### Valkey Configuration
```yaml
# Recommended Valkey configuration for StellaOps
maxmemory: 2gb
maxmemory-policy: volatile-lru
stream-node-max-bytes: 4096
stream-node-max-entries: 100
# Consumer groups for job processing
scanner:jobs:
consumer_group: scanner-workers
idle_timeout: 60s
notify:delivery:
consumer_group: notify-workers
idle_timeout: 30s
```
---
## 4. RustFS (S3) Path Conventions
### Blob Storage Layout
```
blobs/
+-- {sha256_prefix}/ # First 4 chars of digest
+-- {full_digest}/
+-- sbom.json # SBOM payload (any format)
+-- sbom.meta.json # Wrapper envelope
+-- sbom.cdx.pb # CycloneDX Protobuf (compact)
+-- attestation.dsse # DSSE envelope
+-- evidence.bundle # Evidence package
images/
+-- {imageDigest}/
+-- inventory.cdx.json # Inventory SBOM
+-- inventory.cdx.pb # Inventory (Protobuf)
+-- usage.cdx.json # Usage SBOM (entrypoint closure)
+-- usage.cdx.pb # Usage (Protobuf)
+-- call-graph.json # Call graph data
+-- reachability.json # Reachability analysis
evidence/
+-- {bundleId}/
+-- manifest.json # Bundle manifest
+-- manifest.dsse # Signed manifest
+-- sboms/ # SBOM files
+-- attestations/ # Attestation files
+-- proofs/ # Verification proofs
offline-kits/
+-- {kitId}/
+-- feeds/ # Advisory snapshots
+-- images/ # Container images
+-- signatures/ # DSSE signatures
+-- trust-roots/ # CA certificates
+-- manifest.json # Kit manifest
```
### SBOM Wrapper Envelope
```json
{
"id": "sha256:417f...",
"imageDigest": "sha256:e2b9...",
"created": "2025-01-02T15:30:00Z",
"format": "cdx-json",
"layers": [
"sha256:d38b...",
"sha256:af45..."
],
"partial": false,
"provenanceId": "prov_0291"
}
```
### Content Types
| Extension | Content-Type | Description |
|-----------|--------------|-------------|
| `.json` | `application/json` | JSON documents |
| `.cdx.json` | `application/vnd.cyclonedx+json` | CycloneDX JSON |
| `.cdx.pb` | `application/vnd.cyclonedx+protobuf` | CycloneDX Protobuf |
| `.spdx.json` | `application/spdx+json` | SPDX JSON |
| `.dsse` | `application/vnd.dsse+json` | DSSE envelope |
| `.bundle` | `application/zip` | Evidence bundle |
---
## 5. Module-to-Storage Mapping
### Complete Reference Table
| Module | PostgreSQL Schema | Valkey Keys | RustFS Paths |
|--------|------------------|-------------|--------------|
| **Authority** | `authority` | `dpop:{jti}` | - |
| **Gateway** | - (stateless) | - | - |
| **Router** | - | connection state | - |
| **Scanner** | `scanner` | `scan:{digest}`, `layers:{digest}`, `locator:{imageDigest}` | `blobs/`, `images/` |
| **Concelier** | `vuln` | `feed:*`, `concelier:drift` stream | - |
| **Excititor** | `vex` | - | - |
| **VexLens** | - (reads `vex`) | - | - |
| **VexHub** | `vex` (extension) | - | - |
| **IssuerDirectory** | - (reads `vex.issuer_trust`) | - | - |
| **Policy** | `policy` | `policy:active`, `policy:history` | - |
| **RiskEngine** | - (reads `policy`) | - | - |
| **Scheduler** | `scheduler` | `scheduler:jobs` stream | - |
| **Notifier** | `notify` | `notify:delivery` stream | - |
| **Orchestrator** | `orchestrator` | `orchestrator:*` streams | - |
| **Attestor** | - (uses `scanner`) | - | `blobs/*/attestation.dsse` |
| **Signer** | - (uses `authority`) | - | - |
| **SbomService** | - (reads `scanner`) | - | `blobs/`, `images/` |
| **EvidenceLocker** | - | - | `evidence/` |
| **ExportCenter** | - | - | `evidence/`, `offline-kits/` |
| **AirGap** | - | - | `offline-kits/` |
| **Registry** | `registry` | - | - |
| **Symbols** | `symbols` | - | - |
| **Unknowns** | `unknowns` | - | - |
| **TimelineIndexer** | - (writes to `scanner`, etc.) | - | - |
---
## 6. Cross-References
### Data Flow Dependencies
```
+---------+ +-----------+ +--------+ +--------+
| Scanner |---->| SbomService|---->| Policy |---->| Notify |
+---------+ +-----------+ +--------+ +--------+
| | | |
v v v v
+----------+ +-----------+ +-----------+ +---------+
| scanner | | blobs/ | | policy | | notify |
| (PG) | | (RustFS) | | (PG) | | (PG) |
+----------+ +-----------+ +-----------+ +---------+
+-----------+ +----------+
| Concelier |---->| Policy |
+-----------+ +----------+
| ^
v |
+----------+ +-----------+
| vuln | | Excititor |
| (PG) | +-----------+
+----------+ |
v
+----------+
| vex |
| (PG) |
+----------+
```
### Schema Version Tracking
All schemas support versioning:
```sql
-- Every schema has a version tracking table
schema_migrations (
version INTEGER PRIMARY KEY,
description TEXT,
applied_at TIMESTAMPTZ,
checksum VARCHAR(64)
);
```
### Backup Considerations
| Storage | Backup Strategy | Retention |
|---------|-----------------|-----------|
| PostgreSQL | pg_dump + WAL archiving | 30 days |
| Valkey | RDB snapshots + AOF | 7 days |
| RustFS | Object versioning | 90 days |
---
## Related Documentation
- [Data Schemas](../../11_DATA_SCHEMAS.md) - Detailed schema definitions
- [Data Flows](data-flows.md) - How data moves through the system
- [Data Isolation](data-isolation.md) - Per-tenant isolation model
- [Module Matrix](module-matrix.md) - Complete module inventory
- [Request Flows](request-flows.md) - HTTP/Binary protocol flows