Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
This commit is contained in:
2025-12-22 07:59:15 +02:00
parent 5d398ec442
commit 53503cb407
96 changed files with 37565 additions and 71 deletions

View File

@@ -0,0 +1,680 @@
# Binaries Schema Specification
**Version:** 1.0.0
**Status:** DRAFT
**Owner:** BinaryIndex Module
**Last Updated:** 2025-12-21
---
## 1. Overview
The `binaries` schema stores binary identity, vulnerability mappings, fingerprints, and patch-aware fix status for the BinaryIndex module. This enables detection of vulnerable binaries independent of package metadata.
## 2. Schema Definition
```sql
-- ============================================================================
-- BINARIES SCHEMA
-- ============================================================================
-- Purpose: Binary identity, fingerprint, and vulnerability mapping for
-- the BinaryIndex module (vulnerable binaries database).
-- ============================================================================
CREATE SCHEMA IF NOT EXISTS binaries;
CREATE SCHEMA IF NOT EXISTS binaries_app;
-- ----------------------------------------------------------------------------
-- RLS Helper Function
-- ----------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION binaries_app.require_current_tenant()
RETURNS TEXT
LANGUAGE plpgsql STABLE SECURITY DEFINER
AS $$
DECLARE
v_tenant TEXT;
BEGIN
v_tenant := current_setting('app.tenant_id', true);
IF v_tenant IS NULL OR v_tenant = '' THEN
RAISE EXCEPTION 'app.tenant_id session variable not set';
END IF;
RETURN v_tenant;
END;
$$;
-- ============================================================================
-- CORE IDENTITY TABLES
-- ============================================================================
-- ----------------------------------------------------------------------------
-- Table: binary_identity
-- Purpose: Known binary identities extracted from packages
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.binary_identity (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Primary identity (Build-ID preferred for ELF)
binary_key TEXT NOT NULL, -- build_id || file_sha256 (normalized)
build_id TEXT, -- ELF GNU Build-ID (hex)
build_id_type TEXT CHECK (build_id_type IN ('gnu-build-id', 'pe-cv', 'macho-uuid')),
-- Hashes
file_sha256 TEXT NOT NULL, -- sha256 of entire file
text_sha256 TEXT, -- sha256 of .text section (ELF)
blake3_hash TEXT, -- Optional faster hash
-- Binary metadata
format TEXT NOT NULL CHECK (format IN ('elf', 'pe', 'macho')),
architecture TEXT NOT NULL, -- x86-64, aarch64, arm, etc.
osabi TEXT, -- linux, windows, darwin
binary_type TEXT CHECK (binary_type IN ('executable', 'shared_library', 'static_library', 'object')),
is_stripped BOOLEAN DEFAULT FALSE,
-- Tracking
first_seen_snapshot_id UUID,
last_seen_snapshot_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT binary_identity_key_unique UNIQUE (tenant_id, binary_key)
);
-- ----------------------------------------------------------------------------
-- Table: binary_package_map
-- Purpose: Maps binaries to source packages (per snapshot)
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.binary_package_map (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Binary reference
binary_identity_id UUID NOT NULL REFERENCES binaries.binary_identity(id) ON DELETE CASCADE,
binary_key TEXT NOT NULL,
-- Package info
distro TEXT NOT NULL, -- debian, ubuntu, rhel, alpine
release TEXT NOT NULL, -- bookworm, jammy, 9, 3.19
source_pkg TEXT NOT NULL, -- Source package name (e.g., openssl)
binary_pkg TEXT NOT NULL, -- Binary package name (e.g., libssl3)
pkg_version TEXT NOT NULL, -- Full distro version (e.g., 1.1.1n-0+deb11u5)
pkg_purl TEXT, -- PURL if derivable
architecture TEXT NOT NULL,
-- File location
file_path_in_pkg TEXT NOT NULL, -- /usr/lib/x86_64-linux-gnu/libssl.so.3
-- Snapshot reference
snapshot_id UUID NOT NULL,
-- Metadata
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT binary_package_map_unique UNIQUE (binary_identity_id, snapshot_id, file_path_in_pkg)
);
-- ----------------------------------------------------------------------------
-- Table: corpus_snapshots
-- Purpose: Tracks corpus ingestion snapshots
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.corpus_snapshots (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Snapshot identification
distro TEXT NOT NULL,
release TEXT NOT NULL,
architecture TEXT NOT NULL,
snapshot_id TEXT NOT NULL, -- Unique snapshot identifier
-- Content tracking
packages_processed INT NOT NULL DEFAULT 0,
binaries_indexed INT NOT NULL DEFAULT 0,
repo_metadata_digest TEXT, -- SHA-256 of repo metadata
-- Signing
signing_key_id TEXT,
dsse_envelope_ref TEXT, -- RustFS reference to DSSE envelope
-- Status
status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed')),
error TEXT,
-- Timestamps
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT corpus_snapshots_unique UNIQUE (tenant_id, distro, release, architecture, snapshot_id)
);
-- ============================================================================
-- VULNERABILITY MAPPING TABLES
-- ============================================================================
-- ----------------------------------------------------------------------------
-- Table: vulnerable_buildids
-- Purpose: Build-IDs known to be associated with vulnerable packages
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.vulnerable_buildids (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Build-ID reference
buildid_type TEXT NOT NULL CHECK (buildid_type IN ('gnu-build-id', 'pe-cv', 'macho-uuid')),
buildid_value TEXT NOT NULL, -- Hex string
-- Package info
purl TEXT NOT NULL, -- Package URL
pkg_version TEXT NOT NULL,
distro TEXT,
release TEXT,
-- Confidence
confidence TEXT NOT NULL DEFAULT 'exact' CHECK (confidence IN ('exact', 'inferred', 'heuristic')),
-- Provenance
provenance JSONB DEFAULT '{}',
snapshot_id UUID REFERENCES binaries.corpus_snapshots(id),
-- Tracking
indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT vulnerable_buildids_unique UNIQUE (tenant_id, buildid_value, buildid_type, purl, pkg_version)
);
-- ----------------------------------------------------------------------------
-- Table: binary_vuln_assertion
-- Purpose: CVE status assertions for specific binaries
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.binary_vuln_assertion (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Binary reference
binary_key TEXT NOT NULL,
binary_identity_id UUID REFERENCES binaries.binary_identity(id),
-- CVE reference
cve_id TEXT NOT NULL,
advisory_id UUID, -- Reference to vuln.advisories
-- Status
status TEXT NOT NULL CHECK (status IN ('affected', 'not_affected', 'fixed', 'unknown')),
-- Method used to determine status
method TEXT NOT NULL CHECK (method IN ('range_match', 'buildid_catalog', 'fingerprint_match', 'fix_index')),
confidence NUMERIC(3,2) CHECK (confidence >= 0 AND confidence <= 1),
-- Evidence
evidence_ref TEXT, -- RustFS reference to evidence bundle
evidence_digest TEXT, -- SHA-256 of evidence
-- Tracking
evaluated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT binary_vuln_assertion_unique UNIQUE (tenant_id, binary_key, cve_id)
);
-- ============================================================================
-- FIX INDEX TABLES (Patch-Aware Backport Handling)
-- ============================================================================
-- ----------------------------------------------------------------------------
-- Table: cve_fix_evidence
-- Purpose: Raw evidence of CVE fixes (append-only)
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.cve_fix_evidence (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Key fields
distro TEXT NOT NULL,
release TEXT NOT NULL,
source_pkg TEXT NOT NULL,
cve_id TEXT NOT NULL,
-- Fix information
state TEXT NOT NULL CHECK (state IN ('fixed', 'vulnerable', 'not_affected', 'wontfix', 'unknown')),
fixed_version TEXT, -- Distro version string (nullable for not_affected)
-- Method and confidence
method TEXT NOT NULL CHECK (method IN ('security_feed', 'changelog', 'patch_header', 'upstream_patch_match')),
confidence NUMERIC(3,2) NOT NULL CHECK (confidence >= 0 AND confidence <= 1),
-- Evidence details
evidence JSONB NOT NULL, -- Method-specific evidence payload
-- Snapshot reference
snapshot_id UUID REFERENCES binaries.corpus_snapshots(id),
-- Tracking
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ----------------------------------------------------------------------------
-- Table: cve_fix_index
-- Purpose: Merged best-record for CVE fix status per distro/package
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.cve_fix_index (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Key fields
distro TEXT NOT NULL,
release TEXT NOT NULL,
source_pkg TEXT NOT NULL,
cve_id TEXT NOT NULL,
architecture TEXT, -- NULL means all architectures
-- Fix status
state TEXT NOT NULL CHECK (state IN ('fixed', 'vulnerable', 'not_affected', 'wontfix', 'unknown')),
fixed_version TEXT,
-- Merge metadata
primary_method TEXT NOT NULL, -- Method of highest-confidence evidence
confidence NUMERIC(3,2) NOT NULL,
evidence_ids UUID[], -- References to cve_fix_evidence
-- Tracking
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT cve_fix_index_unique UNIQUE (tenant_id, distro, release, source_pkg, cve_id, architecture)
);
-- ============================================================================
-- FINGERPRINT TABLES
-- ============================================================================
-- ----------------------------------------------------------------------------
-- Table: vulnerable_fingerprints
-- Purpose: Function fingerprints for CVE detection
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.vulnerable_fingerprints (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- CVE and component
cve_id TEXT NOT NULL,
component TEXT NOT NULL, -- e.g., openssl, glibc
purl TEXT, -- Package URL if applicable
-- Fingerprint data
algorithm TEXT NOT NULL CHECK (algorithm IN ('basic_block', 'control_flow_graph', 'string_refs', 'combined')),
fingerprint_id TEXT NOT NULL, -- Unique ID (e.g., "bb-abc123...")
fingerprint_hash BYTEA NOT NULL, -- Raw fingerprint bytes (16-32 bytes)
architecture TEXT NOT NULL, -- x86-64, aarch64
-- Function hints
function_name TEXT, -- Original function name if known
source_file TEXT, -- Source file path
source_line INT,
-- Confidence and validation
similarity_threshold NUMERIC(3,2) DEFAULT 0.95,
confidence NUMERIC(3,2) CHECK (confidence >= 0 AND confidence <= 1),
validated BOOLEAN DEFAULT FALSE,
validation_stats JSONB DEFAULT '{}', -- precision, recall, etc.
-- Reference builds
vuln_build_ref TEXT, -- RustFS ref to vulnerable reference build
fixed_build_ref TEXT, -- RustFS ref to fixed reference build
-- Metadata
notes TEXT,
evidence_ref TEXT, -- RustFS ref to evidence bundle
-- Tracking
indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT vulnerable_fingerprints_unique UNIQUE (tenant_id, cve_id, algorithm, fingerprint_id, architecture)
);
-- ----------------------------------------------------------------------------
-- Table: fingerprint_corpus_metadata
-- Purpose: Tracks which packages have been fingerprinted
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.fingerprint_corpus_metadata (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Package identification
purl TEXT NOT NULL,
version TEXT NOT NULL,
-- Fingerprinting info
algorithm TEXT NOT NULL,
binary_digest TEXT, -- sha256 of the binary analyzed
-- Statistics
function_count INT NOT NULL DEFAULT 0,
fingerprints_indexed INT NOT NULL DEFAULT 0,
-- Provenance
indexed_by TEXT, -- Service/user that indexed
indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Tracking
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT fingerprint_corpus_metadata_unique UNIQUE (tenant_id, purl, version, algorithm)
);
-- ============================================================================
-- MATCH RESULTS TABLES
-- ============================================================================
-- ----------------------------------------------------------------------------
-- Table: fingerprint_matches
-- Purpose: Records fingerprint matches during scans
-- ----------------------------------------------------------------------------
CREATE TABLE binaries.fingerprint_matches (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
-- Scan reference
scan_id UUID NOT NULL, -- Reference to scanner.scan_manifest
-- Match details
match_type TEXT NOT NULL CHECK (match_type IN ('fingerprint', 'buildid', 'hash_exact')),
binary_key TEXT NOT NULL,
binary_identity_id UUID REFERENCES binaries.binary_identity(id),
-- Vulnerable package
vulnerable_purl TEXT NOT NULL,
vulnerable_version TEXT NOT NULL,
-- Fingerprint match specifics (nullable for non-fingerprint matches)
matched_fingerprint_id UUID REFERENCES binaries.vulnerable_fingerprints(id),
matched_function TEXT,
similarity NUMERIC(3,2), -- 0.00-1.00
-- CVE linkage
advisory_ids TEXT[], -- Linked CVE/GHSA IDs
-- Reachability (populated later by Scanner)
reachability_status TEXT CHECK (reachability_status IN ('reachable', 'unreachable', 'unknown', 'partial')),
-- Evidence
evidence JSONB DEFAULT '{}',
-- Tracking
matched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ============================================================================
-- INDEXES
-- ============================================================================
-- binary_identity indexes
CREATE INDEX idx_binary_identity_tenant ON binaries.binary_identity(tenant_id);
CREATE INDEX idx_binary_identity_buildid ON binaries.binary_identity(build_id) WHERE build_id IS NOT NULL;
CREATE INDEX idx_binary_identity_sha256 ON binaries.binary_identity(file_sha256);
CREATE INDEX idx_binary_identity_key ON binaries.binary_identity(binary_key);
-- binary_package_map indexes
CREATE INDEX idx_binary_package_map_tenant ON binaries.binary_package_map(tenant_id);
CREATE INDEX idx_binary_package_map_binary ON binaries.binary_package_map(binary_identity_id);
CREATE INDEX idx_binary_package_map_distro ON binaries.binary_package_map(distro, release, source_pkg);
CREATE INDEX idx_binary_package_map_snapshot ON binaries.binary_package_map(snapshot_id);
CREATE INDEX idx_binary_package_map_purl ON binaries.binary_package_map(pkg_purl) WHERE pkg_purl IS NOT NULL;
-- corpus_snapshots indexes
CREATE INDEX idx_corpus_snapshots_tenant ON binaries.corpus_snapshots(tenant_id);
CREATE INDEX idx_corpus_snapshots_distro ON binaries.corpus_snapshots(distro, release, architecture);
CREATE INDEX idx_corpus_snapshots_status ON binaries.corpus_snapshots(status) WHERE status IN ('pending', 'processing');
-- vulnerable_buildids indexes
CREATE INDEX idx_vulnerable_buildids_tenant ON binaries.vulnerable_buildids(tenant_id);
CREATE INDEX idx_vulnerable_buildids_value ON binaries.vulnerable_buildids(buildid_type, buildid_value);
CREATE INDEX idx_vulnerable_buildids_purl ON binaries.vulnerable_buildids(purl);
-- binary_vuln_assertion indexes
CREATE INDEX idx_binary_vuln_assertion_tenant ON binaries.binary_vuln_assertion(tenant_id);
CREATE INDEX idx_binary_vuln_assertion_binary ON binaries.binary_vuln_assertion(binary_key);
CREATE INDEX idx_binary_vuln_assertion_cve ON binaries.binary_vuln_assertion(cve_id);
CREATE INDEX idx_binary_vuln_assertion_status ON binaries.binary_vuln_assertion(status) WHERE status = 'affected';
-- cve_fix_evidence indexes
CREATE INDEX idx_cve_fix_evidence_tenant ON binaries.cve_fix_evidence(tenant_id);
CREATE INDEX idx_cve_fix_evidence_key ON binaries.cve_fix_evidence(distro, release, source_pkg, cve_id);
-- cve_fix_index indexes
CREATE INDEX idx_cve_fix_index_tenant ON binaries.cve_fix_index(tenant_id);
CREATE INDEX idx_cve_fix_index_lookup ON binaries.cve_fix_index(distro, release, source_pkg, cve_id);
CREATE INDEX idx_cve_fix_index_state ON binaries.cve_fix_index(state) WHERE state = 'fixed';
-- vulnerable_fingerprints indexes
CREATE INDEX idx_vulnerable_fingerprints_tenant ON binaries.vulnerable_fingerprints(tenant_id);
CREATE INDEX idx_vulnerable_fingerprints_cve ON binaries.vulnerable_fingerprints(cve_id);
CREATE INDEX idx_vulnerable_fingerprints_component ON binaries.vulnerable_fingerprints(component, architecture);
CREATE INDEX idx_vulnerable_fingerprints_hash ON binaries.vulnerable_fingerprints USING hash (fingerprint_hash);
CREATE INDEX idx_vulnerable_fingerprints_validated ON binaries.vulnerable_fingerprints(validated) WHERE validated = TRUE;
-- fingerprint_corpus_metadata indexes
CREATE INDEX idx_fingerprint_corpus_tenant ON binaries.fingerprint_corpus_metadata(tenant_id);
CREATE INDEX idx_fingerprint_corpus_purl ON binaries.fingerprint_corpus_metadata(purl, version);
-- fingerprint_matches indexes
CREATE INDEX idx_fingerprint_matches_tenant ON binaries.fingerprint_matches(tenant_id);
CREATE INDEX idx_fingerprint_matches_scan ON binaries.fingerprint_matches(scan_id);
CREATE INDEX idx_fingerprint_matches_type ON binaries.fingerprint_matches(match_type);
CREATE INDEX idx_fingerprint_matches_purl ON binaries.fingerprint_matches(vulnerable_purl);
-- ============================================================================
-- ROW-LEVEL SECURITY
-- ============================================================================
-- Enable RLS on all tenant-scoped tables
ALTER TABLE binaries.binary_identity ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.binary_identity FORCE ROW LEVEL SECURITY;
CREATE POLICY binary_identity_tenant_isolation ON binaries.binary_identity
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.binary_package_map ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.binary_package_map FORCE ROW LEVEL SECURITY;
CREATE POLICY binary_package_map_tenant_isolation ON binaries.binary_package_map
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.corpus_snapshots ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.corpus_snapshots FORCE ROW LEVEL SECURITY;
CREATE POLICY corpus_snapshots_tenant_isolation ON binaries.corpus_snapshots
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.vulnerable_buildids ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.vulnerable_buildids FORCE ROW LEVEL SECURITY;
CREATE POLICY vulnerable_buildids_tenant_isolation ON binaries.vulnerable_buildids
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.binary_vuln_assertion ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.binary_vuln_assertion FORCE ROW LEVEL SECURITY;
CREATE POLICY binary_vuln_assertion_tenant_isolation ON binaries.binary_vuln_assertion
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.cve_fix_evidence ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.cve_fix_evidence FORCE ROW LEVEL SECURITY;
CREATE POLICY cve_fix_evidence_tenant_isolation ON binaries.cve_fix_evidence
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.cve_fix_index ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.cve_fix_index FORCE ROW LEVEL SECURITY;
CREATE POLICY cve_fix_index_tenant_isolation ON binaries.cve_fix_index
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.vulnerable_fingerprints ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.vulnerable_fingerprints FORCE ROW LEVEL SECURITY;
CREATE POLICY vulnerable_fingerprints_tenant_isolation ON binaries.vulnerable_fingerprints
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.fingerprint_corpus_metadata ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.fingerprint_corpus_metadata FORCE ROW LEVEL SECURITY;
CREATE POLICY fingerprint_corpus_metadata_tenant_isolation ON binaries.fingerprint_corpus_metadata
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
ALTER TABLE binaries.fingerprint_matches ENABLE ROW LEVEL SECURITY;
ALTER TABLE binaries.fingerprint_matches FORCE ROW LEVEL SECURITY;
CREATE POLICY fingerprint_matches_tenant_isolation ON binaries.fingerprint_matches
FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant())
WITH CHECK (tenant_id::text = binaries_app.require_current_tenant());
```
---
## 3. Table Relationships
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ BINARIES SCHEMA │
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ corpus_snapshots │<────────│ binary_package_map │ │
│ │ (ingestion state) │ │ (binary→pkg) │ │
│ └─────────┬──────────┘ └────────┬───────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌────────────────────┐ │
│ └───────────────────>│ binary_identity │<─────────────────┐ │
│ │ (Build-ID, hashes) │ │ │
│ └────────┬───────────┘ │ │
│ │ │ │
│ ┌─────────────────────────────┼───────────────────────────────┤ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌────────────────────┐ ┌─────────────────────┐ ┌──────────┴───┐
│ │ vulnerable_buildids│ │ binary_vuln_ │ │fingerprint_ │
│ │ (known vuln builds)│ │ assertion │ │matches │
│ └────────────────────┘ │ (CVE status) │ │(scan results)│
│ └─────────────────────┘ └──────────────┘
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ FIX INDEX (Patch-Aware) ││
│ │ ┌────────────────────┐ ┌────────────────────┐ ││
│ │ │ cve_fix_evidence │────────>│ cve_fix_index │ ││
│ │ │ (raw evidence) │ merge │ (merged best) │ ││
│ │ └────────────────────┘ └────────────────────┘ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ FINGERPRINTS ││
│ │ ┌────────────────────┐ ┌──────────────────────┐ ││
│ │ │vulnerable_ │ │fingerprint_corpus_ │ ││
│ │ │fingerprints │ │metadata │ ││
│ │ │(CVE fingerprints) │ │(what's indexed) │ ││
│ │ └────────────────────┘ └──────────────────────┘ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 4. Query Patterns
### 4.1 Lookup by Build-ID
```sql
-- Find vulnerabilities for a specific Build-ID
SELECT ba.cve_id, ba.status, ba.confidence, ba.method
FROM binaries.binary_vuln_assertion ba
JOIN binaries.binary_identity bi ON bi.binary_key = ba.binary_key
WHERE bi.build_id = :build_id
AND bi.build_id_type = 'gnu-build-id'
AND ba.status = 'affected';
```
### 4.2 Check Fix Status (Patch-Aware)
```sql
-- Check if a CVE is fixed for a specific distro/package
SELECT cfi.state, cfi.fixed_version, cfi.confidence, cfi.primary_method
FROM binaries.cve_fix_index cfi
WHERE cfi.distro = :distro
AND cfi.release = :release
AND cfi.source_pkg = :source_pkg
AND cfi.cve_id = :cve_id;
```
### 4.3 Fingerprint Similarity Search
```sql
-- Find fingerprints with similar hash (requires application-level similarity)
SELECT vf.cve_id, vf.component, vf.function_name, vf.confidence
FROM binaries.vulnerable_fingerprints vf
WHERE vf.algorithm = :algorithm
AND vf.architecture = :architecture
AND vf.validated = TRUE
-- Application performs similarity comparison on fingerprint_hash
```
---
## 5. Migration Strategy
### 5.1 Initial Migration
```sql
-- V001__create_binaries_schema.sql
-- Creates all tables, indexes, and RLS policies
```
### 5.2 Seed Data
```sql
-- S001__seed_reference_fingerprints.sql
-- Seeds fingerprints for high-impact CVEs from golden corpus
```
---
## 6. Performance Considerations
### 6.1 Table Sizing Estimates
| Table | Expected Rows | Growth Rate |
|-------|---------------|-------------|
| binary_identity | 10M | 1M/month |
| binary_package_map | 50M | 5M/month |
| vulnerable_buildids | 1M | 100K/month |
| cve_fix_index | 500K | 50K/month |
| vulnerable_fingerprints | 100K | 10K/month |
| fingerprint_matches | 10M | 1M/month |
### 6.2 Partitioning Candidates
- `fingerprint_matches` - Partition by `matched_at` (monthly)
- `cve_fix_evidence` - Partition by `created_at` (monthly)
### 6.3 Index Maintenance
- Hash index on `fingerprint_hash` for exact matches
- Consider bloom filter for fingerprint similarity pre-filtering
---
*Document Version: 1.0.0*
*Last Updated: 2025-12-21*