# Binaries Schema Specification **Version:** 1.0.0 **Status:** DRAFT **Owner:** BinaryIndex Module **Last Updated:** 2025-12-21 --- ## 1. Overview The `binaries` schema stores binary identity, vulnerability mappings, fingerprints, and patch-aware fix status for the BinaryIndex module. This enables detection of vulnerable binaries independent of package metadata. ## 2. Schema Definition ```sql -- ============================================================================ -- BINARIES SCHEMA -- ============================================================================ -- Purpose: Binary identity, fingerprint, and vulnerability mapping for -- the BinaryIndex module (vulnerable binaries database). -- ============================================================================ CREATE SCHEMA IF NOT EXISTS binaries; CREATE SCHEMA IF NOT EXISTS binaries_app; -- ---------------------------------------------------------------------------- -- RLS Helper Function -- ---------------------------------------------------------------------------- CREATE OR REPLACE FUNCTION binaries_app.require_current_tenant() RETURNS TEXT LANGUAGE plpgsql STABLE SECURITY DEFINER AS $$ DECLARE v_tenant TEXT; BEGIN v_tenant := current_setting('app.tenant_id', true); IF v_tenant IS NULL OR v_tenant = '' THEN RAISE EXCEPTION 'app.tenant_id session variable not set'; END IF; RETURN v_tenant; END; $$; -- ============================================================================ -- CORE IDENTITY TABLES -- ============================================================================ -- ---------------------------------------------------------------------------- -- Table: binary_identity -- Purpose: Known binary identities extracted from packages -- ---------------------------------------------------------------------------- CREATE TABLE binaries.binary_identity ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Primary identity (Build-ID preferred for ELF) binary_key TEXT NOT NULL, -- build_id || file_sha256 (normalized) build_id TEXT, -- ELF GNU Build-ID (hex) build_id_type TEXT CHECK (build_id_type IN ('gnu-build-id', 'pe-cv', 'macho-uuid')), -- Hashes file_sha256 TEXT NOT NULL, -- sha256 of entire file text_sha256 TEXT, -- sha256 of .text section (ELF) blake3_hash TEXT, -- Optional faster hash -- Binary metadata format TEXT NOT NULL CHECK (format IN ('elf', 'pe', 'macho')), architecture TEXT NOT NULL, -- x86-64, aarch64, arm, etc. osabi TEXT, -- linux, windows, darwin binary_type TEXT CHECK (binary_type IN ('executable', 'shared_library', 'static_library', 'object')), is_stripped BOOLEAN DEFAULT FALSE, -- Tracking first_seen_snapshot_id UUID, last_seen_snapshot_id UUID, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT binary_identity_key_unique UNIQUE (tenant_id, binary_key) ); -- ---------------------------------------------------------------------------- -- Table: binary_package_map -- Purpose: Maps binaries to source packages (per snapshot) -- ---------------------------------------------------------------------------- CREATE TABLE binaries.binary_package_map ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Binary reference binary_identity_id UUID NOT NULL REFERENCES binaries.binary_identity(id) ON DELETE CASCADE, binary_key TEXT NOT NULL, -- Package info distro TEXT NOT NULL, -- debian, ubuntu, rhel, alpine release TEXT NOT NULL, -- bookworm, jammy, 9, 3.19 source_pkg TEXT NOT NULL, -- Source package name (e.g., openssl) binary_pkg TEXT NOT NULL, -- Binary package name (e.g., libssl3) pkg_version TEXT NOT NULL, -- Full distro version (e.g., 1.1.1n-0+deb11u5) pkg_purl TEXT, -- PURL if derivable architecture TEXT NOT NULL, -- File location file_path_in_pkg TEXT NOT NULL, -- /usr/lib/x86_64-linux-gnu/libssl.so.3 -- Snapshot reference snapshot_id UUID NOT NULL, -- Metadata created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT binary_package_map_unique UNIQUE (binary_identity_id, snapshot_id, file_path_in_pkg) ); -- ---------------------------------------------------------------------------- -- Table: corpus_snapshots -- Purpose: Tracks corpus ingestion snapshots -- ---------------------------------------------------------------------------- CREATE TABLE binaries.corpus_snapshots ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Snapshot identification distro TEXT NOT NULL, release TEXT NOT NULL, architecture TEXT NOT NULL, snapshot_id TEXT NOT NULL, -- Unique snapshot identifier -- Content tracking packages_processed INT NOT NULL DEFAULT 0, binaries_indexed INT NOT NULL DEFAULT 0, repo_metadata_digest TEXT, -- SHA-256 of repo metadata -- Signing signing_key_id TEXT, dsse_envelope_ref TEXT, -- RustFS reference to DSSE envelope -- Status status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed')), error TEXT, -- Timestamps started_at TIMESTAMPTZ, completed_at TIMESTAMPTZ, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT corpus_snapshots_unique UNIQUE (tenant_id, distro, release, architecture, snapshot_id) ); -- ============================================================================ -- VULNERABILITY MAPPING TABLES -- ============================================================================ -- ---------------------------------------------------------------------------- -- Table: vulnerable_buildids -- Purpose: Build-IDs known to be associated with vulnerable packages -- ---------------------------------------------------------------------------- CREATE TABLE binaries.vulnerable_buildids ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Build-ID reference buildid_type TEXT NOT NULL CHECK (buildid_type IN ('gnu-build-id', 'pe-cv', 'macho-uuid')), buildid_value TEXT NOT NULL, -- Hex string -- Package info purl TEXT NOT NULL, -- Package URL pkg_version TEXT NOT NULL, distro TEXT, release TEXT, -- Confidence confidence TEXT NOT NULL DEFAULT 'exact' CHECK (confidence IN ('exact', 'inferred', 'heuristic')), -- Provenance provenance JSONB DEFAULT '{}', snapshot_id UUID REFERENCES binaries.corpus_snapshots(id), -- Tracking indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT vulnerable_buildids_unique UNIQUE (tenant_id, buildid_value, buildid_type, purl, pkg_version) ); -- ---------------------------------------------------------------------------- -- Table: binary_vuln_assertion -- Purpose: CVE status assertions for specific binaries -- ---------------------------------------------------------------------------- CREATE TABLE binaries.binary_vuln_assertion ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Binary reference binary_key TEXT NOT NULL, binary_identity_id UUID REFERENCES binaries.binary_identity(id), -- CVE reference cve_id TEXT NOT NULL, advisory_id UUID, -- Reference to vuln.advisories -- Status status TEXT NOT NULL CHECK (status IN ('affected', 'not_affected', 'fixed', 'unknown')), -- Method used to determine status method TEXT NOT NULL CHECK (method IN ('range_match', 'buildid_catalog', 'fingerprint_match', 'fix_index')), confidence NUMERIC(3,2) CHECK (confidence >= 0 AND confidence <= 1), -- Evidence evidence_ref TEXT, -- RustFS reference to evidence bundle evidence_digest TEXT, -- SHA-256 of evidence -- Tracking evaluated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT binary_vuln_assertion_unique UNIQUE (tenant_id, binary_key, cve_id) ); -- ============================================================================ -- FIX INDEX TABLES (Patch-Aware Backport Handling) -- ============================================================================ -- ---------------------------------------------------------------------------- -- Table: cve_fix_evidence -- Purpose: Raw evidence of CVE fixes (append-only) -- ---------------------------------------------------------------------------- CREATE TABLE binaries.cve_fix_evidence ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Key fields distro TEXT NOT NULL, release TEXT NOT NULL, source_pkg TEXT NOT NULL, cve_id TEXT NOT NULL, -- Fix information state TEXT NOT NULL CHECK (state IN ('fixed', 'vulnerable', 'not_affected', 'wontfix', 'unknown')), fixed_version TEXT, -- Distro version string (nullable for not_affected) -- Method and confidence method TEXT NOT NULL CHECK (method IN ('security_feed', 'changelog', 'patch_header', 'upstream_patch_match')), confidence NUMERIC(3,2) NOT NULL CHECK (confidence >= 0 AND confidence <= 1), -- Evidence details evidence JSONB NOT NULL, -- Method-specific evidence payload -- Snapshot reference snapshot_id UUID REFERENCES binaries.corpus_snapshots(id), -- Tracking created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- ---------------------------------------------------------------------------- -- Table: cve_fix_index -- Purpose: Merged best-record for CVE fix status per distro/package -- ---------------------------------------------------------------------------- CREATE TABLE binaries.cve_fix_index ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Key fields distro TEXT NOT NULL, release TEXT NOT NULL, source_pkg TEXT NOT NULL, cve_id TEXT NOT NULL, architecture TEXT, -- NULL means all architectures -- Fix status state TEXT NOT NULL CHECK (state IN ('fixed', 'vulnerable', 'not_affected', 'wontfix', 'unknown')), fixed_version TEXT, -- Merge metadata primary_method TEXT NOT NULL, -- Method of highest-confidence evidence confidence NUMERIC(3,2) NOT NULL, evidence_ids UUID[], -- References to cve_fix_evidence -- Tracking computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT cve_fix_index_unique UNIQUE (tenant_id, distro, release, source_pkg, cve_id, architecture) ); -- ============================================================================ -- FINGERPRINT TABLES -- ============================================================================ -- ---------------------------------------------------------------------------- -- Table: vulnerable_fingerprints -- Purpose: Function fingerprints for CVE detection -- ---------------------------------------------------------------------------- CREATE TABLE binaries.vulnerable_fingerprints ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- CVE and component cve_id TEXT NOT NULL, component TEXT NOT NULL, -- e.g., openssl, glibc purl TEXT, -- Package URL if applicable -- Fingerprint data algorithm TEXT NOT NULL CHECK (algorithm IN ('basic_block', 'control_flow_graph', 'string_refs', 'combined')), fingerprint_id TEXT NOT NULL, -- Unique ID (e.g., "bb-abc123...") fingerprint_hash BYTEA NOT NULL, -- Raw fingerprint bytes (16-32 bytes) architecture TEXT NOT NULL, -- x86-64, aarch64 -- Function hints function_name TEXT, -- Original function name if known source_file TEXT, -- Source file path source_line INT, -- Confidence and validation similarity_threshold NUMERIC(3,2) DEFAULT 0.95, confidence NUMERIC(3,2) CHECK (confidence >= 0 AND confidence <= 1), validated BOOLEAN DEFAULT FALSE, validation_stats JSONB DEFAULT '{}', -- precision, recall, etc. -- Reference builds vuln_build_ref TEXT, -- RustFS ref to vulnerable reference build fixed_build_ref TEXT, -- RustFS ref to fixed reference build -- Metadata notes TEXT, evidence_ref TEXT, -- RustFS ref to evidence bundle -- Tracking indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT vulnerable_fingerprints_unique UNIQUE (tenant_id, cve_id, algorithm, fingerprint_id, architecture) ); -- ---------------------------------------------------------------------------- -- Table: fingerprint_corpus_metadata -- Purpose: Tracks which packages have been fingerprinted -- ---------------------------------------------------------------------------- CREATE TABLE binaries.fingerprint_corpus_metadata ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Package identification purl TEXT NOT NULL, version TEXT NOT NULL, -- Fingerprinting info algorithm TEXT NOT NULL, binary_digest TEXT, -- sha256 of the binary analyzed -- Statistics function_count INT NOT NULL DEFAULT 0, fingerprints_indexed INT NOT NULL DEFAULT 0, -- Provenance indexed_by TEXT, -- Service/user that indexed indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- Tracking created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), CONSTRAINT fingerprint_corpus_metadata_unique UNIQUE (tenant_id, purl, version, algorithm) ); -- ============================================================================ -- MATCH RESULTS TABLES -- ============================================================================ -- ---------------------------------------------------------------------------- -- Table: fingerprint_matches -- Purpose: Records fingerprint matches during scans -- ---------------------------------------------------------------------------- CREATE TABLE binaries.fingerprint_matches ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, -- Scan reference scan_id UUID NOT NULL, -- Reference to scanner.scan_manifest -- Match details match_type TEXT NOT NULL CHECK (match_type IN ('fingerprint', 'buildid', 'hash_exact')), binary_key TEXT NOT NULL, binary_identity_id UUID REFERENCES binaries.binary_identity(id), -- Vulnerable package vulnerable_purl TEXT NOT NULL, vulnerable_version TEXT NOT NULL, -- Fingerprint match specifics (nullable for non-fingerprint matches) matched_fingerprint_id UUID REFERENCES binaries.vulnerable_fingerprints(id), matched_function TEXT, similarity NUMERIC(3,2), -- 0.00-1.00 -- CVE linkage advisory_ids TEXT[], -- Linked CVE/GHSA IDs -- Reachability (populated later by Scanner) reachability_status TEXT CHECK (reachability_status IN ('reachable', 'unreachable', 'unknown', 'partial')), -- Evidence evidence JSONB DEFAULT '{}', -- Tracking matched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- ============================================================================ -- INDEXES -- ============================================================================ -- binary_identity indexes CREATE INDEX idx_binary_identity_tenant ON binaries.binary_identity(tenant_id); CREATE INDEX idx_binary_identity_buildid ON binaries.binary_identity(build_id) WHERE build_id IS NOT NULL; CREATE INDEX idx_binary_identity_sha256 ON binaries.binary_identity(file_sha256); CREATE INDEX idx_binary_identity_key ON binaries.binary_identity(binary_key); -- binary_package_map indexes CREATE INDEX idx_binary_package_map_tenant ON binaries.binary_package_map(tenant_id); CREATE INDEX idx_binary_package_map_binary ON binaries.binary_package_map(binary_identity_id); CREATE INDEX idx_binary_package_map_distro ON binaries.binary_package_map(distro, release, source_pkg); CREATE INDEX idx_binary_package_map_snapshot ON binaries.binary_package_map(snapshot_id); CREATE INDEX idx_binary_package_map_purl ON binaries.binary_package_map(pkg_purl) WHERE pkg_purl IS NOT NULL; -- corpus_snapshots indexes CREATE INDEX idx_corpus_snapshots_tenant ON binaries.corpus_snapshots(tenant_id); CREATE INDEX idx_corpus_snapshots_distro ON binaries.corpus_snapshots(distro, release, architecture); CREATE INDEX idx_corpus_snapshots_status ON binaries.corpus_snapshots(status) WHERE status IN ('pending', 'processing'); -- vulnerable_buildids indexes CREATE INDEX idx_vulnerable_buildids_tenant ON binaries.vulnerable_buildids(tenant_id); CREATE INDEX idx_vulnerable_buildids_value ON binaries.vulnerable_buildids(buildid_type, buildid_value); CREATE INDEX idx_vulnerable_buildids_purl ON binaries.vulnerable_buildids(purl); -- binary_vuln_assertion indexes CREATE INDEX idx_binary_vuln_assertion_tenant ON binaries.binary_vuln_assertion(tenant_id); CREATE INDEX idx_binary_vuln_assertion_binary ON binaries.binary_vuln_assertion(binary_key); CREATE INDEX idx_binary_vuln_assertion_cve ON binaries.binary_vuln_assertion(cve_id); CREATE INDEX idx_binary_vuln_assertion_status ON binaries.binary_vuln_assertion(status) WHERE status = 'affected'; -- cve_fix_evidence indexes CREATE INDEX idx_cve_fix_evidence_tenant ON binaries.cve_fix_evidence(tenant_id); CREATE INDEX idx_cve_fix_evidence_key ON binaries.cve_fix_evidence(distro, release, source_pkg, cve_id); -- cve_fix_index indexes CREATE INDEX idx_cve_fix_index_tenant ON binaries.cve_fix_index(tenant_id); CREATE INDEX idx_cve_fix_index_lookup ON binaries.cve_fix_index(distro, release, source_pkg, cve_id); CREATE INDEX idx_cve_fix_index_state ON binaries.cve_fix_index(state) WHERE state = 'fixed'; -- vulnerable_fingerprints indexes CREATE INDEX idx_vulnerable_fingerprints_tenant ON binaries.vulnerable_fingerprints(tenant_id); CREATE INDEX idx_vulnerable_fingerprints_cve ON binaries.vulnerable_fingerprints(cve_id); CREATE INDEX idx_vulnerable_fingerprints_component ON binaries.vulnerable_fingerprints(component, architecture); CREATE INDEX idx_vulnerable_fingerprints_hash ON binaries.vulnerable_fingerprints USING hash (fingerprint_hash); CREATE INDEX idx_vulnerable_fingerprints_validated ON binaries.vulnerable_fingerprints(validated) WHERE validated = TRUE; -- fingerprint_corpus_metadata indexes CREATE INDEX idx_fingerprint_corpus_tenant ON binaries.fingerprint_corpus_metadata(tenant_id); CREATE INDEX idx_fingerprint_corpus_purl ON binaries.fingerprint_corpus_metadata(purl, version); -- fingerprint_matches indexes CREATE INDEX idx_fingerprint_matches_tenant ON binaries.fingerprint_matches(tenant_id); CREATE INDEX idx_fingerprint_matches_scan ON binaries.fingerprint_matches(scan_id); CREATE INDEX idx_fingerprint_matches_type ON binaries.fingerprint_matches(match_type); CREATE INDEX idx_fingerprint_matches_purl ON binaries.fingerprint_matches(vulnerable_purl); -- ============================================================================ -- ROW-LEVEL SECURITY -- ============================================================================ -- Enable RLS on all tenant-scoped tables ALTER TABLE binaries.binary_identity ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.binary_identity FORCE ROW LEVEL SECURITY; CREATE POLICY binary_identity_tenant_isolation ON binaries.binary_identity FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.binary_package_map ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.binary_package_map FORCE ROW LEVEL SECURITY; CREATE POLICY binary_package_map_tenant_isolation ON binaries.binary_package_map FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.corpus_snapshots ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.corpus_snapshots FORCE ROW LEVEL SECURITY; CREATE POLICY corpus_snapshots_tenant_isolation ON binaries.corpus_snapshots FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.vulnerable_buildids ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.vulnerable_buildids FORCE ROW LEVEL SECURITY; CREATE POLICY vulnerable_buildids_tenant_isolation ON binaries.vulnerable_buildids FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.binary_vuln_assertion ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.binary_vuln_assertion FORCE ROW LEVEL SECURITY; CREATE POLICY binary_vuln_assertion_tenant_isolation ON binaries.binary_vuln_assertion FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.cve_fix_evidence ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.cve_fix_evidence FORCE ROW LEVEL SECURITY; CREATE POLICY cve_fix_evidence_tenant_isolation ON binaries.cve_fix_evidence FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.cve_fix_index ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.cve_fix_index FORCE ROW LEVEL SECURITY; CREATE POLICY cve_fix_index_tenant_isolation ON binaries.cve_fix_index FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.vulnerable_fingerprints ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.vulnerable_fingerprints FORCE ROW LEVEL SECURITY; CREATE POLICY vulnerable_fingerprints_tenant_isolation ON binaries.vulnerable_fingerprints FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.fingerprint_corpus_metadata ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.fingerprint_corpus_metadata FORCE ROW LEVEL SECURITY; CREATE POLICY fingerprint_corpus_metadata_tenant_isolation ON binaries.fingerprint_corpus_metadata FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ALTER TABLE binaries.fingerprint_matches ENABLE ROW LEVEL SECURITY; ALTER TABLE binaries.fingerprint_matches FORCE ROW LEVEL SECURITY; CREATE POLICY fingerprint_matches_tenant_isolation ON binaries.fingerprint_matches FOR ALL USING (tenant_id::text = binaries_app.require_current_tenant()) WITH CHECK (tenant_id::text = binaries_app.require_current_tenant()); ``` --- ## 3. Table Relationships ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ BINARIES SCHEMA │ │ │ │ ┌────────────────────┐ ┌────────────────────┐ │ │ │ corpus_snapshots │<────────│ binary_package_map │ │ │ │ (ingestion state) │ │ (binary→pkg) │ │ │ └─────────┬──────────┘ └────────┬───────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌────────────────────┐ │ │ └───────────────────>│ binary_identity │<─────────────────┐ │ │ │ (Build-ID, hashes) │ │ │ │ └────────┬───────────┘ │ │ │ │ │ │ │ ┌─────────────────────────────┼───────────────────────────────┤ │ │ │ │ │ │ │ ▼ ▼ │ │ │ ┌────────────────────┐ ┌─────────────────────┐ ┌──────────┴───┐ │ │ vulnerable_buildids│ │ binary_vuln_ │ │fingerprint_ │ │ │ (known vuln builds)│ │ assertion │ │matches │ │ └────────────────────┘ │ (CVE status) │ │(scan results)│ │ └─────────────────────┘ └──────────────┘ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ FIX INDEX (Patch-Aware) ││ │ │ ┌────────────────────┐ ┌────────────────────┐ ││ │ │ │ cve_fix_evidence │────────>│ cve_fix_index │ ││ │ │ │ (raw evidence) │ merge │ (merged best) │ ││ │ │ └────────────────────┘ └────────────────────┘ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ FINGERPRINTS ││ │ │ ┌────────────────────┐ ┌──────────────────────┐ ││ │ │ │vulnerable_ │ │fingerprint_corpus_ │ ││ │ │ │fingerprints │ │metadata │ ││ │ │ │(CVE fingerprints) │ │(what's indexed) │ ││ │ │ └────────────────────┘ └──────────────────────┘ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## 4. Query Patterns ### 4.1 Lookup by Build-ID ```sql -- Find vulnerabilities for a specific Build-ID SELECT ba.cve_id, ba.status, ba.confidence, ba.method FROM binaries.binary_vuln_assertion ba JOIN binaries.binary_identity bi ON bi.binary_key = ba.binary_key WHERE bi.build_id = :build_id AND bi.build_id_type = 'gnu-build-id' AND ba.status = 'affected'; ``` ### 4.2 Check Fix Status (Patch-Aware) ```sql -- Check if a CVE is fixed for a specific distro/package SELECT cfi.state, cfi.fixed_version, cfi.confidence, cfi.primary_method FROM binaries.cve_fix_index cfi WHERE cfi.distro = :distro AND cfi.release = :release AND cfi.source_pkg = :source_pkg AND cfi.cve_id = :cve_id; ``` ### 4.3 Fingerprint Similarity Search ```sql -- Find fingerprints with similar hash (requires application-level similarity) SELECT vf.cve_id, vf.component, vf.function_name, vf.confidence FROM binaries.vulnerable_fingerprints vf WHERE vf.algorithm = :algorithm AND vf.architecture = :architecture AND vf.validated = TRUE -- Application performs similarity comparison on fingerprint_hash ``` --- ## 5. Migration Strategy ### 5.1 Initial Migration ```sql -- V001__create_binaries_schema.sql -- Creates all tables, indexes, and RLS policies ``` ### 5.2 Seed Data ```sql -- S001__seed_reference_fingerprints.sql -- Seeds fingerprints for high-impact CVEs from golden corpus ``` --- ## 6. Performance Considerations ### 6.1 Table Sizing Estimates | Table | Expected Rows | Growth Rate | |-------|---------------|-------------| | binary_identity | 10M | 1M/month | | binary_package_map | 50M | 5M/month | | vulnerable_buildids | 1M | 100K/month | | cve_fix_index | 500K | 50K/month | | vulnerable_fingerprints | 100K | 10K/month | | fingerprint_matches | 10M | 1M/month | ### 6.2 Partitioning Candidates - `fingerprint_matches` - Partition by `matched_at` (monthly) - `cve_fix_evidence` - Partition by `created_at` (monthly) ### 6.3 Index Maintenance - Hash index on `fingerprint_hash` for exact matches - Consider bloom filter for fingerprint similarity pre-filtering --- *Document Version: 1.0.0* *Last Updated: 2025-12-21*