# Golden Corpus KPI Specification > **Version**: 1.0.0 > **Last Updated**: 2026-01-21 > **Source Advisory**: Golden Corpus Patch-Paired Artifacts Advisory This document specifies the Key Performance Indicators (KPIs) for the golden corpus of patch-paired artifacts, enabling measurement of SBOM reproducibility and binary-level patch provenance verification. --- ## Overview The golden corpus KPIs measure: 1. **Accuracy** - How well the system detects patched vs. vulnerable code 2. **Reproducibility** - Whether outputs are deterministic across runs 3. **Performance** - Time to verify evidence offline These metrics enable regression detection in CI and demonstrate corpus quality for auditors. --- ## KPI Definitions ### Per-Target KPIs Computed for each artifact pair in the corpus: | KPI | Formula | Target | Description | |-----|---------|--------|-------------| | **Per-function match rate** | `matched_functions_after / total_functions_post * 100` | >= 90% | Percentage of post-patch functions matched by the system | | **False-negative patch detection** | `missed_patched_funcs / total_true_patched_funcs * 100` | <= 5% | Percentage of known-patched functions incorrectly classified | | **SBOM canonical-hash stability** | `runs_with_same_hash / 3` | 3/3 | Determinism across 3 independent runs | | **Binary reconstruction equivalence** | `bytewise_equiv_rebuild / 1` | 1/1 (trend) | Whether rebuilt binary matches original | ### Aggregate KPIs Computed across the entire corpus: | KPI | Formula | Target | Description | |-----|---------|--------|-------------| | **Corpus precision** | `TP / (TP + FP)` | >= 95% | Overall precision of vulnerability detection | | **Corpus recall** | `TP / (TP + FN)` | >= 90% | Overall recall of vulnerability detection | | **F1 score** | `2 * (precision * recall) / (precision + recall)` | >= 92% | Harmonic mean of precision and recall | | **Deterministic replay rate** | `deterministic_pairs / total_pairs` | 100% | Pairs with identical results across runs | | **Verify time (median, cold)** | `p50(verify_time_cold)` | Track trend | Cold-start offline verification time | | **Verify time (p95, cold)** | `p95(verify_time_cold)` | Track trend | 95th percentile cold verification time | --- ## Measurement Methodology ### Function Match Rate ``` Input: Post-patch binary B_post, ground-truth function list F_gt Output: Match rate percentage 1. Lift all functions in B_post to IR 2. Generate semantic fingerprints for each function 3. For each f in F_gt: - Find best-matching function in B_post by fingerprint similarity - Mark as matched if similarity >= 0.90 4. match_rate = |matched| / |F_gt| * 100 ``` ### False-Negative Detection ``` Input: Pre-patch binary B_pre, post-patch binary B_post, CVE patch metadata Output: False-negative rate percentage 1. Identify functions modified by the CVE patch (from delta-sig) 2. For each modified function f_patched: - Compare fingerprint(f_pre) vs fingerprint(f_post) - Mark as "detected" if diff confidence >= 0.85 3. false_neg_rate = |undetected| / |f_patched| * 100 ``` ### SBOM Canonical-Hash Stability ``` Input: Target artifact A Output: Stability score (0, 1, 2, or 3) 1. For i in 1..3: - Spawn fresh process (no cache) - Generate SBOM for A - Compute canonical hash H_i 2. stability = count of (H_i == H_1) ``` ### Binary Reconstruction Equivalence ``` Input: Source package S, original binary B_orig Output: Equivalence boolean 1. Rebuild S in deterministic chroot with SOURCE_DATE_EPOCH 2. Extract rebuilt binary B_rebuilt 3. equivalence = (sha256(B_orig) == sha256(B_rebuilt)) ``` --- ## CI Regression Gates ### Gate Thresholds | Metric | Fail Threshold | Warn Threshold | |--------|----------------|----------------| | Precision delta | > -1.0 pp | > -0.5 pp | | Recall delta | > -1.0 pp | > -0.5 pp | | F1 delta | > -1.0 pp | > -0.5 pp | | False-negative rate delta | > +1.0 pp | > +0.5 pp | | Deterministic replay | < 100% | N/A | | TTFRP p95 delta | > +20% | > +10% | ### Gate Actions - **Fail**: Block merge, require investigation - **Warn**: Allow merge, create tracking issue - **Pass**: No action required ### Baseline Management ```bash # View current baseline stella groundtruth baseline show # Update baseline after validated improvements stella groundtruth baseline update \ --results bench/results/20260121.json \ --output bench/baselines/current.json \ --reason "Improved semantic matching accuracy" # Compare results against baseline stella groundtruth validate check \ --results bench/results/20260121.json \ --baseline bench/baselines/current.json ``` --- ## Database Schema ```sql -- KPI storage for validation runs CREATE TABLE groundtruth.validation_kpis ( run_id UUID PRIMARY KEY, tenant_id TEXT NOT NULL, corpus_version TEXT NOT NULL, scanner_version TEXT NOT NULL, -- Per-run aggregates pair_count INT NOT NULL, function_match_rate_mean DECIMAL(5,2), function_match_rate_min DECIMAL(5,2), function_match_rate_max DECIMAL(5,2), false_negative_rate_mean DECIMAL(5,2), false_negative_rate_max DECIMAL(5,2), -- Stability metrics sbom_hash_stability_3of3_count INT, sbom_hash_stability_2of3_count INT, sbom_hash_stability_1of3_count INT, reconstruction_equiv_count INT, reconstruction_total_count INT, -- Performance metrics verify_time_median_ms INT, verify_time_p95_ms INT, verify_time_p99_ms INT, -- Computed aggregates precision DECIMAL(5,4), recall DECIMAL(5,4), f1_score DECIMAL(5,4), deterministic_replay_rate DECIMAL(5,4), computed_at TIMESTAMPTZ NOT NULL DEFAULT now(), -- Indexing CONSTRAINT fk_tenant FOREIGN KEY (tenant_id) REFERENCES tenants.tenant(id) ); CREATE INDEX idx_validation_kpis_tenant_time ON groundtruth.validation_kpis(tenant_id, computed_at DESC); CREATE INDEX idx_validation_kpis_corpus_version ON groundtruth.validation_kpis(corpus_version, computed_at DESC); -- Baseline storage CREATE TABLE groundtruth.kpi_baselines ( baseline_id UUID PRIMARY KEY, tenant_id TEXT NOT NULL, corpus_version TEXT NOT NULL, -- Reference metrics precision_baseline DECIMAL(5,4) NOT NULL, recall_baseline DECIMAL(5,4) NOT NULL, f1_baseline DECIMAL(5,4) NOT NULL, fn_rate_baseline DECIMAL(5,4) NOT NULL, verify_p95_baseline_ms INT NOT NULL, -- Metadata source_run_id UUID REFERENCES groundtruth.validation_kpis(run_id), created_at TIMESTAMPTZ NOT NULL DEFAULT now(), created_by TEXT NOT NULL, reason TEXT, is_active BOOLEAN NOT NULL DEFAULT true ); CREATE UNIQUE INDEX idx_kpi_baselines_active ON groundtruth.kpi_baselines(tenant_id, corpus_version) WHERE is_active = true; ``` --- ## Reporting ### Validation Run Report (Markdown) ```markdown # Golden Corpus Validation Report **Run ID:** bench-20260121-001 **Timestamp:** 2026-01-21T03:00:00Z **Corpus Version:** 1.0.0 **Scanner Version:** 1.5.0 ## Summary | Metric | Value | Target | Status | |--------|-------|--------|--------| | Precision | 96.2% | >= 95% | PASS | | Recall | 91.5% | >= 90% | PASS | | F1 Score | 93.8% | >= 92% | PASS | | False-Negative Rate | 3.2% | <= 5% | PASS | | Deterministic Replay | 100% | 100% | PASS | | SBOM Hash Stability | 10/10 3/3 | All 3/3 | PASS | | Verify Time (p95) | 420ms | Trend | - | ## Regression Check Compared against baseline `baseline-20260115-001`: | Metric | Baseline | Current | Delta | Status | |--------|----------|---------|-------|--------| | Precision | 95.8% | 96.2% | +0.4 pp | IMPROVED | | Recall | 91.2% | 91.5% | +0.3 pp | IMPROVED | | Verify p95 | 450ms | 420ms | -6.7% | IMPROVED | ## Per-Package Results | Package | Advisory | Match Rate | FN Rate | SBOM Stable | Recon Equiv | |---------|----------|------------|---------|-------------|-------------| | openssl | DSA-5678 | 94.2% | 2.1% | 3/3 | Yes | | zlib | DSA-5432 | 98.1% | 0.0% | 3/3 | Yes | | curl | DSA-5555 | 91.8% | 4.5% | 3/3 | No | ... ``` ### JSON Report Schema ```json { "$schema": "https://stellaops.io/schemas/validation-report.v1.json", "runId": "bench-20260121-001", "timestamp": "2026-01-21T03:00:00Z", "corpusVersion": "1.0.0", "scannerVersion": "1.5.0", "metrics": { "precision": 0.962, "recall": 0.915, "f1Score": 0.938, "falseNegativeRate": 0.032, "deterministicReplayRate": 1.0, "verifyTimeMedianMs": 280, "verifyTimeP95Ms": 420 }, "regressionCheck": { "baselineId": "baseline-20260115-001", "precisionDelta": 0.004, "recallDelta": 0.003, "status": "pass" }, "packages": [ { "package": "openssl", "advisory": "DSA-5678", "matchRate": 0.942, "falseNegativeRate": 0.021, "sbomHashStability": 3, "reconstructionEquivalent": true, "verifyTimeMs": 350 } ] } ``` --- ## Related Documentation - [Ground-Truth Corpus Specification](ground-truth-corpus.md) - [BinaryIndex Architecture](../modules/binary-index/architecture.md) - [Golden Corpus Seed List](golden-corpus-seed-list.md) - [Determinism and Reproducibility Reference](../product/advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)