# Ground-Truth Corpus CLI Guide **Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli ## Overview The `stella groundtruth` command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources. ## Use Cases - **Security teams**: Validate patch presence in production binaries - **Compliance auditors**: Generate evidence bundles for air-gapped verification - **DevSecOps**: Integrate corpus validation into CI/CD pipelines - **Researchers**: Query symbol databases for vulnerability analysis ## Prerequisites - Stella CLI installed and configured - Backend connectivity to Platform service (or offline bundle) - For sync operations: network access to upstream sources ## Command Structure ``` stella groundtruth ├── sources # Manage symbol source connectors │ ├── list # List available connectors │ ├── enable # Enable a connector │ ├── disable # Disable a connector │ └── sync # Sync from upstream ├── symbols # Query symbols in corpus │ ├── lookup # Lookup by debug ID │ └── search # Search by package/distro ├── pairs # Manage security pairs │ ├── create # Create vuln/patch pair │ ├── list # List existing pairs │ └── delete # Remove a pair └── validate # Run validation harness ├── run # Execute validation ├── metrics # View run metrics └── export # Export report ``` ## Source Connectors The ground-truth corpus ingests data from multiple upstream sources: | Connector ID | Distribution | Data Type | Description | |--------------|--------------|-----------|-------------| | `debuginfod-fedora` | Fedora | Debug symbols | ELF debuginfo via debuginfod protocol | | `debuginfod-ubuntu` | Ubuntu | Debug symbols | ELF debuginfo via debuginfod protocol | | `ddeb-ubuntu` | Ubuntu | Debug packages | `.ddeb` debug symbol packages | | `buildinfo-debian` | Debian | Build metadata | `.buildinfo` reproducibility records | | `secdb-alpine` | Alpine | Security DB | `secfixes` YAML from APKBUILD | ### List Sources ```bash stella groundtruth sources list # Output: ID Display Name Status Last Sync ------------------------------------------------------------------------------------------ debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z ``` ### Enable/Disable Sources ```bash # Enable a source connector stella groundtruth sources enable debuginfod-fedora # Disable a source connector (stops future syncs) stella groundtruth sources disable debuginfod-fedora ``` ### Sync Sources ```bash # Incremental sync of all enabled sources stella groundtruth sources sync # Full sync of a specific source stella groundtruth sources sync --source buildinfo-debian --full # Sync with verbose output stella groundtruth sources sync --source ddeb-ubuntu -v ``` ## Symbol Operations ### Lookup by Debug ID Query symbols using the ELF GNU Build-ID or equivalent identifier: ```bash # Lookup by build-id stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a # JSON output stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json ``` **Example output:** ``` Binary: libcrypto.so.3 Architecture: x86_64 Distribution: debian-bookworm Package: openssl@3.0.11-1 Symbol Count: 4523 Sources: debuginfod-fedora, buildinfo-debian ``` ### Search Symbols Search across the corpus by package name or distribution: ```bash # Search by package stella groundtruth symbols search --package openssl # Filter by distribution stella groundtruth symbols search --package openssl --distro debian # Limit results stella groundtruth symbols search --package curl --limit 100 ``` ## Security Pairs Security pairs link vulnerable and patched binary versions for a specific CVE. ### Create a Pair ```bash stella groundtruth pairs create \ --cve CVE-2024-1234 \ --vuln-pkg openssl=3.0.10-1 \ --patch-pkg openssl=3.0.11-1 \ --distro debian-bookworm ``` ### List Pairs ```bash # List all pairs stella groundtruth pairs list # Filter by CVE pattern stella groundtruth pairs list --cve "CVE-2024-*" # Filter by package stella groundtruth pairs list --package openssl --limit 50 # JSON output stella groundtruth pairs list --output-format json ``` **Example output:** ``` Pair ID CVE Package Vuln Version Patch Version ------------------------------------------------------------------------------- pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1 pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1 ``` ### Delete a Pair ```bash # Delete with confirmation prompt stella groundtruth pairs delete pair-001 # Skip confirmation stella groundtruth pairs delete pair-001 --force ``` ## Validation Harness The validation harness runs end-to-end verification against security pairs. ### Run Validation ```bash # Validate all pairs stella groundtruth validate run # Validate specific pairs (pattern match) stella groundtruth validate run --pairs "openssl:CVE-2024-*" # Use specific matcher stella groundtruth validate run --matcher semantic-diffing # Parallel validation with report output stella groundtruth validate run \ --pairs "curl:*" \ --parallel 8 \ --output validation-report.md ``` **Matcher types:** | Matcher | Description | |---------|-------------| | `semantic-diffing` | IR-level semantic comparison (default) | | `hash-based` | Function hash matching | | `hybrid` | Combined semantic + hash approach | ### View Metrics ```bash stella groundtruth validate metrics --run-id vr-20260122100532 # JSON output stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json ``` **Example output:** ``` Run ID: vr-20260122100532 Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z Pairs: 48/50 successful Function Match Rate: 94.2% False-Negative Rate: 2.1% SBOM Hash Stability: 3/3 Verify Time (p50/p95): 423ms / 1.2s ``` ### Export Reports ```bash # Export as Markdown stella groundtruth validate export \ --run-id vr-20260122100532 \ --format markdown \ --output report.md # Export as HTML stella groundtruth validate export \ --run-id vr-20260122100532 \ --format html \ --output report.html # Export as JSON (machine-readable) stella groundtruth validate export \ --run-id vr-20260122100532 \ --format json \ --output report.json ``` ## CI/CD Integration ### GitHub Actions Example ```yaml name: Corpus Validation on: schedule: - cron: '0 6 * * 1' # Weekly on Monday jobs: validate: runs-on: ubuntu-latest steps: - name: Sync corpus sources run: stella groundtruth sources sync - name: Run validation run: | stella groundtruth validate run \ --matcher semantic-diffing \ --parallel 4 \ --output validation-${{ github.run_id }}.md - name: Check metrics run: | MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate') if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then echo "Match rate below threshold: $MATCH_RATE%" exit 1 fi ``` ### GitLab CI Example ```yaml corpus-validation: stage: verify script: - stella groundtruth sources sync --source buildinfo-debian - stella groundtruth validate run --pairs "openssl:*" --output report.md artifacts: paths: - report.md expire_in: 1 week rules: - if: $CI_PIPELINE_SOURCE == "schedule" ``` ## Offline Usage For air-gapped environments, use offline bundles: ```bash # Export corpus for offline use stella bundle export \ --include-corpus \ --output corpus-bundle-$(date +%F).tar.gz # Import on air-gapped system stella bundle import --package corpus-bundle-2026-01-22.tar.gz # Run validation offline stella groundtruth validate run --offline ``` ## Troubleshooting ### Common Issues **Sync fails with network error:** ```bash # Check source status stella groundtruth sources list # Retry with verbose output stella groundtruth sources sync --source debuginfod-ubuntu -v ``` **Symbol lookup returns no results:** ```bash # Verify debug-id format (hex string) stella groundtruth symbols lookup --debug-id abc123 -v # Try searching by package instead stella groundtruth symbols search --package libcrypto ``` **Validation metrics show low match rate:** - Check that both vuln and patch binaries are present in corpus - Verify symbol sources are synced and enabled - Consider using `hybrid` matcher for complex cases ## See Also - [CLI Command Reference](commands/reference.md#ground-truth-corpus-commands) - [BinaryIndex Architecture](../../binary-index/architecture.md) - [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md) - [Air-Gap Bundle Guide](../../modules/airgap/README.md)