9.3 KiB
Ground-Truth Corpus CLI Guide
Sprint: SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
Overview
The stella groundtruth command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources.
Use Cases
- Security teams: Validate patch presence in production binaries
- Compliance auditors: Generate evidence bundles for air-gapped verification
- DevSecOps: Integrate corpus validation into CI/CD pipelines
- Researchers: Query symbol databases for vulnerability analysis
Prerequisites
- Stella CLI installed and configured
- Backend connectivity to Platform service (or offline bundle)
- For sync operations: network access to upstream sources
Command Structure
stella groundtruth
├── sources # Manage symbol source connectors
│ ├── list # List available connectors
│ ├── enable # Enable a connector
│ ├── disable # Disable a connector
│ └── sync # Sync from upstream
├── symbols # Query symbols in corpus
│ ├── lookup # Lookup by debug ID
│ └── search # Search by package/distro
├── pairs # Manage security pairs
│ ├── create # Create vuln/patch pair
│ ├── list # List existing pairs
│ └── delete # Remove a pair
└── validate # Run validation harness
├── run # Execute validation
├── metrics # View run metrics
└── export # Export report
Source Connectors
The ground-truth corpus ingests data from multiple upstream sources:
| Connector ID | Distribution | Data Type | Description |
|---|---|---|---|
debuginfod-fedora |
Fedora | Debug symbols | ELF debuginfo via debuginfod protocol |
debuginfod-ubuntu |
Ubuntu | Debug symbols | ELF debuginfo via debuginfod protocol |
ddeb-ubuntu |
Ubuntu | Debug packages | .ddeb debug symbol packages |
buildinfo-debian |
Debian | Build metadata | .buildinfo reproducibility records |
secdb-alpine |
Alpine | Security DB | secfixes YAML from APKBUILD |
List Sources
stella groundtruth sources list
# Output:
ID Display Name Status Last Sync
------------------------------------------------------------------------------------------
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
Enable/Disable Sources
# Enable a source connector
stella groundtruth sources enable debuginfod-fedora
# Disable a source connector (stops future syncs)
stella groundtruth sources disable debuginfod-fedora
Sync Sources
# Incremental sync of all enabled sources
stella groundtruth sources sync
# Full sync of a specific source
stella groundtruth sources sync --source buildinfo-debian --full
# Sync with verbose output
stella groundtruth sources sync --source ddeb-ubuntu -v
Symbol Operations
Lookup by Debug ID
Query symbols using the ELF GNU Build-ID or equivalent identifier:
# Lookup by build-id
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a
# JSON output
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
Example output:
Binary: libcrypto.so.3
Architecture: x86_64
Distribution: debian-bookworm
Package: openssl@3.0.11-1
Symbol Count: 4523
Sources: debuginfod-fedora, buildinfo-debian
Search Symbols
Search across the corpus by package name or distribution:
# Search by package
stella groundtruth symbols search --package openssl
# Filter by distribution
stella groundtruth symbols search --package openssl --distro debian
# Limit results
stella groundtruth symbols search --package curl --limit 100
Security Pairs
Security pairs link vulnerable and patched binary versions for a specific CVE.
Create a Pair
stella groundtruth pairs create \
--cve CVE-2024-1234 \
--vuln-pkg openssl=3.0.10-1 \
--patch-pkg openssl=3.0.11-1 \
--distro debian-bookworm
List Pairs
# List all pairs
stella groundtruth pairs list
# Filter by CVE pattern
stella groundtruth pairs list --cve "CVE-2024-*"
# Filter by package
stella groundtruth pairs list --package openssl --limit 50
# JSON output
stella groundtruth pairs list --output-format json
Example output:
Pair ID CVE Package Vuln Version Patch Version
-------------------------------------------------------------------------------
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
Delete a Pair
# Delete with confirmation prompt
stella groundtruth pairs delete pair-001
# Skip confirmation
stella groundtruth pairs delete pair-001 --force
Validation Harness
The validation harness runs end-to-end verification against security pairs.
Run Validation
# Validate all pairs
stella groundtruth validate run
# Validate specific pairs (pattern match)
stella groundtruth validate run --pairs "openssl:CVE-2024-*"
# Use specific matcher
stella groundtruth validate run --matcher semantic-diffing
# Parallel validation with report output
stella groundtruth validate run \
--pairs "curl:*" \
--parallel 8 \
--output validation-report.md
Matcher types:
| Matcher | Description |
|---|---|
semantic-diffing |
IR-level semantic comparison (default) |
hash-based |
Function hash matching |
hybrid |
Combined semantic + hash approach |
View Metrics
stella groundtruth validate metrics --run-id vr-20260122100532
# JSON output
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
Example output:
Run ID: vr-20260122100532
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
Pairs: 48/50 successful
Function Match Rate: 94.2%
False-Negative Rate: 2.1%
SBOM Hash Stability: 3/3
Verify Time (p50/p95): 423ms / 1.2s
Export Reports
# Export as Markdown
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format markdown \
--output report.md
# Export as HTML
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format html \
--output report.html
# Export as JSON (machine-readable)
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format json \
--output report.json
CI/CD Integration
GitHub Actions Example
name: Corpus Validation
on:
schedule:
- cron: '0 6 * * 1' # Weekly on Monday
jobs:
validate:
runs-on: ubuntu-latest
steps:
- name: Sync corpus sources
run: stella groundtruth sources sync
- name: Run validation
run: |
stella groundtruth validate run \
--matcher semantic-diffing \
--parallel 4 \
--output validation-${{ github.run_id }}.md
- name: Check metrics
run: |
MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate')
if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then
echo "Match rate below threshold: $MATCH_RATE%"
exit 1
fi
GitLab CI Example
corpus-validation:
stage: verify
script:
- stella groundtruth sources sync --source buildinfo-debian
- stella groundtruth validate run --pairs "openssl:*" --output report.md
artifacts:
paths:
- report.md
expire_in: 1 week
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
Offline Usage
For air-gapped environments, use offline bundles:
# Export corpus for offline use
stella bundle export \
--include-corpus \
--output corpus-bundle-$(date +%F).tar.gz
# Import on air-gapped system
stella bundle import --package corpus-bundle-2026-01-22.tar.gz
# Run validation offline
stella groundtruth validate run --offline
Troubleshooting
Common Issues
Sync fails with network error:
# Check source status
stella groundtruth sources list
# Retry with verbose output
stella groundtruth sources sync --source debuginfod-ubuntu -v
Symbol lookup returns no results:
# Verify debug-id format (hex string)
stella groundtruth symbols lookup --debug-id abc123 -v
# Try searching by package instead
stella groundtruth symbols search --package libcrypto
Validation metrics show low match rate:
- Check that both vuln and patch binaries are present in corpus
- Verify symbol sources are synced and enabled
- Consider using
hybridmatcher for complex cases