352 lines
9.3 KiB
Markdown
352 lines
9.3 KiB
Markdown
# Ground-Truth Corpus CLI Guide
|
|
|
|
**Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
|
|
|
|
## Overview
|
|
|
|
The `stella groundtruth` command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources.
|
|
|
|
## Use Cases
|
|
|
|
- **Security teams**: Validate patch presence in production binaries
|
|
- **Compliance auditors**: Generate evidence bundles for air-gapped verification
|
|
- **DevSecOps**: Integrate corpus validation into CI/CD pipelines
|
|
- **Researchers**: Query symbol databases for vulnerability analysis
|
|
|
|
## Prerequisites
|
|
|
|
- Stella CLI installed and configured
|
|
- Backend connectivity to Platform service (or offline bundle)
|
|
- For sync operations: network access to upstream sources
|
|
|
|
## Command Structure
|
|
|
|
```
|
|
stella groundtruth
|
|
├── sources # Manage symbol source connectors
|
|
│ ├── list # List available connectors
|
|
│ ├── enable # Enable a connector
|
|
│ ├── disable # Disable a connector
|
|
│ └── sync # Sync from upstream
|
|
├── symbols # Query symbols in corpus
|
|
│ ├── lookup # Lookup by debug ID
|
|
│ └── search # Search by package/distro
|
|
├── pairs # Manage security pairs
|
|
│ ├── create # Create vuln/patch pair
|
|
│ ├── list # List existing pairs
|
|
│ └── delete # Remove a pair
|
|
└── validate # Run validation harness
|
|
├── run # Execute validation
|
|
├── metrics # View run metrics
|
|
└── export # Export report
|
|
```
|
|
|
|
## Source Connectors
|
|
|
|
The ground-truth corpus ingests data from multiple upstream sources:
|
|
|
|
| Connector ID | Distribution | Data Type | Description |
|
|
|--------------|--------------|-----------|-------------|
|
|
| `debuginfod-fedora` | Fedora | Debug symbols | ELF debuginfo via debuginfod protocol |
|
|
| `debuginfod-ubuntu` | Ubuntu | Debug symbols | ELF debuginfo via debuginfod protocol |
|
|
| `ddeb-ubuntu` | Ubuntu | Debug packages | `.ddeb` debug symbol packages |
|
|
| `buildinfo-debian` | Debian | Build metadata | `.buildinfo` reproducibility records |
|
|
| `secdb-alpine` | Alpine | Security DB | `secfixes` YAML from APKBUILD |
|
|
|
|
### List Sources
|
|
|
|
```bash
|
|
stella groundtruth sources list
|
|
|
|
# Output:
|
|
ID Display Name Status Last Sync
|
|
------------------------------------------------------------------------------------------
|
|
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
|
|
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
|
|
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
|
|
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
|
|
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
|
|
```
|
|
|
|
### Enable/Disable Sources
|
|
|
|
```bash
|
|
# Enable a source connector
|
|
stella groundtruth sources enable debuginfod-fedora
|
|
|
|
# Disable a source connector (stops future syncs)
|
|
stella groundtruth sources disable debuginfod-fedora
|
|
```
|
|
|
|
### Sync Sources
|
|
|
|
```bash
|
|
# Incremental sync of all enabled sources
|
|
stella groundtruth sources sync
|
|
|
|
# Full sync of a specific source
|
|
stella groundtruth sources sync --source buildinfo-debian --full
|
|
|
|
# Sync with verbose output
|
|
stella groundtruth sources sync --source ddeb-ubuntu -v
|
|
```
|
|
|
|
## Symbol Operations
|
|
|
|
### Lookup by Debug ID
|
|
|
|
Query symbols using the ELF GNU Build-ID or equivalent identifier:
|
|
|
|
```bash
|
|
# Lookup by build-id
|
|
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a
|
|
|
|
# JSON output
|
|
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
|
|
```
|
|
|
|
**Example output:**
|
|
```
|
|
Binary: libcrypto.so.3
|
|
Architecture: x86_64
|
|
Distribution: debian-bookworm
|
|
Package: openssl@3.0.11-1
|
|
Symbol Count: 4523
|
|
Sources: debuginfod-fedora, buildinfo-debian
|
|
```
|
|
|
|
### Search Symbols
|
|
|
|
Search across the corpus by package name or distribution:
|
|
|
|
```bash
|
|
# Search by package
|
|
stella groundtruth symbols search --package openssl
|
|
|
|
# Filter by distribution
|
|
stella groundtruth symbols search --package openssl --distro debian
|
|
|
|
# Limit results
|
|
stella groundtruth symbols search --package curl --limit 100
|
|
```
|
|
|
|
## Security Pairs
|
|
|
|
Security pairs link vulnerable and patched binary versions for a specific CVE.
|
|
|
|
### Create a Pair
|
|
|
|
```bash
|
|
stella groundtruth pairs create \
|
|
--cve CVE-2024-1234 \
|
|
--vuln-pkg openssl=3.0.10-1 \
|
|
--patch-pkg openssl=3.0.11-1 \
|
|
--distro debian-bookworm
|
|
```
|
|
|
|
### List Pairs
|
|
|
|
```bash
|
|
# List all pairs
|
|
stella groundtruth pairs list
|
|
|
|
# Filter by CVE pattern
|
|
stella groundtruth pairs list --cve "CVE-2024-*"
|
|
|
|
# Filter by package
|
|
stella groundtruth pairs list --package openssl --limit 50
|
|
|
|
# JSON output
|
|
stella groundtruth pairs list --output-format json
|
|
```
|
|
|
|
**Example output:**
|
|
```
|
|
Pair ID CVE Package Vuln Version Patch Version
|
|
-------------------------------------------------------------------------------
|
|
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
|
|
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
|
|
```
|
|
|
|
### Delete a Pair
|
|
|
|
```bash
|
|
# Delete with confirmation prompt
|
|
stella groundtruth pairs delete pair-001
|
|
|
|
# Skip confirmation
|
|
stella groundtruth pairs delete pair-001 --force
|
|
```
|
|
|
|
## Validation Harness
|
|
|
|
The validation harness runs end-to-end verification against security pairs.
|
|
|
|
### Run Validation
|
|
|
|
```bash
|
|
# Validate all pairs
|
|
stella groundtruth validate run
|
|
|
|
# Validate specific pairs (pattern match)
|
|
stella groundtruth validate run --pairs "openssl:CVE-2024-*"
|
|
|
|
# Use specific matcher
|
|
stella groundtruth validate run --matcher semantic-diffing
|
|
|
|
# Parallel validation with report output
|
|
stella groundtruth validate run \
|
|
--pairs "curl:*" \
|
|
--parallel 8 \
|
|
--output validation-report.md
|
|
```
|
|
|
|
**Matcher types:**
|
|
| Matcher | Description |
|
|
|---------|-------------|
|
|
| `semantic-diffing` | IR-level semantic comparison (default) |
|
|
| `hash-based` | Function hash matching |
|
|
| `hybrid` | Combined semantic + hash approach |
|
|
|
|
### View Metrics
|
|
|
|
```bash
|
|
stella groundtruth validate metrics --run-id vr-20260122100532
|
|
|
|
# JSON output
|
|
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
|
|
```
|
|
|
|
**Example output:**
|
|
```
|
|
Run ID: vr-20260122100532
|
|
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
|
|
Pairs: 48/50 successful
|
|
Function Match Rate: 94.2%
|
|
False-Negative Rate: 2.1%
|
|
SBOM Hash Stability: 3/3
|
|
Verify Time (p50/p95): 423ms / 1.2s
|
|
```
|
|
|
|
### Export Reports
|
|
|
|
```bash
|
|
# Export as Markdown
|
|
stella groundtruth validate export \
|
|
--run-id vr-20260122100532 \
|
|
--format markdown \
|
|
--output report.md
|
|
|
|
# Export as HTML
|
|
stella groundtruth validate export \
|
|
--run-id vr-20260122100532 \
|
|
--format html \
|
|
--output report.html
|
|
|
|
# Export as JSON (machine-readable)
|
|
stella groundtruth validate export \
|
|
--run-id vr-20260122100532 \
|
|
--format json \
|
|
--output report.json
|
|
```
|
|
|
|
## CI/CD Integration
|
|
|
|
### GitHub Actions Example
|
|
|
|
```yaml
|
|
name: Corpus Validation
|
|
on:
|
|
schedule:
|
|
- cron: '0 6 * * 1' # Weekly on Monday
|
|
|
|
jobs:
|
|
validate:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Sync corpus sources
|
|
run: stella groundtruth sources sync
|
|
|
|
- name: Run validation
|
|
run: |
|
|
stella groundtruth validate run \
|
|
--matcher semantic-diffing \
|
|
--parallel 4 \
|
|
--output validation-${{ github.run_id }}.md
|
|
|
|
- name: Check metrics
|
|
run: |
|
|
MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate')
|
|
if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then
|
|
echo "Match rate below threshold: $MATCH_RATE%"
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
### GitLab CI Example
|
|
|
|
```yaml
|
|
corpus-validation:
|
|
stage: verify
|
|
script:
|
|
- stella groundtruth sources sync --source buildinfo-debian
|
|
- stella groundtruth validate run --pairs "openssl:*" --output report.md
|
|
artifacts:
|
|
paths:
|
|
- report.md
|
|
expire_in: 1 week
|
|
rules:
|
|
- if: $CI_PIPELINE_SOURCE == "schedule"
|
|
```
|
|
|
|
## Offline Usage
|
|
|
|
For air-gapped environments, use offline bundles:
|
|
|
|
```bash
|
|
# Export corpus for offline use
|
|
stella bundle export \
|
|
--include-corpus \
|
|
--output corpus-bundle-$(date +%F).tar.gz
|
|
|
|
# Import on air-gapped system
|
|
stella bundle import --package corpus-bundle-2026-01-22.tar.gz
|
|
|
|
# Run validation offline
|
|
stella groundtruth validate run --offline
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Sync fails with network error:**
|
|
```bash
|
|
# Check source status
|
|
stella groundtruth sources list
|
|
|
|
# Retry with verbose output
|
|
stella groundtruth sources sync --source debuginfod-ubuntu -v
|
|
```
|
|
|
|
**Symbol lookup returns no results:**
|
|
```bash
|
|
# Verify debug-id format (hex string)
|
|
stella groundtruth symbols lookup --debug-id abc123 -v
|
|
|
|
# Try searching by package instead
|
|
stella groundtruth symbols search --package libcrypto
|
|
```
|
|
|
|
**Validation metrics show low match rate:**
|
|
- Check that both vuln and patch binaries are present in corpus
|
|
- Verify symbol sources are synced and enabled
|
|
- Consider using `hybrid` matcher for complex cases
|
|
|
|
## See Also
|
|
|
|
- [CLI Command Reference](commands/reference.md#ground-truth-corpus-commands)
|
|
- [BinaryIndex Architecture](../../binary-index/architecture.md)
|
|
- [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md)
|
|
- [Air-Gap Bundle Guide](../../modules/airgap/README.md)
|