Files
git.stella-ops.org/docs/modules/cli/guides/ground-truth-cli.md
2026-01-22 19:08:46 +02:00

352 lines
9.3 KiB
Markdown

# Ground-Truth Corpus CLI Guide
**Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
## Overview
The `stella groundtruth` command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources.
## Use Cases
- **Security teams**: Validate patch presence in production binaries
- **Compliance auditors**: Generate evidence bundles for air-gapped verification
- **DevSecOps**: Integrate corpus validation into CI/CD pipelines
- **Researchers**: Query symbol databases for vulnerability analysis
## Prerequisites
- Stella CLI installed and configured
- Backend connectivity to Platform service (or offline bundle)
- For sync operations: network access to upstream sources
## Command Structure
```
stella groundtruth
├── sources # Manage symbol source connectors
│ ├── list # List available connectors
│ ├── enable # Enable a connector
│ ├── disable # Disable a connector
│ └── sync # Sync from upstream
├── symbols # Query symbols in corpus
│ ├── lookup # Lookup by debug ID
│ └── search # Search by package/distro
├── pairs # Manage security pairs
│ ├── create # Create vuln/patch pair
│ ├── list # List existing pairs
│ └── delete # Remove a pair
└── validate # Run validation harness
├── run # Execute validation
├── metrics # View run metrics
└── export # Export report
```
## Source Connectors
The ground-truth corpus ingests data from multiple upstream sources:
| Connector ID | Distribution | Data Type | Description |
|--------------|--------------|-----------|-------------|
| `debuginfod-fedora` | Fedora | Debug symbols | ELF debuginfo via debuginfod protocol |
| `debuginfod-ubuntu` | Ubuntu | Debug symbols | ELF debuginfo via debuginfod protocol |
| `ddeb-ubuntu` | Ubuntu | Debug packages | `.ddeb` debug symbol packages |
| `buildinfo-debian` | Debian | Build metadata | `.buildinfo` reproducibility records |
| `secdb-alpine` | Alpine | Security DB | `secfixes` YAML from APKBUILD |
### List Sources
```bash
stella groundtruth sources list
# Output:
ID Display Name Status Last Sync
------------------------------------------------------------------------------------------
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
```
### Enable/Disable Sources
```bash
# Enable a source connector
stella groundtruth sources enable debuginfod-fedora
# Disable a source connector (stops future syncs)
stella groundtruth sources disable debuginfod-fedora
```
### Sync Sources
```bash
# Incremental sync of all enabled sources
stella groundtruth sources sync
# Full sync of a specific source
stella groundtruth sources sync --source buildinfo-debian --full
# Sync with verbose output
stella groundtruth sources sync --source ddeb-ubuntu -v
```
## Symbol Operations
### Lookup by Debug ID
Query symbols using the ELF GNU Build-ID or equivalent identifier:
```bash
# Lookup by build-id
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a
# JSON output
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
```
**Example output:**
```
Binary: libcrypto.so.3
Architecture: x86_64
Distribution: debian-bookworm
Package: openssl@3.0.11-1
Symbol Count: 4523
Sources: debuginfod-fedora, buildinfo-debian
```
### Search Symbols
Search across the corpus by package name or distribution:
```bash
# Search by package
stella groundtruth symbols search --package openssl
# Filter by distribution
stella groundtruth symbols search --package openssl --distro debian
# Limit results
stella groundtruth symbols search --package curl --limit 100
```
## Security Pairs
Security pairs link vulnerable and patched binary versions for a specific CVE.
### Create a Pair
```bash
stella groundtruth pairs create \
--cve CVE-2024-1234 \
--vuln-pkg openssl=3.0.10-1 \
--patch-pkg openssl=3.0.11-1 \
--distro debian-bookworm
```
### List Pairs
```bash
# List all pairs
stella groundtruth pairs list
# Filter by CVE pattern
stella groundtruth pairs list --cve "CVE-2024-*"
# Filter by package
stella groundtruth pairs list --package openssl --limit 50
# JSON output
stella groundtruth pairs list --output-format json
```
**Example output:**
```
Pair ID CVE Package Vuln Version Patch Version
-------------------------------------------------------------------------------
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
```
### Delete a Pair
```bash
# Delete with confirmation prompt
stella groundtruth pairs delete pair-001
# Skip confirmation
stella groundtruth pairs delete pair-001 --force
```
## Validation Harness
The validation harness runs end-to-end verification against security pairs.
### Run Validation
```bash
# Validate all pairs
stella groundtruth validate run
# Validate specific pairs (pattern match)
stella groundtruth validate run --pairs "openssl:CVE-2024-*"
# Use specific matcher
stella groundtruth validate run --matcher semantic-diffing
# Parallel validation with report output
stella groundtruth validate run \
--pairs "curl:*" \
--parallel 8 \
--output validation-report.md
```
**Matcher types:**
| Matcher | Description |
|---------|-------------|
| `semantic-diffing` | IR-level semantic comparison (default) |
| `hash-based` | Function hash matching |
| `hybrid` | Combined semantic + hash approach |
### View Metrics
```bash
stella groundtruth validate metrics --run-id vr-20260122100532
# JSON output
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
```
**Example output:**
```
Run ID: vr-20260122100532
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
Pairs: 48/50 successful
Function Match Rate: 94.2%
False-Negative Rate: 2.1%
SBOM Hash Stability: 3/3
Verify Time (p50/p95): 423ms / 1.2s
```
### Export Reports
```bash
# Export as Markdown
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format markdown \
--output report.md
# Export as HTML
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format html \
--output report.html
# Export as JSON (machine-readable)
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format json \
--output report.json
```
## CI/CD Integration
### GitHub Actions Example
```yaml
name: Corpus Validation
on:
schedule:
- cron: '0 6 * * 1' # Weekly on Monday
jobs:
validate:
runs-on: ubuntu-latest
steps:
- name: Sync corpus sources
run: stella groundtruth sources sync
- name: Run validation
run: |
stella groundtruth validate run \
--matcher semantic-diffing \
--parallel 4 \
--output validation-${{ github.run_id }}.md
- name: Check metrics
run: |
MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate')
if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then
echo "Match rate below threshold: $MATCH_RATE%"
exit 1
fi
```
### GitLab CI Example
```yaml
corpus-validation:
stage: verify
script:
- stella groundtruth sources sync --source buildinfo-debian
- stella groundtruth validate run --pairs "openssl:*" --output report.md
artifacts:
paths:
- report.md
expire_in: 1 week
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
```
## Offline Usage
For air-gapped environments, use offline bundles:
```bash
# Export corpus for offline use
stella bundle export \
--include-corpus \
--output corpus-bundle-$(date +%F).tar.gz
# Import on air-gapped system
stella bundle import --package corpus-bundle-2026-01-22.tar.gz
# Run validation offline
stella groundtruth validate run --offline
```
## Troubleshooting
### Common Issues
**Sync fails with network error:**
```bash
# Check source status
stella groundtruth sources list
# Retry with verbose output
stella groundtruth sources sync --source debuginfod-ubuntu -v
```
**Symbol lookup returns no results:**
```bash
# Verify debug-id format (hex string)
stella groundtruth symbols lookup --debug-id abc123 -v
# Try searching by package instead
stella groundtruth symbols search --package libcrypto
```
**Validation metrics show low match rate:**
- Check that both vuln and patch binaries are present in corpus
- Verify symbol sources are synced and enabled
- Consider using `hybrid` matcher for complex cases
## See Also
- [CLI Command Reference](commands/reference.md#ground-truth-corpus-commands)
- [BinaryIndex Architecture](../../binary-index/architecture.md)
- [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md)
- [Air-Gap Bundle Guide](../../modules/airgap/README.md)