Files
git.stella-ops.org/docs/modules/cli/guides/ground-truth-cli.md
2026-01-22 19:08:46 +02:00

9.3 KiB

Ground-Truth Corpus CLI Guide

Sprint: SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli

Overview

The stella groundtruth command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources.

Use Cases

  • Security teams: Validate patch presence in production binaries
  • Compliance auditors: Generate evidence bundles for air-gapped verification
  • DevSecOps: Integrate corpus validation into CI/CD pipelines
  • Researchers: Query symbol databases for vulnerability analysis

Prerequisites

  • Stella CLI installed and configured
  • Backend connectivity to Platform service (or offline bundle)
  • For sync operations: network access to upstream sources

Command Structure

stella groundtruth
├── sources        # Manage symbol source connectors
│   ├── list       # List available connectors
│   ├── enable     # Enable a connector
│   ├── disable    # Disable a connector
│   └── sync       # Sync from upstream
├── symbols        # Query symbols in corpus
│   ├── lookup     # Lookup by debug ID
│   └── search     # Search by package/distro
├── pairs          # Manage security pairs
│   ├── create     # Create vuln/patch pair
│   ├── list       # List existing pairs
│   └── delete     # Remove a pair
└── validate       # Run validation harness
    ├── run        # Execute validation
    ├── metrics    # View run metrics
    └── export     # Export report

Source Connectors

The ground-truth corpus ingests data from multiple upstream sources:

Connector ID Distribution Data Type Description
debuginfod-fedora Fedora Debug symbols ELF debuginfo via debuginfod protocol
debuginfod-ubuntu Ubuntu Debug symbols ELF debuginfo via debuginfod protocol
ddeb-ubuntu Ubuntu Debug packages .ddeb debug symbol packages
buildinfo-debian Debian Build metadata .buildinfo reproducibility records
secdb-alpine Alpine Security DB secfixes YAML from APKBUILD

List Sources

stella groundtruth sources list

# Output:
ID                        Display Name              Status       Last Sync
------------------------------------------------------------------------------------------
debuginfod-fedora         Fedora Debuginfod         Enabled      2026-01-22T10:00:00Z
debuginfod-ubuntu         Ubuntu Debuginfod         Enabled      2026-01-22T10:00:00Z
ddeb-ubuntu               Ubuntu ddebs              Enabled      2026-01-22T09:30:00Z
buildinfo-debian          Debian Buildinfo          Enabled      2026-01-22T08:00:00Z
secdb-alpine              Alpine SecDB              Enabled      2026-01-22T06:00:00Z

Enable/Disable Sources

# Enable a source connector
stella groundtruth sources enable debuginfod-fedora

# Disable a source connector (stops future syncs)
stella groundtruth sources disable debuginfod-fedora

Sync Sources

# Incremental sync of all enabled sources
stella groundtruth sources sync

# Full sync of a specific source
stella groundtruth sources sync --source buildinfo-debian --full

# Sync with verbose output
stella groundtruth sources sync --source ddeb-ubuntu -v

Symbol Operations

Lookup by Debug ID

Query symbols using the ELF GNU Build-ID or equivalent identifier:

# Lookup by build-id
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a

# JSON output
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json

Example output:

Binary: libcrypto.so.3
Architecture: x86_64
Distribution: debian-bookworm
Package: openssl@3.0.11-1
Symbol Count: 4523
Sources: debuginfod-fedora, buildinfo-debian

Search Symbols

Search across the corpus by package name or distribution:

# Search by package
stella groundtruth symbols search --package openssl

# Filter by distribution
stella groundtruth symbols search --package openssl --distro debian

# Limit results
stella groundtruth symbols search --package curl --limit 100

Security Pairs

Security pairs link vulnerable and patched binary versions for a specific CVE.

Create a Pair

stella groundtruth pairs create \
  --cve CVE-2024-1234 \
  --vuln-pkg openssl=3.0.10-1 \
  --patch-pkg openssl=3.0.11-1 \
  --distro debian-bookworm

List Pairs

# List all pairs
stella groundtruth pairs list

# Filter by CVE pattern
stella groundtruth pairs list --cve "CVE-2024-*"

# Filter by package
stella groundtruth pairs list --package openssl --limit 50

# JSON output
stella groundtruth pairs list --output-format json

Example output:

Pair ID      CVE                Package      Vuln Version    Patch Version
-------------------------------------------------------------------------------
pair-001     CVE-2024-1234      openssl      3.0.10-1        3.0.11-1
pair-002     CVE-2024-5678      curl         8.4.0-1         8.5.0-1

Delete a Pair

# Delete with confirmation prompt
stella groundtruth pairs delete pair-001

# Skip confirmation
stella groundtruth pairs delete pair-001 --force

Validation Harness

The validation harness runs end-to-end verification against security pairs.

Run Validation

# Validate all pairs
stella groundtruth validate run

# Validate specific pairs (pattern match)
stella groundtruth validate run --pairs "openssl:CVE-2024-*"

# Use specific matcher
stella groundtruth validate run --matcher semantic-diffing

# Parallel validation with report output
stella groundtruth validate run \
  --pairs "curl:*" \
  --parallel 8 \
  --output validation-report.md

Matcher types:

Matcher Description
semantic-diffing IR-level semantic comparison (default)
hash-based Function hash matching
hybrid Combined semantic + hash approach

View Metrics

stella groundtruth validate metrics --run-id vr-20260122100532

# JSON output
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json

Example output:

Run ID: vr-20260122100532
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
Pairs: 48/50 successful
Function Match Rate: 94.2%
False-Negative Rate: 2.1%
SBOM Hash Stability: 3/3
Verify Time (p50/p95): 423ms / 1.2s

Export Reports

# Export as Markdown
stella groundtruth validate export \
  --run-id vr-20260122100532 \
  --format markdown \
  --output report.md

# Export as HTML
stella groundtruth validate export \
  --run-id vr-20260122100532 \
  --format html \
  --output report.html

# Export as JSON (machine-readable)
stella groundtruth validate export \
  --run-id vr-20260122100532 \
  --format json \
  --output report.json

CI/CD Integration

GitHub Actions Example

name: Corpus Validation
on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly on Monday

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Sync corpus sources
        run: stella groundtruth sources sync

      - name: Run validation
        run: |
          stella groundtruth validate run \
            --matcher semantic-diffing \
            --parallel 4 \
            --output validation-${{ github.run_id }}.md

      - name: Check metrics
        run: |
          MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate')
          if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then
            echo "Match rate below threshold: $MATCH_RATE%"
            exit 1
          fi

GitLab CI Example

corpus-validation:
  stage: verify
  script:
    - stella groundtruth sources sync --source buildinfo-debian
    - stella groundtruth validate run --pairs "openssl:*" --output report.md
  artifacts:
    paths:
      - report.md
    expire_in: 1 week
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

Offline Usage

For air-gapped environments, use offline bundles:

# Export corpus for offline use
stella bundle export \
  --include-corpus \
  --output corpus-bundle-$(date +%F).tar.gz

# Import on air-gapped system
stella bundle import --package corpus-bundle-2026-01-22.tar.gz

# Run validation offline
stella groundtruth validate run --offline

Troubleshooting

Common Issues

Sync fails with network error:

# Check source status
stella groundtruth sources list

# Retry with verbose output
stella groundtruth sources sync --source debuginfod-ubuntu -v

Symbol lookup returns no results:

# Verify debug-id format (hex string)
stella groundtruth symbols lookup --debug-id abc123 -v

# Try searching by package instead
stella groundtruth symbols search --package libcrypto

Validation metrics show low match rate:

  • Check that both vuln and patch binaries are present in corpus
  • Verify symbol sources are synced and enabled
  • Consider using hybrid matcher for complex cases

See Also