9.3 KiB
Golden Corpus Operations Runbook
Sprint: SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification Task: GCB-006 - Document corpus folder layout and maintenance procedures
Overview
This runbook provides operational procedures for the golden corpus infrastructure, including troubleshooting, incident response, and common maintenance tasks.
Quick Reference
| Task | Command |
|---|---|
| Check corpus health | stella doctor --check "check.binaryanalysis.corpus.*" |
| Run validation | stella groundtruth validate run --output results.json |
| Check regression | stella groundtruth validate check --results results.json --baseline current.json |
| Update baseline | stella groundtruth baseline update --from-results results.json --output current.json |
| Export bundle | stella groundtruth bundle export --packages openssl --distros debian --output bundle.tar.gz |
| Verify bundle | stella groundtruth bundle import --input bundle.tar.gz --verify |
Troubleshooting
Mirror Sync Failures
Symptoms
- Doctor check
check.binaryanalysis.corpus.mirror.freshnessfails - Validation runs fail with "source not found" errors
- Alerts for stale mirrors
Diagnosis
# Check mirror last sync times
ls -la /data/golden-corpus/mirrors/*/.last-sync
# Check sync logs
tail -100 /var/log/corpus/debian-sync.log
tail -100 /var/log/corpus/ubuntu-sync.log
tail -100 /var/log/corpus/osv-sync.log
# Test connectivity
curl -I https://snapshot.debian.org/
curl -I https://buildinfos.debian.net/
curl -I https://ubuntu.com/security/notices.json
Resolution
-
Network connectivity issues
# Check firewall rules iptables -L -n | grep -E "80|443" # Check DNS resolution nslookup snapshot.debian.org # Test with proxy if applicable export https_proxy=http://proxy:3128 curl -I https://snapshot.debian.org/ -
Upstream service unavailable
- Check upstream service status
- Wait and retry (services may be temporarily unavailable)
- Switch to backup mirror if available
-
Disk space issues
# Check disk usage df -h /data/golden-corpus # Clean up old archives /opt/golden-corpus/scripts/archive-old-results.sh -
Permission issues
# Check file ownership ls -la /data/golden-corpus/mirrors/ # Fix permissions chown -R corpus:corpus /data/golden-corpus/mirrors/ chmod -R 755 /data/golden-corpus/mirrors/
Validation Failures
Symptoms
- CI pipeline fails on regression check
- Validation run exits with non-zero code
- Lower than expected KPI metrics
Diagnosis
# Check latest validation results
stella groundtruth validate metrics --run-id latest --detailed
# Compare with baseline
stella groundtruth validate check \
--results bench/results/latest.json \
--baseline bench/baselines/current.json \
--verbose
# Review specific failures
jq '.failedPairs[]' bench/results/latest.json
Resolution
-
True regression (algorithm degradation)
- Review recent code changes
- Identify the causing commit
- Either fix the regression or update baseline if intentional
-
False positive (ground truth incorrect)
# Review ground truth for specific pair cat corpus/debian/openssl/DSA-5678-1/metadata/ground-truth.json # Update ground truth if incorrect # (Requires manual review by security team) -
Infrastructure issues
- Check if build environment is consistent
- Verify debug symbols are available
- Check Ghidra/BSim connectivity
-
Baseline drift
- If corpus was significantly updated, baseline may need refresh
- Run full validation and update baseline following procedures
Bundle Verification Failures
Symptoms
stella groundtruth bundle import --verifyfails- Signature verification errors
- Timestamp validation errors
Diagnosis
# Verbose verification
stella groundtruth bundle import \
--input bundle.tar.gz \
--verify \
--verbose \
--output report.json
# Check specific failures
jq '.signatureResult, .timestampResult, .digestResult' report.json
Resolution
-
Signature verification failure
# Check trusted keys cat /etc/stellaops/trusted-keys.pub # Verify key hasn't expired openssl x509 -in /etc/stellaops/trusted-keys.pub -noout -dates # Check if bundle was signed with different key # May need to add signing key to trusted keys -
Timestamp verification failure
- Check TSA certificate validity
- Verify system clock is accurate
- Check if timestamp is within validity window
-
Digest mismatch
- Bundle may be corrupted during transfer
- Re-download or re-generate the bundle
- Check for partial transfers
Baseline Not Found
Symptoms
- Doctor check
check.binaryanalysis.corpus.kpi.baselinefails - Regression check errors with "baseline not found"
Resolution
# Check baseline path
ls -la bench/baselines/current.json
# If missing, create from latest results
stella groundtruth baseline update \
--from-results bench/results/latest.json \
--output bench/baselines/current.json \
--description "Initial baseline"
# Or restore from archive
ls bench/baselines/archive/
cp bench/baselines/archive/baseline-20260115.json \
bench/baselines/current.json
Debuginfod Connectivity Issues
Symptoms
- Doctor check
check.binaryanalysis.debuginfod.availabilityfails - Missing debug symbols during validation
Diagnosis
# Check DEBUGINFOD_URLS environment
echo $DEBUGINFOD_URLS
# Test debuginfod connectivity
curl -I "https://debuginfod.fedoraproject.org/buildid/xyz/debuginfo"
curl -I "https://debuginfod.ubuntu.com/buildid/xyz/debuginfo"
Resolution
-
Configure DEBUGINFOD_URLS
export DEBUGINFOD_URLS="https://debuginfod.fedoraproject.org/ https://debuginfod.ubuntu.com/" -
Use local fallback
- Enable local debug symbol cache
- Sync ddeb packages for Ubuntu
- Download debug packages from archives
Incident Response
KPI Regression Detected in Production
Severity: High Response Time: 4 hours
-
Acknowledge and assess
# Get current status stella groundtruth validate check \ --results bench/results/latest.json \ --baseline bench/baselines/current.json -
Identify root cause
- Check recent code changes
- Review validation logs
- Compare with previous runs
-
Mitigate
- If code regression: revert the change
- If ground truth issue: fix ground truth
- If infrastructure issue: fix and re-run
-
Verify fix
# Re-run validation stella groundtruth validate run --output results-fix.json # Verify regression is fixed stella groundtruth validate check \ --results results-fix.json \ --baseline bench/baselines/current.json -
Post-incident
- Document in incident log
- Update runbook if new issue type
- Consider adding monitoring/alerting
Mirror Corruption Detected
Severity: Medium Response Time: 24 hours
-
Identify corrupted files
# Check file integrity find /data/golden-corpus/mirrors -name "*.deb" -exec dpkg-deb --info {} \; 2>&1 | grep -i error -
Remove corrupted files
# Move corrupted files to quarantine mkdir -p /data/golden-corpus/quarantine mv /data/golden-corpus/mirrors/debian/path/to/corrupted.deb \ /data/golden-corpus/quarantine/ -
Re-sync affected mirror
/opt/golden-corpus/scripts/sync-debian-mirrors.sh -
Verify fix
stella doctor --check check.binaryanalysis.corpus.mirror.freshness
Disk Space Critical
Severity: High Response Time: 1 hour
-
Check usage
df -h /data/golden-corpus du -sh /data/golden-corpus/* -
Quick cleanup
# Archive old results /opt/golden-corpus/scripts/archive-old-results.sh # Prune old baselines /opt/golden-corpus/scripts/prune-baselines.sh # Remove old evidence bundles find /data/golden-corpus/evidence -name "*.tar.gz" -mtime +90 -delete -
Expand storage if needed
- Request additional storage
- Mount new volume
- Migrate data if necessary
Scheduled Maintenance
Weekly Tasks
- Review Doctor health checks
- Check mirror freshness alerts
- Review validation results trends
- Archive old results
Monthly Tasks
- Generate compliance evidence bundles
- Review and update ground truth annotations
- Prune old baselines (keep last 10)
- Review storage usage trends
Quarterly Tasks
- Full corpus validation (not just seed)
- Review and update documentation
- Test disaster recovery procedures
- Review access permissions
Contact Information
| Role | Contact | Escalation |
|---|---|---|
| Corpus Owner | corpus-team@stella-ops.org | 1st |
| BinaryIndex Guild | binaryindex@stella-ops.org | 2nd |
| Platform On-Call | oncall@stella-ops.org | 3rd |