- Added IIssuerDirectory interface for managing VEX document issuers, including methods for registration, revocation, and trust validation. - Created InMemoryIssuerDirectory class as an in-memory implementation of IIssuerDirectory for testing and single-instance deployments. - Introduced ISignatureVerifier interface for verifying signatures on VEX documents, with support for multiple signature formats. - Developed SignatureVerifier class as the default implementation of ISignatureVerifier, allowing extensibility for different signature formats. - Implemented handlers for DSSE and JWS signature formats, including methods for verification and signature extraction. - Defined various records and enums for issuer and signature metadata, enhancing the structure and clarity of the verification process.
9.2 KiB
VexLens Operations Runbook
VexLens provides VEX consensus computation across multiple issuer sources. This runbook covers deployment, configuration, operations, and troubleshooting.
1. Service scope
VexLens computes deterministic consensus over VEX (Vulnerability Exploitability eXchange) statements from multiple issuers. Operations owns:
- Consensus engine scaling, projection storage, and event bus connectivity.
- Monitoring and alerts for consensus latency, conflict rates, and trust weight anomalies.
- Runbook execution for recovery, offline bundle import, and issuer trust management.
- Coordination with Policy Engine and Vuln Explorer consumers.
Related documentation:
docs/modules/vex-lens/README.mddocs/modules/vex-lens/architecture.mddocs/modules/vex-lens/implementation_plan.mddocs/modules/vex-lens/runbooks/observability.md
2. Contacts & tooling
| Area | Owner(s) | Escalation |
|---|---|---|
| VexLens service | VEX Lens Guild | #vex-lens-ops, on-call rotation |
| Issuer Directory | Issuer Directory Guild | #issuer-directory |
| Policy Engine integration | Policy Guild | #policy-engine |
| Offline Kit | Offline Kit Guild | #offline-kit |
Primary tooling:
stella vex consensusCLI (query, export, verify).- VexLens API (
/api/v1/vex/consensus/*) for automation. - Grafana dashboards (
VEX Lens / Consensus Health,VEX Lens / Conflicts). - Alertmanager routes (
VexLens.ConsensusLatency,VexLens.Conflicts).
3. Configuration
3.1 Options reference
Configure via vexlens.yaml or environment variables with VEXLENS_ prefix:
VexLens:
Storage:
Driver: mongo # "memory" for testing, "mongo" for production
ConnectionString: "mongodb://..."
Database: stellaops
ProjectionsCollection: vex_consensus
HistoryCollection: vex_consensus_history
MaxHistoryEntries: 100
CommandTimeoutSeconds: 30
Trust:
AuthoritativeWeight: 1.0
TrustedWeight: 0.8
KnownWeight: 0.5
UnknownWeight: 0.3
UntrustedWeight: 0.1
SignedMultiplier: 1.2
FreshnessDecayDays: 30
MinFreshnessFactor: 0.5
JustifiedNotAffectedBoost: 1.1
FixedStatusBoost: 1.05
Consensus:
DefaultMode: WeightedVote # HighestWeight, WeightedVote, Lattice, AuthoritativeFirst
MinimumWeightThreshold: 0.1
ConflictThreshold: 0.3
RequireJustificationForNotAffected: false
MaxStatementsPerComputation: 100
EnableConflictDetection: true
EmitEvents: true
Normalization:
EnabledFormats:
- OpenVEX
- CSAF
- CycloneDX
StrictMode: false
MaxDocumentSizeBytes: 10485760 # 10 MB
MaxStatementsPerDocument: 10000
AirGap:
SealedMode: false
BundlePath: /var/lib/stellaops/vex-bundles
VerifyBundleSignatures: true
AllowedBundleSources: []
ExportFormat: jsonl
Telemetry:
MetricsEnabled: true
TracingEnabled: true
MeterName: StellaOps.VexLens
ActivitySourceName: StellaOps.VexLens
3.2 Environment variable overrides
VEXLENS_STORAGE__DRIVER=mongo
VEXLENS_STORAGE__CONNECTIONSTRING=mongodb://localhost:27017
VEXLENS_CONSENSUS__DEFAULTMODE=WeightedVote
VEXLENS_AIRGAP__SEALEDMODE=true
3.3 Consensus mode selection
| Mode | Use case |
|---|---|
| HighestWeight | Single authoritative source preferred |
| WeightedVote | Democratic consensus from multiple sources |
| Lattice | Formal lattice join (most conservative) |
| AuthoritativeFirst | Short-circuit on authoritative issuer |
4. Monitoring & SLOs
Key metrics (exposed by VexLensMetrics):
| Metric | SLO / Alert | Notes |
|---|---|---|
vexlens.consensus.duration_seconds |
p95 < 500ms | Per-computation latency |
vexlens.consensus.conflicts_total |
Monitor trend | Conflicts by reason |
vexlens.consensus.confidence |
avg > 0.7 | Low confidence indicates issuer gaps |
vexlens.normalization.duration_seconds |
p95 < 200ms | Per-document normalization |
vexlens.normalization.errors_total |
Alert on spike | By format |
vexlens.trust.weight_value |
Distribution | Trust weight distribution |
vexlens.projection.query_duration_seconds |
p95 < 100ms | Projection lookups |
Dashboards must include:
- Consensus computation rate by mode and outcome.
- Conflict breakdown (status disagreement, weight tie, insufficient data).
- Trust weight distribution by issuer category.
- Normalization success/failure by VEX format.
- Projection query latency and throughput.
Alerts (Alertmanager):
VexLensConsensusLatencyHigh- consensus duration p95 > 500ms for 5 minutes.VexLensConflictSpike- conflict rate increase > 50% in 10 minutes.VexLensNormalizationFailures- normalization error rate > 5% for 5 minutes.VexLensLowConfidence- average confidence < 0.5 for 10 minutes.
5. Routine operations
5.1 Daily checklist
- Review dashboard for consensus latency and conflict rates.
- Check normalization error logs for malformed VEX documents.
- Verify projection storage growth is within capacity thresholds.
- Review trust weight distribution for anomalies.
- Scan logs for
issuer_not_foundorsignature_verification_failed.
5.2 Weekly tasks
- Review issuer directory for new registrations or revocations.
- Audit conflict queue for persistent disagreements.
- Test consensus determinism with sample documents.
- Verify Policy Engine and Vuln Explorer integrations are functional.
5.3 Monthly tasks
- Review and tune trust weights based on issuer performance.
- Archive old projection history beyond retention period.
- Update issuer trust tiers based on incident history.
- Test offline bundle import/export workflow.
6. Offline operations
6.1 Bundle export
# Export consensus projections to offline bundle
stella vex consensus export \
--format jsonl \
--output /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
--manifest /var/lib/stellaops/vex-bundles/manifest.json \
--sign
# Verify bundle integrity
stella vex consensus verify \
--bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
--manifest /var/lib/stellaops/vex-bundles/manifest.json
6.2 Bundle import (air-gapped)
# Enable sealed mode
export VEXLENS_AIRGAP__SEALEDMODE=true
export VEXLENS_AIRGAP__BUNDLEPATH=/var/lib/stellaops/vex-bundles
# Import bundle
stella vex consensus import \
--bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
--verify-signatures
# Verify import
stella vex consensus status
6.3 Air-gap verification
- Confirm
VEXLENS_AIRGAP__SEALEDMODE=truein environment. - Verify no external network calls in service logs.
- Check bundle manifest hashes match imported data.
- Run determinism check on imported projections.
7. Troubleshooting
7.1 High conflict rates
Symptoms: vexlens.consensus.conflicts_total spiking.
Investigation:
- Check conflict breakdown by reason in dashboard.
- Identify issuers with conflicting statements.
- Review issuer trust tiers and weights.
Resolution:
- Adjust
ConflictThresholdif legitimate disagreements. - Update issuer trust tiers based on authority.
- Contact issuer owners to resolve source conflicts.
7.2 Normalization failures
Symptoms: vexlens.normalization.errors_total increasing.
Investigation:
- Check error logs for specific format failures.
- Identify malformed documents in input stream.
- Validate document against format schema.
Resolution:
- Enable
StrictMode: falsefor lenient parsing. - Report malformed documents to source issuers.
- Update normalizers if format specification changed.
7.3 Low consensus confidence
Symptoms: Average confidence below 0.5.
Investigation:
- Check issuer coverage for affected vulnerabilities.
- Review trust weight distribution.
- Identify missing or untrusted issuers.
Resolution:
- Register additional trusted issuers.
- Adjust trust tier assignments.
- Import offline bundles from authoritative sources.
7.4 Projection storage growth
Symptoms: Storage usage increasing beyond capacity.
Investigation:
- Check
MaxHistoryEntriessetting. - Review projection count and history depth.
- Identify high-churn vulnerability/product pairs.
Resolution:
- Reduce
MaxHistoryEntries. - Implement history pruning job.
- Archive old projections to cold storage.
8. Recovery procedures
8.1 Storage failover
- Stop VexLens service instances.
- Switch storage connection to replica.
- Verify connectivity with health check.
- Restart service instances.
- Monitor for consensus recomputation.
8.2 Issuer directory sync
- Export current issuer registry backup.
- Resync from authoritative issuer directory source.
- Verify issuer fingerprints and trust tiers.
- Restart VexLens to reload issuer cache.
8.3 Consensus recomputation
- Trigger recomputation for affected vulnerability/product pairs.
- Monitor recomputation progress in logs.
- Verify consensus outcomes match expected state.
- Emit status change events if outcomes differ.
9. Evidence locations
- Sprint tracker:
docs/implplan/SPRINT_0129_0001_0001_policy_reasoning.md - Module docs:
docs/modules/vex-lens/ - Source code:
src/VexLens/StellaOps.VexLens/ - Dashboard stub:
docs/modules/vex-lens/runbooks/dashboards/vex-lens-observability.json