- Added IIssuerDirectory interface for managing VEX document issuers, including methods for registration, revocation, and trust validation. - Created InMemoryIssuerDirectory class as an in-memory implementation of IIssuerDirectory for testing and single-instance deployments. - Introduced ISignatureVerifier interface for verifying signatures on VEX documents, with support for multiple signature formats. - Developed SignatureVerifier class as the default implementation of ISignatureVerifier, allowing extensibility for different signature formats. - Implemented handlers for DSSE and JWS signature formats, including methods for verification and signature extraction. - Defined various records and enums for issuer and signature metadata, enhancing the structure and clarity of the verification process.
298 lines
9.2 KiB
Markdown
298 lines
9.2 KiB
Markdown
# VexLens Operations Runbook
|
|
|
|
> VexLens provides VEX consensus computation across multiple issuer sources. This runbook covers deployment, configuration, operations, and troubleshooting.
|
|
|
|
## 1. Service scope
|
|
|
|
VexLens computes deterministic consensus over VEX (Vulnerability Exploitability eXchange) statements from multiple issuers. Operations owns:
|
|
|
|
- Consensus engine scaling, projection storage, and event bus connectivity.
|
|
- Monitoring and alerts for consensus latency, conflict rates, and trust weight anomalies.
|
|
- Runbook execution for recovery, offline bundle import, and issuer trust management.
|
|
- Coordination with Policy Engine and Vuln Explorer consumers.
|
|
|
|
Related documentation:
|
|
|
|
- `docs/modules/vex-lens/README.md`
|
|
- `docs/modules/vex-lens/architecture.md`
|
|
- `docs/modules/vex-lens/implementation_plan.md`
|
|
- `docs/modules/vex-lens/runbooks/observability.md`
|
|
|
|
## 2. Contacts & tooling
|
|
|
|
| Area | Owner(s) | Escalation |
|
|
|------|----------|------------|
|
|
| VexLens service | VEX Lens Guild | `#vex-lens-ops`, on-call rotation |
|
|
| Issuer Directory | Issuer Directory Guild | `#issuer-directory` |
|
|
| Policy Engine integration | Policy Guild | `#policy-engine` |
|
|
| Offline Kit | Offline Kit Guild | `#offline-kit` |
|
|
|
|
Primary tooling:
|
|
|
|
- `stella vex consensus` CLI (query, export, verify).
|
|
- VexLens API (`/api/v1/vex/consensus/*`) for automation.
|
|
- Grafana dashboards (`VEX Lens / Consensus Health`, `VEX Lens / Conflicts`).
|
|
- Alertmanager routes (`VexLens.ConsensusLatency`, `VexLens.Conflicts`).
|
|
|
|
## 3. Configuration
|
|
|
|
### 3.1 Options reference
|
|
|
|
Configure via `vexlens.yaml` or environment variables with `VEXLENS_` prefix:
|
|
|
|
```yaml
|
|
VexLens:
|
|
Storage:
|
|
Driver: mongo # "memory" for testing, "mongo" for production
|
|
ConnectionString: "mongodb://..."
|
|
Database: stellaops
|
|
ProjectionsCollection: vex_consensus
|
|
HistoryCollection: vex_consensus_history
|
|
MaxHistoryEntries: 100
|
|
CommandTimeoutSeconds: 30
|
|
|
|
Trust:
|
|
AuthoritativeWeight: 1.0
|
|
TrustedWeight: 0.8
|
|
KnownWeight: 0.5
|
|
UnknownWeight: 0.3
|
|
UntrustedWeight: 0.1
|
|
SignedMultiplier: 1.2
|
|
FreshnessDecayDays: 30
|
|
MinFreshnessFactor: 0.5
|
|
JustifiedNotAffectedBoost: 1.1
|
|
FixedStatusBoost: 1.05
|
|
|
|
Consensus:
|
|
DefaultMode: WeightedVote # HighestWeight, WeightedVote, Lattice, AuthoritativeFirst
|
|
MinimumWeightThreshold: 0.1
|
|
ConflictThreshold: 0.3
|
|
RequireJustificationForNotAffected: false
|
|
MaxStatementsPerComputation: 100
|
|
EnableConflictDetection: true
|
|
EmitEvents: true
|
|
|
|
Normalization:
|
|
EnabledFormats:
|
|
- OpenVEX
|
|
- CSAF
|
|
- CycloneDX
|
|
StrictMode: false
|
|
MaxDocumentSizeBytes: 10485760 # 10 MB
|
|
MaxStatementsPerDocument: 10000
|
|
|
|
AirGap:
|
|
SealedMode: false
|
|
BundlePath: /var/lib/stellaops/vex-bundles
|
|
VerifyBundleSignatures: true
|
|
AllowedBundleSources: []
|
|
ExportFormat: jsonl
|
|
|
|
Telemetry:
|
|
MetricsEnabled: true
|
|
TracingEnabled: true
|
|
MeterName: StellaOps.VexLens
|
|
ActivitySourceName: StellaOps.VexLens
|
|
```
|
|
|
|
### 3.2 Environment variable overrides
|
|
|
|
```bash
|
|
VEXLENS_STORAGE__DRIVER=mongo
|
|
VEXLENS_STORAGE__CONNECTIONSTRING=mongodb://localhost:27017
|
|
VEXLENS_CONSENSUS__DEFAULTMODE=WeightedVote
|
|
VEXLENS_AIRGAP__SEALEDMODE=true
|
|
```
|
|
|
|
### 3.3 Consensus mode selection
|
|
|
|
| Mode | Use case |
|
|
|------|----------|
|
|
| HighestWeight | Single authoritative source preferred |
|
|
| WeightedVote | Democratic consensus from multiple sources |
|
|
| Lattice | Formal lattice join (most conservative) |
|
|
| AuthoritativeFirst | Short-circuit on authoritative issuer |
|
|
|
|
## 4. Monitoring & SLOs
|
|
|
|
Key metrics (exposed by VexLensMetrics):
|
|
|
|
| Metric | SLO / Alert | Notes |
|
|
|--------|-------------|-------|
|
|
| `vexlens.consensus.duration_seconds` | p95 < 500ms | Per-computation latency |
|
|
| `vexlens.consensus.conflicts_total` | Monitor trend | Conflicts by reason |
|
|
| `vexlens.consensus.confidence` | avg > 0.7 | Low confidence indicates issuer gaps |
|
|
| `vexlens.normalization.duration_seconds` | p95 < 200ms | Per-document normalization |
|
|
| `vexlens.normalization.errors_total` | Alert on spike | By format |
|
|
| `vexlens.trust.weight_value` | Distribution | Trust weight distribution |
|
|
| `vexlens.projection.query_duration_seconds` | p95 < 100ms | Projection lookups |
|
|
|
|
Dashboards must include:
|
|
|
|
- Consensus computation rate by mode and outcome.
|
|
- Conflict breakdown (status disagreement, weight tie, insufficient data).
|
|
- Trust weight distribution by issuer category.
|
|
- Normalization success/failure by VEX format.
|
|
- Projection query latency and throughput.
|
|
|
|
Alerts (Alertmanager):
|
|
|
|
- `VexLensConsensusLatencyHigh` - consensus duration p95 > 500ms for 5 minutes.
|
|
- `VexLensConflictSpike` - conflict rate increase > 50% in 10 minutes.
|
|
- `VexLensNormalizationFailures` - normalization error rate > 5% for 5 minutes.
|
|
- `VexLensLowConfidence` - average confidence < 0.5 for 10 minutes.
|
|
|
|
## 5. Routine operations
|
|
|
|
### 5.1 Daily checklist
|
|
|
|
- Review dashboard for consensus latency and conflict rates.
|
|
- Check normalization error logs for malformed VEX documents.
|
|
- Verify projection storage growth is within capacity thresholds.
|
|
- Review trust weight distribution for anomalies.
|
|
- Scan logs for `issuer_not_found` or `signature_verification_failed`.
|
|
|
|
### 5.2 Weekly tasks
|
|
|
|
- Review issuer directory for new registrations or revocations.
|
|
- Audit conflict queue for persistent disagreements.
|
|
- Test consensus determinism with sample documents.
|
|
- Verify Policy Engine and Vuln Explorer integrations are functional.
|
|
|
|
### 5.3 Monthly tasks
|
|
|
|
- Review and tune trust weights based on issuer performance.
|
|
- Archive old projection history beyond retention period.
|
|
- Update issuer trust tiers based on incident history.
|
|
- Test offline bundle import/export workflow.
|
|
|
|
## 6. Offline operations
|
|
|
|
### 6.1 Bundle export
|
|
|
|
```bash
|
|
# Export consensus projections to offline bundle
|
|
stella vex consensus export \
|
|
--format jsonl \
|
|
--output /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
|
|
--manifest /var/lib/stellaops/vex-bundles/manifest.json \
|
|
--sign
|
|
|
|
# Verify bundle integrity
|
|
stella vex consensus verify \
|
|
--bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
|
|
--manifest /var/lib/stellaops/vex-bundles/manifest.json
|
|
```
|
|
|
|
### 6.2 Bundle import (air-gapped)
|
|
|
|
```bash
|
|
# Enable sealed mode
|
|
export VEXLENS_AIRGAP__SEALEDMODE=true
|
|
export VEXLENS_AIRGAP__BUNDLEPATH=/var/lib/stellaops/vex-bundles
|
|
|
|
# Import bundle
|
|
stella vex consensus import \
|
|
--bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
|
|
--verify-signatures
|
|
|
|
# Verify import
|
|
stella vex consensus status
|
|
```
|
|
|
|
### 6.3 Air-gap verification
|
|
|
|
1. Confirm `VEXLENS_AIRGAP__SEALEDMODE=true` in environment.
|
|
2. Verify no external network calls in service logs.
|
|
3. Check bundle manifest hashes match imported data.
|
|
4. Run determinism check on imported projections.
|
|
|
|
## 7. Troubleshooting
|
|
|
|
### 7.1 High conflict rates
|
|
|
|
**Symptoms:** `vexlens.consensus.conflicts_total` spiking.
|
|
|
|
**Investigation:**
|
|
1. Check conflict breakdown by reason in dashboard.
|
|
2. Identify issuers with conflicting statements.
|
|
3. Review issuer trust tiers and weights.
|
|
|
|
**Resolution:**
|
|
- Adjust `ConflictThreshold` if legitimate disagreements.
|
|
- Update issuer trust tiers based on authority.
|
|
- Contact issuer owners to resolve source conflicts.
|
|
|
|
### 7.2 Normalization failures
|
|
|
|
**Symptoms:** `vexlens.normalization.errors_total` increasing.
|
|
|
|
**Investigation:**
|
|
1. Check error logs for specific format failures.
|
|
2. Identify malformed documents in input stream.
|
|
3. Validate document against format schema.
|
|
|
|
**Resolution:**
|
|
- Enable `StrictMode: false` for lenient parsing.
|
|
- Report malformed documents to source issuers.
|
|
- Update normalizers if format specification changed.
|
|
|
|
### 7.3 Low consensus confidence
|
|
|
|
**Symptoms:** Average confidence below 0.5.
|
|
|
|
**Investigation:**
|
|
1. Check issuer coverage for affected vulnerabilities.
|
|
2. Review trust weight distribution.
|
|
3. Identify missing or untrusted issuers.
|
|
|
|
**Resolution:**
|
|
- Register additional trusted issuers.
|
|
- Adjust trust tier assignments.
|
|
- Import offline bundles from authoritative sources.
|
|
|
|
### 7.4 Projection storage growth
|
|
|
|
**Symptoms:** Storage usage increasing beyond capacity.
|
|
|
|
**Investigation:**
|
|
1. Check `MaxHistoryEntries` setting.
|
|
2. Review projection count and history depth.
|
|
3. Identify high-churn vulnerability/product pairs.
|
|
|
|
**Resolution:**
|
|
- Reduce `MaxHistoryEntries`.
|
|
- Implement history pruning job.
|
|
- Archive old projections to cold storage.
|
|
|
|
## 8. Recovery procedures
|
|
|
|
### 8.1 Storage failover
|
|
|
|
1. Stop VexLens service instances.
|
|
2. Switch storage connection to replica.
|
|
3. Verify connectivity with health check.
|
|
4. Restart service instances.
|
|
5. Monitor for consensus recomputation.
|
|
|
|
### 8.2 Issuer directory sync
|
|
|
|
1. Export current issuer registry backup.
|
|
2. Resync from authoritative issuer directory source.
|
|
3. Verify issuer fingerprints and trust tiers.
|
|
4. Restart VexLens to reload issuer cache.
|
|
|
|
### 8.3 Consensus recomputation
|
|
|
|
1. Trigger recomputation for affected vulnerability/product pairs.
|
|
2. Monitor recomputation progress in logs.
|
|
3. Verify consensus outcomes match expected state.
|
|
4. Emit status change events if outcomes differ.
|
|
|
|
## 9. Evidence locations
|
|
|
|
- Sprint tracker: `docs/implplan/SPRINT_0129_0001_0001_policy_reasoning.md`
|
|
- Module docs: `docs/modules/vex-lens/`
|
|
- Source code: `src/VexLens/StellaOps.VexLens/`
|
|
- Dashboard stub: `docs/modules/vex-lens/runbooks/dashboards/vex-lens-observability.json`
|