Files
git.stella-ops.org/docs/modules/vex-lens/runbooks/operations.md
StellaOps Bot 5e514532df Implement VEX document verification system with issuer management and signature verification
- Added IIssuerDirectory interface for managing VEX document issuers, including methods for registration, revocation, and trust validation.
- Created InMemoryIssuerDirectory class as an in-memory implementation of IIssuerDirectory for testing and single-instance deployments.
- Introduced ISignatureVerifier interface for verifying signatures on VEX documents, with support for multiple signature formats.
- Developed SignatureVerifier class as the default implementation of ISignatureVerifier, allowing extensibility for different signature formats.
- Implemented handlers for DSSE and JWS signature formats, including methods for verification and signature extraction.
- Defined various records and enums for issuer and signature metadata, enhancing the structure and clarity of the verification process.
2025-12-06 13:41:22 +02:00

298 lines
9.2 KiB
Markdown

# VexLens Operations Runbook
> VexLens provides VEX consensus computation across multiple issuer sources. This runbook covers deployment, configuration, operations, and troubleshooting.
## 1. Service scope
VexLens computes deterministic consensus over VEX (Vulnerability Exploitability eXchange) statements from multiple issuers. Operations owns:
- Consensus engine scaling, projection storage, and event bus connectivity.
- Monitoring and alerts for consensus latency, conflict rates, and trust weight anomalies.
- Runbook execution for recovery, offline bundle import, and issuer trust management.
- Coordination with Policy Engine and Vuln Explorer consumers.
Related documentation:
- `docs/modules/vex-lens/README.md`
- `docs/modules/vex-lens/architecture.md`
- `docs/modules/vex-lens/implementation_plan.md`
- `docs/modules/vex-lens/runbooks/observability.md`
## 2. Contacts & tooling
| Area | Owner(s) | Escalation |
|------|----------|------------|
| VexLens service | VEX Lens Guild | `#vex-lens-ops`, on-call rotation |
| Issuer Directory | Issuer Directory Guild | `#issuer-directory` |
| Policy Engine integration | Policy Guild | `#policy-engine` |
| Offline Kit | Offline Kit Guild | `#offline-kit` |
Primary tooling:
- `stella vex consensus` CLI (query, export, verify).
- VexLens API (`/api/v1/vex/consensus/*`) for automation.
- Grafana dashboards (`VEX Lens / Consensus Health`, `VEX Lens / Conflicts`).
- Alertmanager routes (`VexLens.ConsensusLatency`, `VexLens.Conflicts`).
## 3. Configuration
### 3.1 Options reference
Configure via `vexlens.yaml` or environment variables with `VEXLENS_` prefix:
```yaml
VexLens:
Storage:
Driver: mongo # "memory" for testing, "mongo" for production
ConnectionString: "mongodb://..."
Database: stellaops
ProjectionsCollection: vex_consensus
HistoryCollection: vex_consensus_history
MaxHistoryEntries: 100
CommandTimeoutSeconds: 30
Trust:
AuthoritativeWeight: 1.0
TrustedWeight: 0.8
KnownWeight: 0.5
UnknownWeight: 0.3
UntrustedWeight: 0.1
SignedMultiplier: 1.2
FreshnessDecayDays: 30
MinFreshnessFactor: 0.5
JustifiedNotAffectedBoost: 1.1
FixedStatusBoost: 1.05
Consensus:
DefaultMode: WeightedVote # HighestWeight, WeightedVote, Lattice, AuthoritativeFirst
MinimumWeightThreshold: 0.1
ConflictThreshold: 0.3
RequireJustificationForNotAffected: false
MaxStatementsPerComputation: 100
EnableConflictDetection: true
EmitEvents: true
Normalization:
EnabledFormats:
- OpenVEX
- CSAF
- CycloneDX
StrictMode: false
MaxDocumentSizeBytes: 10485760 # 10 MB
MaxStatementsPerDocument: 10000
AirGap:
SealedMode: false
BundlePath: /var/lib/stellaops/vex-bundles
VerifyBundleSignatures: true
AllowedBundleSources: []
ExportFormat: jsonl
Telemetry:
MetricsEnabled: true
TracingEnabled: true
MeterName: StellaOps.VexLens
ActivitySourceName: StellaOps.VexLens
```
### 3.2 Environment variable overrides
```bash
VEXLENS_STORAGE__DRIVER=mongo
VEXLENS_STORAGE__CONNECTIONSTRING=mongodb://localhost:27017
VEXLENS_CONSENSUS__DEFAULTMODE=WeightedVote
VEXLENS_AIRGAP__SEALEDMODE=true
```
### 3.3 Consensus mode selection
| Mode | Use case |
|------|----------|
| HighestWeight | Single authoritative source preferred |
| WeightedVote | Democratic consensus from multiple sources |
| Lattice | Formal lattice join (most conservative) |
| AuthoritativeFirst | Short-circuit on authoritative issuer |
## 4. Monitoring & SLOs
Key metrics (exposed by VexLensMetrics):
| Metric | SLO / Alert | Notes |
|--------|-------------|-------|
| `vexlens.consensus.duration_seconds` | p95 < 500ms | Per-computation latency |
| `vexlens.consensus.conflicts_total` | Monitor trend | Conflicts by reason |
| `vexlens.consensus.confidence` | avg > 0.7 | Low confidence indicates issuer gaps |
| `vexlens.normalization.duration_seconds` | p95 < 200ms | Per-document normalization |
| `vexlens.normalization.errors_total` | Alert on spike | By format |
| `vexlens.trust.weight_value` | Distribution | Trust weight distribution |
| `vexlens.projection.query_duration_seconds` | p95 < 100ms | Projection lookups |
Dashboards must include:
- Consensus computation rate by mode and outcome.
- Conflict breakdown (status disagreement, weight tie, insufficient data).
- Trust weight distribution by issuer category.
- Normalization success/failure by VEX format.
- Projection query latency and throughput.
Alerts (Alertmanager):
- `VexLensConsensusLatencyHigh` - consensus duration p95 > 500ms for 5 minutes.
- `VexLensConflictSpike` - conflict rate increase > 50% in 10 minutes.
- `VexLensNormalizationFailures` - normalization error rate > 5% for 5 minutes.
- `VexLensLowConfidence` - average confidence < 0.5 for 10 minutes.
## 5. Routine operations
### 5.1 Daily checklist
- Review dashboard for consensus latency and conflict rates.
- Check normalization error logs for malformed VEX documents.
- Verify projection storage growth is within capacity thresholds.
- Review trust weight distribution for anomalies.
- Scan logs for `issuer_not_found` or `signature_verification_failed`.
### 5.2 Weekly tasks
- Review issuer directory for new registrations or revocations.
- Audit conflict queue for persistent disagreements.
- Test consensus determinism with sample documents.
- Verify Policy Engine and Vuln Explorer integrations are functional.
### 5.3 Monthly tasks
- Review and tune trust weights based on issuer performance.
- Archive old projection history beyond retention period.
- Update issuer trust tiers based on incident history.
- Test offline bundle import/export workflow.
## 6. Offline operations
### 6.1 Bundle export
```bash
# Export consensus projections to offline bundle
stella vex consensus export \
--format jsonl \
--output /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
--manifest /var/lib/stellaops/vex-bundles/manifest.json \
--sign
# Verify bundle integrity
stella vex consensus verify \
--bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
--manifest /var/lib/stellaops/vex-bundles/manifest.json
```
### 6.2 Bundle import (air-gapped)
```bash
# Enable sealed mode
export VEXLENS_AIRGAP__SEALEDMODE=true
export VEXLENS_AIRGAP__BUNDLEPATH=/var/lib/stellaops/vex-bundles
# Import bundle
stella vex consensus import \
--bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
--verify-signatures
# Verify import
stella vex consensus status
```
### 6.3 Air-gap verification
1. Confirm `VEXLENS_AIRGAP__SEALEDMODE=true` in environment.
2. Verify no external network calls in service logs.
3. Check bundle manifest hashes match imported data.
4. Run determinism check on imported projections.
## 7. Troubleshooting
### 7.1 High conflict rates
**Symptoms:** `vexlens.consensus.conflicts_total` spiking.
**Investigation:**
1. Check conflict breakdown by reason in dashboard.
2. Identify issuers with conflicting statements.
3. Review issuer trust tiers and weights.
**Resolution:**
- Adjust `ConflictThreshold` if legitimate disagreements.
- Update issuer trust tiers based on authority.
- Contact issuer owners to resolve source conflicts.
### 7.2 Normalization failures
**Symptoms:** `vexlens.normalization.errors_total` increasing.
**Investigation:**
1. Check error logs for specific format failures.
2. Identify malformed documents in input stream.
3. Validate document against format schema.
**Resolution:**
- Enable `StrictMode: false` for lenient parsing.
- Report malformed documents to source issuers.
- Update normalizers if format specification changed.
### 7.3 Low consensus confidence
**Symptoms:** Average confidence below 0.5.
**Investigation:**
1. Check issuer coverage for affected vulnerabilities.
2. Review trust weight distribution.
3. Identify missing or untrusted issuers.
**Resolution:**
- Register additional trusted issuers.
- Adjust trust tier assignments.
- Import offline bundles from authoritative sources.
### 7.4 Projection storage growth
**Symptoms:** Storage usage increasing beyond capacity.
**Investigation:**
1. Check `MaxHistoryEntries` setting.
2. Review projection count and history depth.
3. Identify high-churn vulnerability/product pairs.
**Resolution:**
- Reduce `MaxHistoryEntries`.
- Implement history pruning job.
- Archive old projections to cold storage.
## 8. Recovery procedures
### 8.1 Storage failover
1. Stop VexLens service instances.
2. Switch storage connection to replica.
3. Verify connectivity with health check.
4. Restart service instances.
5. Monitor for consensus recomputation.
### 8.2 Issuer directory sync
1. Export current issuer registry backup.
2. Resync from authoritative issuer directory source.
3. Verify issuer fingerprints and trust tiers.
4. Restart VexLens to reload issuer cache.
### 8.3 Consensus recomputation
1. Trigger recomputation for affected vulnerability/product pairs.
2. Monitor recomputation progress in logs.
3. Verify consensus outcomes match expected state.
4. Emit status change events if outcomes differ.
## 9. Evidence locations
- Sprint tracker: `docs/implplan/SPRINT_0129_0001_0001_policy_reasoning.md`
- Module docs: `docs/modules/vex-lens/`
- Source code: `src/VexLens/StellaOps.VexLens/`
- Dashboard stub: `docs/modules/vex-lens/runbooks/dashboards/vex-lens-observability.json`