Files
git.stella-ops.org/docs/modules/vex-lens/runbooks/operations.md
StellaOps Bot 5e514532df Implement VEX document verification system with issuer management and signature verification
- Added IIssuerDirectory interface for managing VEX document issuers, including methods for registration, revocation, and trust validation.
- Created InMemoryIssuerDirectory class as an in-memory implementation of IIssuerDirectory for testing and single-instance deployments.
- Introduced ISignatureVerifier interface for verifying signatures on VEX documents, with support for multiple signature formats.
- Developed SignatureVerifier class as the default implementation of ISignatureVerifier, allowing extensibility for different signature formats.
- Implemented handlers for DSSE and JWS signature formats, including methods for verification and signature extraction.
- Defined various records and enums for issuer and signature metadata, enhancing the structure and clarity of the verification process.
2025-12-06 13:41:22 +02:00

9.2 KiB

VexLens Operations Runbook

VexLens provides VEX consensus computation across multiple issuer sources. This runbook covers deployment, configuration, operations, and troubleshooting.

1. Service scope

VexLens computes deterministic consensus over VEX (Vulnerability Exploitability eXchange) statements from multiple issuers. Operations owns:

  • Consensus engine scaling, projection storage, and event bus connectivity.
  • Monitoring and alerts for consensus latency, conflict rates, and trust weight anomalies.
  • Runbook execution for recovery, offline bundle import, and issuer trust management.
  • Coordination with Policy Engine and Vuln Explorer consumers.

Related documentation:

  • docs/modules/vex-lens/README.md
  • docs/modules/vex-lens/architecture.md
  • docs/modules/vex-lens/implementation_plan.md
  • docs/modules/vex-lens/runbooks/observability.md

2. Contacts & tooling

Area Owner(s) Escalation
VexLens service VEX Lens Guild #vex-lens-ops, on-call rotation
Issuer Directory Issuer Directory Guild #issuer-directory
Policy Engine integration Policy Guild #policy-engine
Offline Kit Offline Kit Guild #offline-kit

Primary tooling:

  • stella vex consensus CLI (query, export, verify).
  • VexLens API (/api/v1/vex/consensus/*) for automation.
  • Grafana dashboards (VEX Lens / Consensus Health, VEX Lens / Conflicts).
  • Alertmanager routes (VexLens.ConsensusLatency, VexLens.Conflicts).

3. Configuration

3.1 Options reference

Configure via vexlens.yaml or environment variables with VEXLENS_ prefix:

VexLens:
  Storage:
    Driver: mongo              # "memory" for testing, "mongo" for production
    ConnectionString: "mongodb://..."
    Database: stellaops
    ProjectionsCollection: vex_consensus
    HistoryCollection: vex_consensus_history
    MaxHistoryEntries: 100
    CommandTimeoutSeconds: 30

  Trust:
    AuthoritativeWeight: 1.0
    TrustedWeight: 0.8
    KnownWeight: 0.5
    UnknownWeight: 0.3
    UntrustedWeight: 0.1
    SignedMultiplier: 1.2
    FreshnessDecayDays: 30
    MinFreshnessFactor: 0.5
    JustifiedNotAffectedBoost: 1.1
    FixedStatusBoost: 1.05

  Consensus:
    DefaultMode: WeightedVote  # HighestWeight, WeightedVote, Lattice, AuthoritativeFirst
    MinimumWeightThreshold: 0.1
    ConflictThreshold: 0.3
    RequireJustificationForNotAffected: false
    MaxStatementsPerComputation: 100
    EnableConflictDetection: true
    EmitEvents: true

  Normalization:
    EnabledFormats:
      - OpenVEX
      - CSAF
      - CycloneDX
    StrictMode: false
    MaxDocumentSizeBytes: 10485760   # 10 MB
    MaxStatementsPerDocument: 10000

  AirGap:
    SealedMode: false
    BundlePath: /var/lib/stellaops/vex-bundles
    VerifyBundleSignatures: true
    AllowedBundleSources: []
    ExportFormat: jsonl

  Telemetry:
    MetricsEnabled: true
    TracingEnabled: true
    MeterName: StellaOps.VexLens
    ActivitySourceName: StellaOps.VexLens

3.2 Environment variable overrides

VEXLENS_STORAGE__DRIVER=mongo
VEXLENS_STORAGE__CONNECTIONSTRING=mongodb://localhost:27017
VEXLENS_CONSENSUS__DEFAULTMODE=WeightedVote
VEXLENS_AIRGAP__SEALEDMODE=true

3.3 Consensus mode selection

Mode Use case
HighestWeight Single authoritative source preferred
WeightedVote Democratic consensus from multiple sources
Lattice Formal lattice join (most conservative)
AuthoritativeFirst Short-circuit on authoritative issuer

4. Monitoring & SLOs

Key metrics (exposed by VexLensMetrics):

Metric SLO / Alert Notes
vexlens.consensus.duration_seconds p95 < 500ms Per-computation latency
vexlens.consensus.conflicts_total Monitor trend Conflicts by reason
vexlens.consensus.confidence avg > 0.7 Low confidence indicates issuer gaps
vexlens.normalization.duration_seconds p95 < 200ms Per-document normalization
vexlens.normalization.errors_total Alert on spike By format
vexlens.trust.weight_value Distribution Trust weight distribution
vexlens.projection.query_duration_seconds p95 < 100ms Projection lookups

Dashboards must include:

  • Consensus computation rate by mode and outcome.
  • Conflict breakdown (status disagreement, weight tie, insufficient data).
  • Trust weight distribution by issuer category.
  • Normalization success/failure by VEX format.
  • Projection query latency and throughput.

Alerts (Alertmanager):

  • VexLensConsensusLatencyHigh - consensus duration p95 > 500ms for 5 minutes.
  • VexLensConflictSpike - conflict rate increase > 50% in 10 minutes.
  • VexLensNormalizationFailures - normalization error rate > 5% for 5 minutes.
  • VexLensLowConfidence - average confidence < 0.5 for 10 minutes.

5. Routine operations

5.1 Daily checklist

  • Review dashboard for consensus latency and conflict rates.
  • Check normalization error logs for malformed VEX documents.
  • Verify projection storage growth is within capacity thresholds.
  • Review trust weight distribution for anomalies.
  • Scan logs for issuer_not_found or signature_verification_failed.

5.2 Weekly tasks

  • Review issuer directory for new registrations or revocations.
  • Audit conflict queue for persistent disagreements.
  • Test consensus determinism with sample documents.
  • Verify Policy Engine and Vuln Explorer integrations are functional.

5.3 Monthly tasks

  • Review and tune trust weights based on issuer performance.
  • Archive old projection history beyond retention period.
  • Update issuer trust tiers based on incident history.
  • Test offline bundle import/export workflow.

6. Offline operations

6.1 Bundle export

# Export consensus projections to offline bundle
stella vex consensus export \
  --format jsonl \
  --output /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
  --manifest /var/lib/stellaops/vex-bundles/manifest.json \
  --sign

# Verify bundle integrity
stella vex consensus verify \
  --bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
  --manifest /var/lib/stellaops/vex-bundles/manifest.json

6.2 Bundle import (air-gapped)

# Enable sealed mode
export VEXLENS_AIRGAP__SEALEDMODE=true
export VEXLENS_AIRGAP__BUNDLEPATH=/var/lib/stellaops/vex-bundles

# Import bundle
stella vex consensus import \
  --bundle /var/lib/stellaops/vex-bundles/consensus-2025-01.jsonl \
  --verify-signatures

# Verify import
stella vex consensus status

6.3 Air-gap verification

  1. Confirm VEXLENS_AIRGAP__SEALEDMODE=true in environment.
  2. Verify no external network calls in service logs.
  3. Check bundle manifest hashes match imported data.
  4. Run determinism check on imported projections.

7. Troubleshooting

7.1 High conflict rates

Symptoms: vexlens.consensus.conflicts_total spiking.

Investigation:

  1. Check conflict breakdown by reason in dashboard.
  2. Identify issuers with conflicting statements.
  3. Review issuer trust tiers and weights.

Resolution:

  • Adjust ConflictThreshold if legitimate disagreements.
  • Update issuer trust tiers based on authority.
  • Contact issuer owners to resolve source conflicts.

7.2 Normalization failures

Symptoms: vexlens.normalization.errors_total increasing.

Investigation:

  1. Check error logs for specific format failures.
  2. Identify malformed documents in input stream.
  3. Validate document against format schema.

Resolution:

  • Enable StrictMode: false for lenient parsing.
  • Report malformed documents to source issuers.
  • Update normalizers if format specification changed.

7.3 Low consensus confidence

Symptoms: Average confidence below 0.5.

Investigation:

  1. Check issuer coverage for affected vulnerabilities.
  2. Review trust weight distribution.
  3. Identify missing or untrusted issuers.

Resolution:

  • Register additional trusted issuers.
  • Adjust trust tier assignments.
  • Import offline bundles from authoritative sources.

7.4 Projection storage growth

Symptoms: Storage usage increasing beyond capacity.

Investigation:

  1. Check MaxHistoryEntries setting.
  2. Review projection count and history depth.
  3. Identify high-churn vulnerability/product pairs.

Resolution:

  • Reduce MaxHistoryEntries.
  • Implement history pruning job.
  • Archive old projections to cold storage.

8. Recovery procedures

8.1 Storage failover

  1. Stop VexLens service instances.
  2. Switch storage connection to replica.
  3. Verify connectivity with health check.
  4. Restart service instances.
  5. Monitor for consensus recomputation.

8.2 Issuer directory sync

  1. Export current issuer registry backup.
  2. Resync from authoritative issuer directory source.
  3. Verify issuer fingerprints and trust tiers.
  4. Restart VexLens to reload issuer cache.

8.3 Consensus recomputation

  1. Trigger recomputation for affected vulnerability/product pairs.
  2. Monitor recomputation progress in logs.
  3. Verify consensus outcomes match expected state.
  4. Emit status change events if outcomes differ.

9. Evidence locations

  • Sprint tracker: docs/implplan/SPRINT_0129_0001_0001_policy_reasoning.md
  • Module docs: docs/modules/vex-lens/
  • Source code: src/VexLens/StellaOps.VexLens/
  • Dashboard stub: docs/modules/vex-lens/runbooks/dashboards/vex-lens-observability.json