Files
git.stella-ops.org/docs/guides/vex-trust-gate-rollout.md
StellaOps Bot cec4265a40 save progress
2025-12-28 01:40:52 +02:00

5.8 KiB

VexTrustGate Rollout Guide

This guide describes the phased rollout procedure for the VexTrustGate policy feature, which enforces VEX signature verification trust thresholds.

Overview

VexTrustGate adds a new policy gate that:

  1. Validates VEX signature verification trust scores
  2. Enforces per-environment thresholds (production stricter than staging/dev)
  3. Blocks or warns on status transitions when trust is insufficient
  4. Contributes to confidence scoring via VexTrustConfidenceFactorProvider

Gate Order

VexTrustGate is positioned in the policy gate chain at order 250:

  • 100: EvidenceCompleteness
  • 200: LatticeState
  • 250: VexTrust ← NEW
  • 300: UncertaintyTier
  • 400: Confidence

Prerequisites

  1. VEX signature verification pipeline active (SPRINT_1227_0004_0001)
  2. IssuerDirectory populated with trusted VEX sources
  3. Excititor properly populating VexTrustStatus in API responses

Rollout Phases

Phase 1: Feature Flag Deployment

Deploy with gate disabled to establish baseline:

PolicyGates:
  VexTrust:
    Enabled: false  # Gate off initially

Duration: 1-2 days Monitoring: Verify deployment health, no regression in existing gates.

Phase 2: Shadow Mode (Warn Everywhere)

Enable gate in warn-only mode across all environments:

PolicyGates:
  VexTrust:
    Enabled: true
    Thresholds:
      production:
        MinCompositeScore: 0.80
        RequireIssuerVerified: true
        FailureAction: Warn  # Changed from Block
      staging:
        MinCompositeScore: 0.60
        RequireIssuerVerified: true
        FailureAction: Warn
      development:
        MinCompositeScore: 0.40
        RequireIssuerVerified: false
        FailureAction: Warn
    MissingTrustBehavior: Warn

Duration: 1-2 weeks Monitoring:

  • Review stellaops.policy.vex_trust_gate.decisions.total metrics
  • Analyze warn events to understand threshold impact
  • Collect feedback from operators on false positives

Phase 3: Threshold Tuning

Based on Phase 2 data, adjust thresholds:

  1. Review decision breakdown by reason:

    • composite_score: May need to lower threshold
    • issuer_verified: Check IssuerDirectory completeness
    • freshness: Consider expanding acceptable states
  2. Tenant-specific adjustments (if needed):

    PolicyGates:
      VexTrust:
        TenantOverrides:
          tenant-with-internal-vex:
            production:
              MinCompositeScore: 0.70  # Lower for self-signed internal VEX
          high-security-tenant:
            production:
              MinCompositeScore: 0.90  # Higher for regulated workloads
    

Duration: 1 week Outcome: Validated threshold configuration

Phase 4: Production Enforcement

Enable blocking in production only:

PolicyGates:
  VexTrust:
    Enabled: true
    Thresholds:
      production:
        MinCompositeScore: 0.80
        RequireIssuerVerified: true
        MinAccuracyRate: 0.85
        AcceptableFreshness:
          - fresh
        FailureAction: Block  # Now enforcing
      staging:
        FailureAction: Warn  # Still warn only
      development:
        FailureAction: Warn

Duration: Ongoing with monitoring Rollback: Set FailureAction: Warn or Enabled: false if issues arise.

Phase 5: Full Rollout

After production stabilization, optionally enable blocking in staging:

PolicyGates:
  VexTrust:
    Thresholds:
      staging:
        MinCompositeScore: 0.60
        RequireIssuerVerified: true
        FailureAction: Block  # Optional stricter staging

Monitoring

Key Metrics

Metric Description Alert Threshold
stellaops.policy.vex_trust_gate.evaluations.total Total evaluations Baseline variance
stellaops.policy.vex_trust_gate.decisions.total{decision="block"} Block decisions Sudden spike
stellaops.policy.vex_trust_gate.trust_score Score distribution Mean < 0.50
stellaops.policy.vex_trust_gate.evaluation_duration_ms Latency p99 > 100ms

Trace Spans

  • VexTrustGate.EvaluateAsync
    • Attributes: environment, trust_score, decision, issuer_id

Audit Trail

PolicyAuditEntity now includes VEX trust fields:

  • VexTrustScore: Composite score at decision time
  • VexTrustTier: Tier classification
  • VexSignatureVerified: Whether signature was verified
  • VexIssuerId/VexIssuerName: Issuer info
  • VexTrustGateResult: Gate decision
  • VexTrustGateReason: Reason code

Rollback Procedure

Immediate Disable

PolicyGates:
  VexTrust:
    Enabled: false

Switch to Warn Mode

PolicyGates:
  VexTrust:
    Thresholds:
      production:
        FailureAction: Warn
      staging:
        FailureAction: Warn
      development:
        FailureAction: Warn

Per-Tenant Disable

PolicyGates:
  VexTrust:
    TenantOverrides:
      affected-tenant:
        production:
          MinCompositeScore: 0.01  # Effectively bypass
          RequireIssuerVerified: false

Troubleshooting

Common Issues

Symptom Likely Cause Resolution
All VEX blocked Missing IssuerDirectory entries Populate directory with trusted issuers
High false positive rate Threshold too strict Lower MinCompositeScore
"missing_vex_trust_data" warnings Verification pipeline not running Check Excititor logs
Inconsistent decisions Stale trust cache Verify cache TTL settings

Debug Logging

Enable debug logging for gate:

Logging:
  LogLevel:
    StellaOps.Policy.Engine.Gates.VexTrustGate: Debug

Support

  • Sprint: SPRINT_1227_0004_0003
  • Component: StellaOps.Policy.Engine.Gates
  • Files:
    • src/Policy/StellaOps.Policy.Engine/Gates/VexTrustGate.cs
    • src/Policy/StellaOps.Policy.Engine/Gates/VexTrustGateOptions.cs
    • etc/policy-gates.yaml.sample