Files
git.stella-ops.org/docs/operations/trust-lattice-runbook.md
StellaOps Bot 5146204f1b feat: add security sink detection patterns for JavaScript/TypeScript
- Introduced `sink-detect.js` with various security sink detection patterns categorized by type (e.g., command injection, SQL injection, file operations).
- Implemented functions to build a lookup map for fast sink detection and to match sink calls against known patterns.
- Added `package-lock.json` for dependency management.
2025-12-22 23:21:21 +02:00

5.7 KiB

Trust Lattice Operations Runbook

Version: 1.0.0 Last Updated: 2025-12-22 Audience: Operations and Support teams


1. Overview

The Trust Lattice is a VEX claim scoring framework that produces explainable, deterministic verdicts. This runbook covers operational procedures for monitoring, troubleshooting, and maintaining the system.


2. System Components

Component Service Purpose
TrustVector Excititor 3-component trust scoring (P/C/R)
ClaimScoreMerger Policy Merge scored claims into verdicts
PolicyGates Policy Enforce trust thresholds
VerdictManifest Authority Store signed verdicts
Calibration Excititor Adjust trust vectors over time

3. Monitoring

3.1 Key Metrics

Metric Alert Threshold Description
trustlattice_score_latency_p95 > 100ms Claim scoring latency
trustlattice_merge_conflicts_total Rate increase Claims with status conflicts
policy_gate_failures_total Rate increase Gate rejections
verdict_manifest_replay_failures > 0 Non-deterministic verdicts
calibration_drift_percent > 10% Trust vector drift from baseline

3.2 Dashboards

Access dashboards at:

  • Grafana: https://<grafana>/d/trustlattice
  • Prometheus queries:
    # Average claim score by source class
    avg(trustlattice_claim_score) by (source_class)
    
    # Gate failure rate
    rate(policy_gate_failures_total[5m])
    
    # Confidence distribution
    histogram_quantile(0.5, trustlattice_verdict_confidence_bucket)
    

3.3 Log Queries

Key log entries (Loki/ELK):

# Claim scoring
{app="excititor"} |= "ClaimScore computed"

# Gate failures
{app="policy"} |= "Gate failed" | json | gate_name != ""

# Verdict replay failures
{app="authority"} |= "Replay mismatch"

4. Common Operations

4.1 Viewing Current Trust Vectors

# Via CLI
stella trustvector list --source-class vendor

# Via API
curl -H "Authorization: Bearer $TOKEN" \
  https://api.example.com/api/v1/trustlattice/vectors

4.2 Inspecting a Verdict

# Get verdict details
stella verdict show verd:acme:abc123:CVE-2025-12345:1734873600

# Verify verdict replay
stella verdict replay verd:acme:abc123:CVE-2025-12345:1734873600

4.3 Viewing Gate Configuration

# List enabled gates
stella gates list --environment production

# Show gate thresholds
stella gates show minimumConfidence --environment production

4.4 Triggering Manual Calibration

# Trigger calibration epoch for a source
stella calibration run --source vendor:redhat \
  --start 2025-11-01 --end 2025-12-01

# View calibration history
stella calibration history vendor:redhat

5. Emergency Procedures

5.1 High Gate Failure Rate

Symptoms:

  • Spike in policy_gate_failures_total
  • Many builds failing due to low confidence

Steps:

  1. Check if VEX source is unavailable:

    stella vex source status vendor:redhat
    
  2. If source is stale, consider temporary threshold reduction:

    # Edit etc/policy-gates.yaml
    gates:
      minimumConfidence:
        thresholds:
          production: 0.60  # Reduced from 0.75
    
  3. Restart Policy Engine to apply changes

  4. Monitor and restore threshold once source recovers

5.2 Verdict Replay Failures

Symptoms:

  • verdict_manifest_replay_failures > 0
  • Audit compliance check failures

Steps:

  1. Identify failing verdict:

    stella verdict list --replay-status failed --limit 10
    
  2. Compare original and replayed inputs:

    stella verdict diff <manifestId>
    
  3. Common causes:

    • VEX document modified after verdict
    • Clock drift during evaluation
    • Policy configuration changed
  4. For clock drift, verify NTP synchronization:

    timedatectl status
    

5.3 Trust Vector Drift Emergency

Symptoms:

  • calibration_drift_percent > 20%
  • Sudden confidence changes across many assets

Steps:

  1. Freeze calibration:

    stella calibration freeze vendor:redhat
    
  2. Investigate recent calibration epochs:

    stella calibration history vendor:redhat --epochs 5
    
  3. If false positive rate increased, rollback:

    stella calibration rollback vendor:redhat --to-epoch 41
    
  4. Unfreeze after investigation:

    stella calibration unfreeze vendor:redhat
    

6. Configuration

6.1 Configuration Files

File Purpose
etc/trust-lattice.yaml Trust vector weights and defaults
etc/policy-gates.yaml Gate thresholds and rules
etc/excititor-calibration.yaml Calibration parameters

6.2 Environment Variables

Variable Default Description
TRUSTLATTICE_WEIGHTS_PROVENANCE 0.45 Provenance weight
TRUSTLATTICE_WEIGHTS_COVERAGE 0.35 Coverage weight
TRUSTLATTICE_FRESHNESS_HALFLIFE 90 Freshness half-life (days)
GATES_MINIMUM_CONFIDENCE_PROD 0.75 Production confidence threshold
CALIBRATION_LEARNING_RATE 0.02 Calibration learning rate

7. Maintenance Tasks

7.1 Daily

  • Review gate failure alerts
  • Check verdict replay success rate
  • Monitor trust vector stability

7.2 Weekly

  • Review calibration epoch results
  • Analyze conflict rate trends
  • Update trust vectors for new sources

7.3 Monthly

  • Audit high-drift sources
  • Review and tune gate thresholds
  • Clean up expired verdict manifests

8. Contact

  • On-call: #trustlattice-oncall (Slack)
  • Escalation: VEX Guild Lead
  • Documentation: docs/modules/excititor/trust-lattice.md

Document Version: 1.0.0 Sprint: 7100.0003.0002