Files
git.stella-ops.org/docs/operations/trust-lattice-runbook.md
StellaOps Bot 5146204f1b feat: add security sink detection patterns for JavaScript/TypeScript
- Introduced `sink-detect.js` with various security sink detection patterns categorized by type (e.g., command injection, SQL injection, file operations).
- Implemented functions to build a lookup map for fast sink detection and to match sink calls against known patterns.
- Added `package-lock.json` for dependency management.
2025-12-22 23:21:21 +02:00

254 lines
5.7 KiB
Markdown

# Trust Lattice Operations Runbook
> **Version**: 1.0.0
> **Last Updated**: 2025-12-22
> **Audience**: Operations and Support teams
---
## 1. Overview
The Trust Lattice is a VEX claim scoring framework that produces explainable, deterministic verdicts. This runbook covers operational procedures for monitoring, troubleshooting, and maintaining the system.
---
## 2. System Components
| Component | Service | Purpose |
|-----------|---------|---------|
| TrustVector | Excititor | 3-component trust scoring (P/C/R) |
| ClaimScoreMerger | Policy | Merge scored claims into verdicts |
| PolicyGates | Policy | Enforce trust thresholds |
| VerdictManifest | Authority | Store signed verdicts |
| Calibration | Excititor | Adjust trust vectors over time |
---
## 3. Monitoring
### 3.1 Key Metrics
| Metric | Alert Threshold | Description |
|--------|-----------------|-------------|
| `trustlattice_score_latency_p95` | > 100ms | Claim scoring latency |
| `trustlattice_merge_conflicts_total` | Rate increase | Claims with status conflicts |
| `policy_gate_failures_total` | Rate increase | Gate rejections |
| `verdict_manifest_replay_failures` | > 0 | Non-deterministic verdicts |
| `calibration_drift_percent` | > 10% | Trust vector drift from baseline |
### 3.2 Dashboards
Access dashboards at:
- Grafana: `https://<grafana>/d/trustlattice`
- Prometheus queries:
```promql
# Average claim score by source class
avg(trustlattice_claim_score) by (source_class)
# Gate failure rate
rate(policy_gate_failures_total[5m])
# Confidence distribution
histogram_quantile(0.5, trustlattice_verdict_confidence_bucket)
```
### 3.3 Log Queries
Key log entries (Loki/ELK):
```
# Claim scoring
{app="excititor"} |= "ClaimScore computed"
# Gate failures
{app="policy"} |= "Gate failed" | json | gate_name != ""
# Verdict replay failures
{app="authority"} |= "Replay mismatch"
```
---
## 4. Common Operations
### 4.1 Viewing Current Trust Vectors
```bash
# Via CLI
stella trustvector list --source-class vendor
# Via API
curl -H "Authorization: Bearer $TOKEN" \
https://api.example.com/api/v1/trustlattice/vectors
```
### 4.2 Inspecting a Verdict
```bash
# Get verdict details
stella verdict show verd:acme:abc123:CVE-2025-12345:1734873600
# Verify verdict replay
stella verdict replay verd:acme:abc123:CVE-2025-12345:1734873600
```
### 4.3 Viewing Gate Configuration
```bash
# List enabled gates
stella gates list --environment production
# Show gate thresholds
stella gates show minimumConfidence --environment production
```
### 4.4 Triggering Manual Calibration
```bash
# Trigger calibration epoch for a source
stella calibration run --source vendor:redhat \
--start 2025-11-01 --end 2025-12-01
# View calibration history
stella calibration history vendor:redhat
```
---
## 5. Emergency Procedures
### 5.1 High Gate Failure Rate
**Symptoms:**
- Spike in `policy_gate_failures_total`
- Many builds failing due to low confidence
**Steps:**
1. Check if VEX source is unavailable:
```bash
stella vex source status vendor:redhat
```
2. If source is stale, consider temporary threshold reduction:
```bash
# Edit etc/policy-gates.yaml
gates:
minimumConfidence:
thresholds:
production: 0.60 # Reduced from 0.75
```
3. Restart Policy Engine to apply changes
4. Monitor and restore threshold once source recovers
### 5.2 Verdict Replay Failures
**Symptoms:**
- `verdict_manifest_replay_failures` > 0
- Audit compliance check failures
**Steps:**
1. Identify failing verdict:
```bash
stella verdict list --replay-status failed --limit 10
```
2. Compare original and replayed inputs:
```bash
stella verdict diff <manifestId>
```
3. Common causes:
- VEX document modified after verdict
- Clock drift during evaluation
- Policy configuration changed
4. For clock drift, verify NTP synchronization:
```bash
timedatectl status
```
### 5.3 Trust Vector Drift Emergency
**Symptoms:**
- `calibration_drift_percent` > 20%
- Sudden confidence changes across many assets
**Steps:**
1. Freeze calibration:
```bash
stella calibration freeze vendor:redhat
```
2. Investigate recent calibration epochs:
```bash
stella calibration history vendor:redhat --epochs 5
```
3. If false positive rate increased, rollback:
```bash
stella calibration rollback vendor:redhat --to-epoch 41
```
4. Unfreeze after investigation:
```bash
stella calibration unfreeze vendor:redhat
```
---
## 6. Configuration
### 6.1 Configuration Files
| File | Purpose |
|------|---------|
| `etc/trust-lattice.yaml` | Trust vector weights and defaults |
| `etc/policy-gates.yaml` | Gate thresholds and rules |
| `etc/excititor-calibration.yaml` | Calibration parameters |
### 6.2 Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `TRUSTLATTICE_WEIGHTS_PROVENANCE` | 0.45 | Provenance weight |
| `TRUSTLATTICE_WEIGHTS_COVERAGE` | 0.35 | Coverage weight |
| `TRUSTLATTICE_FRESHNESS_HALFLIFE` | 90 | Freshness half-life (days) |
| `GATES_MINIMUM_CONFIDENCE_PROD` | 0.75 | Production confidence threshold |
| `CALIBRATION_LEARNING_RATE` | 0.02 | Calibration learning rate |
---
## 7. Maintenance Tasks
### 7.1 Daily
- [ ] Review gate failure alerts
- [ ] Check verdict replay success rate
- [ ] Monitor trust vector stability
### 7.2 Weekly
- [ ] Review calibration epoch results
- [ ] Analyze conflict rate trends
- [ ] Update trust vectors for new sources
### 7.3 Monthly
- [ ] Audit high-drift sources
- [ ] Review and tune gate thresholds
- [ ] Clean up expired verdict manifests
---
## 8. Contact
- **On-call**: #trustlattice-oncall (Slack)
- **Escalation**: VEX Guild Lead
- **Documentation**: `docs/modules/excititor/trust-lattice.md`
---
*Document Version: 1.0.0*
*Sprint: 7100.0003.0002*