- Introduced `sink-detect.js` with various security sink detection patterns categorized by type (e.g., command injection, SQL injection, file operations). - Implemented functions to build a lookup map for fast sink detection and to match sink calls against known patterns. - Added `package-lock.json` for dependency management.
254 lines
5.7 KiB
Markdown
254 lines
5.7 KiB
Markdown
# Trust Lattice Operations Runbook
|
|
|
|
> **Version**: 1.0.0
|
|
> **Last Updated**: 2025-12-22
|
|
> **Audience**: Operations and Support teams
|
|
|
|
---
|
|
|
|
## 1. Overview
|
|
|
|
The Trust Lattice is a VEX claim scoring framework that produces explainable, deterministic verdicts. This runbook covers operational procedures for monitoring, troubleshooting, and maintaining the system.
|
|
|
|
---
|
|
|
|
## 2. System Components
|
|
|
|
| Component | Service | Purpose |
|
|
|-----------|---------|---------|
|
|
| TrustVector | Excititor | 3-component trust scoring (P/C/R) |
|
|
| ClaimScoreMerger | Policy | Merge scored claims into verdicts |
|
|
| PolicyGates | Policy | Enforce trust thresholds |
|
|
| VerdictManifest | Authority | Store signed verdicts |
|
|
| Calibration | Excititor | Adjust trust vectors over time |
|
|
|
|
---
|
|
|
|
## 3. Monitoring
|
|
|
|
### 3.1 Key Metrics
|
|
|
|
| Metric | Alert Threshold | Description |
|
|
|--------|-----------------|-------------|
|
|
| `trustlattice_score_latency_p95` | > 100ms | Claim scoring latency |
|
|
| `trustlattice_merge_conflicts_total` | Rate increase | Claims with status conflicts |
|
|
| `policy_gate_failures_total` | Rate increase | Gate rejections |
|
|
| `verdict_manifest_replay_failures` | > 0 | Non-deterministic verdicts |
|
|
| `calibration_drift_percent` | > 10% | Trust vector drift from baseline |
|
|
|
|
### 3.2 Dashboards
|
|
|
|
Access dashboards at:
|
|
- Grafana: `https://<grafana>/d/trustlattice`
|
|
- Prometheus queries:
|
|
```promql
|
|
# Average claim score by source class
|
|
avg(trustlattice_claim_score) by (source_class)
|
|
|
|
# Gate failure rate
|
|
rate(policy_gate_failures_total[5m])
|
|
|
|
# Confidence distribution
|
|
histogram_quantile(0.5, trustlattice_verdict_confidence_bucket)
|
|
```
|
|
|
|
### 3.3 Log Queries
|
|
|
|
Key log entries (Loki/ELK):
|
|
```
|
|
# Claim scoring
|
|
{app="excititor"} |= "ClaimScore computed"
|
|
|
|
# Gate failures
|
|
{app="policy"} |= "Gate failed" | json | gate_name != ""
|
|
|
|
# Verdict replay failures
|
|
{app="authority"} |= "Replay mismatch"
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Common Operations
|
|
|
|
### 4.1 Viewing Current Trust Vectors
|
|
|
|
```bash
|
|
# Via CLI
|
|
stella trustvector list --source-class vendor
|
|
|
|
# Via API
|
|
curl -H "Authorization: Bearer $TOKEN" \
|
|
https://api.example.com/api/v1/trustlattice/vectors
|
|
```
|
|
|
|
### 4.2 Inspecting a Verdict
|
|
|
|
```bash
|
|
# Get verdict details
|
|
stella verdict show verd:acme:abc123:CVE-2025-12345:1734873600
|
|
|
|
# Verify verdict replay
|
|
stella verdict replay verd:acme:abc123:CVE-2025-12345:1734873600
|
|
```
|
|
|
|
### 4.3 Viewing Gate Configuration
|
|
|
|
```bash
|
|
# List enabled gates
|
|
stella gates list --environment production
|
|
|
|
# Show gate thresholds
|
|
stella gates show minimumConfidence --environment production
|
|
```
|
|
|
|
### 4.4 Triggering Manual Calibration
|
|
|
|
```bash
|
|
# Trigger calibration epoch for a source
|
|
stella calibration run --source vendor:redhat \
|
|
--start 2025-11-01 --end 2025-12-01
|
|
|
|
# View calibration history
|
|
stella calibration history vendor:redhat
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Emergency Procedures
|
|
|
|
### 5.1 High Gate Failure Rate
|
|
|
|
**Symptoms:**
|
|
- Spike in `policy_gate_failures_total`
|
|
- Many builds failing due to low confidence
|
|
|
|
**Steps:**
|
|
1. Check if VEX source is unavailable:
|
|
```bash
|
|
stella vex source status vendor:redhat
|
|
```
|
|
|
|
2. If source is stale, consider temporary threshold reduction:
|
|
```bash
|
|
# Edit etc/policy-gates.yaml
|
|
gates:
|
|
minimumConfidence:
|
|
thresholds:
|
|
production: 0.60 # Reduced from 0.75
|
|
```
|
|
|
|
3. Restart Policy Engine to apply changes
|
|
|
|
4. Monitor and restore threshold once source recovers
|
|
|
|
### 5.2 Verdict Replay Failures
|
|
|
|
**Symptoms:**
|
|
- `verdict_manifest_replay_failures` > 0
|
|
- Audit compliance check failures
|
|
|
|
**Steps:**
|
|
1. Identify failing verdict:
|
|
```bash
|
|
stella verdict list --replay-status failed --limit 10
|
|
```
|
|
|
|
2. Compare original and replayed inputs:
|
|
```bash
|
|
stella verdict diff <manifestId>
|
|
```
|
|
|
|
3. Common causes:
|
|
- VEX document modified after verdict
|
|
- Clock drift during evaluation
|
|
- Policy configuration changed
|
|
|
|
4. For clock drift, verify NTP synchronization:
|
|
```bash
|
|
timedatectl status
|
|
```
|
|
|
|
### 5.3 Trust Vector Drift Emergency
|
|
|
|
**Symptoms:**
|
|
- `calibration_drift_percent` > 20%
|
|
- Sudden confidence changes across many assets
|
|
|
|
**Steps:**
|
|
1. Freeze calibration:
|
|
```bash
|
|
stella calibration freeze vendor:redhat
|
|
```
|
|
|
|
2. Investigate recent calibration epochs:
|
|
```bash
|
|
stella calibration history vendor:redhat --epochs 5
|
|
```
|
|
|
|
3. If false positive rate increased, rollback:
|
|
```bash
|
|
stella calibration rollback vendor:redhat --to-epoch 41
|
|
```
|
|
|
|
4. Unfreeze after investigation:
|
|
```bash
|
|
stella calibration unfreeze vendor:redhat
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Configuration
|
|
|
|
### 6.1 Configuration Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `etc/trust-lattice.yaml` | Trust vector weights and defaults |
|
|
| `etc/policy-gates.yaml` | Gate thresholds and rules |
|
|
| `etc/excititor-calibration.yaml` | Calibration parameters |
|
|
|
|
### 6.2 Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `TRUSTLATTICE_WEIGHTS_PROVENANCE` | 0.45 | Provenance weight |
|
|
| `TRUSTLATTICE_WEIGHTS_COVERAGE` | 0.35 | Coverage weight |
|
|
| `TRUSTLATTICE_FRESHNESS_HALFLIFE` | 90 | Freshness half-life (days) |
|
|
| `GATES_MINIMUM_CONFIDENCE_PROD` | 0.75 | Production confidence threshold |
|
|
| `CALIBRATION_LEARNING_RATE` | 0.02 | Calibration learning rate |
|
|
|
|
---
|
|
|
|
## 7. Maintenance Tasks
|
|
|
|
### 7.1 Daily
|
|
|
|
- [ ] Review gate failure alerts
|
|
- [ ] Check verdict replay success rate
|
|
- [ ] Monitor trust vector stability
|
|
|
|
### 7.2 Weekly
|
|
|
|
- [ ] Review calibration epoch results
|
|
- [ ] Analyze conflict rate trends
|
|
- [ ] Update trust vectors for new sources
|
|
|
|
### 7.3 Monthly
|
|
|
|
- [ ] Audit high-drift sources
|
|
- [ ] Review and tune gate thresholds
|
|
- [ ] Clean up expired verdict manifests
|
|
|
|
---
|
|
|
|
## 8. Contact
|
|
|
|
- **On-call**: #trustlattice-oncall (Slack)
|
|
- **Escalation**: VEX Guild Lead
|
|
- **Documentation**: `docs/modules/excititor/trust-lattice.md`
|
|
|
|
---
|
|
|
|
*Document Version: 1.0.0*
|
|
*Sprint: 7100.0003.0002*
|