feat: add security sink detection patterns for JavaScript/TypeScript
- Introduced `sink-detect.js` with various security sink detection patterns categorized by type (e.g., command injection, SQL injection, file operations). - Implemented functions to build a lookup map for fast sink detection and to match sink calls against known patterns. - Added `package-lock.json` for dependency management.
This commit is contained in:
197
docs/operations/router-chaos-testing-runbook.md
Normal file
197
docs/operations/router-chaos-testing-runbook.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Router Chaos Testing Runbook
|
||||
|
||||
**Sprint:** SPRINT_5100_0005_0001
|
||||
**Last Updated:** 2025-12-22
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the chaos testing approach for the StellaOps router, focusing on backpressure handling, graceful degradation under load, and recovery behavior.
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 1. Load Testing (k6)
|
||||
|
||||
**Location:** `tests/load/router/`
|
||||
|
||||
#### Spike Test Scenarios
|
||||
|
||||
| Scenario | Rate | Duration | Purpose |
|
||||
|----------|------|----------|---------|
|
||||
| Baseline | 100 req/s | 1 min | Establish normal operation |
|
||||
| 10x Spike | 1000 req/s | 30s | Moderate overload |
|
||||
| 50x Spike | 5000 req/s | 30s | Severe overload |
|
||||
| Recovery | 100 req/s | 2 min | Measure recovery time |
|
||||
|
||||
#### Running Load Tests
|
||||
|
||||
```bash
|
||||
# Install k6
|
||||
brew install k6 # macOS
|
||||
# or
|
||||
choco install k6 # Windows
|
||||
|
||||
# Run spike test against local router
|
||||
k6 run tests/load/router/spike-test.js \
|
||||
-e ROUTER_URL=http://localhost:8080
|
||||
|
||||
# Run against staging
|
||||
k6 run tests/load/router/spike-test.js \
|
||||
-e ROUTER_URL=https://router.staging.stellaops.io
|
||||
|
||||
# Output results to JSON
|
||||
k6 run tests/load/router/spike-test.js \
|
||||
--out json=results.json
|
||||
```
|
||||
|
||||
### 2. Backpressure Verification
|
||||
|
||||
**Location:** `tests/chaos/BackpressureVerificationTests.cs`
|
||||
|
||||
Tests verify:
|
||||
- HTTP 429 responses include `Retry-After` header
|
||||
- HTTP 503 responses include `Retry-After` header
|
||||
- Retry-After values are reasonable (1-60 seconds)
|
||||
- No data loss during throttling
|
||||
|
||||
#### Expected Behavior
|
||||
|
||||
| Load Level | Expected Response | Retry-After |
|
||||
|------------|-------------------|-------------|
|
||||
| Normal | 200 OK | N/A |
|
||||
| High (>80% capacity) | 429 Too Many Requests | 1-10s |
|
||||
| Critical (>95% capacity) | 503 Service Unavailable | 10-60s |
|
||||
|
||||
### 3. Recovery Testing
|
||||
|
||||
**Location:** `tests/chaos/RecoveryTests.cs`
|
||||
|
||||
Tests verify:
|
||||
- Router recovers within 30 seconds after load drops
|
||||
- No request queue corruption
|
||||
- Metrics return to baseline
|
||||
|
||||
#### Recovery Thresholds
|
||||
|
||||
| Metric | Target | Critical |
|
||||
|--------|--------|----------|
|
||||
| P95 Recovery Time | <15s | <30s |
|
||||
| P99 Recovery Time | <25s | <45s |
|
||||
| Data Loss | 0% | 0% |
|
||||
|
||||
### 4. Valkey Failure Injection
|
||||
|
||||
**Location:** `tests/chaos/ValkeyFailureTests.cs`
|
||||
|
||||
Tests verify router behavior when Valkey (cache/session store) fails:
|
||||
- Graceful degradation to stateless mode
|
||||
- No crashes or hangs
|
||||
- Proper error logging
|
||||
- Recovery when Valkey returns
|
||||
|
||||
#### Failure Scenarios
|
||||
|
||||
| Scenario | Expected Behavior |
|
||||
|----------|-------------------|
|
||||
| Valkey unreachable | Fallback to direct processing |
|
||||
| Valkey slow (>500ms) | Timeout and continue |
|
||||
| Valkey returns | Resume normal caching |
|
||||
|
||||
## CI Integration
|
||||
|
||||
**Workflow:** `.gitea/workflows/router-chaos.yml`
|
||||
|
||||
The chaos tests run:
|
||||
- On every PR to `main` that touches router code
|
||||
- Nightly against staging environment
|
||||
- Before production deployments
|
||||
|
||||
### Workflow Stages
|
||||
|
||||
1. **Build** - Compile router and test projects
|
||||
2. **Unit Tests** - Run BackpressureVerificationTests
|
||||
3. **Integration Tests** - Run RecoveryTests, ValkeyFailureTests
|
||||
4. **Load Tests** - Run k6 spike scenarios (staging only)
|
||||
5. **Report** - Upload results as artifacts
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### Success Criteria
|
||||
|
||||
| Metric | Pass | Fail |
|
||||
|--------|------|------|
|
||||
| Request success rate during normal load | >=99% | <95% |
|
||||
| Throttle response rate during spike | >0% (expected) | 0% (no backpressure) |
|
||||
| Recovery time P95 | <30s | >=45s |
|
||||
| Data loss | 0% | >0% |
|
||||
|
||||
### Common Failure Patterns
|
||||
|
||||
#### No Throttling Under Load
|
||||
**Symptom:** 0% throttled requests during 50x spike
|
||||
**Cause:** Backpressure not configured or circuit breaker disabled
|
||||
**Fix:** Check router configuration `backpressure.enabled=true`
|
||||
|
||||
#### Slow Recovery
|
||||
**Symptom:** Recovery time >45s
|
||||
**Cause:** Request queue not draining properly
|
||||
**Fix:** Check `maxQueueSize` and `drainTimeoutSeconds` settings
|
||||
|
||||
#### Missing Retry-After Header
|
||||
**Symptom:** 429/503 without Retry-After
|
||||
**Cause:** Header middleware not applied
|
||||
**Fix:** Ensure `UseRetryAfterMiddleware()` is in pipeline
|
||||
|
||||
## Metrics & Dashboards
|
||||
|
||||
### Key Metrics to Monitor
|
||||
|
||||
```promql
|
||||
# Throttle rate
|
||||
rate(http_requests_total{status="429"}[5m]) / rate(http_requests_total[5m])
|
||||
|
||||
# Recovery time
|
||||
histogram_quantile(0.95, rate(request_recovery_seconds_bucket[5m]))
|
||||
|
||||
# Queue depth
|
||||
router_request_queue_depth
|
||||
```
|
||||
|
||||
### Alert Thresholds
|
||||
|
||||
| Alert | Condition | Severity |
|
||||
|-------|-----------|----------|
|
||||
| High Throttle Rate | throttle_rate > 10% for 5m | Warning |
|
||||
| Extended Throttle | throttle_rate > 50% for 2m | Critical |
|
||||
| Slow Recovery | p95_recovery > 30s | Warning |
|
||||
| No Recovery | p99_recovery > 60s | Critical |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Test Environment Setup
|
||||
|
||||
```bash
|
||||
# Start router locally
|
||||
docker-compose up router valkey
|
||||
|
||||
# Verify router health
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Verify Valkey connection
|
||||
docker exec -it valkey redis-cli ping
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
|
||||
```bash
|
||||
# Run tests with verbose logging
|
||||
dotnet test tests/chaos/ --logger "console;verbosity=detailed"
|
||||
|
||||
# k6 with debug output
|
||||
k6 run tests/load/router/spike-test.js --verbose
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Router Architecture](../modules/router/architecture.md)
|
||||
- [Backpressure Design](../product-advisories/15-Dec-2025%20-%20Designing%20202%20+%20Retry-After%20Backpressure%20Control.md)
|
||||
- [Testing Strategy](../product-advisories/20-Dec-2025%20-%20Testing%20strategy.md)
|
||||
253
docs/operations/trust-lattice-runbook.md
Normal file
253
docs/operations/trust-lattice-runbook.md
Normal file
@@ -0,0 +1,253 @@
|
||||
# Trust Lattice Operations Runbook
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Last Updated**: 2025-12-22
|
||||
> **Audience**: Operations and Support teams
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The Trust Lattice is a VEX claim scoring framework that produces explainable, deterministic verdicts. This runbook covers operational procedures for monitoring, troubleshooting, and maintaining the system.
|
||||
|
||||
---
|
||||
|
||||
## 2. System Components
|
||||
|
||||
| Component | Service | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| TrustVector | Excititor | 3-component trust scoring (P/C/R) |
|
||||
| ClaimScoreMerger | Policy | Merge scored claims into verdicts |
|
||||
| PolicyGates | Policy | Enforce trust thresholds |
|
||||
| VerdictManifest | Authority | Store signed verdicts |
|
||||
| Calibration | Excititor | Adjust trust vectors over time |
|
||||
|
||||
---
|
||||
|
||||
## 3. Monitoring
|
||||
|
||||
### 3.1 Key Metrics
|
||||
|
||||
| Metric | Alert Threshold | Description |
|
||||
|--------|-----------------|-------------|
|
||||
| `trustlattice_score_latency_p95` | > 100ms | Claim scoring latency |
|
||||
| `trustlattice_merge_conflicts_total` | Rate increase | Claims with status conflicts |
|
||||
| `policy_gate_failures_total` | Rate increase | Gate rejections |
|
||||
| `verdict_manifest_replay_failures` | > 0 | Non-deterministic verdicts |
|
||||
| `calibration_drift_percent` | > 10% | Trust vector drift from baseline |
|
||||
|
||||
### 3.2 Dashboards
|
||||
|
||||
Access dashboards at:
|
||||
- Grafana: `https://<grafana>/d/trustlattice`
|
||||
- Prometheus queries:
|
||||
```promql
|
||||
# Average claim score by source class
|
||||
avg(trustlattice_claim_score) by (source_class)
|
||||
|
||||
# Gate failure rate
|
||||
rate(policy_gate_failures_total[5m])
|
||||
|
||||
# Confidence distribution
|
||||
histogram_quantile(0.5, trustlattice_verdict_confidence_bucket)
|
||||
```
|
||||
|
||||
### 3.3 Log Queries
|
||||
|
||||
Key log entries (Loki/ELK):
|
||||
```
|
||||
# Claim scoring
|
||||
{app="excititor"} |= "ClaimScore computed"
|
||||
|
||||
# Gate failures
|
||||
{app="policy"} |= "Gate failed" | json | gate_name != ""
|
||||
|
||||
# Verdict replay failures
|
||||
{app="authority"} |= "Replay mismatch"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Common Operations
|
||||
|
||||
### 4.1 Viewing Current Trust Vectors
|
||||
|
||||
```bash
|
||||
# Via CLI
|
||||
stella trustvector list --source-class vendor
|
||||
|
||||
# Via API
|
||||
curl -H "Authorization: Bearer $TOKEN" \
|
||||
https://api.example.com/api/v1/trustlattice/vectors
|
||||
```
|
||||
|
||||
### 4.2 Inspecting a Verdict
|
||||
|
||||
```bash
|
||||
# Get verdict details
|
||||
stella verdict show verd:acme:abc123:CVE-2025-12345:1734873600
|
||||
|
||||
# Verify verdict replay
|
||||
stella verdict replay verd:acme:abc123:CVE-2025-12345:1734873600
|
||||
```
|
||||
|
||||
### 4.3 Viewing Gate Configuration
|
||||
|
||||
```bash
|
||||
# List enabled gates
|
||||
stella gates list --environment production
|
||||
|
||||
# Show gate thresholds
|
||||
stella gates show minimumConfidence --environment production
|
||||
```
|
||||
|
||||
### 4.4 Triggering Manual Calibration
|
||||
|
||||
```bash
|
||||
# Trigger calibration epoch for a source
|
||||
stella calibration run --source vendor:redhat \
|
||||
--start 2025-11-01 --end 2025-12-01
|
||||
|
||||
# View calibration history
|
||||
stella calibration history vendor:redhat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Emergency Procedures
|
||||
|
||||
### 5.1 High Gate Failure Rate
|
||||
|
||||
**Symptoms:**
|
||||
- Spike in `policy_gate_failures_total`
|
||||
- Many builds failing due to low confidence
|
||||
|
||||
**Steps:**
|
||||
1. Check if VEX source is unavailable:
|
||||
```bash
|
||||
stella vex source status vendor:redhat
|
||||
```
|
||||
|
||||
2. If source is stale, consider temporary threshold reduction:
|
||||
```bash
|
||||
# Edit etc/policy-gates.yaml
|
||||
gates:
|
||||
minimumConfidence:
|
||||
thresholds:
|
||||
production: 0.60 # Reduced from 0.75
|
||||
```
|
||||
|
||||
3. Restart Policy Engine to apply changes
|
||||
|
||||
4. Monitor and restore threshold once source recovers
|
||||
|
||||
### 5.2 Verdict Replay Failures
|
||||
|
||||
**Symptoms:**
|
||||
- `verdict_manifest_replay_failures` > 0
|
||||
- Audit compliance check failures
|
||||
|
||||
**Steps:**
|
||||
1. Identify failing verdict:
|
||||
```bash
|
||||
stella verdict list --replay-status failed --limit 10
|
||||
```
|
||||
|
||||
2. Compare original and replayed inputs:
|
||||
```bash
|
||||
stella verdict diff <manifestId>
|
||||
```
|
||||
|
||||
3. Common causes:
|
||||
- VEX document modified after verdict
|
||||
- Clock drift during evaluation
|
||||
- Policy configuration changed
|
||||
|
||||
4. For clock drift, verify NTP synchronization:
|
||||
```bash
|
||||
timedatectl status
|
||||
```
|
||||
|
||||
### 5.3 Trust Vector Drift Emergency
|
||||
|
||||
**Symptoms:**
|
||||
- `calibration_drift_percent` > 20%
|
||||
- Sudden confidence changes across many assets
|
||||
|
||||
**Steps:**
|
||||
1. Freeze calibration:
|
||||
```bash
|
||||
stella calibration freeze vendor:redhat
|
||||
```
|
||||
|
||||
2. Investigate recent calibration epochs:
|
||||
```bash
|
||||
stella calibration history vendor:redhat --epochs 5
|
||||
```
|
||||
|
||||
3. If false positive rate increased, rollback:
|
||||
```bash
|
||||
stella calibration rollback vendor:redhat --to-epoch 41
|
||||
```
|
||||
|
||||
4. Unfreeze after investigation:
|
||||
```bash
|
||||
stella calibration unfreeze vendor:redhat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Configuration
|
||||
|
||||
### 6.1 Configuration Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `etc/trust-lattice.yaml` | Trust vector weights and defaults |
|
||||
| `etc/policy-gates.yaml` | Gate thresholds and rules |
|
||||
| `etc/excititor-calibration.yaml` | Calibration parameters |
|
||||
|
||||
### 6.2 Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `TRUSTLATTICE_WEIGHTS_PROVENANCE` | 0.45 | Provenance weight |
|
||||
| `TRUSTLATTICE_WEIGHTS_COVERAGE` | 0.35 | Coverage weight |
|
||||
| `TRUSTLATTICE_FRESHNESS_HALFLIFE` | 90 | Freshness half-life (days) |
|
||||
| `GATES_MINIMUM_CONFIDENCE_PROD` | 0.75 | Production confidence threshold |
|
||||
| `CALIBRATION_LEARNING_RATE` | 0.02 | Calibration learning rate |
|
||||
|
||||
---
|
||||
|
||||
## 7. Maintenance Tasks
|
||||
|
||||
### 7.1 Daily
|
||||
|
||||
- [ ] Review gate failure alerts
|
||||
- [ ] Check verdict replay success rate
|
||||
- [ ] Monitor trust vector stability
|
||||
|
||||
### 7.2 Weekly
|
||||
|
||||
- [ ] Review calibration epoch results
|
||||
- [ ] Analyze conflict rate trends
|
||||
- [ ] Update trust vectors for new sources
|
||||
|
||||
### 7.3 Monthly
|
||||
|
||||
- [ ] Audit high-drift sources
|
||||
- [ ] Review and tune gate thresholds
|
||||
- [ ] Clean up expired verdict manifests
|
||||
|
||||
---
|
||||
|
||||
## 8. Contact
|
||||
|
||||
- **On-call**: #trustlattice-oncall (Slack)
|
||||
- **Escalation**: VEX Guild Lead
|
||||
- **Documentation**: `docs/modules/excititor/trust-lattice.md`
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.0.0*
|
||||
*Sprint: 7100.0003.0002*
|
||||
405
docs/operations/trust-lattice-troubleshooting.md
Normal file
405
docs/operations/trust-lattice-troubleshooting.md
Normal file
@@ -0,0 +1,405 @@
|
||||
# Trust Lattice Troubleshooting Guide
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Last Updated**: 2025-12-22
|
||||
> **Audience**: Support and Development teams
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Symptom | Likely Cause | Section |
|
||||
|---------|--------------|---------|
|
||||
| Low confidence scores | Stale VEX data or missing sources | [2.1](#21-low-confidence-scores) |
|
||||
| Gate failures blocking builds | Threshold too high or source issues | [2.2](#22-gate-failures) |
|
||||
| Verdict replay mismatches | Non-deterministic inputs | [2.3](#23-verdict-replay-failures) |
|
||||
| Unexpected trust changes | Calibration drift | [2.4](#24-calibration-issues) |
|
||||
| Conflicting verdicts | Multi-source disagreement | [2.5](#25-claim-conflicts) |
|
||||
|
||||
---
|
||||
|
||||
## 1. Diagnostic Commands
|
||||
|
||||
### 1.1 Check System Health
|
||||
|
||||
```bash
|
||||
# Excititor health
|
||||
curl https://api.example.com/excititor/health
|
||||
|
||||
# Policy Engine health
|
||||
curl https://api.example.com/policy/health
|
||||
|
||||
# Authority health
|
||||
curl https://api.example.com/authority/health
|
||||
```
|
||||
|
||||
### 1.2 Trace a Verdict
|
||||
|
||||
```bash
|
||||
# Get detailed verdict explanation
|
||||
stella verdict explain <manifestId>
|
||||
|
||||
# Output includes:
|
||||
# - All claims considered
|
||||
# - Trust vector scores
|
||||
# - Strength/freshness multipliers
|
||||
# - Gate evaluation results
|
||||
# - Conflict detection
|
||||
```
|
||||
|
||||
### 1.3 Check VEX Source Status
|
||||
|
||||
```bash
|
||||
# List all sources with status
|
||||
stella vex source list
|
||||
|
||||
# Check specific source
|
||||
stella vex source status vendor:redhat
|
||||
|
||||
# Sample output:
|
||||
# Source: vendor:redhat
|
||||
# Status: healthy
|
||||
# Last fetch: 2025-12-22T10:00:00Z
|
||||
# Documents: 15234
|
||||
# Freshness: 2.3 hours
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Common Issues
|
||||
|
||||
### 2.1 Low Confidence Scores
|
||||
|
||||
**Symptoms:**
|
||||
- Verdicts have confidence < 0.5
|
||||
- Many "under_investigation" statuses
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
1. Check claim freshness:
|
||||
```bash
|
||||
stella claim analyze --cve CVE-2025-12345 --asset sha256:abc123
|
||||
|
||||
# Look for:
|
||||
# - Freshness multiplier < 0.5 (claim older than 180 days)
|
||||
# - No high-trust sources
|
||||
```
|
||||
|
||||
2. Check trust vector values:
|
||||
```bash
|
||||
stella trustvector show vendor:redhat
|
||||
|
||||
# Low scores indicate:
|
||||
# - Signature verification issues (P)
|
||||
# - Poor scope matching (C)
|
||||
# - Non-deterministic outputs (R)
|
||||
```
|
||||
|
||||
3. Check for missing VEX coverage:
|
||||
```bash
|
||||
stella vex coverage --purl pkg:npm/lodash@4.17.21
|
||||
|
||||
# No claims? Source may not cover this package
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- If freshness is low: Check if source is publishing updates
|
||||
- If trust vector is low: Review source verification settings
|
||||
- If coverage is missing: Add additional VEX sources
|
||||
|
||||
### 2.2 Gate Failures
|
||||
|
||||
**Symptoms:**
|
||||
- Builds failing with "Gate: MinimumConfidenceGate FAILED"
|
||||
- Policy violations despite VEX claims
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
1. Check gate thresholds:
|
||||
```bash
|
||||
stella gates show minimumConfidence
|
||||
|
||||
# Thresholds:
|
||||
# production: 0.75
|
||||
# staging: 0.60
|
||||
# development: 0.40
|
||||
```
|
||||
|
||||
2. Compare with verdict confidence:
|
||||
```bash
|
||||
stella verdict show <manifestId> | grep confidence
|
||||
|
||||
# confidence: 0.68 <- Below 0.75 production threshold
|
||||
```
|
||||
|
||||
3. Check which gate failed:
|
||||
```bash
|
||||
stella verdict gates <manifestId>
|
||||
|
||||
# Gates:
|
||||
# MinimumConfidenceGate: FAILED (0.68 < 0.75)
|
||||
# SourceQuotaGate: PASSED
|
||||
# UnknownsBudgetGate: PASSED
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- Temporary: Lower threshold (with approval)
|
||||
- Long-term: Add corroborating VEX sources
|
||||
- If single-source: Check SourceQuotaGate corroboration
|
||||
|
||||
### 2.3 Verdict Replay Failures
|
||||
|
||||
**Symptoms:**
|
||||
- Replay verification returns success: false
|
||||
- Audit failures due to non-determinism
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
1. Get detailed diff:
|
||||
```bash
|
||||
stella verdict replay --diff <manifestId>
|
||||
|
||||
# Differences:
|
||||
# result.confidence: 0.82 -> 0.79
|
||||
# inputs.vexDocumentDigests[2]: sha256:abc... (missing)
|
||||
```
|
||||
|
||||
2. Common causes:
|
||||
|
||||
| Difference | Likely Cause |
|
||||
|------------|--------------|
|
||||
| VEX digest mismatch | Document was modified after verdict |
|
||||
| Confidence delta | Clock cutoff drift (freshness calc) |
|
||||
| Missing claims | Source was unavailable during replay |
|
||||
| Different status | Policy version changed |
|
||||
|
||||
3. Check input availability:
|
||||
```bash
|
||||
# Verify all pinned inputs exist
|
||||
stella cas verify --digest sha256:abc123
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- Clock drift: Ensure NTP synchronization across nodes
|
||||
- Missing inputs: Restore from backup or acknowledge drift
|
||||
- Policy change: Compare policy hashes between original and replay
|
||||
|
||||
### 2.4 Calibration Issues
|
||||
|
||||
**Symptoms:**
|
||||
- Trust vectors changed unexpectedly
|
||||
- Accuracy metrics declining
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
1. Review recent calibrations:
|
||||
```bash
|
||||
stella calibration history vendor:redhat --epochs 5
|
||||
|
||||
# Epoch 42: accuracy=0.92, delta=(-0.02, +0.02, 0)
|
||||
# Epoch 41: accuracy=0.94, delta=(-0.01, +0.01, 0)
|
||||
```
|
||||
|
||||
2. Check comparison results:
|
||||
```bash
|
||||
stella calibration epoch 42 --details
|
||||
|
||||
# Total claims: 1500
|
||||
# Correct: 1380
|
||||
# False positives: 45
|
||||
# False negatives: 75
|
||||
# Detected bias: OptimisticBias
|
||||
```
|
||||
|
||||
3. Check for data quality issues:
|
||||
```bash
|
||||
# Look for corrupted truth data
|
||||
stella calibration validate-truth --epoch 42
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- High false positive: Reduce provenance score
|
||||
- High false negative: Review coverage matching
|
||||
- Data quality issue: Re-run with corrected truth set
|
||||
- Emergency: Rollback to previous epoch
|
||||
|
||||
### 2.5 Claim Conflicts
|
||||
|
||||
**Symptoms:**
|
||||
- Verdicts show hasConflicts: true
|
||||
- Confidence reduced due to conflict penalty
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
1. View conflict details:
|
||||
```bash
|
||||
stella verdict conflicts <manifestId>
|
||||
|
||||
# Conflicts:
|
||||
# vendor:redhat claims: not_affected
|
||||
# hub:osv claims: affected
|
||||
# Conflict penalty applied: 0.25
|
||||
```
|
||||
|
||||
2. Investigate source disagreement:
|
||||
```bash
|
||||
# Get raw claims from each source
|
||||
stella vex claim --source vendor:redhat --cve CVE-2025-12345
|
||||
stella vex claim --source hub:osv --cve CVE-2025-12345
|
||||
```
|
||||
|
||||
3. Check claim timestamps:
|
||||
```bash
|
||||
# Older claim may be outdated
|
||||
stella claim compare vendor:redhat hub:osv --cve CVE-2025-12345
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- If one source is stale: Flag for review
|
||||
- If genuine disagreement: Higher-trust source wins (by design)
|
||||
- If persistent: Consider source override in policy
|
||||
|
||||
---
|
||||
|
||||
## 3. Performance Issues
|
||||
|
||||
### 3.1 Slow Claim Scoring
|
||||
|
||||
**Symptoms:**
|
||||
- Scoring latency > 100ms
|
||||
- Timeouts during high load
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check scoring performance
|
||||
stella perf scoring --samples 100
|
||||
|
||||
# Look for:
|
||||
# - Cache miss rate
|
||||
# - Trust vector lookups
|
||||
# - Freshness calculation overhead
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- Enable trust vector caching
|
||||
- Pre-compute freshness for common cutoffs
|
||||
- Scale Excititor horizontally
|
||||
|
||||
### 3.2 Slow Verdict Replay
|
||||
|
||||
**Symptoms:**
|
||||
- Replay verification > 5 seconds
|
||||
- Timeout during audit
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check input retrieval time
|
||||
stella verdict replay --timing <manifestId>
|
||||
|
||||
# Timing:
|
||||
# Input fetch: 3.2s
|
||||
# Score compute: 0.1s
|
||||
# Merge: 0.05s
|
||||
# Total: 3.35s
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
- Ensure CAS storage is local or cached
|
||||
- Pre-warm verdict cache for critical assets
|
||||
- Increase timeout for large manifests
|
||||
|
||||
---
|
||||
|
||||
## 4. Integration Issues
|
||||
|
||||
### 4.1 VEX Source Not Recognized
|
||||
|
||||
**Symptoms:**
|
||||
- Claims from source not included in verdicts
|
||||
- Source shows as "unknown" class
|
||||
|
||||
**Resolution:**
|
||||
|
||||
1. Register source in configuration:
|
||||
```yaml
|
||||
# etc/trust-lattice.yaml
|
||||
sources:
|
||||
- id: vendor:newvendor
|
||||
class: vendor
|
||||
trustVector:
|
||||
provenance: 0.85
|
||||
coverage: 0.70
|
||||
replayability: 0.60
|
||||
```
|
||||
|
||||
2. Reload configuration:
|
||||
```bash
|
||||
stella config reload --service excititor
|
||||
```
|
||||
|
||||
### 4.2 Gate Not Evaluating
|
||||
|
||||
**Symptoms:**
|
||||
- Expected gate not appearing in results
|
||||
- Gate shows as "disabled"
|
||||
|
||||
**Resolution:**
|
||||
|
||||
1. Check gate configuration:
|
||||
```bash
|
||||
stella gates list --show-disabled
|
||||
```
|
||||
|
||||
2. Enable gate:
|
||||
```yaml
|
||||
# etc/policy-gates.yaml
|
||||
gates:
|
||||
minimumConfidence:
|
||||
enabled: true # Ensure this is true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Support Information
|
||||
|
||||
### 5.1 Collecting Diagnostic Bundle
|
||||
|
||||
```bash
|
||||
stella support bundle --include trust-lattice \
|
||||
--since 1h --output /tmp/diag.zip
|
||||
```
|
||||
|
||||
Bundle includes:
|
||||
- Trust vector snapshots
|
||||
- Recent verdicts
|
||||
- Gate evaluations
|
||||
- Calibration history
|
||||
- System metrics
|
||||
|
||||
### 5.2 Log Locations
|
||||
|
||||
| Service | Log Path |
|
||||
|---------|----------|
|
||||
| Excititor | `/var/log/stellaops/excititor.log` |
|
||||
| Policy | `/var/log/stellaops/policy.log` |
|
||||
| Authority | `/var/log/stellaops/authority.log` |
|
||||
|
||||
### 5.3 Contact
|
||||
|
||||
- **Support**: support@stella-ops.org
|
||||
- **Documentation**: `docs/modules/excititor/trust-lattice.md`
|
||||
- **GitHub Issues**: https://github.com/stella-ops/stella-ops/issues
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.0.0*
|
||||
*Sprint: 7100.0003.0002*
|
||||
Reference in New Issue
Block a user