feat: add security sink detection patterns for JavaScript/TypeScript

- Introduced `sink-detect.js` with various security sink detection patterns categorized by type (e.g., command injection, SQL injection, file operations). - Implemented functions to build a lookup map for fast sink detection and to match sink calls against known patterns. - Added `package-lock.json` for dependency management.
2025-12-22 23:21:21 +02:00
parent 3ba7157b00
commit 5146204f1b
529 changed files with 73579 additions and 5985 deletions
--- a/docs/operations/router-chaos-testing-runbook.md
+++ b/docs/operations/router-chaos-testing-runbook.md
@@ -0,0 +1,197 @@
+# Router Chaos Testing Runbook
+
+**Sprint:** SPRINT_5100_0005_0001
+**Last Updated:** 2025-12-22
+
+## Overview
+
+This document describes the chaos testing approach for the StellaOps router, focusing on backpressure handling, graceful degradation under load, and recovery behavior.
+
+## Test Categories
+
+### 1. Load Testing (k6)
+
+**Location:** `tests/load/router/`
+
+#### Spike Test Scenarios
+
+| Scenario | Rate | Duration | Purpose |
+|----------|------|----------|---------|
+| Baseline | 100 req/s | 1 min | Establish normal operation |
+| 10x Spike | 1000 req/s | 30s | Moderate overload |
+| 50x Spike | 5000 req/s | 30s | Severe overload |
+| Recovery | 100 req/s | 2 min | Measure recovery time |
+
+#### Running Load Tests
+
+```bash
+# Install k6
+brew install k6  # macOS
+# or
+choco install k6  # Windows
+
+# Run spike test against local router
+k6 run tests/load/router/spike-test.js \
+  -e ROUTER_URL=http://localhost:8080
+
+# Run against staging
+k6 run tests/load/router/spike-test.js \
+  -e ROUTER_URL=https://router.staging.stellaops.io
+
+# Output results to JSON
+k6 run tests/load/router/spike-test.js \
+  --out json=results.json
+```
+
+### 2. Backpressure Verification
+
+**Location:** `tests/chaos/BackpressureVerificationTests.cs`
+
+Tests verify:
+- HTTP 429 responses include `Retry-After` header
+- HTTP 503 responses include `Retry-After` header
+- Retry-After values are reasonable (1-60 seconds)
+- No data loss during throttling
+
+#### Expected Behavior
+
+| Load Level | Expected Response | Retry-After |
+|------------|-------------------|-------------|
+| Normal | 200 OK | N/A |
+| High (>80% capacity) | 429 Too Many Requests | 1-10s |
+| Critical (>95% capacity) | 503 Service Unavailable | 10-60s |
+
+### 3. Recovery Testing
+
+**Location:** `tests/chaos/RecoveryTests.cs`
+
+Tests verify:
+- Router recovers within 30 seconds after load drops
+- No request queue corruption
+- Metrics return to baseline
+
+#### Recovery Thresholds
+
+| Metric | Target | Critical |
+|--------|--------|----------|
+| P95 Recovery Time | <15s | <30s |
+| P99 Recovery Time | <25s | <45s |
+| Data Loss | 0% | 0% |
+
+### 4. Valkey Failure Injection
+
+**Location:** `tests/chaos/ValkeyFailureTests.cs`
+
+Tests verify router behavior when Valkey (cache/session store) fails:
+- Graceful degradation to stateless mode
+- No crashes or hangs
+- Proper error logging
+- Recovery when Valkey returns
+
+#### Failure Scenarios
+
+| Scenario | Expected Behavior |
+|----------|-------------------|
+| Valkey unreachable | Fallback to direct processing |
+| Valkey slow (>500ms) | Timeout and continue |
+| Valkey returns | Resume normal caching |
+
+## CI Integration
+
+**Workflow:** `.gitea/workflows/router-chaos.yml`
+
+The chaos tests run:
+- On every PR to `main` that touches router code
+- Nightly against staging environment
+- Before production deployments
+
+### Workflow Stages
+
+1. **Build** - Compile router and test projects
+2. **Unit Tests** - Run BackpressureVerificationTests
+3. **Integration Tests** - Run RecoveryTests, ValkeyFailureTests
+4. **Load Tests** - Run k6 spike scenarios (staging only)
+5. **Report** - Upload results as artifacts
+
+## Interpreting Results
+
+### Success Criteria
+
+| Metric | Pass | Fail |
+|--------|------|------|
+| Request success rate during normal load | >=99% | <95% |
+| Throttle response rate during spike | >0% (expected) | 0% (no backpressure) |
+| Recovery time P95 | <30s | >=45s |
+| Data loss | 0% | >0% |
+
+### Common Failure Patterns
+
+#### No Throttling Under Load
+**Symptom:** 0% throttled requests during 50x spike
+**Cause:** Backpressure not configured or circuit breaker disabled
+**Fix:** Check router configuration `backpressure.enabled=true`
+
+#### Slow Recovery
+**Symptom:** Recovery time >45s
+**Cause:** Request queue not draining properly
+**Fix:** Check `maxQueueSize` and `drainTimeoutSeconds` settings
+
+#### Missing Retry-After Header
+**Symptom:** 429/503 without Retry-After
+**Cause:** Header middleware not applied
+**Fix:** Ensure `UseRetryAfterMiddleware()` is in pipeline
+
+## Metrics & Dashboards
+
+### Key Metrics to Monitor
+
+```promql
+# Throttle rate
+rate(http_requests_total{status="429"}[5m]) / rate(http_requests_total[5m])
+
+# Recovery time
+histogram_quantile(0.95, rate(request_recovery_seconds_bucket[5m]))
+
+# Queue depth
+router_request_queue_depth
+```
+
+### Alert Thresholds
+
+| Alert | Condition | Severity |
+|-------|-----------|----------|
+| High Throttle Rate | throttle_rate > 10% for 5m | Warning |
+| Extended Throttle | throttle_rate > 50% for 2m | Critical |
+| Slow Recovery | p95_recovery > 30s | Warning |
+| No Recovery | p99_recovery > 60s | Critical |
+
+## Troubleshooting
+
+### Test Environment Setup
+
+```bash
+# Start router locally
+docker-compose up router valkey
+
+# Verify router health
+curl http://localhost:8080/health
+
+# Verify Valkey connection
+docker exec -it valkey redis-cli ping
+```
+
+### Debug Mode
+
+```bash
+# Run tests with verbose logging
+dotnet test tests/chaos/ --logger "console;verbosity=detailed"
+
+# k6 with debug output
+k6 run tests/load/router/spike-test.js --verbose
+```
+
+## References
+
+- [Router Architecture](../modules/router/architecture.md)
+- [Backpressure Design](../product-advisories/15-Dec-2025%20-%20Designing%20202%20+%20Retry-After%20Backpressure%20Control.md)
+- [Testing Strategy](../product-advisories/20-Dec-2025%20-%20Testing%20strategy.md)
--- a/docs/operations/trust-lattice-runbook.md
+++ b/docs/operations/trust-lattice-runbook.md
@@ -0,0 +1,253 @@
+# Trust Lattice Operations Runbook
+
+> **Version**: 1.0.0
+> **Last Updated**: 2025-12-22
+> **Audience**: Operations and Support teams
+
+---
+
+## 1. Overview
+
+The Trust Lattice is a VEX claim scoring framework that produces explainable, deterministic verdicts. This runbook covers operational procedures for monitoring, troubleshooting, and maintaining the system.
+
+---
+
+## 2. System Components
+
+| Component | Service | Purpose |
+|-----------|---------|---------|
+| TrustVector | Excititor | 3-component trust scoring (P/C/R) |
+| ClaimScoreMerger | Policy | Merge scored claims into verdicts |
+| PolicyGates | Policy | Enforce trust thresholds |
+| VerdictManifest | Authority | Store signed verdicts |
+| Calibration | Excititor | Adjust trust vectors over time |
+
+---
+
+## 3. Monitoring
+
+### 3.1 Key Metrics
+
+| Metric | Alert Threshold | Description |
+|--------|-----------------|-------------|
+| `trustlattice_score_latency_p95` | > 100ms | Claim scoring latency |
+| `trustlattice_merge_conflicts_total` | Rate increase | Claims with status conflicts |
+| `policy_gate_failures_total` | Rate increase | Gate rejections |
+| `verdict_manifest_replay_failures` | > 0 | Non-deterministic verdicts |
+| `calibration_drift_percent` | > 10% | Trust vector drift from baseline |
+
+### 3.2 Dashboards
+
+Access dashboards at:
+- Grafana: `https://<grafana>/d/trustlattice`
+- Prometheus queries:
+  ```promql
+  # Average claim score by source class
+  avg(trustlattice_claim_score) by (source_class)
+
+  # Gate failure rate
+  rate(policy_gate_failures_total[5m])
+
+  # Confidence distribution
+  histogram_quantile(0.5, trustlattice_verdict_confidence_bucket)
+  ```
+
+### 3.3 Log Queries
+
+Key log entries (Loki/ELK):
+```
+# Claim scoring
+{app="excititor"} |= "ClaimScore computed"
+
+# Gate failures
+{app="policy"} |= "Gate failed" | json | gate_name != ""
+
+# Verdict replay failures
+{app="authority"} |= "Replay mismatch"
+```
+
+---
+
+## 4. Common Operations
+
+### 4.1 Viewing Current Trust Vectors
+
+```bash
+# Via CLI
+stella trustvector list --source-class vendor
+
+# Via API
+curl -H "Authorization: Bearer $TOKEN" \
+  https://api.example.com/api/v1/trustlattice/vectors
+```
+
+### 4.2 Inspecting a Verdict
+
+```bash
+# Get verdict details
+stella verdict show verd:acme:abc123:CVE-2025-12345:1734873600
+
+# Verify verdict replay
+stella verdict replay verd:acme:abc123:CVE-2025-12345:1734873600
+```
+
+### 4.3 Viewing Gate Configuration
+
+```bash
+# List enabled gates
+stella gates list --environment production
+
+# Show gate thresholds
+stella gates show minimumConfidence --environment production
+```
+
+### 4.4 Triggering Manual Calibration
+
+```bash
+# Trigger calibration epoch for a source
+stella calibration run --source vendor:redhat \
+  --start 2025-11-01 --end 2025-12-01
+
+# View calibration history
+stella calibration history vendor:redhat
+```
+
+---
+
+## 5. Emergency Procedures
+
+### 5.1 High Gate Failure Rate
+
+**Symptoms:**
+- Spike in `policy_gate_failures_total`
+- Many builds failing due to low confidence
+
+**Steps:**
+1. Check if VEX source is unavailable:
+   ```bash
+   stella vex source status vendor:redhat
+   ```
+
+2. If source is stale, consider temporary threshold reduction:
+   ```bash
+   # Edit etc/policy-gates.yaml
+   gates:
+     minimumConfidence:
+       thresholds:
+         production: 0.60  # Reduced from 0.75
+   ```
+
+3. Restart Policy Engine to apply changes
+
+4. Monitor and restore threshold once source recovers
+
+### 5.2 Verdict Replay Failures
+
+**Symptoms:**
+- `verdict_manifest_replay_failures` > 0
+- Audit compliance check failures
+
+**Steps:**
+1. Identify failing verdict:
+   ```bash
+   stella verdict list --replay-status failed --limit 10
+   ```
+
+2. Compare original and replayed inputs:
+   ```bash
+   stella verdict diff <manifestId>
+   ```
+
+3. Common causes:
+   - VEX document modified after verdict
+   - Clock drift during evaluation
+   - Policy configuration changed
+
+4. For clock drift, verify NTP synchronization:
+   ```bash
+   timedatectl status
+   ```
+
+### 5.3 Trust Vector Drift Emergency
+
+**Symptoms:**
+- `calibration_drift_percent` > 20%
+- Sudden confidence changes across many assets
+
+**Steps:**
+1. Freeze calibration:
+   ```bash
+   stella calibration freeze vendor:redhat
+   ```
+
+2. Investigate recent calibration epochs:
+   ```bash
+   stella calibration history vendor:redhat --epochs 5
+   ```
+
+3. If false positive rate increased, rollback:
+   ```bash
+   stella calibration rollback vendor:redhat --to-epoch 41
+   ```
+
+4. Unfreeze after investigation:
+   ```bash
+   stella calibration unfreeze vendor:redhat
+   ```
+
+---
+
+## 6. Configuration
+
+### 6.1 Configuration Files
+
+| File | Purpose |
+|------|---------|
+| `etc/trust-lattice.yaml` | Trust vector weights and defaults |
+| `etc/policy-gates.yaml` | Gate thresholds and rules |
+| `etc/excititor-calibration.yaml` | Calibration parameters |
+
+### 6.2 Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `TRUSTLATTICE_WEIGHTS_PROVENANCE` | 0.45 | Provenance weight |
+| `TRUSTLATTICE_WEIGHTS_COVERAGE` | 0.35 | Coverage weight |
+| `TRUSTLATTICE_FRESHNESS_HALFLIFE` | 90 | Freshness half-life (days) |
+| `GATES_MINIMUM_CONFIDENCE_PROD` | 0.75 | Production confidence threshold |
+| `CALIBRATION_LEARNING_RATE` | 0.02 | Calibration learning rate |
+
+---
+
+## 7. Maintenance Tasks
+
+### 7.1 Daily
+
+- [ ] Review gate failure alerts
+- [ ] Check verdict replay success rate
+- [ ] Monitor trust vector stability
+
+### 7.2 Weekly
+
+- [ ] Review calibration epoch results
+- [ ] Analyze conflict rate trends
+- [ ] Update trust vectors for new sources
+
+### 7.3 Monthly
+
+- [ ] Audit high-drift sources
+- [ ] Review and tune gate thresholds
+- [ ] Clean up expired verdict manifests
+
+---
+
+## 8. Contact
+
+- **On-call**: #trustlattice-oncall (Slack)
+- **Escalation**: VEX Guild Lead
+- **Documentation**: `docs/modules/excititor/trust-lattice.md`
+
+---
+
+*Document Version: 1.0.0*
+*Sprint: 7100.0003.0002*
--- a/docs/operations/trust-lattice-troubleshooting.md
+++ b/docs/operations/trust-lattice-troubleshooting.md
@@ -0,0 +1,405 @@
+# Trust Lattice Troubleshooting Guide
+
+> **Version**: 1.0.0
+> **Last Updated**: 2025-12-22
+> **Audience**: Support and Development teams
+
+---
+
+## Quick Reference
+
+| Symptom | Likely Cause | Section |
+|---------|--------------|---------|
+| Low confidence scores | Stale VEX data or missing sources | [2.1](#21-low-confidence-scores) |
+| Gate failures blocking builds | Threshold too high or source issues | [2.2](#22-gate-failures) |
+| Verdict replay mismatches | Non-deterministic inputs | [2.3](#23-verdict-replay-failures) |
+| Unexpected trust changes | Calibration drift | [2.4](#24-calibration-issues) |
+| Conflicting verdicts | Multi-source disagreement | [2.5](#25-claim-conflicts) |
+
+---
+
+## 1. Diagnostic Commands
+
+### 1.1 Check System Health
+
+```bash
+# Excititor health
+curl https://api.example.com/excititor/health
+
+# Policy Engine health
+curl https://api.example.com/policy/health
+
+# Authority health
+curl https://api.example.com/authority/health
+```
+
+### 1.2 Trace a Verdict
+
+```bash
+# Get detailed verdict explanation
+stella verdict explain <manifestId>
+
+# Output includes:
+# - All claims considered
+# - Trust vector scores
+# - Strength/freshness multipliers
+# - Gate evaluation results
+# - Conflict detection
+```
+
+### 1.3 Check VEX Source Status
+
+```bash
+# List all sources with status
+stella vex source list
+
+# Check specific source
+stella vex source status vendor:redhat
+
+# Sample output:
+# Source: vendor:redhat
+# Status: healthy
+# Last fetch: 2025-12-22T10:00:00Z
+# Documents: 15234
+# Freshness: 2.3 hours
+```
+
+---
+
+## 2. Common Issues
+
+### 2.1 Low Confidence Scores
+
+**Symptoms:**
+- Verdicts have confidence < 0.5
+- Many "under_investigation" statuses
+
+**Diagnosis:**
+
+1. Check claim freshness:
+   ```bash
+   stella claim analyze --cve CVE-2025-12345 --asset sha256:abc123
+
+   # Look for:
+   # - Freshness multiplier < 0.5 (claim older than 180 days)
+   # - No high-trust sources
+   ```
+
+2. Check trust vector values:
+   ```bash
+   stella trustvector show vendor:redhat
+
+   # Low scores indicate:
+   # - Signature verification issues (P)
+   # - Poor scope matching (C)
+   # - Non-deterministic outputs (R)
+   ```
+
+3. Check for missing VEX coverage:
+   ```bash
+   stella vex coverage --purl pkg:npm/lodash@4.17.21
+
+   # No claims? Source may not cover this package
+   ```
+
+**Resolution:**
+
+- If freshness is low: Check if source is publishing updates
+- If trust vector is low: Review source verification settings
+- If coverage is missing: Add additional VEX sources
+
+### 2.2 Gate Failures
+
+**Symptoms:**
+- Builds failing with "Gate: MinimumConfidenceGate FAILED"
+- Policy violations despite VEX claims
+
+**Diagnosis:**
+
+1. Check gate thresholds:
+   ```bash
+   stella gates show minimumConfidence
+
+   # Thresholds:
+   #   production: 0.75
+   #   staging: 0.60
+   #   development: 0.40
+   ```
+
+2. Compare with verdict confidence:
+   ```bash
+   stella verdict show <manifestId> | grep confidence
+
+   # confidence: 0.68  <- Below 0.75 production threshold
+   ```
+
+3. Check which gate failed:
+   ```bash
+   stella verdict gates <manifestId>
+
+   # Gates:
+   #   MinimumConfidenceGate: FAILED (0.68 < 0.75)
+   #   SourceQuotaGate: PASSED
+   #   UnknownsBudgetGate: PASSED
+   ```
+
+**Resolution:**
+
+- Temporary: Lower threshold (with approval)
+- Long-term: Add corroborating VEX sources
+- If single-source: Check SourceQuotaGate corroboration
+
+### 2.3 Verdict Replay Failures
+
+**Symptoms:**
+- Replay verification returns success: false
+- Audit failures due to non-determinism
+
+**Diagnosis:**
+
+1. Get detailed diff:
+   ```bash
+   stella verdict replay --diff <manifestId>
+
+   # Differences:
+   #   result.confidence: 0.82 -> 0.79
+   #   inputs.vexDocumentDigests[2]: sha256:abc... (missing)
+   ```
+
+2. Common causes:
+
+   | Difference | Likely Cause |
+   |------------|--------------|
+   | VEX digest mismatch | Document was modified after verdict |
+   | Confidence delta | Clock cutoff drift (freshness calc) |
+   | Missing claims | Source was unavailable during replay |
+   | Different status | Policy version changed |
+
+3. Check input availability:
+   ```bash
+   # Verify all pinned inputs exist
+   stella cas verify --digest sha256:abc123
+   ```
+
+**Resolution:**
+
+- Clock drift: Ensure NTP synchronization across nodes
+- Missing inputs: Restore from backup or acknowledge drift
+- Policy change: Compare policy hashes between original and replay
+
+### 2.4 Calibration Issues
+
+**Symptoms:**
+- Trust vectors changed unexpectedly
+- Accuracy metrics declining
+
+**Diagnosis:**
+
+1. Review recent calibrations:
+   ```bash
+   stella calibration history vendor:redhat --epochs 5
+
+   # Epoch 42: accuracy=0.92, delta=(-0.02, +0.02, 0)
+   # Epoch 41: accuracy=0.94, delta=(-0.01, +0.01, 0)
+   ```
+
+2. Check comparison results:
+   ```bash
+   stella calibration epoch 42 --details
+
+   # Total claims: 1500
+   # Correct: 1380
+   # False positives: 45
+   # False negatives: 75
+   # Detected bias: OptimisticBias
+   ```
+
+3. Check for data quality issues:
+   ```bash
+   # Look for corrupted truth data
+   stella calibration validate-truth --epoch 42
+   ```
+
+**Resolution:**
+
+- High false positive: Reduce provenance score
+- High false negative: Review coverage matching
+- Data quality issue: Re-run with corrected truth set
+- Emergency: Rollback to previous epoch
+
+### 2.5 Claim Conflicts
+
+**Symptoms:**
+- Verdicts show hasConflicts: true
+- Confidence reduced due to conflict penalty
+
+**Diagnosis:**
+
+1. View conflict details:
+   ```bash
+   stella verdict conflicts <manifestId>
+
+   # Conflicts:
+   #   vendor:redhat claims: not_affected
+   #   hub:osv claims: affected
+   #   Conflict penalty applied: 0.25
+   ```
+
+2. Investigate source disagreement:
+   ```bash
+   # Get raw claims from each source
+   stella vex claim --source vendor:redhat --cve CVE-2025-12345
+   stella vex claim --source hub:osv --cve CVE-2025-12345
+   ```
+
+3. Check claim timestamps:
+   ```bash
+   # Older claim may be outdated
+   stella claim compare vendor:redhat hub:osv --cve CVE-2025-12345
+   ```
+
+**Resolution:**
+
+- If one source is stale: Flag for review
+- If genuine disagreement: Higher-trust source wins (by design)
+- If persistent: Consider source override in policy
+
+---
+
+## 3. Performance Issues
+
+### 3.1 Slow Claim Scoring
+
+**Symptoms:**
+- Scoring latency > 100ms
+- Timeouts during high load
+
+**Diagnosis:**
+
+```bash
+# Check scoring performance
+stella perf scoring --samples 100
+
+# Look for:
+# - Cache miss rate
+# - Trust vector lookups
+# - Freshness calculation overhead
+```
+
+**Resolution:**
+
+- Enable trust vector caching
+- Pre-compute freshness for common cutoffs
+- Scale Excititor horizontally
+
+### 3.2 Slow Verdict Replay
+
+**Symptoms:**
+- Replay verification > 5 seconds
+- Timeout during audit
+
+**Diagnosis:**
+
+```bash
+# Check input retrieval time
+stella verdict replay --timing <manifestId>
+
+# Timing:
+#   Input fetch: 3.2s
+#   Score compute: 0.1s
+#   Merge: 0.05s
+#   Total: 3.35s
+```
+
+**Resolution:**
+
+- Ensure CAS storage is local or cached
+- Pre-warm verdict cache for critical assets
+- Increase timeout for large manifests
+
+---
+
+## 4. Integration Issues
+
+### 4.1 VEX Source Not Recognized
+
+**Symptoms:**
+- Claims from source not included in verdicts
+- Source shows as "unknown" class
+
+**Resolution:**
+
+1. Register source in configuration:
+   ```yaml
+   # etc/trust-lattice.yaml
+   sources:
+     - id: vendor:newvendor
+       class: vendor
+       trustVector:
+         provenance: 0.85
+         coverage: 0.70
+         replayability: 0.60
+   ```
+
+2. Reload configuration:
+   ```bash
+   stella config reload --service excititor
+   ```
+
+### 4.2 Gate Not Evaluating
+
+**Symptoms:**
+- Expected gate not appearing in results
+- Gate shows as "disabled"
+
+**Resolution:**
+
+1. Check gate configuration:
+   ```bash
+   stella gates list --show-disabled
+   ```
+
+2. Enable gate:
+   ```yaml
+   # etc/policy-gates.yaml
+   gates:
+     minimumConfidence:
+       enabled: true  # Ensure this is true
+   ```
+
+---
+
+## 5. Support Information
+
+### 5.1 Collecting Diagnostic Bundle
+
+```bash
+stella support bundle --include trust-lattice \
+  --since 1h --output /tmp/diag.zip
+```
+
+Bundle includes:
+- Trust vector snapshots
+- Recent verdicts
+- Gate evaluations
+- Calibration history
+- System metrics
+
+### 5.2 Log Locations
+
+| Service | Log Path |
+|---------|----------|
+| Excititor | `/var/log/stellaops/excititor.log` |
+| Policy | `/var/log/stellaops/policy.log` |
+| Authority | `/var/log/stellaops/authority.log` |
+
+### 5.3 Contact
+
+- **Support**: support@stella-ops.org
+- **Documentation**: `docs/modules/excititor/trust-lattice.md`
+- **GitHub Issues**: https://github.com/stella-ops/stella-ops/issues
+
+---
+
+*Document Version: 1.0.0*
+*Sprint: 7100.0003.0002*