docs consolidation

2025-12-25 12:16:13 +02:00
parent deb82b4f03
commit 223843f1d1
34 changed files with 2141 additions and 106 deletions
--- a/docs/modules/policy/design/confidence-to-ews-migration.md
+++ b/docs/modules/policy/design/confidence-to-ews-migration.md
@@ -0,0 +1,422 @@
+# Confidence to Evidence-Weighted Score Migration Guide
+
+> **Version:** 1.0  
+> **Status:** Active  
+> **Last Updated:** 2025-12-31  
+> **Sprint:** 8200.0012.0003 (Policy Engine Integration)
+
+## Overview
+
+This document describes the migration path from the legacy **Confidence** scoring system to the new **Evidence-Weighted Score (EWS)** system. The migration is designed to be gradual, low-risk, and fully reversible at each stage.
+
+### Key Differences
+
+| Aspect | Confidence (Legacy) | Evidence-Weighted Score |
+|--------|---------------------|------------------------|
+| **Score Range** | 0.0–1.0 (decimal) | 0–100 (integer) |
+| **Direction** | Higher = more confident | Higher = higher risk/priority |
+| **Basis** | Heuristic confidence in finding | Evidence-backed exploitability |
+| **Breakdown** | Single value | 6 dimensions (Rch, Rts, Bkp, Xpl, Src, Mit) |
+| **Determinism** | Limited | Fully deterministic with proofs |
+| **Attestation** | Not attested | Included in verdict attestation |
+
+### Semantic Inversion
+
+The most important difference is **semantic inversion**:
+
+- **Confidence**: Higher values indicate higher confidence that a finding is accurate
+- **EWS**: Higher values indicate higher exploitability evidence (more urgent to fix)
+
+A high-confidence finding may have a low EWS if evidence shows it's mitigated. Conversely, a low-confidence finding may have a high EWS if runtime signals indicate active exploitation.
+
+---
+
+## Migration Phases
+
+### Phase 1: Feature Flag (Opt-In)
+
+**Duration:** Immediate → 2 weeks  
+**Risk:** None (off by default)
+
+Enable EWS calculation without changing behavior:
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "Enabled": false,
+      "DualEmitMode": false,
+      "UseAsPrimaryScore": false
+    }
+  }
+}
+```
+
+**What happens:**
+- EWS infrastructure is loaded but dormant
+- No performance impact
+- No output changes
+
+**When to proceed:** After infrastructure validation
+
+---
+
+### Phase 2: Dual-Emit Mode (Parallel Calculation)
+
+**Duration:** 2–4 weeks  
+**Risk:** Low (additive only)
+
+Enable both scoring systems in parallel:
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "Enabled": true,
+      "DualEmitMode": true,
+      "UseAsPrimaryScore": false
+    }
+  }
+}
+```
+
+**What happens:**
+- Both Confidence AND EWS are calculated
+- Both appear in verdicts and attestations
+- Telemetry compares rankings
+- Existing rules use Confidence (unchanged behavior)
+
+**Verdict output example:**
+```json
+{
+  "findingId": "CVE-2024-1234:pkg:npm/lodash@4.17.0",
+  "status": "block",
+  "confidence": {
+    "value": 0.85,
+    "tier": "High"
+  },
+  "evidenceWeightedScore": {
+    "score": 72,
+    "bucket": "ScheduleNext",
+    "breakdown": {
+      "rch": { "weighted": 18, "weight": 0.25, "raw": 0.72 },
+      "rts": { "weighted": 24, "weight": 0.30, "raw": 0.80 },
+      "bkp": { "weighted": 0, "weight": 0.10, "raw": 0.00 },
+      "xpl": { "weighted": 10, "weight": 0.15, "raw": 0.67 },
+      "src": { "weighted": 12, "weight": 0.15, "raw": 0.80 },
+      "mit": { "weighted": 8, "weight": 0.05, "raw": 1.60 }
+    },
+    "flags": ["live-signal"],
+    "explanations": ["Runtime signal detected (score +8)", "Reachable via call graph"]
+  }
+}
+```
+
+**Monitoring during this phase:**
+- Use `IMigrationTelemetryService` to track alignment
+- Review `MigrationTelemetryStats.AlignmentRate`
+- Investigate samples where rankings diverge significantly
+
+**When to proceed:** When alignment rate > 80% or divergences are understood
+
+---
+
+### Phase 3: EWS as Primary (Shadow Confidence)
+
+**Duration:** 2–4 weeks  
+**Risk:** Medium (behavior change possible)
+
+Switch primary scoring while keeping Confidence for validation:
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "Enabled": true,
+      "DualEmitMode": true,
+      "UseAsPrimaryScore": true
+    }
+  }
+}
+```
+
+**What happens:**
+- EWS is used for policy rule evaluation
+- Confidence is still calculated and emitted (deprecated field)
+- Policy rules should be migrated to use `score` instead of `confidence`
+
+**Rule migration example:**
+
+Before (Confidence-based):
+```yaml
+rules:
+  - name: block-high-confidence
+    when: confidence >= 0.9
+    then: block
+```
+
+After (EWS-based):
+```yaml
+rules:
+  - name: block-high-evidence
+    when: score >= 85
+    then: block
+
+  # Or use bucket-based for clearer semantics:
+  - name: block-act-now
+    when: score.bucket == "ActNow"
+    then: block
+```
+
+**Recommended rule patterns:**
+
+| Confidence Rule | EWS Equivalent | Notes |
+|----------------|----------------|-------|
+| `confidence >= 0.9` | `score >= 85` or `score.bucket == "ActNow"` | Very high certainty |
+| `confidence >= 0.7` | `score >= 60` or `score.bucket in ["ActNow", "ScheduleNext"]` | High certainty |
+| `confidence >= 0.5` | `score >= 40` | Medium certainty |
+| `confidence < 0.3` | `score < 25` | Low evidence |
+
+**When to proceed:** After rule migration and 2+ weeks of stable operation
+
+---
+
+### Phase 4: EWS-Only (Deprecation Complete)
+
+**Duration:** Permanent  
+**Risk:** Low (rollback path exists)
+
+Disable legacy Confidence scoring:
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "Enabled": true,
+      "DualEmitMode": false,
+      "UseAsPrimaryScore": true
+    }
+  }
+}
+```
+
+**What happens:**
+- Only EWS is calculated
+- Confidence field is null in verdicts
+- Performance improvement (single calculation)
+- Consumers must use EWS fields
+
+**Breaking changes to document:**
+- `Verdict.Confidence` returns null
+- `ConfidenceScore` type is deprecated (will be removed in v3.0)
+- Rules referencing `confidence` will fail validation
+
+---
+
+## Configuration Reference
+
+### Full Configuration Schema
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "Enabled": true,
+      "DualEmitMode": true,
+      "UseAsPrimaryScore": false,
+      "EnableCaching": true,
+      "CacheDurationSeconds": 300,
+      "Weights": {
+        "Reachability": 0.25,
+        "RuntimeSignal": 0.30,
+        "BackportStatus": 0.10,
+        "ExploitMaturity": 0.15,
+        "SourceTrust": 0.15,
+        "MitigationStatus": 0.05
+      },
+      "BucketThresholds": {
+        "ActNow": 85,
+        "ScheduleNext": 60,
+        "Investigate": 40
+      },
+      "Telemetry": {
+        "EnableMigrationMetrics": true,
+        "SampleRate": 0.1,
+        "MaxSamples": 1000
+      }
+    }
+  }
+}
+```
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `POLICY_EWS_ENABLED` | `false` | Enable EWS calculation |
+| `POLICY_EWS_DUAL_EMIT` | `false` | Emit both scores |
+| `POLICY_EWS_PRIMARY` | `false` | Use EWS as primary score |
+| `POLICY_EWS_CACHE_ENABLED` | `true` | Enable score caching |
+
+---
+
+## Telemetry & Monitoring
+
+### Metrics
+
+The migration telemetry service exposes these metrics:
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `stellaops.policy.migration.comparisons_total` | Counter | Total comparisons made |
+| `stellaops.policy.migration.aligned_total` | Counter | Comparisons where rankings aligned |
+| `stellaops.policy.migration.score_difference` | Histogram | Distribution of score differences |
+| `stellaops.policy.migration.tier_bucket_match_total` | Counter | Tier/bucket matches |
+| `stellaops.policy.dual_emit.verdicts_total` | Counter | Dual-emit verdicts produced |
+
+### Dashboard Queries
+
+**Alignment rate over time:**
+```promql
+rate(stellaops_policy_migration_aligned_total[5m]) 
+/ rate(stellaops_policy_migration_comparisons_total[5m])
+```
+
+**Score difference distribution:**
+```promql
+histogram_quantile(0.95, stellaops_policy_migration_score_difference_bucket)
+```
+
+### Sample Analysis
+
+Use `IMigrationTelemetryService.GetRecentSamples()` to retrieve divergent samples:
+
+```csharp
+var telemetry = serviceProvider.GetRequiredService<IMigrationTelemetryService>();
+var stats = telemetry.GetStats();
+
+if (stats.AlignmentRate < 0.8m)
+{
+    var samples = telemetry.GetRecentSamples(50)
+        .Where(s => !s.IsAligned)
+        .OrderByDescending(s => Math.Abs(s.ScoreDifference));
+    
+    foreach (var sample in samples)
+    {
+        Console.WriteLine($"{sample.FindingId}: Conf={sample.ConfidenceValue:F2} → EWS={sample.EwsScore} (Δ={sample.ScoreDifference})");
+    }
+}
+```
+
+---
+
+## Rollback Procedures
+
+### Phase 4 → Phase 3 (Re-enable Dual-Emit)
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "DualEmitMode": true
+    }
+  }
+}
+```
+
+Restart services. Confidence will be calculated again.
+
+### Phase 3 → Phase 2 (Revert to Confidence Primary)
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "UseAsPrimaryScore": false
+    }
+  }
+}
+```
+
+Rules using `confidence` will work again. Rules using `score` will still work.
+
+### Phase 2 → Phase 1 (Disable EWS)
+
+```json
+{
+  "Policy": {
+    "EvidenceWeightedScore": {
+      "Enabled": false
+    }
+  }
+}
+```
+
+No EWS calculation, no performance impact.
+
+### Emergency Rollback
+
+Set environment variable for immediate effect without restart (if hot-reload enabled):
+
+```bash
+export POLICY_EWS_ENABLED=false
+```
+
+---
+
+## Rule Migration Checklist
+
+- [ ] Inventory all policies using `confidence` field
+- [ ] Map confidence thresholds to EWS thresholds (see table above)
+- [ ] Update rules to use `score` syntax
+- [ ] Consider using bucket-based rules for clearer semantics
+- [ ] Test rules in dual-emit mode before switching primary
+- [ ] Update documentation and runbooks
+- [ ] Train operators on new score interpretation
+- [ ] Update alerting thresholds
+
+---
+
+## FAQ
+
+### Q: Will existing rules break?
+
+**A:** Not during dual-emit mode. Rules using `confidence` continue to work. Once `UseAsPrimaryScore: true`, new rules should use `score`. Old `confidence` rules will emit deprecation warnings and fail validation in Phase 4.
+
+### Q: How do I interpret the score difference?
+
+**A:** The ConfidenceToEwsAdapter maps Confidence (0-1) to an approximate EWS (0-100) with semantic inversion. A "difference" of ±15 points is normal due to the different underlying models. Investigate differences > 30 points.
+
+### Q: What if my rankings diverge significantly?
+
+**A:** This is expected for findings where:
+- Runtime signals (Rts) differ from static analysis
+- Vendor VEX overrides traditional severity
+- Reachability analysis shows unreachable code
+
+Review these cases manually. EWS is likely more accurate due to evidence integration.
+
+### Q: Can I customize the EWS weights?
+
+**A:** Yes, via `Weights` configuration. However, changing weights affects determinism proofs. Document any changes and bump the policy version.
+
+### Q: What about attestations?
+
+**A:** During dual-emit, attestations include both scores. After Phase 4, only EWS is attested. Old attestations remain verifiable with their original scores.
+
+---
+
+## Related Documents
+
+- [Evidence-Weighted Score Architecture](../../signals/architecture.md)
+- [Policy DSL Reference](../contracts/policy-dsl.md)
+- [Verdict Attestation](../verdict-attestation.md)
+- [Sprint 8200.0012.0003](../../../../implplan/SPRINT_8200_0012_0003_policy_engine_integration.md)
+
+---
+
+## Revision History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 1.0 | 2025-12-31 | Implementer | Initial migration guide |