# Confidence to Evidence-Weighted Score Migration Guide > **Version:** 1.0 > **Status:** Active > **Last Updated:** 2025-12-31 > **Sprint:** 8200.0012.0003 (Policy Engine Integration) ## Overview This document describes the migration path from the legacy **Confidence** scoring system to the new **Evidence-Weighted Score (EWS)** system. The migration is designed to be gradual, low-risk, and fully reversible at each stage. ### Key Differences | Aspect | Confidence (Legacy) | Evidence-Weighted Score | |--------|---------------------|------------------------| | **Score Range** | 0.0–1.0 (decimal) | 0–100 (integer) | | **Direction** | Higher = more confident | Higher = higher risk/priority | | **Basis** | Heuristic confidence in finding | Evidence-backed exploitability | | **Breakdown** | Single value | 6 dimensions (Rch, Rts, Bkp, Xpl, Src, Mit) | | **Determinism** | Limited | Fully deterministic with proofs | | **Attestation** | Not attested | Included in verdict attestation | ### Semantic Inversion The most important difference is **semantic inversion**: - **Confidence**: Higher values indicate higher confidence that a finding is accurate - **EWS**: Higher values indicate higher exploitability evidence (more urgent to fix) A high-confidence finding may have a low EWS if evidence shows it's mitigated. Conversely, a low-confidence finding may have a high EWS if runtime signals indicate active exploitation. --- ## Migration Phases ### Phase 1: Feature Flag (Opt-In) **Duration:** Immediate → 2 weeks **Risk:** None (off by default) Enable EWS calculation without changing behavior: ```json { "Policy": { "EvidenceWeightedScore": { "Enabled": false, "DualEmitMode": false, "UseAsPrimaryScore": false } } } ``` **What happens:** - EWS infrastructure is loaded but dormant - No performance impact - No output changes **When to proceed:** After infrastructure validation --- ### Phase 2: Dual-Emit Mode (Parallel Calculation) **Duration:** 2–4 weeks **Risk:** Low (additive only) Enable both scoring systems in parallel: ```json { "Policy": { "EvidenceWeightedScore": { "Enabled": true, "DualEmitMode": true, "UseAsPrimaryScore": false } } } ``` **What happens:** - Both Confidence AND EWS are calculated - Both appear in verdicts and attestations - Telemetry compares rankings - Existing rules use Confidence (unchanged behavior) **Verdict output example:** ```json { "findingId": "CVE-2024-1234:pkg:npm/lodash@4.17.0", "status": "block", "confidence": { "value": 0.85, "tier": "High" }, "evidenceWeightedScore": { "score": 72, "bucket": "ScheduleNext", "breakdown": { "rch": { "weighted": 18, "weight": 0.25, "raw": 0.72 }, "rts": { "weighted": 24, "weight": 0.30, "raw": 0.80 }, "bkp": { "weighted": 0, "weight": 0.10, "raw": 0.00 }, "xpl": { "weighted": 10, "weight": 0.15, "raw": 0.67 }, "src": { "weighted": 12, "weight": 0.15, "raw": 0.80 }, "mit": { "weighted": 8, "weight": 0.05, "raw": 1.60 } }, "flags": ["live-signal"], "explanations": ["Runtime signal detected (score +8)", "Reachable via call graph"] } } ``` **Monitoring during this phase:** - Use `IMigrationTelemetryService` to track alignment - Review `MigrationTelemetryStats.AlignmentRate` - Investigate samples where rankings diverge significantly **When to proceed:** When alignment rate > 80% or divergences are understood --- ### Phase 3: EWS as Primary (Shadow Confidence) **Duration:** 2–4 weeks **Risk:** Medium (behavior change possible) Switch primary scoring while keeping Confidence for validation: ```json { "Policy": { "EvidenceWeightedScore": { "Enabled": true, "DualEmitMode": true, "UseAsPrimaryScore": true } } } ``` **What happens:** - EWS is used for policy rule evaluation - Confidence is still calculated and emitted (deprecated field) - Policy rules should be migrated to use `score` instead of `confidence` **Rule migration example:** Before (Confidence-based): ```yaml rules: - name: block-high-confidence when: confidence >= 0.9 then: block ``` After (EWS-based): ```yaml rules: - name: block-high-evidence when: score >= 85 then: block # Or use bucket-based for clearer semantics: - name: block-act-now when: score.bucket == "ActNow" then: block ``` **Recommended rule patterns:** | Confidence Rule | EWS Equivalent | Notes | |----------------|----------------|-------| | `confidence >= 0.9` | `score >= 85` or `score.bucket == "ActNow"` | Very high certainty | | `confidence >= 0.7` | `score >= 60` or `score.bucket in ["ActNow", "ScheduleNext"]` | High certainty | | `confidence >= 0.5` | `score >= 40` | Medium certainty | | `confidence < 0.3` | `score < 25` | Low evidence | **When to proceed:** After rule migration and 2+ weeks of stable operation --- ### Phase 4: EWS-Only (Deprecation Complete) **Duration:** Permanent **Risk:** Low (rollback path exists) Disable legacy Confidence scoring: ```json { "Policy": { "EvidenceWeightedScore": { "Enabled": true, "DualEmitMode": false, "UseAsPrimaryScore": true } } } ``` **What happens:** - Only EWS is calculated - Confidence field is null in verdicts - Performance improvement (single calculation) - Consumers must use EWS fields **Breaking changes to document:** - `Verdict.Confidence` returns null - `ConfidenceScore` type is deprecated (will be removed in v3.0) - Rules referencing `confidence` will fail validation --- ## Configuration Reference ### Full Configuration Schema ```json { "Policy": { "EvidenceWeightedScore": { "Enabled": true, "DualEmitMode": true, "UseAsPrimaryScore": false, "EnableCaching": true, "CacheDurationSeconds": 300, "Weights": { "Reachability": 0.25, "RuntimeSignal": 0.30, "BackportStatus": 0.10, "ExploitMaturity": 0.15, "SourceTrust": 0.15, "MitigationStatus": 0.05 }, "BucketThresholds": { "ActNow": 85, "ScheduleNext": 60, "Investigate": 40 }, "Telemetry": { "EnableMigrationMetrics": true, "SampleRate": 0.1, "MaxSamples": 1000 } } } } ``` ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `POLICY_EWS_ENABLED` | `false` | Enable EWS calculation | | `POLICY_EWS_DUAL_EMIT` | `false` | Emit both scores | | `POLICY_EWS_PRIMARY` | `false` | Use EWS as primary score | | `POLICY_EWS_CACHE_ENABLED` | `true` | Enable score caching | --- ## Telemetry & Monitoring ### Metrics The migration telemetry service exposes these metrics: | Metric | Type | Description | |--------|------|-------------| | `stellaops.policy.migration.comparisons_total` | Counter | Total comparisons made | | `stellaops.policy.migration.aligned_total` | Counter | Comparisons where rankings aligned | | `stellaops.policy.migration.score_difference` | Histogram | Distribution of score differences | | `stellaops.policy.migration.tier_bucket_match_total` | Counter | Tier/bucket matches | | `stellaops.policy.dual_emit.verdicts_total` | Counter | Dual-emit verdicts produced | ### Dashboard Queries **Alignment rate over time:** ```promql rate(stellaops_policy_migration_aligned_total[5m]) / rate(stellaops_policy_migration_comparisons_total[5m]) ``` **Score difference distribution:** ```promql histogram_quantile(0.95, stellaops_policy_migration_score_difference_bucket) ``` ### Sample Analysis Use `IMigrationTelemetryService.GetRecentSamples()` to retrieve divergent samples: ```csharp var telemetry = serviceProvider.GetRequiredService(); var stats = telemetry.GetStats(); if (stats.AlignmentRate < 0.8m) { var samples = telemetry.GetRecentSamples(50) .Where(s => !s.IsAligned) .OrderByDescending(s => Math.Abs(s.ScoreDifference)); foreach (var sample in samples) { Console.WriteLine($"{sample.FindingId}: Conf={sample.ConfidenceValue:F2} → EWS={sample.EwsScore} (Δ={sample.ScoreDifference})"); } } ``` --- ## Rollback Procedures ### Phase 4 → Phase 3 (Re-enable Dual-Emit) ```json { "Policy": { "EvidenceWeightedScore": { "DualEmitMode": true } } } ``` Restart services. Confidence will be calculated again. ### Phase 3 → Phase 2 (Revert to Confidence Primary) ```json { "Policy": { "EvidenceWeightedScore": { "UseAsPrimaryScore": false } } } ``` Rules using `confidence` will work again. Rules using `score` will still work. ### Phase 2 → Phase 1 (Disable EWS) ```json { "Policy": { "EvidenceWeightedScore": { "Enabled": false } } } ``` No EWS calculation, no performance impact. ### Emergency Rollback Set environment variable for immediate effect without restart (if hot-reload enabled): ```bash export POLICY_EWS_ENABLED=false ``` --- ## Rule Migration Checklist - [ ] Inventory all policies using `confidence` field - [ ] Map confidence thresholds to EWS thresholds (see table above) - [ ] Update rules to use `score` syntax - [ ] Consider using bucket-based rules for clearer semantics - [ ] Test rules in dual-emit mode before switching primary - [ ] Update documentation and runbooks - [ ] Train operators on new score interpretation - [ ] Update alerting thresholds --- ## FAQ ### Q: Will existing rules break? **A:** Not during dual-emit mode. Rules using `confidence` continue to work. Once `UseAsPrimaryScore: true`, new rules should use `score`. Old `confidence` rules will emit deprecation warnings and fail validation in Phase 4. ### Q: How do I interpret the score difference? **A:** The ConfidenceToEwsAdapter maps Confidence (0-1) to an approximate EWS (0-100) with semantic inversion. A "difference" of ±15 points is normal due to the different underlying models. Investigate differences > 30 points. ### Q: What if my rankings diverge significantly? **A:** This is expected for findings where: - Runtime signals (Rts) differ from static analysis - Vendor VEX overrides traditional severity - Reachability analysis shows unreachable code Review these cases manually. EWS is likely more accurate due to evidence integration. ### Q: Can I customize the EWS weights? **A:** Yes, via `Weights` configuration. However, changing weights affects determinism proofs. Document any changes and bump the policy version. ### Q: What about attestations? **A:** During dual-emit, attestations include both scores. After Phase 4, only EWS is attested. Old attestations remain verifiable with their original scores. --- ## Related Documents - [Evidence-Weighted Score Architecture](../../signals/architecture.md) - [Policy DSL Reference](../contracts/policy-dsl.md) - [Verdict Attestation](../verdict-attestation.md) - [Sprint 8200.0012.0003](../../../../implplan/SPRINT_8200_0012_0003_policy_engine_integration.md) --- ## Revision History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0 | 2025-12-31 | Implementer | Initial migration guide |